ts_headline calls ts_lexize equivalent to break the text. Off course there
is algorithm to process the tokens and generate the headline. I would be
really surprised if the algorithm to generate the headline is somehow
dependent on language (as it only processes the tokens). So Oleg is right
when he says ts_lexize is something to be checked.

I will try to replicate what you are trying to do but in the meantime can
you run the same ts_headline under psql multiple times and paste the result.

-Sushant.

2009/11/19 Wojciech Knapik <webmas...@wolniartysci.pl>

>
> Oleg Bartunov wrote:
>
>  Yes, for 4-word texts the results are similar.
>>> Try that with a longer text and the difference becomes more and more
>>> significant. For the lorem ipsum text, 'polish' is about 4 times slower,
>>> than 'english'. For 5 repetitions of the text, it's 6 times, for 10
>>> repetitions - 7.5 times...
>>>
>>
>> Again, I see nothing unclear here, since dictionaries (as specified
>> in configuration) apply to ALL words in document. The more words in
>> document, the more overhead.
>>
>
> You're missing the point. I'm not surprised that the function takes more
> time for larger input texts - that's obvious. The thing is, the computation
> times rise more steeply when the Polish config is used. Steeply enough, that
> the difference between the Polish and English configs becomes enormous in
> practical cases.
>
> Now this may be expected behaviour, but since I don't know if it is, I
> posted to the mailing lists to find out. If you're saying this is ok and
> there's nothing to fix here, then there's nothing more to discuss and we may
> consider the thread closed.
> If not, ts_headline deserves a closer look.
>
> cheers,
> Wojciech Knapik
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Reply via email to