Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-19 Thread Wojciech Knapik
ts_headline calls ts_lexize equivalent to break the text. Off course there is algorithm to process the tokens and generate the headline. I would be really surprised if the algorithm to generate the headline is somehow dependent on language (as it only processes the tokens). So Oleg is right

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-19 Thread Tom Lane
Wojciech Knapik webmas...@wolniartysci.pl writes: Tom Lane wrote: I tried to duplicate this test, but got no further than here: ERROR: syntax error CONTEXT: line 174 of configuration file /home/tgl/testversion/share/postgresql/tsearch_data/polish.affix: L E C

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-19 Thread Wojciech Knapik
Tom Lane wrote: *SNIP* So about 55% of the time is going into affix pattern matching. I wonder whether that couldn't be made faster. A lot of the cycles are spent on coping with variable-length characters --- perhaps the ispell code should convert to wchar representation before doing this?

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Oleg Bartunov
Wojciech, your polish_english, polish configurations uses ispell language and slow, while english configuration doesn't contains ispell. So, what's your complains ? Try add ispell dictionary to english configuration and see timings. Oleg On Wed, 18 Nov 2009, Wojciech Knapik wrote: Hello

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Wojciech Knapik
Tom Lane wrote: I tested on 8.3.1 on G5/OSX 10.5.8 and Xeon/Gentoo AMD64-2008.0 (2.6.21), then switched both installations to 8.3.8 (both packages compiled from source, but provided by the distro - port/emerge). The Polish dictionaries and config were created according to this article (it's

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Wojciech Knapik
your polish_english, polish configurations uses ispell language and slow, while english configuration doesn't contains ispell. So, what's your complains ? Try add ispell dictionary to english configuration and see timings. Oh, so this is not anomalous ? These are the expected speeds for an

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Oleg Bartunov
On Wed, 18 Nov 2009, Wojciech Knapik wrote: your polish_english, polish configurations uses ispell language and slow, while english configuration doesn't contains ispell. So, what's your complains ? Try add ispell dictionary to english configuration and see timings. Oh, so this is not

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Wojciech Knapik
Oleg Bartunov wrote: your polish_english, polish configurations uses ispell language and slow, while english configuration doesn't contains ispell. So, what's your complains ? Try add ispell dictionary to english configuration and see timings. Oh, so this is not anomalous ? These are the

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Pavel Stehule
2009/11/18 Oleg Bartunov o...@sai.msu.su: On Wed, 18 Nov 2009, Wojciech Knapik wrote: your polish_english, polish configurations uses ispell language and slow, while english configuration doesn't contains ispell. So, what's your complains ? Try add ispell dictionary to english configuration

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Oleg Bartunov
On Wed, 18 Nov 2009, Wojciech Knapik wrote: Yes, for 4-word texts the results are similar. Try that with a longer text and the difference becomes more and more significant. For the lorem ipsum text, 'polish' is about 4 times slower, than 'english'. For 5 repetitions of the text, it's 6 times,

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Wojciech Knapik
Oleg Bartunov wrote: Yes, for 4-word texts the results are similar. Try that with a longer text and the difference becomes more and more significant. For the lorem ipsum text, 'polish' is about 4 times slower, than 'english'. For 5 repetitions of the text, it's 6 times, for 10 repetitions -

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Sushant Sinha
ts_headline calls ts_lexize equivalent to break the text. Off course there is algorithm to process the tokens and generate the headline. I would be really surprised if the algorithm to generate the headline is somehow dependent on language (as it only processes the tokens). So Oleg is right when

[HACKERS] Very bad FTS performance with the Polish config

2009-11-17 Thread Wojciech Knapik
Hello This has been discussed in #postgresql and posted to -performance a couple days ago, but no solution has been found. The discussion can be found here: http://archives.postgresql.org/pgsql-performance/2009-11/msg00162.php I just finished implementing a search engine for my site and

[HACKERS] Very bad FTS performance with the Polish config

2009-11-17 Thread Wojciech Knapik
Hello This has been discussed in #postgresql and posted to -performance a couple days ago, but no solution has been found. The discussion can be found here: http://archives.postgresql.org/pgsql-performance/2009-11/msg00162.php I just finished implementing a search engine for my site and

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-17 Thread Euler Taveira de Oliveira
Wojciech Knapik escreveu: PS. This issue is not related to the loading time of dictionaries, or calls to ts_headline for results that won't be displayed. So what? Could you post the profiling of that query? -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-17 Thread Wojciech Knapik
Euler Taveira de Oliveira wrote: PS. This issue is not related to the loading time of dictionaries, or calls to ts_headline for results that won't be displayed. So what? Could you post the profiling of that query? Polish: http://pastie.textmate.org/private/8lhmnbvde43lfjoxc52r1q English:

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-17 Thread Euler Taveira de Oliveira
Wojciech Knapik escreveu: Euler Taveira de Oliveira wrote: PS. This issue is not related to the loading time of dictionaries, or calls to ts_headline for results that won't be displayed. So what? Could you post the profiling of that query? I was talking about gprof (--enable-profiling),

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-17 Thread Tom Lane
Wojciech Knapik webmas...@wolniartysci.pl writes: I tested on 8.3.1 on G5/OSX 10.5.8 and Xeon/Gentoo AMD64-2008.0 (2.6.21), then switched both installations to 8.3.8 (both packages compiled from source, but provided by the distro - port/emerge). The Polish dictionaries and config were