Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Tom Lane
Teodor Sigaev <[EMAIL PROTECTED]> writes: >> I think Teodor's solution is wrong as it stands, because if the subquery >> finds matches for mapcfg and maptokentype, but none of those rows >> produce a non-null ts_lexize result, it will instead emit one row with a >> null result, which is not what sh

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Teodor Sigaev
I think Teodor's solution is wrong as it stands, because if the subquery finds matches for mapcfg and maptokentype, but none of those rows produce a non-null ts_lexize result, it will instead emit one row with a null result, which is not what should happen. But concatenation with NULL will have r

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Uh, how will that help? AFAICS it still has to call ts_lexize with >> every dictionary. > No, ts_lexize is no longer in the seq scan filter, but in the sort key > that's calculated only for those rows that match the filter 'map

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Heikki Linnakangas
Tom Lane wrote: > Teodor Sigaev <[EMAIL PROTECTED]> writes: >>> Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict, >>> $1). That means that it will call ts_lexize on every dictionary, which >>> will try to load every dictionary. And loading danish_stem dictionary >>> fails in

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Tom Lane
Teodor Sigaev <[EMAIL PROTECTED]> writes: >> Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict, >> $1). That means that it will call ts_lexize on every dictionary, which >> will try to load every dictionary. And loading danish_stem dictionary >> fails in latin2 encoding, becau

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-10 Thread Teodor Sigaev
Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict, $1). That means that it will call ts_lexize on every dictionary, which will try to load every dictionary. And loading danish_stem dictionary fails in latin2 encoding, because of the problem with the stopword file. Attached

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-07 Thread Oleg Bartunov
On Fri, 7 Sep 2007, Heikki Linnakangas wrote: Pavel Stehule wrote: postgres=# select ts_debug('cs','PliЪЪ ЪЪluЪЪouЪЪkЪЪ k se napil ЪЪlutЪЪ vody'); ERROR: character 0xc3a5 of encoding "UTF8" has no equivalent in "LATIN2" CONTEXT: SQL function "ts_debug" statement 1 I can reproduce t

Re: [HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-07 Thread Heikki Linnakangas
Pavel Stehule wrote: > postgres=# select ts_debug('cs','Příliš žluťoučký kůň se napil žluté vody'); > ERROR: character 0xc3a5 of encoding "UTF8" has no equivalent in "LATIN2" > CONTEXT: SQL function "ts_debug" statement 1 I can reproduce that. In fact, you don't need the custom config or diction

[HACKERS] integrated tsearch doesn't work with non utf8 database

2007-09-07 Thread Pavel Stehule
Hello last time I checked utf8 database. Now I checked latin2 encoding database. I used dictionaries from last test. client_encoding | utf8 lc_collate | cs_CZ.iso-8859-2 lc_ctype| cs_CZ.iso-8859-2 lc_messages | cs_CZ