Re: [HACKERS] Full-text search default vs specified configuration
On Fri, 22 Feb 2008, Richard Huxton wrote: Tom Lane wrote: Richard Huxton <[EMAIL PROTECTED]> writes: Would there be any support for two changes in 8.4 though? 1. Tag tsvector/tsquery's with the (oid of) their configuration? 2. Either warn or require CASCADE on changes to a configuration/dictionary that could impact existing indexes etc. IIRC, the current behavior is intentional --- Oleg and Teodor argued that tsvector values are relatively independent of small changes in configuration and we should *not* force people to, say, reindex their tables every time they add or subtract a stopword. If we had some measure of whether a TS configuration change was "critical" or not, it might make sense to restrict critical changes; but I fear that would be kind of hard to determine. Well, clearly in my example it didn't impact operation at all, but it's an accident waiting to happen (and more importantly, a hard one to track down). It's like running SQL-ASCII encoding, everything just ticks along only to cause problems a month later. What about the warning: "This may affect existing indexes - please check". Would that cause anyone problems? What worries me is that it might take 10 messages on general/sql list to figure out the problem. This was reported as "words with many hits causes problems". He just didn't read documentation thoroughly. Maybe it's just a matter of getting the message out: "always specify the config or never specify the config". Probably, just stress this in documentation. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Full-text search default vs specified configuration
Tom Lane wrote: Richard Huxton <[EMAIL PROTECTED]> writes: Would there be any support for two changes in 8.4 though? 1. Tag tsvector/tsquery's with the (oid of) their configuration? 2. Either warn or require CASCADE on changes to a configuration/dictionary that could impact existing indexes etc. IIRC, the current behavior is intentional --- Oleg and Teodor argued that tsvector values are relatively independent of small changes in configuration and we should *not* force people to, say, reindex their tables every time they add or subtract a stopword. If we had some measure of whether a TS configuration change was "critical" or not, it might make sense to restrict critical changes; but I fear that would be kind of hard to determine. Well, clearly in my example it didn't impact operation at all, but it's an accident waiting to happen (and more importantly, a hard one to track down). It's like running SQL-ASCII encoding, everything just ticks along only to cause problems a month later. What about the warning: "This may affect existing indexes - please check". Would that cause anyone problems? What worries me is that it might take 10 messages on general/sql list to figure out the problem. This was reported as "words with many hits causes problems". Maybe it's just a matter of getting the message out: "always specify the config or never specify the config". -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Full-text search default vs specified configuration
Richard Huxton <[EMAIL PROTECTED]> writes: > Would there be any support for two changes in 8.4 though? > 1. Tag tsvector/tsquery's with the (oid of) their configuration? > 2. Either warn or require CASCADE on changes to a > configuration/dictionary that could impact existing indexes etc. IIRC, the current behavior is intentional --- Oleg and Teodor argued that tsvector values are relatively independent of small changes in configuration and we should *not* force people to, say, reindex their tables every time they add or subtract a stopword. If we had some measure of whether a TS configuration change was "critical" or not, it might make sense to restrict critical changes; but I fear that would be kind of hard to determine. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Full-text search default vs specified configuration
I've been looking at a problem someone encountered with ts_headline: http://archives.postgresql.org/pgsql-general/2008-02/msg01035.php It turns out the problem was mixing ts_headline() with to_tsquery() where wasn't the default. Fair enough, and in retrospect it's obvious. However, I fear it's going to be a pretty common error. It's also one that's not easy to catch - you can test a configuration, but you can't see what configuration generated a particular tsvector / tsquery (afaict). I realise there was a lot of discussion during 8.3 devt about what was wanted from a default config and I'm guessing there's nothing that can be done for 8.3.x Would there be any support for two changes in 8.4 though? 1. Tag tsvector/tsquery's with the (oid of) their configuration? This could then generate a warning/error if you are running a tsquery against the wrong tsvector / combining two incompatible tsvectors etc. 2. Either warn or require CASCADE on changes to a configuration/dictionary that could impact existing indexes etc. I've done it once myself where a stopword dictionary was changed from accept=true to accept=false. That change is OK (as long as you don't mind rogue tokens in your tsvectors) but others are probably not. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate