Re: [GENERAL] changing text search treatment of puncutation
In general there seem to be a lot of ways that people wish they could tweak the text search parser, and telling them to write their own parser isn't a very helpful response for most folk. I don't have an idea about how to improve the situation, but it seems like something that should be thought about. We (with Oleg) thought hard about it and we don't find a solution yet. Configurable parser should be: - fast - flexible - not error-prone - comfortable to use by non-programmer (at least for non-C programmer) It might be a table-driven state machine (just put TParserStateAction into table(s) with some caching for first step) , but it's complex to operate and it's needed to prove correctness of changes in states before its become in use. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] changing text search treatment of puncutation
On Wed, 2 Jul 2008, Tom Lane wrote: John DeSoi [EMAIL PROTECTED] writes: Is there an easy way to change '/' to be treated like '-' ? I've looked over the documentation several times and could not find anything. Even just a way to get the two tokens 'home' and 'work' without the joined form would be helpful. Seems like the simplest solution is just to apply regexp_replace(text, '/', '-', 'g') before letting the text search stuff have the string. If you're using a trigger to update a tsvector column, this would be pretty trivial to do within the trigger. In general there seem to be a lot of ways that people wish they could tweak the text search parser, and telling them to write their own parser isn't a very helpful response for most folk. I don't have an idea about how to improve the situation, but it seems like something that should be thought about. Sure, we thought about this. The most difficult part in user-configurable parser (we thought about table-driven finite automata) is the foolproof design. There are should be algorithms for testing validity of finite automata, but we don't know any effective way. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] changing text search treatment of puncutation
Text with the '/' character gets treated as a file path, e.g. select * from to_tsvector('english', 'home/work'); gives only the single token: 'home/work':1 Changing '/' to '-' gives 'home':2 'work':3 'home-work':1 which is much more desirable for this application. Is there an easy way to change '/' to be treated like '-' ? I've looked over the documentation several times and could not find anything. Even just a way to get the two tokens 'home' and 'work' without the joined form would be helpful. Thanks, John DeSoi, Ph.D. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] changing text search treatment of puncutation
John DeSoi [EMAIL PROTECTED] writes: Is there an easy way to change '/' to be treated like '-' ? I've looked over the documentation several times and could not find anything. Even just a way to get the two tokens 'home' and 'work' without the joined form would be helpful. Seems like the simplest solution is just to apply regexp_replace(text, '/', '-', 'g') before letting the text search stuff have the string. If you're using a trigger to update a tsvector column, this would be pretty trivial to do within the trigger. In general there seem to be a lot of ways that people wish they could tweak the text search parser, and telling them to write their own parser isn't a very helpful response for most folk. I don't have an idea about how to improve the situation, but it seems like something that should be thought about. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general