Re: [GENERAL] changing text search treatment of puncutation

2008-07-03 Thread Teodor Sigaev



In general there seem to be a lot of ways that people wish they
could tweak the text search parser, and telling them to write
their own parser isn't a very helpful response for most folk.
I don't have an idea about how to improve the situation, but
it seems like something that should be thought about.


We (with Oleg) thought hard about it and we don't find a solution yet.
Configurable parser should be:
- fast
- flexible
- not error-prone
- comfortable to use by non-programmer (at least for non-C programmer)

It might be a table-driven state machine (just put TParserStateAction into 
table(s) with some caching for first step) , but it's complex to operate and 
it's needed to prove correctness of changes in states before its become in use.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] changing text search treatment of puncutation

2008-07-03 Thread Oleg Bartunov

On Wed, 2 Jul 2008, Tom Lane wrote:


John DeSoi [EMAIL PROTECTED] writes:

Is there an easy way to change '/' to be treated like '-' ? I've
looked over the documentation several times and could not find
anything. Even just a way to get the two tokens 'home' and 'work'
without the joined form would be helpful.


Seems like the simplest solution is just to apply
regexp_replace(text, '/', '-', 'g')
before letting the text search stuff have the string.  If you're
using a trigger to update a tsvector column, this would be pretty
trivial to do within the trigger.

In general there seem to be a lot of ways that people wish they
could tweak the text search parser, and telling them to write
their own parser isn't a very helpful response for most folk.
I don't have an idea about how to improve the situation, but
it seems like something that should be thought about.


Sure, we thought about this. The most difficult part in user-configurable 
parser (we thought about table-driven  finite automata) is the foolproof 
design. There are should be algorithms for testing validity of finite

automata, but we don't know any effective way.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] changing text search treatment of puncutation

2008-07-02 Thread John DeSoi


Text with the '/' character gets treated as a file path, e.g.

select * from to_tsvector('english', 'home/work');

gives only the single token:

'home/work':1

Changing '/' to '-' gives

'home':2 'work':3 'home-work':1

which is much more desirable for this application.

Is there an easy way to change '/' to be treated like '-' ? I've  
looked over the documentation several times and could not find  
anything. Even just a way to get the two tokens 'home' and 'work'  
without the joined form would be helpful.


Thanks,


John DeSoi, Ph.D.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] changing text search treatment of puncutation

2008-07-02 Thread Tom Lane
John DeSoi [EMAIL PROTECTED] writes:
 Is there an easy way to change '/' to be treated like '-' ? I've  
 looked over the documentation several times and could not find  
 anything. Even just a way to get the two tokens 'home' and 'work'  
 without the joined form would be helpful.

Seems like the simplest solution is just to apply
regexp_replace(text, '/', '-', 'g')
before letting the text search stuff have the string.  If you're
using a trigger to update a tsvector column, this would be pretty
trivial to do within the trigger.

In general there seem to be a lot of ways that people wish they
could tweak the text search parser, and telling them to write
their own parser isn't a very helpful response for most folk.
I don't have an idea about how to improve the situation, but
it seems like something that should be thought about.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general