Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to lower 
case and checking it against a file of stop words. If it is found in the file 
then an empty array is returned, causing the token to be discarded. If not, the 
lower-cased form of the word is returned as the normalized lexeme. 
Alternatively, the dictionary can be configured to report non-stop-words as 
unrecognized, allowing them to be passed on to the next dictionary in the list.

d=# \dFd+ simple
                                          List of text search dictionaries
Schema | Name | Template | Init options | Description ------------+--------+-------------------+--------------+-----------------------------------------------------------
 pg_catalog | simple | pg_catalog.simple |              | simple dictionary: 
just lower case and check for stopword

By default it has no Init options, so it doesn't check for stopwords.

On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

On 07/22/2010 06:27 PM, John Gage wrote:
The easiest way to look at this is to give the simple dictionary a document with to_tsvector() and see if stopwords pop out.

In my experience they do. In my experience, the simple dictionary just breaks the document down into the space etc. separated words in the document. It doesn't analyze further.

That's my experience too, I just want to make sure it doesn't actually have any stopwords which I've missed. Trying many phrases and checking for stopwords isn't really proving anything.

Can anybody confirm the "simple" dict. only lowercases the words and "uniques" them?



        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to