Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

Heikki Linnakangas Fri, 09 Nov 2007 04:01:47 -0800

Jan Urbański wrote:

The solution I came up with was simple: write a dictionary, that does
only one thing: looks up the lexeme in a stopwords file and either
discards it or returns NULL.

Doesn't the "simple" dictionary handle this?


I don't think so. The 'simple' dictionary discards stopwords, but
accepts any other lexemes. So if use {'simple', 'pl_ispell'} for my
config, I'll get rid of the stopwords, but I won't get any lexemes
stemmed by ispell. Every lexeme that's not a stopword will produce the
very same lexeme (this is how I think the 'simple' dictionary works).

My dictionary does basically the same thing as the 'simple' dictionary,
but it returns NULL instead of the original lexeme in case the lexeme is
not found in the stopwords file.

In the long term, what we really need a more flexible way to chaindictionaries. In this case, you would first check against one stopwordlist, eliminating 'od', then check the ispell dictionary, and then checkanother stopword list without 'od'.

I suggested that a while ago(http://archives.postgresql.org/pgsql-hackers/2007-08/msg01036.php).Hopefully Oleg or someone else gets around restructuring thedictionaries in a future release.

I wonder if you could hack the ispell dictionary file to treat odaspecially?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

Reply via email to