Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

Jan Urbański Fri, 09 Nov 2007 04:34:40 -0800

> dictionaries. In this case, you would first check against one stopword
> list, eliminating 'od', then check the ispell dictionary, and then check
> another stopword list without 'od'.


My problem is basically solved using the patch I sent earlier. I use
'{stop, pl_ispell, simple}' which has the effect of:
a) eliminating words that are stopwords but stemmed produce
non-stopwords (such as  'od', that gets stemmed to 'oda')
b) stemming non-stopwords properly (using an ispell dictionary)
c) indexing words that are not reckognized by ispell, (for instance
'postgresql' gets indexed as 'postgresql')

> I suggested that a while ago
> (http://archives.postgresql.org/pgsql-hackers/2007-08/msg01036.php).
> Hopefully Oleg or someone else gets around restructuring the
> dictionaries in a future release.

I'm gald to see I'm not the only one who is in need of a more
sophisticated way of dealing with dictionaries chaining. I understand
however the problems that arise when one wants to extend the dictionary
API beyond the reject/accept/pass-on schema. For these three we have an
easy way of passing the result from lexize - it returns an empty array,
an array of stemmed lexemes or NULL. If more complex actions were to be
taken, I'm afraid lexize would have to return something more complex
than just text[].

> I wonder if you could hack the ispell dictionary file to treat oda
> specially?

I thought about it, but it turned out that writing a custom dictionary
was easier than figuring out how ispell works internally.

Regards,
-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

signature.asc
Description: OpenPGP digital signature

Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

Reply via email to