Hi Oleg Haha, understood!
Thanks for helping me on this one. Cheers Tim On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov <obartu...@gmail.com> wrote: >Tim, > >you did answer yourself - don't use ispell :) > >On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden <t...@shisaa.jp> >wrote: >> On Fri, 2 May 2014 21:12:56 +0400 >> Oleg Bartunov <obartu...@gmail.com> wrote: >> >> Hi Oleg >> >> Thanks for the response! >> >>> Yes, it's normal for ispell dictionary, think about morphological >dictionary. >> >> Hmm, I see, that makes sense. I thought the morphological aspect of >the Ispell only dealt with splitting up compound words, but it also >deals with deriving the word to a more "stem" like form, correct? >> >> As a last question on this, is there a way to disable this dictionary >to emit multiple lexemes? >> >> The reason I am asking is because in my (fairly new) understanding of >PostgreSQL's full text it is always best to have as few lexemes as >possible saved in the vector. This to get smaller indexes and faster >matching afterwards. Also, if you run a tsquery afterwards to, you can >still employ the power of these multiple lexemes to find a match. >> >> Or...probably answering my own question...if I do not desire this >behavior I should maybe not use Ispell and simply use another >dictionary :) >> >> Thanks again. >> >> Cheers, >> Tim >> >>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <t...@shisaa.jp> >wrote: >>> > Good morning/afternoon all >>> > >>> > I am currently writing a few articles about PostgreSQL's full text >capabilities and have a question about the Ispell dictionary which I >cannot seem to find an answer to. It is probably a very simple issue, >so forgive my ignorance. >>> > >>> > In one article I am explaining about dictionaries and I have setup >a sample configuration which maps most token categories to only use a >Ispell dictionary (timusan_ispell) which has a default configuration: >>> > >>> > CREATE TEXT SEARCH DICTIONARY timusan_ispell ( >>> > TEMPLATE = ispell, >>> > DictFile = en_us, >>> > AffFile = en_us, >>> > StopWords = english >>> > ); >>> > >>> > When I run a simple query like "SELECT >to_tsvector('timusan-ispell','smiling')" I get back the following >tsvector: >>> > >>> > 'smile':1 'smiling':1 >>> > >>> > As you can see I get two lexemes with the same pointer. >>> > The question here is: why does this happen? >>> > >>> > Is it normal behavior for the Ispell dictionary to emit multiple >lexemes for a single token? And if so, is this efficient? I mean, why >could it not simply save one lexeme 'smile' which (same as the snowball >dictionary) would match 'smiling' as well if later matched with the >accompanying tsquery? >>> > >>> > Thanks! >>> > >>> > Cheers, >>> > Tim >>> > >>> > >>> > -- >>> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>> > To make changes to your subscription: >>> > http://www.postgresql.org/mailpref/pgsql-general >> >> >> -- >> Tim van der Linden <t...@shisaa.jp>