Re: LucidWorks Solr

darren Mon, 19 Apr 2010 07:07:31 -0700

There have been some open source ones. I don't have the links handy at
this moment[1]. But I parsed through the electronic dictionary and
generated a database of each word and its morphologies. I got tired of
lame stemmers that were wrong half the time. Computers are fast enough to
do lookups on 150,000 words noawadays, there's no need for fuzzy
algorithms here, IMO.


Good luck!

[1] google will turn up some I think.

> Thanks for the tip.
>
> Are there any publicly available dictionary of morphologies that I could
> use? Or did you build your own one?
>
>
> --- On Mon, 4/19/10, Darren Govoni <dar...@ontrenet.com> wrote:
>
>> From: Darren Govoni <dar...@ontrenet.com>
>> Subject: Re: LucidWorks Solr
>> To: solr-user@lucene.apache.org
>> Date: Monday, April 19, 2010, 7:39 AM
>> Regarding stemmers, I ditched them
>> altogether a long time ago in favor
>> of a dictionary of morphologies of all known words (for any
>> given
>> language). A simple lookup of any word morphology thus
>> produces the set,
>> including the correct stem.
>>
>> Works great. 100% of the time.
>>
>> Just a tip from me.
>>
>>
>> On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote:
>>
>> > Andy, I think it is important to know what a stemmer
>> really is.
>> >
>> > It reduces words to their infinitves. Those
>> infinitives do not refer to the
>> > real infinitive everytime, but however: for the
>> system, it is an infinitive,
>> > since all its derivates could be reduced to the same
>> form.
>> > Thats a stemmer.
>> >
>> > According to this, there can't exist a stemmer for
>> every language, because
>> > every language has got its own rules of how to reduce
>> a word to its
>> > infinitive.
>> >
>> > If you apply a stemmer for english language on a
>> german document, the
>> > results might be unexpected. However, sometimes it
>> still works good enough.
>> >
>> > Keep in mind that this is an algorithm. It is not
>> important whether the
>> > created infinitive is the real infinitive. It is only
>> important that most of
>> > the derivate forms can be reduced to the same basic
>> form. Please ask, if
>> > something is not clear.
>> >
>> > KStem:
>> > The wiki[1] says that KStem is less aggressive as the
>> standard stemmer.
>> > I guess that this means that there are more rules for
>> how to reduce a word
>> > to its infinitive and according to this the results
>> might be better.
>> >
>> >
>> > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
>> >
>> > Kind regards
>> > - Mitch
>>
>>
>>
>
>
>
>

Re: LucidWorks Solr

Reply via email to