RE: Grammatical tenses Stemming in SOLR

2018-09-21 Thread Markus Jelsma
Hello Aishwarya,

KStem does a really bad job with the examples you have given, it won't remove 
the -s and -ing suffixes in some strange cases. Porter/Snowball work just fine 
for this example.

What won't work, of course, are irregular verbs and nouns (plural forms). They 
always need to be hard-coded either within the algorithm, which it is not, or 
outside by for example a StemmerOverrideFilter.

Regards,
Markus
 
-Original message-
> From:aishwarya 
> Sent: Friday 21st September 2018 10:38
> To: solr-user@lucene.apache.org
> Subject: Grammatical tenses Stemming in SOLR
> 
> 
> 1
> down vote
> favorite
> i want to know which stemming filter factory can be used to fetch all the
> possible tenses of a stem word.
> 
> example : if "run" is the search word -> it has to fetch results for all
> files involving run , running , runs , ran.
> 
> also the vice-versa --> whichever tense of a word is searched , it has to
> retreive all the results from the files.
> 
> i tried using POrterStemFilterFactory , snowball , kstem --> none of these
> seems to fetch the intended results.
> 
> Please help ! thanks in advance
> 
> Thanks, Aishwarya
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 


Grammatical tenses Stemming in SOLR

2018-09-21 Thread aishwarya


1
down vote
favorite
i want to know which stemming filter factory can be used to fetch all the
possible tenses of a stem word.

example : if "run" is the search word -> it has to fetch results for all
files involving run , running , runs , ran.

also the vice-versa --> whichever tense of a word is searched , it has to
retreive all the results from the files.

i tried using POrterStemFilterFactory , snowball , kstem --> none of these
seems to fetch the intended results.

Please help ! thanks in advance

Thanks, Aishwarya



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Advice on Stemming in Solr

2017-11-04 Thread Zheng Lin Edwin Yeo
Hi Emir,

We are looking at the configuration, to try to adjust the rules to suit our
use case.

Regards,
Edwin


On 3 November 2017 at 16:24, Emir Arnautović 
wrote:

> Hi Edwin,
> Hunspell is configurable, language independent library and you can define
> any morphology rules. It’s beed there for a while and I would not be
> surprised if someone already adjusted english rules to suite you case.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Nov 2017, at 04:25, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Emir,
> >
> > We are looking to change to HunspellStemFilterFactory. This has a
> > dictionary file containing words and applicable flags, and an affix file
> > that specifies how these flags will control spell checking.
> > Probably we can control it from those files in HunspellStemFilterFactory?
> >
> > Regards,
> > Edwin
> >
> >
> > On 2 November 2017 at 17:46, Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi Edwin,
> >> It seems that it would be best if you do not apply *ing stemming rule at
> >> all. The first idea is to trick stemmer and replace any word that ends
> with
> >> ing to some nonexisting char combination e.g. ‘wqx’. You can use solr.
> PatternReplaceFilterFactory
> >> to do that. You can switch it back after stemming if want to have proper
> >> token in index.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo 
> >> wrote:
> >>>
> >>> Hi Emir,
> >>>
> >>> We do have quite alot of words that should not be stemmed. Currently,
> the
> >>> KStemFilterFactory are stemming all the non-English words that end with
> >>> "ing" as well. There are quite alot of places and names which ends in
> >>> "ing", and all these are being stemmed as well, which leads to an
> >>> inaccurate search.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>> On 1 November 2017 at 18:20, Emir Arnautović <
> >> emir.arnauto...@sematext.com>
> >>> wrote:
> >>>
>  Hi Edwin,
>  If the number of words that should not be stemmed is not high you
> could
>  use KeywordMarkerFilterFactory to flag those words as keywords and it
>  should prevent stemmer from changing them.
>  Depending on what you want to achieve, you might not be able to avoid
>  using stemmer at indexing time. If you want to find documents that
> >> contain
>  only “walking” with search term “walk”, then you have to stem at index
>  time. Cases when you use stemming on query time only are rare and
> >> specific.
>  If you want to prefer exact matches over stemmed matches, you have to
>  index same content with and without stemming and boost matches on
> field
>  without stemming.
> 
>  HTH,
>  Emir
>  --
>  Monitoring - Log Management - Alerting - Anomaly Detection
>  Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> 
> 
> 
> > On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
>  wrote:
> >
> > Hi,
> >
> > We are currently using KStemFilterFactory in Solr, but we found that
> it
>  is
> > actually doing stemming on non-English words like "ximenting", which
> it
> > stem to "ximent". This is not what we wanted.
> >
> > Another option is to use the HunspellStemFilterFactory, but there are
>  some
> > English words like "running", walking" that are not being stemmed.
> >
> > Would like to check, is it advisable to use Stemming at index? Or we
>  should
> > not use Stemming at index time, but at query time, do a search for
> the
> > stemmed words as well, like for example, if the user search for
>  "walking",
> > we will do the search together with "walk", and the actual word of
>  walking
> > will have higher weightage.
> >
> > I'm currently using Solr 6.5.1.
> >
> > Regards,
> > Edwin
> 
> 
> >>
> >>
>
>


Re: Advice on Stemming in Solr

2017-11-03 Thread Emir Arnautović
Hi Edwin,
Hunspell is configurable, language independent library and you can define any 
morphology rules. It’s beed there for a while and I would not be surprised if 
someone already adjusted english rules to suite you case.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Nov 2017, at 04:25, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Emir,
> 
> We are looking to change to HunspellStemFilterFactory. This has a
> dictionary file containing words and applicable flags, and an affix file
> that specifies how these flags will control spell checking.
> Probably we can control it from those files in HunspellStemFilterFactory?
> 
> Regards,
> Edwin
> 
> 
> On 2 November 2017 at 17:46, Emir Arnautović 
> wrote:
> 
>> Hi Edwin,
>> It seems that it would be best if you do not apply *ing stemming rule at
>> all. The first idea is to trick stemmer and replace any word that ends with
>> ing to some nonexisting char combination e.g. ‘wqx’. You can use 
>> solr.PatternReplaceFilterFactory
>> to do that. You can switch it back after stemming if want to have proper
>> token in index.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo 
>> wrote:
>>> 
>>> Hi Emir,
>>> 
>>> We do have quite alot of words that should not be stemmed. Currently, the
>>> KStemFilterFactory are stemming all the non-English words that end with
>>> "ing" as well. There are quite alot of places and names which ends in
>>> "ing", and all these are being stemmed as well, which leads to an
>>> inaccurate search.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> On 1 November 2017 at 18:20, Emir Arnautović <
>> emir.arnauto...@sematext.com>
>>> wrote:
>>> 
 Hi Edwin,
 If the number of words that should not be stemmed is not high you could
 use KeywordMarkerFilterFactory to flag those words as keywords and it
 should prevent stemmer from changing them.
 Depending on what you want to achieve, you might not be able to avoid
 using stemmer at indexing time. If you want to find documents that
>> contain
 only “walking” with search term “walk”, then you have to stem at index
 time. Cases when you use stemming on query time only are rare and
>> specific.
 If you want to prefer exact matches over stemmed matches, you have to
 index same content with and without stemming and boost matches on field
 without stemming.
 
 HTH,
 Emir
 --
 Monitoring - Log Management - Alerting - Anomaly Detection
 Solr & Elasticsearch Consulting Support Training - http://sematext.com/
 
 
 
> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
 wrote:
> 
> Hi,
> 
> We are currently using KStemFilterFactory in Solr, but we found that it
 is
> actually doing stemming on non-English words like "ximenting", which it
> stem to "ximent". This is not what we wanted.
> 
> Another option is to use the HunspellStemFilterFactory, but there are
 some
> English words like "running", walking" that are not being stemmed.
> 
> Would like to check, is it advisable to use Stemming at index? Or we
 should
> not use Stemming at index time, but at query time, do a search for the
> stemmed words as well, like for example, if the user search for
 "walking",
> we will do the search together with "walk", and the actual word of
 walking
> will have higher weightage.
> 
> I'm currently using Solr 6.5.1.
> 
> Regards,
> Edwin
 
 
>> 
>> 



Re: Advice on Stemming in Solr

2017-11-02 Thread Zheng Lin Edwin Yeo
Hi Emir,

We are looking to change to HunspellStemFilterFactory. This has a
dictionary file containing words and applicable flags, and an affix file
that specifies how these flags will control spell checking.
Probably we can control it from those files in HunspellStemFilterFactory?

Regards,
Edwin


On 2 November 2017 at 17:46, Emir Arnautović 
wrote:

> Hi Edwin,
> It seems that it would be best if you do not apply *ing stemming rule at
> all. The first idea is to trick stemmer and replace any word that ends with
> ing to some nonexisting char combination e.g. ‘wqx’. You can use 
> solr.PatternReplaceFilterFactory
> to do that. You can switch it back after stemming if want to have proper
> token in index.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Emir,
> >
> > We do have quite alot of words that should not be stemmed. Currently, the
> > KStemFilterFactory are stemming all the non-English words that end with
> > "ing" as well. There are quite alot of places and names which ends in
> > "ing", and all these are being stemmed as well, which leads to an
> > inaccurate search.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 November 2017 at 18:20, Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi Edwin,
> >> If the number of words that should not be stemmed is not high you could
> >> use KeywordMarkerFilterFactory to flag those words as keywords and it
> >> should prevent stemmer from changing them.
> >> Depending on what you want to achieve, you might not be able to avoid
> >> using stemmer at indexing time. If you want to find documents that
> contain
> >> only “walking” with search term “walk”, then you have to stem at index
> >> time. Cases when you use stemming on query time only are rare and
> specific.
> >> If you want to prefer exact matches over stemmed matches, you have to
> >> index same content with and without stemming and boost matches on field
> >> without stemming.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We are currently using KStemFilterFactory in Solr, but we found that it
> >> is
> >>> actually doing stemming on non-English words like "ximenting", which it
> >>> stem to "ximent". This is not what we wanted.
> >>>
> >>> Another option is to use the HunspellStemFilterFactory, but there are
> >> some
> >>> English words like "running", walking" that are not being stemmed.
> >>>
> >>> Would like to check, is it advisable to use Stemming at index? Or we
> >> should
> >>> not use Stemming at index time, but at query time, do a search for the
> >>> stemmed words as well, like for example, if the user search for
> >> "walking",
> >>> we will do the search together with "walk", and the actual word of
> >> walking
> >>> will have higher weightage.
> >>>
> >>> I'm currently using Solr 6.5.1.
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>


Re: Advice on Stemming in Solr

2017-11-02 Thread Emir Arnautović
Hi Edwin,
It seems that it would be best if you do not apply *ing stemming rule at all. 
The first idea is to trick stemmer and replace any word that ends with ing to 
some nonexisting char combination e.g. ‘wqx’. You can use 
solr.PatternReplaceFilterFactory to do that. You can switch it back after 
stemming if want to have proper token in index.

HTH,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Emir,
> 
> We do have quite alot of words that should not be stemmed. Currently, the
> KStemFilterFactory are stemming all the non-English words that end with
> "ing" as well. There are quite alot of places and names which ends in
> "ing", and all these are being stemmed as well, which leads to an
> inaccurate search.
> 
> Regards,
> Edwin
> 
> 
> On 1 November 2017 at 18:20, Emir Arnautović 
> wrote:
> 
>> Hi Edwin,
>> If the number of words that should not be stemmed is not high you could
>> use KeywordMarkerFilterFactory to flag those words as keywords and it
>> should prevent stemmer from changing them.
>> Depending on what you want to achieve, you might not be able to avoid
>> using stemmer at indexing time. If you want to find documents that contain
>> only “walking” with search term “walk”, then you have to stem at index
>> time. Cases when you use stemming on query time only are rare and specific.
>> If you want to prefer exact matches over stemmed matches, you have to
>> index same content with and without stemming and boost matches on field
>> without stemming.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We are currently using KStemFilterFactory in Solr, but we found that it
>> is
>>> actually doing stemming on non-English words like "ximenting", which it
>>> stem to "ximent". This is not what we wanted.
>>> 
>>> Another option is to use the HunspellStemFilterFactory, but there are
>> some
>>> English words like "running", walking" that are not being stemmed.
>>> 
>>> Would like to check, is it advisable to use Stemming at index? Or we
>> should
>>> not use Stemming at index time, but at query time, do a search for the
>>> stemmed words as well, like for example, if the user search for
>> "walking",
>>> we will do the search together with "walk", and the actual word of
>> walking
>>> will have higher weightage.
>>> 
>>> I'm currently using Solr 6.5.1.
>>> 
>>> Regards,
>>> Edwin
>> 
>> 



Re: Advice on Stemming in Solr

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi Emir,

We do have quite alot of words that should not be stemmed. Currently, the
KStemFilterFactory are stemming all the non-English words that end with
"ing" as well. There are quite alot of places and names which ends in
"ing", and all these are being stemmed as well, which leads to an
inaccurate search.

Regards,
Edwin


On 1 November 2017 at 18:20, Emir Arnautović 
wrote:

> Hi Edwin,
> If the number of words that should not be stemmed is not high you could
> use KeywordMarkerFilterFactory to flag those words as keywords and it
> should prevent stemmer from changing them.
> Depending on what you want to achieve, you might not be able to avoid
> using stemmer at indexing time. If you want to find documents that contain
> only “walking” with search term “walk”, then you have to stem at index
> time. Cases when you use stemming on query time only are rare and specific.
> If you want to prefer exact matches over stemmed matches, you have to
> index same content with and without stemming and boost matches on field
> without stemming.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > We are currently using KStemFilterFactory in Solr, but we found that it
> is
> > actually doing stemming on non-English words like "ximenting", which it
> > stem to "ximent". This is not what we wanted.
> >
> > Another option is to use the HunspellStemFilterFactory, but there are
> some
> > English words like "running", walking" that are not being stemmed.
> >
> > Would like to check, is it advisable to use Stemming at index? Or we
> should
> > not use Stemming at index time, but at query time, do a search for the
> > stemmed words as well, like for example, if the user search for
> "walking",
> > we will do the search together with "walk", and the actual word of
> walking
> > will have higher weightage.
> >
> > I'm currently using Solr 6.5.1.
> >
> > Regards,
> > Edwin
>
>


Re: Advice on Stemming in Solr

2017-11-01 Thread Emir Arnautović
Hi Edwin,
If the number of words that should not be stemmed is not high you could use 
KeywordMarkerFilterFactory to flag those words as keywords and it should 
prevent stemmer from changing them.
Depending on what you want to achieve, you might not be able to avoid using 
stemmer at indexing time. If you want to find documents that contain only 
“walking” with search term “walk”, then you have to stem at index time. Cases 
when you use stemming on query time only are rare and specific.
If you want to prefer exact matches over stemmed matches, you have to index 
same content with and without stemming and boost matches on field without 
stemming.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> We are currently using KStemFilterFactory in Solr, but we found that it is
> actually doing stemming on non-English words like "ximenting", which it
> stem to "ximent". This is not what we wanted.
> 
> Another option is to use the HunspellStemFilterFactory, but there are some
> English words like "running", walking" that are not being stemmed.
> 
> Would like to check, is it advisable to use Stemming at index? Or we should
> not use Stemming at index time, but at query time, do a search for the
> stemmed words as well, like for example, if the user search for "walking",
> we will do the search together with "walk", and the actual word of walking
> will have higher weightage.
> 
> I'm currently using Solr 6.5.1.
> 
> Regards,
> Edwin



Advice on Stemming in Solr

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi,

We are currently using KStemFilterFactory in Solr, but we found that it is
actually doing stemming on non-English words like "ximenting", which it
stem to "ximent". This is not what we wanted.

Another option is to use the HunspellStemFilterFactory, but there are some
English words like "running", walking" that are not being stemmed.

Would like to check, is it advisable to use Stemming at index? Or we should
not use Stemming at index time, but at query time, do a search for the
stemmed words as well, like for example, if the user search for "walking",
we will do the search together with "walk", and the actual word of walking
will have higher weightage.

I'm currently using Solr 6.5.1.

Regards,
Edwin


Re: Stemming with SOLR

2016-12-18 Thread Lasitha Wattaladeniya
Thank you all for the replies.  I am considering the suggestions

On 17 Dec 2016 01:50, "Susheel Kumar"  wrote:

> To handle irregular nouns (
> http://www.ef.com/english-resources/english-grammar/
> singular-and-plural-nouns/),
> the simplest way is handle them using StemOverriderFactory.  The list is
> not so long. Or otherwise go for commercial solutions like basistech etc.
> as Alex suggested  oR you can customize Hunspell extensively to handle most
> of them.
>
> Thanks,
> Susheel
>
> On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch  >
> wrote:
>
> > If you need the full fidelity solution taking care of multiple
> > edge-cases, it could be worth looking at commercial solutions.
> >
> >
> > http://www.basistech.com/ has one, including a free-level SAAS plan.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 15 December 2016 at 21:28, Lasitha Wattaladeniya 
> > wrote:
> > > Hi all,
> > >
> > > Thanks for the replies,
> > >
> > > @eric, ahmet : since those stemmers are logical stemmers it won't work
> on
> > > words such as caught, ran and so on. So in our case it won't work
> > >
> > > @susheel : Yes I thought about it but problems we have is, the
> documents
> > we
> > > index are some what large text, so copy fielding these into duplicate
> > > fields will affect on the index time ( we have jobs to index data
> > > periodically) and query time. I wonder why there isn't a correct
> solution
> > > to this
> > >
> > > Regards,
> > > Lasitha
> > >
> > > Lasitha Wattaladeniya
> > > Software Engineer
> > >
> > > Mobile : +6593896893
> > > Blog : techreadme.blogspot.com
> > >
> > > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar  >
> > > wrote:
> > >
> > >> We did extensive comparison in the past for Snowball, KStem and
> Hunspell
> > >> and there are cases where one of them works better but not other or
> > >> vice-versa. You may utilise all three of them by having 3 different
> > fields
> > >> (fieldTypes) and during query, search in all of them.
> > >>
> > >> For some of the cases where none of them works (e.g wolves, wolf
> etc).,
> > use
> > >> StemOverriderFactory.
> > >>
> > >> HTH.
> > >>
> > >> Thanks,
> > >> Susheel
> > >>
> > >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan
> > 
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > KStemFilter returns legitimate English words, please use it.
> > >> >
> > >> > Ahmet
> > >> >
> > >> >
> > >> >
> > >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> > >> > watt...@gmail.com> wrote:
> > >> > Hello devs,
> > >> >
> > >> > I'm trying to develop this indexing and querying flow where it
> > converts
> > >> the
> > >> > words to its original form (lemmatization). I was doing bit of
> > research
> > >> > lately but the information on the internet is very limited. I tried
> > using
> > >> > hunspellfactory but it doesn't convert the word to it's original
> form,
> > >> > instead it gives suggestions for some words (hunspell works for some
> > >> > english words correctly but for some it gives multiple suggestions
> or
> > no
> > >> > suggestions, i used the en_us.dic provided by openoffice)
> > >> >
> > >> > I know this is a generic problem in searching, so is there anyone
> who
> > can
> > >> > point me to correct direction or some information :)
> > >> >
> > >> > Best regards,
> > >> > Lasitha Wattaladeniya
> > >> > Software Engineer
> > >> >
> > >> > Mobile : +6593896893
> > >> > Blog : techreadme.blogspot.com
> > >> >
> > >>
> >
>


Re: Stemming with SOLR

2016-12-16 Thread Susheel Kumar
To handle irregular nouns (
http://www.ef.com/english-resources/english-grammar/singular-and-plural-nouns/),
the simplest way is handle them using StemOverriderFactory.  The list is
not so long. Or otherwise go for commercial solutions like basistech etc.
as Alex suggested  oR you can customize Hunspell extensively to handle most
of them.

Thanks,
Susheel

On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch 
wrote:

> If you need the full fidelity solution taking care of multiple
> edge-cases, it could be worth looking at commercial solutions.
>
>
> http://www.basistech.com/ has one, including a free-level SAAS plan.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 15 December 2016 at 21:28, Lasitha Wattaladeniya 
> wrote:
> > Hi all,
> >
> > Thanks for the replies,
> >
> > @eric, ahmet : since those stemmers are logical stemmers it won't work on
> > words such as caught, ran and so on. So in our case it won't work
> >
> > @susheel : Yes I thought about it but problems we have is, the documents
> we
> > index are some what large text, so copy fielding these into duplicate
> > fields will affect on the index time ( we have jobs to index data
> > periodically) and query time. I wonder why there isn't a correct solution
> > to this
> >
> > Regards,
> > Lasitha
> >
> > Lasitha Wattaladeniya
> > Software Engineer
> >
> > Mobile : +6593896893
> > Blog : techreadme.blogspot.com
> >
> > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar 
> > wrote:
> >
> >> We did extensive comparison in the past for Snowball, KStem and Hunspell
> >> and there are cases where one of them works better but not other or
> >> vice-versa. You may utilise all three of them by having 3 different
> fields
> >> (fieldTypes) and during query, search in all of them.
> >>
> >> For some of the cases where none of them works (e.g wolves, wolf etc).,
> use
> >> StemOverriderFactory.
> >>
> >> HTH.
> >>
> >> Thanks,
> >> Susheel
> >>
> >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan
> 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > KStemFilter returns legitimate English words, please use it.
> >> >
> >> > Ahmet
> >> >
> >> >
> >> >
> >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> >> > watt...@gmail.com> wrote:
> >> > Hello devs,
> >> >
> >> > I'm trying to develop this indexing and querying flow where it
> converts
> >> the
> >> > words to its original form (lemmatization). I was doing bit of
> research
> >> > lately but the information on the internet is very limited. I tried
> using
> >> > hunspellfactory but it doesn't convert the word to it's original form,
> >> > instead it gives suggestions for some words (hunspell works for some
> >> > english words correctly but for some it gives multiple suggestions or
> no
> >> > suggestions, i used the en_us.dic provided by openoffice)
> >> >
> >> > I know this is a generic problem in searching, so is there anyone who
> can
> >> > point me to correct direction or some information :)
> >> >
> >> > Best regards,
> >> > Lasitha Wattaladeniya
> >> > Software Engineer
> >> >
> >> > Mobile : +6593896893
> >> > Blog : techreadme.blogspot.com
> >> >
> >>
>


Re: Stemming with SOLR

2016-12-15 Thread Alexandre Rafalovitch
If you need the full fidelity solution taking care of multiple
edge-cases, it could be worth looking at commercial solutions.


http://www.basistech.com/ has one, including a free-level SAAS plan.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 15 December 2016 at 21:28, Lasitha Wattaladeniya  wrote:
> Hi all,
>
> Thanks for the replies,
>
> @eric, ahmet : since those stemmers are logical stemmers it won't work on
> words such as caught, ran and so on. So in our case it won't work
>
> @susheel : Yes I thought about it but problems we have is, the documents we
> index are some what large text, so copy fielding these into duplicate
> fields will affect on the index time ( we have jobs to index data
> periodically) and query time. I wonder why there isn't a correct solution
> to this
>
> Regards,
> Lasitha
>
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>
> On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar 
> wrote:
>
>> We did extensive comparison in the past for Snowball, KStem and Hunspell
>> and there are cases where one of them works better but not other or
>> vice-versa. You may utilise all three of them by having 3 different fields
>> (fieldTypes) and during query, search in all of them.
>>
>> For some of the cases where none of them works (e.g wolves, wolf etc)., use
>> StemOverriderFactory.
>>
>> HTH.
>>
>> Thanks,
>> Susheel
>>
>> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan 
>> wrote:
>>
>> > Hi,
>> >
>> > KStemFilter returns legitimate English words, please use it.
>> >
>> > Ahmet
>> >
>> >
>> >
>> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
>> > watt...@gmail.com> wrote:
>> > Hello devs,
>> >
>> > I'm trying to develop this indexing and querying flow where it converts
>> the
>> > words to its original form (lemmatization). I was doing bit of research
>> > lately but the information on the internet is very limited. I tried using
>> > hunspellfactory but it doesn't convert the word to it's original form,
>> > instead it gives suggestions for some words (hunspell works for some
>> > english words correctly but for some it gives multiple suggestions or no
>> > suggestions, i used the en_us.dic provided by openoffice)
>> >
>> > I know this is a generic problem in searching, so is there anyone who can
>> > point me to correct direction or some information :)
>> >
>> > Best regards,
>> > Lasitha Wattaladeniya
>> > Software Engineer
>> >
>> > Mobile : +6593896893
>> > Blog : techreadme.blogspot.com
>> >
>>


Re: Stemming with SOLR

2016-12-15 Thread Lasitha Wattaladeniya
Hi all,

Thanks for the replies,

@eric, ahmet : since those stemmers are logical stemmers it won't work on
words such as caught, ran and so on. So in our case it won't work

@susheel : Yes I thought about it but problems we have is, the documents we
index are some what large text, so copy fielding these into duplicate
fields will affect on the index time ( we have jobs to index data
periodically) and query time. I wonder why there isn't a correct solution
to this

Regards,
Lasitha

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com

On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar 
wrote:

> We did extensive comparison in the past for Snowball, KStem and Hunspell
> and there are cases where one of them works better but not other or
> vice-versa. You may utilise all three of them by having 3 different fields
> (fieldTypes) and during query, search in all of them.
>
> For some of the cases where none of them works (e.g wolves, wolf etc)., use
> StemOverriderFactory.
>
> HTH.
>
> Thanks,
> Susheel
>
> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > KStemFilter returns legitimate English words, please use it.
> >
> > Ahmet
> >
> >
> >
> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> > watt...@gmail.com> wrote:
> > Hello devs,
> >
> > I'm trying to develop this indexing and querying flow where it converts
> the
> > words to its original form (lemmatization). I was doing bit of research
> > lately but the information on the internet is very limited. I tried using
> > hunspellfactory but it doesn't convert the word to it's original form,
> > instead it gives suggestions for some words (hunspell works for some
> > english words correctly but for some it gives multiple suggestions or no
> > suggestions, i used the en_us.dic provided by openoffice)
> >
> > I know this is a generic problem in searching, so is there anyone who can
> > point me to correct direction or some information :)
> >
> > Best regards,
> > Lasitha Wattaladeniya
> > Software Engineer
> >
> > Mobile : +6593896893
> > Blog : techreadme.blogspot.com
> >
>


Re: Stemming with SOLR

2016-12-15 Thread Susheel Kumar
We did extensive comparison in the past for Snowball, KStem and Hunspell
and there are cases where one of them works better but not other or
vice-versa. You may utilise all three of them by having 3 different fields
(fieldTypes) and during query, search in all of them.

For some of the cases where none of them works (e.g wolves, wolf etc)., use
StemOverriderFactory.

HTH.

Thanks,
Susheel

On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan 
wrote:

> Hi,
>
> KStemFilter returns legitimate English words, please use it.
>
> Ahmet
>
>
>
> On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> watt...@gmail.com> wrote:
> Hello devs,
>
> I'm trying to develop this indexing and querying flow where it converts the
> words to its original form (lemmatization). I was doing bit of research
> lately but the information on the internet is very limited. I tried using
> hunspellfactory but it doesn't convert the word to it's original form,
> instead it gives suggestions for some words (hunspell works for some
> english words correctly but for some it gives multiple suggestions or no
> suggestions, i used the en_us.dic provided by openoffice)
>
> I know this is a generic problem in searching, so is there anyone who can
> point me to correct direction or some information :)
>
> Best regards,
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>


Re: Stemming with SOLR

2016-12-15 Thread Ahmet Arslan
Hi,

KStemFilter returns legitimate English words, please use it.

Ahmet



On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya 
 wrote:
Hello devs,

I'm trying to develop this indexing and querying flow where it converts the
words to its original form (lemmatization). I was doing bit of research
lately but the information on the internet is very limited. I tried using
hunspellfactory but it doesn't convert the word to it's original form,
instead it gives suggestions for some words (hunspell works for some
english words correctly but for some it gives multiple suggestions or no
suggestions, i used the en_us.dic provided by openoffice)

I know this is a generic problem in searching, so is there anyone who can
point me to correct direction or some information :)

Best regards,
Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com


Re: Stemming with SOLR

2016-12-15 Thread Erick Erickson
What about things like PorterStemFilterFactory,
EnglishMinimalStemFilterFactory and the like?

Best,
Erick

On Thu, Dec 15, 2016 at 7:16 AM, Lasitha Wattaladeniya
 wrote:
> Hello devs,
>
> I'm trying to develop this indexing and querying flow where it converts the
> words to its original form (lemmatization). I was doing bit of research
> lately but the information on the internet is very limited. I tried using
> hunspellfactory but it doesn't convert the word to it's original form,
> instead it gives suggestions for some words (hunspell works for some
> english words correctly but for some it gives multiple suggestions or no
> suggestions, i used the en_us.dic provided by openoffice)
>
> I know this is a generic problem in searching, so is there anyone who can
> point me to correct direction or some information :)
>
> Best regards,
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com


Stemming with SOLR

2016-12-15 Thread Lasitha Wattaladeniya
Hello devs,

I'm trying to develop this indexing and querying flow where it converts the
words to its original form (lemmatization). I was doing bit of research
lately but the information on the internet is very limited. I tried using
hunspellfactory but it doesn't convert the word to it's original form,
instead it gives suggestions for some words (hunspell works for some
english words correctly but for some it gives multiple suggestions or no
suggestions, i used the en_us.dic provided by openoffice)

I know this is a generic problem in searching, so is there anyone who can
point me to correct direction or some information :)

Best regards,
Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com


Re: Stemming in Solr

2009-03-20 Thread Chris Hostetter


: Can someone please let me know how to implement stemming in solr. I am
: particularly looking of the changes, I might need to do in the config files
: and also if I need to use some already supplied libraries/factories etc etc.

i would start by searching the wiki and email archives for stemming...

http://wiki.apache.org/solr/?action=fullsearchcontext=180value=stemmingfullsearch=Text


-Hoss



Stemming in Solr

2009-03-13 Thread dabboo

Hi,

Can someone please let me know how to implement stemming in solr. I am
particularly looking of the changes, I might need to do in the config files
and also if I need to use some already supplied libraries/factories etc etc.

It would be a great help.

Thanks,
Amit Garg

-- 
View this message in context: 
http://www.nabble.com/Stemming-in-Solr-tp22495961p22495961.html
Sent from the Solr - User mailing list archive at Nabble.com.



Stemming in Solr

2009-03-04 Thread dabboo

Hi, 

I am trying to implement stemming in solr. If user searches for walk then
all the records which have walk, walking, walks, walked etc should display.

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Stemming-in-Solr-tp22328850p22328850.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Stemming in Solr

2009-03-04 Thread Lukáš Vlček
Hi,
did you check Snowball stemmers (http://snowball.tartarus.org/)?

Regards,
Lukas

On Wed, Mar 4, 2009 at 1:12 PM, dabboo ag...@sapient.com wrote:


 Hi,

 I am trying to implement stemming in solr. If user searches for walk then
 all the records which have walk, walking, walks, walked etc should display.

 Please suggest.

 Thanks,
 Amit Garg
 --
 View this message in context:
 http://www.nabble.com/Stemming-in-Solr-tp22328850p22328850.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
http://blog.lukas-vlcek.com/


Re: Stemming in Solr

2009-03-04 Thread Lukáš Vlček
May be you can also check
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersStemming is
discussed there...

On Wed, Mar 4, 2009 at 1:18 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hi,
 did you check Snowball stemmers (http://snowball.tartarus.org/)?

 Regards,
 Lukas


 On Wed, Mar 4, 2009 at 1:12 PM, dabboo ag...@sapient.com wrote:


 Hi,

 I am trying to implement stemming in solr. If user searches for walk then
 all the records which have walk, walking, walks, walked etc should
 display.

 Please suggest.

 Thanks,
 Amit Garg
 --
 View this message in context:
 http://www.nabble.com/Stemming-in-Solr-tp22328850p22328850.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 http://blog.lukas-vlcek.com/




-- 
http://blog.lukas-vlcek.com/