Hi Emir, We are looking at the configuration, to try to adjust the rules to suit our use case.
Regards, Edwin On 3 November 2017 at 16:24, Emir Arnautović <emir.arnauto...@sematext.com> wrote: > Hi Edwin, > Hunspell is configurable, language independent library and you can define > any morphology rules. It’s beed there for a while and I would not be > surprised if someone already adjusted english rules to suite you case. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 3 Nov 2017, at 04:25, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > > > Hi Emir, > > > > We are looking to change to HunspellStemFilterFactory. This has a > > dictionary file containing words and applicable flags, and an affix file > > that specifies how these flags will control spell checking. > > Probably we can control it from those files in HunspellStemFilterFactory? > > > > Regards, > > Edwin > > > > > > On 2 November 2017 at 17:46, Emir Arnautović < > emir.arnauto...@sematext.com> > > wrote: > > > >> Hi Edwin, > >> It seems that it would be best if you do not apply *ing stemming rule at > >> all. The first idea is to trick stemmer and replace any word that ends > with > >> ing to some nonexisting char combination e.g. ‘wqx’. You can use solr. > PatternReplaceFilterFactory > >> to do that. You can switch it back after stemming if want to have proper > >> token in index. > >> > >> HTH, > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > >> wrote: > >>> > >>> Hi Emir, > >>> > >>> We do have quite alot of words that should not be stemmed. Currently, > the > >>> KStemFilterFactory are stemming all the non-English words that end with > >>> "ing" as well. There are quite alot of places and names which ends in > >>> "ing", and all these are being stemmed as well, which leads to an > >>> inaccurate search. > >>> > >>> Regards, > >>> Edwin > >>> > >>> > >>> On 1 November 2017 at 18:20, Emir Arnautović < > >> emir.arnauto...@sematext.com> > >>> wrote: > >>> > >>>> Hi Edwin, > >>>> If the number of words that should not be stemmed is not high you > could > >>>> use KeywordMarkerFilterFactory to flag those words as keywords and it > >>>> should prevent stemmer from changing them. > >>>> Depending on what you want to achieve, you might not be able to avoid > >>>> using stemmer at indexing time. If you want to find documents that > >> contain > >>>> only “walking” with search term “walk”, then you have to stem at index > >>>> time. Cases when you use stemming on query time only are rare and > >> specific. > >>>> If you want to prefer exact matches over stemmed matches, you have to > >>>> index same content with and without stemming and boost matches on > field > >>>> without stemming. > >>>> > >>>> HTH, > >>>> Emir > >>>> -- > >>>> Monitoring - Log Management - Alerting - Anomaly Detection > >>>> Solr & Elasticsearch Consulting Support Training - > http://sematext.com/ > >>>> > >>>> > >>>> > >>>>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > >>>> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> We are currently using KStemFilterFactory in Solr, but we found that > it > >>>> is > >>>>> actually doing stemming on non-English words like "ximenting", which > it > >>>>> stem to "ximent". This is not what we wanted. > >>>>> > >>>>> Another option is to use the HunspellStemFilterFactory, but there are > >>>> some > >>>>> English words like "running", walking" that are not being stemmed. > >>>>> > >>>>> Would like to check, is it advisable to use Stemming at index? Or we > >>>> should > >>>>> not use Stemming at index time, but at query time, do a search for > the > >>>>> stemmed words as well, like for example, if the user search for > >>>> "walking", > >>>>> we will do the search together with "walk", and the actual word of > >>>> walking > >>>>> will have higher weightage. > >>>>> > >>>>> I'm currently using Solr 6.5.1. > >>>>> > >>>>> Regards, > >>>>> Edwin > >>>> > >>>> > >> > >> > >