Thank you all for the replies. I am considering the suggestions On 17 Dec 2016 01:50, "Susheel Kumar" <[email protected]> wrote:
> To handle irregular nouns ( > http://www.ef.com/english-resources/english-grammar/ > singular-and-plural-nouns/), > the simplest way is handle them using StemOverriderFactory. The list is > not so long. Or otherwise go for commercial solutions like basistech etc. > as Alex suggested oR you can customize Hunspell extensively to handle most > of them. > > Thanks, > Susheel > > On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch <[email protected] > > > wrote: > > > If you need the full fidelity solution taking care of multiple > > edge-cases, it could be worth looking at commercial solutions. > > > > > > http://www.basistech.com/ has one, including a free-level SAAS plan. > > > > Regards, > > Alex. > > ---- > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > On 15 December 2016 at 21:28, Lasitha Wattaladeniya <[email protected]> > > wrote: > > > Hi all, > > > > > > Thanks for the replies, > > > > > > @eric, ahmet : since those stemmers are logical stemmers it won't work > on > > > words such as caught, ran and so on. So in our case it won't work > > > > > > @susheel : Yes I thought about it but problems we have is, the > documents > > we > > > index are some what large text, so copy fielding these into duplicate > > > fields will affect on the index time ( we have jobs to index data > > > periodically) and query time. I wonder why there isn't a correct > solution > > > to this > > > > > > Regards, > > > Lasitha > > > > > > Lasitha Wattaladeniya > > > Software Engineer > > > > > > Mobile : +6593896893 > > > Blog : techreadme.blogspot.com > > > > > > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar <[email protected] > > > > > wrote: > > > > > >> We did extensive comparison in the past for Snowball, KStem and > Hunspell > > >> and there are cases where one of them works better but not other or > > >> vice-versa. You may utilise all three of them by having 3 different > > fields > > >> (fieldTypes) and during query, search in all of them. > > >> > > >> For some of the cases where none of them works (e.g wolves, wolf > etc)., > > use > > >> StemOverriderFactory. > > >> > > >> HTH. > > >> > > >> Thanks, > > >> Susheel > > >> > > >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan > > <[email protected]> > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > KStemFilter returns legitimate English words, please use it. > > >> > > > >> > Ahmet > > >> > > > >> > > > >> > > > >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya < > > >> > [email protected]> wrote: > > >> > Hello devs, > > >> > > > >> > I'm trying to develop this indexing and querying flow where it > > converts > > >> the > > >> > words to its original form (lemmatization). I was doing bit of > > research > > >> > lately but the information on the internet is very limited. I tried > > using > > >> > hunspellfactory but it doesn't convert the word to it's original > form, > > >> > instead it gives suggestions for some words (hunspell works for some > > >> > english words correctly but for some it gives multiple suggestions > or > > no > > >> > suggestions, i used the en_us.dic provided by openoffice) > > >> > > > >> > I know this is a generic problem in searching, so is there anyone > who > > can > > >> > point me to correct direction or some information :) > > >> > > > >> > Best regards, > > >> > Lasitha Wattaladeniya > > >> > Software Engineer > > >> > > > >> > Mobile : +6593896893 > > >> > Blog : techreadme.blogspot.com > > >> > > > >> > > >
