You, or any other solr member, knows a good fuzzy string matching library to recommend?
On Thu, May 19, 2011 at 9:39 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Well.... the good news is FuzzyQuery is indeed much faster in Lucene/Solr > 4.0. > > But the bad news is... FuzzyQuery won't do what you need here. You > need some sort of FuzzyPhraseQuery, which is able to replace terms > similar to one another (comp/company/corporation) by some metric. I > don't know of such a query in Lucene/Solr... but it'd be a nice > addition. Others have asked about this before. > > FuzzyQuery finds terms "close" to other terms, when measured by edit > distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each > other. > > Mike > > http://blog.mikemccandless.com > > On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi <grad...@gmail.com> > wrote: > > Hi, > > > > I want to do a fuzzy search that compare a phrase to a field in solr. For > > example: > > > > "abc company ltda" will be compared to "abc comp", "abc corporation", > "def > > company ltda", "nothing to match here". > > > > The thing is the it has to always returns documents sorted by its score. > > > > I've found some good algorithms to do that, like StrikeAMatch[1] and > > JaroWinkler. > > > > Using the JaroWinkler with strdist() I can do exactly that. But, I rather > > prefer to use the StrikeAMatch that had a patch in the lucene jira that > was > > never commited. > > > > So, I contacted the author of that patch and he told me that I should use > > the solr 4.0 that it has now some pretty good new fuzzy search > enhancements > > that made StrikeAMatch seems toys for kids. > > > > Anyone know how can I achieve that using solr 4.0? > > > > [1] http://www.catalysoft.com/articles/StrikeAMatch.html > > >