Well.... the good news is FuzzyQuery is indeed much faster in Lucene/Solr 4.0.

But the bad news is... FuzzyQuery won't do what you need here.  You
need some sort of FuzzyPhraseQuery, which is able to replace terms
similar to one another (comp/company/corporation) by some metric.  I
don't know of such a query in Lucene/Solr... but it'd be a nice
addition.  Others have asked about this before.

FuzzyQuery finds terms "close" to other terms, when measured by edit
distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each
other.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi <grad...@gmail.com> wrote:
> Hi,
>
> I want to do a fuzzy search that compare a phrase to a field in solr. For
> example:
>
> "abc company ltda" will be compared to "abc comp", "abc corporation", "def
> company ltda", "nothing to match here".
>
> The thing is the it has to always returns documents sorted by its score.
>
> I've found some good algorithms to do that, like StrikeAMatch[1] and
> JaroWinkler.
>
> Using the JaroWinkler with strdist() I can do exactly that. But, I rather
> prefer to use the StrikeAMatch that had a patch in the lucene jira that was
> never commited.
>
> So, I contacted the author of that patch and he told me that I should use
> the solr 4.0 that it has now some pretty good new fuzzy search enhancements
> that made StrikeAMatch seems toys for kids.
>
> Anyone know how can I achieve that using solr 4.0?
>
> [1] http://www.catalysoft.com/articles/StrikeAMatch.html
>

Reply via email to