Re: EdgeNGram relevancy

Andy Thu, 11 Nov 2010 12:42:05 -0800

Could anyone help me understand what does "Clyde Phillips" appear in the 
results for "Bill Cl"??


"Clyde Phillips" doesn't produce any EdgeNGram that would match "Bill Cl", so 
why is it even in the results?

Thanks.

--- On Thu, 11/11/10, Ahmet Arslan <iori...@yahoo.com> wrote:

> You can add an additional field, with
> using KeywordTokenizerFactory instead of
> WhitespaceTokenizerFactory. And query both these fields with
> an OR operator. 
> 
> edgytext:(Bill Cl) OR edgytext2:"Bill Cl"
> 
> You can even apply boost so that begins with matches comes
> first.
> 
> --- On Thu, 11/11/10, Robert Gründler <rob...@dubture.com>
> wrote:
> 
> > From: Robert Gründler <rob...@dubture.com>
> > Subject: EdgeNGram relevancy
> > To: solr-user@lucene.apache.org
> > Date: Thursday, November 11, 2010, 5:51 PM
> > Hi,
> > 
> > consider the following fieldtype (used for
> > autocompletion):
> > 
> >   <fieldType name="edgytext"
> class="solr.TextField"
> > positionIncrementGap="100">
> >    <analyzer type="index">
> >      <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >      <filter
> > class="solr.LowerCaseFilterFactory"/>
> >      <filter
> > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"
> > />     
> >          <filter
> > class="solr.PatternReplaceFilterFactory"
> pattern="([^a-z])"
> > replacement="" replace="all" />
> >      <filter
> > class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > maxGramSize="25" />
> >    </analyzer>
> >    <analyzer type="query">
> >      <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >      <filter
> > class="solr.LowerCaseFilterFactory"/>
> >      <filter
> > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"
> />
> >          <filter
> > class="solr.PatternReplaceFilterFactory"
> pattern="([^a-z])"
> > replacement="" replace="all" />
> >    </analyzer>
> >   </fieldType>
> > 
> > 
> > This works fine as long as the query string is a
> single
> > word. For multiple words, the ranking is weird
> though.
> > 
> > Example:
> > 
> > Query String: "Bill Cl"
> > 
> > Result (in that order):
> > 
> > - Clyde Phillips
> > - Clay Rogers
> > - Roger Cloud
> > - Bill Clinton
> > 
> > "Bill Clinton" should have the highest rank in that
> > case.  
> > 
> > Has anyone an idea how to to configure this fieldtype
> to
> > make matches in both tokens rank higher than those who
> match
> > in either token?
> > 
> > 
> > thanks!
> > 
> > 
> > -robert
> > 
> > 
> > 
> > 
> 
> 
> 
>

Re: EdgeNGram relevancy

Reply via email to