Re: edge ngram/find as you type sorting

matthew sporleder Wed, 25 Mar 2020 06:50:33 -0700

Where I landed:

  <fieldType name="string_ci" class="solr.TextField"
sortMissingLast="true" omitNorms="false">
     <analyzer>
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory" />
     </analyzer>
  </fieldType>


<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="25" />
   <tokenizer class="solr.KeywordTokenizerFactory"/>
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>


  <field name="slug" type="string_ci" indexed="true" stored="true"
multiValued="false" />
  <field name="fayt" type="edgytext" indexed="true" stored="false"
omitNorms="false" omitTermFreqAndPositions="false" multiValued="true"
/>
  <field name="qt_len" type="int" indexed="true" stored="true"
multiValued="false" />

---

I can then do a search for

q=fayt:my_article_slu&sort=qt_len asc

to get the shortest/most exact find-as-you-type match.  I couldn't get
around all results having the same score (can I boost proximity to the
end of a string?) in the edge ngram search but I am hoping this is the
fastest way to do this type of search since I can avoid wildcards
"my_article_slu*" and stuff.

More suggestions welcome and thanks for the help.  I will re-index
with omitNorms=true again to see if I can save a little space.





On Tue, Mar 24, 2020 at 11:39 AM matthew sporleder <msporle...@gmail.com> wrote:
>
> Okay I appreciate you responding.
>
> Switching "slug" from "string_ci" class="solr.StrField" accomplished
> about the same results, which makes sense to me now :)
>
> The previous definition of string_ci was:
>   <fieldType name="string_ci" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>      <analyzer>
>           <tokenizer class="solr.KeywordTokenizerFactory"/>
>           <filter class="solr.LowerCaseFilterFactory" />
>      </analyzer>
>   </fieldType>
>
> So lowercase + KeywordTokenizerFactory;
>
> I am trying again with omitNorms=false  to see if I can get the more
> "exact" matches to score better this time around.
>
>
> On Tue, Mar 24, 2020 at 9:54 AM Erick Erickson <erickerick...@gmail.com> 
> wrote:
> >
> > Won’t work. String types are totally unanalyzed. Your string_ci fieldType 
> > is what I was looking for.
> >
> > No, you shouldn’t kill the lowercasefilter unless you want all of your 
> > searches will then be case-sensitive.
> >
> > So you should try:
> >
> > q=edgy_text:whatever&sort=string_ci asc
> >
> > Please use the admin>>pick_core>>analysis page when thinking about changing 
> > your schema, it’ll answer a _lot_ of these questions immediately.
> >
> > Best,
> > Erick
> >
> > > On Mar 24, 2020, at 8:37 AM, matthew sporleder <msporle...@gmail.com> 
> > > wrote:
> > >
> > > Oh maybe a schema bug!
> > >
> > > my string_ci:
> > > <fieldType name="string_ci" class="solr.TextField"
> > > sortMissingLast="true" omitNorms="true">
> > >     <analyzer>
> > >          <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >          <filter class="solr.LowerCaseFilterFactory" />
> > >     </analyzer>
> > >  </fieldType>
> > >
> > > going to try this instead:
> > >  <fieldType name="string_lctoken" class="solr.StrField"
> > > sortMissingLast="true" omitNorms="true">
> > >     <analyzer>
> > >          <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >          <filter class="solr.LowerCaseFilterFactory" />
> > >     </analyzer>
> > >  </fieldType>
> > >
> > > Then I can probably kill the lowercasefilter on edgeytext:
> > >
> > >
> > >
> > > On Tue, Mar 24, 2020 at 7:44 AM Erick Erickson <erickerick...@gmail.com> 
> > > wrote:
> > >>
> > >> Sort by the full field. You’ll need to copy to a field with 
> > >> keywordTokenizer and lowercaseFilter (string_ci? assuming it’s not 
> > >> really a :”string”) type.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Mar 24, 2020, at 7:10 AM, matthew sporleder <msporle...@gmail.com> 
> > >>> wrote:
> > >>>
> > >>> I have added an edge ngram field to my index and get decent results
> > >>> with partial words but the results appear randomly sorted and all
> > >>> contain the same score.  Ideally I would like to sort by shortest
> > >>> ngram match within my other qualifiers.
> > >>>
> > >>> Is there a canonical solution to this?
> > >>>
> > >>> Thanks,
> > >>> Matt
> > >>>
> > >>> p.s. I mostly followed
> > >>> https://lucidworks.com/post/auto-suggest-from-popular-queries-using-edgengrams/
> > >>>
> > >>> schema bits:
> > >>>
> > >>> <fieldType name="edgytext" class="solr.TextField" 
> > >>> positionIncrementGap="100">
> > >>> <analyzer type="index">
> > >>>  <tokenizer class="solr.KeywordTokenizerFactory"/>
> > >>>  <filter class="solr.LowerCaseFilterFactory"/>
> > >>>  <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > >>> maxGramSize="25" />
> > >>> </analyzer>
> > >>>
> > >>> <field name="slug" type="string_ci" indexed="true" stored="true"
> > >>> multiValued="false" />
> > >>>
> > >>> <field name="fayt" type="edgytext" indexed="true" stored="false"
> > >>> omitNorms="false" omitTermFreqAndPositions="true" multiValued="true"
> > >>> />
> > >>>
> > >>>
> > >>> <copyField source="slug" dest="fayt" maxChars="65" />
> > >>
> >

Re: edge ngram/find as you type sorting

Reply via email to