Where I landed: <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="false"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType>
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" /> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="slug" type="string_ci" indexed="true" stored="true" multiValued="false" /> <field name="fayt" type="edgytext" indexed="true" stored="false" omitNorms="false" omitTermFreqAndPositions="false" multiValued="true" /> <field name="qt_len" type="int" indexed="true" stored="true" multiValued="false" /> --- I can then do a search for q=fayt:my_article_slu&sort=qt_len asc to get the shortest/most exact find-as-you-type match. I couldn't get around all results having the same score (can I boost proximity to the end of a string?) in the edge ngram search but I am hoping this is the fastest way to do this type of search since I can avoid wildcards "my_article_slu*" and stuff. More suggestions welcome and thanks for the help. I will re-index with omitNorms=true again to see if I can save a little space. On Tue, Mar 24, 2020 at 11:39 AM matthew sporleder <msporle...@gmail.com> wrote: > > Okay I appreciate you responding. > > Switching "slug" from "string_ci" class="solr.StrField" accomplished > about the same results, which makes sense to me now :) > > The previous definition of string_ci was: > <fieldType name="string_ci" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > > So lowercase + KeywordTokenizerFactory; > > I am trying again with omitNorms=false to see if I can get the more > "exact" matches to score better this time around. > > > On Tue, Mar 24, 2020 at 9:54 AM Erick Erickson <erickerick...@gmail.com> > wrote: > > > > Won’t work. String types are totally unanalyzed. Your string_ci fieldType > > is what I was looking for. > > > > No, you shouldn’t kill the lowercasefilter unless you want all of your > > searches will then be case-sensitive. > > > > So you should try: > > > > q=edgy_text:whatever&sort=string_ci asc > > > > Please use the admin>>pick_core>>analysis page when thinking about changing > > your schema, it’ll answer a _lot_ of these questions immediately. > > > > Best, > > Erick > > > > > On Mar 24, 2020, at 8:37 AM, matthew sporleder <msporle...@gmail.com> > > > wrote: > > > > > > Oh maybe a schema bug! > > > > > > my string_ci: > > > <fieldType name="string_ci" class="solr.TextField" > > > sortMissingLast="true" omitNorms="true"> > > > <analyzer> > > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory" /> > > > </analyzer> > > > </fieldType> > > > > > > going to try this instead: > > > <fieldType name="string_lctoken" class="solr.StrField" > > > sortMissingLast="true" omitNorms="true"> > > > <analyzer> > > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory" /> > > > </analyzer> > > > </fieldType> > > > > > > Then I can probably kill the lowercasefilter on edgeytext: > > > > > > > > > > > > On Tue, Mar 24, 2020 at 7:44 AM Erick Erickson <erickerick...@gmail.com> > > > wrote: > > >> > > >> Sort by the full field. You’ll need to copy to a field with > > >> keywordTokenizer and lowercaseFilter (string_ci? assuming it’s not > > >> really a :”string”) type. > > >> > > >> Best, > > >> Erick > > >> > > >>> On Mar 24, 2020, at 7:10 AM, matthew sporleder <msporle...@gmail.com> > > >>> wrote: > > >>> > > >>> I have added an edge ngram field to my index and get decent results > > >>> with partial words but the results appear randomly sorted and all > > >>> contain the same score. Ideally I would like to sort by shortest > > >>> ngram match within my other qualifiers. > > >>> > > >>> Is there a canonical solution to this? > > >>> > > >>> Thanks, > > >>> Matt > > >>> > > >>> p.s. I mostly followed > > >>> https://lucidworks.com/post/auto-suggest-from-popular-queries-using-edgengrams/ > > >>> > > >>> schema bits: > > >>> > > >>> <fieldType name="edgytext" class="solr.TextField" > > >>> positionIncrementGap="100"> > > >>> <analyzer type="index"> > > >>> <tokenizer class="solr.KeywordTokenizerFactory"/> > > >>> <filter class="solr.LowerCaseFilterFactory"/> > > >>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" > > >>> maxGramSize="25" /> > > >>> </analyzer> > > >>> > > >>> <field name="slug" type="string_ci" indexed="true" stored="true" > > >>> multiValued="false" /> > > >>> > > >>> <field name="fayt" type="edgytext" indexed="true" stored="false" > > >>> omitNorms="false" omitTermFreqAndPositions="true" multiValued="true" > > >>> /> > > >>> > > >>> > > >>> <copyField source="slug" dest="fayt" maxChars="65" /> > > >> > >