Re: Suggester not suggesting anything using DictionaryCompoundWordTokenFilterFactory

Thomas Michael Engelke Tue, 11 Nov 2014 03:55:58 -0800

 I think I found the problem. The definition of the suggester component
has a "field" option which references the field that the suggester uses
to generate suggestions. Changing this to the field using the
DictionaryCompundWordTokenFilterFactory also suggests word parts.


Am 11.11.2014 08:52 schrieb Thomas Michael Engelke: 

> I'm toying around with the suggester component, like described here: 
> http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx
>  [1]
> 
> So I made 4 fields:
> 
> <field name="text_suggest" type="text_suggest" indexed="true" stored="true" 
> multiValued="true" />
> <copyField source="name" dest="text_suggest" />
> <field name="text_suggest_edge" type="text_suggest_edge" indexed="true" 
> stored="true" multiValued="true" />
> <copyField source="name" dest="text_suggest_edge" />
> <field name="text_suggest_ngram" type="text_suggest_ngram" indexed="true" 
> stored="true" multiValued="true" />
> <copyField source="name" dest="text_suggest_ngram" />
> <field name="text_suggest_dictionary_ngram" 
> type="text_suggest_dictionary_ngram" indexed="true" stored="true" 
> multiValued="true" />
> <copyField source="name" dest="text_suggest_dictionary_ngram" />
> 
> with the corresponding definitions:
> 
> <fieldType name="text_suggest" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.KeywordTokenizerFactory" />
> <filter class="solr.LowerCaseFilterFactory" />
> </analyzer>
> </fieldType>
> <fieldType name="text_suggest_edge" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.KeywordTokenizerFactory" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" 
> side="front" />
> </analyzer>
> </fieldType>
> <fieldType name="text_suggest_ngram" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" 
> side="front" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
> <fieldType name="text_suggest_dictionary_ngram" class="solr.TextField">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" 
> dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" 
> maxSubwordSize="30" onlyLongestMatch="false"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" 
> side="front" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory" />
> </analyzer>
> </fieldType>
> 
> I'm calling the suggester component this way:
> 
> http://<address>:8983/solr/<core>/suggest?qf="text_suggest^6.0%20test_suggest_edge^3.0%20text_suggest_ngram^1.0%20text_suggest_dictionary_ngram^0.2"&q=wa
> 
> This seems to work fine:
> 
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="wa">
> <int name="numFound">5</int>
> <int name="startOffset">0</int>
> <int name="endOffset">2</int>
> <arr name="suggestion">
> <str>wandelement aus gitter</str>
> <str>wandelement aus stahlblech</str>
> <str>wandelement</str>
> <str>wandhalter für prospekte</str>
> <str>wandascher, h 300 × b 230 × t 60 mm</str>
> </arr>
> </lst>
> <str name="collation">(wandelement aus gitter)</str>
> </lst>
> </lst>
> </response>
> 
> However, I added the fourth field so I could get low-boosted suggestions 
> using the afformentioned DictionaryCompoundWordTokenFilterFactory. A sample 
> analysis for the field(type) text_suggest_dictionary_ngram for the word 
> "Geländewagen":
> 
> g
> ge
> gel
> gelä
> gelän
> geländ
> gelände
> geländew
> geländewa
> geländewag
> geländewage
> geländewagen
> g
> ge
> gel
> gelä
> gelän
> geländ
> gelände
> w
> wa
> wag
> wage
> wagen
> 
> As we can see, the DictionaryCompoundWordTokenFilterFactory extracts the word 
> "wagen" and EdgeNGrams it. However, I cannot get results from these NGrams. 
> Trying "wag" as the search term for the suggester, there are no results.
> 
> However, doing an analysis of "Geländewagen" (as field value index) and "wag" 
> (as field value query), analysis shows a match.
> 
> I had the thought that it might be because the underlying component of the 
> suggester is a spellchecker, and a spellchecker wouldn't "correct" "wag" to 
> "wagen" because there was an NGram that spelled "wag", and so the word was 
> spelled correctly already. So I tried without the EdgeNGrams, but the result 
> stays the same.
 

Links:
------
[1]
http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx

Re: Suggester not suggesting anything using DictionaryCompoundWordTokenFilterFactory

Reply via email to