I think I found the problem. The definition of the suggester component has a "field" option which references the field that the suggester uses to generate suggestions. Changing this to the field using the DictionaryCompundWordTokenFilterFactory also suggests word parts.
Am 11.11.2014 08:52 schrieb Thomas Michael Engelke: > I'm toying around with the suggester component, like described here: > http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx > [1] > > So I made 4 fields: > > <field name="text_suggest" type="text_suggest" indexed="true" stored="true" > multiValued="true" /> > <copyField source="name" dest="text_suggest" /> > <field name="text_suggest_edge" type="text_suggest_edge" indexed="true" > stored="true" multiValued="true" /> > <copyField source="name" dest="text_suggest_edge" /> > <field name="text_suggest_ngram" type="text_suggest_ngram" indexed="true" > stored="true" multiValued="true" /> > <copyField source="name" dest="text_suggest_ngram" /> > <field name="text_suggest_dictionary_ngram" > type="text_suggest_dictionary_ngram" indexed="true" stored="true" > multiValued="true" /> > <copyField source="name" dest="text_suggest_dictionary_ngram" /> > > with the corresponding definitions: > > <fieldType name="text_suggest" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory" /> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > <fieldType name="text_suggest_edge" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" > side="front" /> > </analyzer> > </fieldType> > <fieldType name="text_suggest_ngram" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" > side="front" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > <fieldType name="text_suggest_dictionary_ngram" class="solr.TextField"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.DictionaryCompoundWordTokenFilterFactory" > dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" > maxSubwordSize="30" onlyLongestMatch="false"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" > side="front" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > > I'm calling the suggester component this way: > > http://<address>:8983/solr/<core>/suggest?qf="text_suggest^6.0%20test_suggest_edge^3.0%20text_suggest_ngram^1.0%20text_suggest_dictionary_ngram^0.2"&q=wa > > This seems to work fine: > > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">0</int> > </lst> > <lst name="spellcheck"> > <lst name="suggestions"> > <lst name="wa"> > <int name="numFound">5</int> > <int name="startOffset">0</int> > <int name="endOffset">2</int> > <arr name="suggestion"> > <str>wandelement aus gitter</str> > <str>wandelement aus stahlblech</str> > <str>wandelement</str> > <str>wandhalter für prospekte</str> > <str>wandascher, h 300 × b 230 × t 60 mm</str> > </arr> > </lst> > <str name="collation">(wandelement aus gitter)</str> > </lst> > </lst> > </response> > > However, I added the fourth field so I could get low-boosted suggestions > using the afformentioned DictionaryCompoundWordTokenFilterFactory. A sample > analysis for the field(type) text_suggest_dictionary_ngram for the word > "Geländewagen": > > g > ge > gel > gelä > gelän > geländ > gelände > geländew > geländewa > geländewag > geländewage > geländewagen > g > ge > gel > gelä > gelän > geländ > gelände > w > wa > wag > wage > wagen > > As we can see, the DictionaryCompoundWordTokenFilterFactory extracts the word > "wagen" and EdgeNGrams it. However, I cannot get results from these NGrams. > Trying "wag" as the search term for the suggester, there are no results. > > However, doing an analysis of "Geländewagen" (as field value index) and "wag" > (as field value query), analysis shows a match. > > I had the thought that it might be because the underlying component of the > suggester is a spellchecker, and a spellchecker wouldn't "correct" "wag" to > "wagen" because there was an NGram that spelled "wag", and so the word was > spelled correctly already. So I tried without the EdgeNGrams, but the result > stays the same. Links: ------ [1] http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx