Hey, in german, you can string most nouns together by using hyphens, like this:
Industrie = industry Anhänger = trailer Industrie-Anhänger = trailer for industrial use Here [1], you can see me querying "Industrieanhänger" from the "name" field (name:Industrieanhänger), to make sure the index actually contains the word. Our data is structured that products are listed without the hyphen. Now, customers can come around and use the hyphenated version as a search term (i.e."industrie-anhänger"), and of course we want them to find what they are looking for. I've set it up so that the WordDelimiterFilterFactory uses catenateWords="1", so that these words are catenated. An analysis of "Industrieanhänger" as index and "industrie-anhänger" as query can be seen here [2]. You can see that both word parts are found. However, querying for "industrie-anhänger" does not yield results, only when the hyphen is removed, as you can see here [3]. I'm not sure how to proceed from here, as the results of the analysis have so far always lined up with what I could see when querying. Here's the schema definition for "text", the field type for the "name" field: <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" maxSubwordSize="30" onlyLongestMatch="false"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true" format="snowball"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" maxSubwordSize="30" onlyLongestMatch="false"/> --> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true" format="snowball"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> I've also thought it might be a problem with URL encoding not encoding the hyphen, but replacing it with %2D didn't change the outcome (and was probably wrong anyway). Any help is greatly appreciated. Links: ------ [1] http://imgur.com/2oEC5vz [2] http://i.imgur.com/H0AhEsF.png [3] http://imgur.com/dzmMe7t