Hello all, I am currently trying to determine what is the cause of some odd behaviour when performing fuzzy queries in Solr 4.2.1. I have a field that is configured as follows:
<field type="textSomeField" indexed="true" stored="false" multiValued="false" name="stuff" /> <fieldType name="textSomeField" omitTermFreqAndPositions="false" omitNorms="true" termVectors="false" termPositions="false" termOffsets="false" class="solr.TextField" positionIncrementGap="100" <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" preserveOriginal="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" preserveOriginal="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Fuzzy searches on this field (and others) gets some darned weird results. For example the names julie, julia, julian, julio, and juliar are indexed. The following occurs: stuff:(julia~1) - Only finds julia stuff:(julie~1) - finds julia and julie stuff:(julian~1) - only finds julian stuff:(julin~1) - finds julian, julia, julie, etc stuff:(juliz~1) - finds julia, julio, julie, etc This is one of the simple examples of the behaviour we are seeing. I will happily provide more if necessary. My question is why exactly I am getting the results that I am getting from fuzzy? My understanding of fuzzy is that it is the Levenshtein distance from one word to the next. Therefore, julia, julie, and julio should be returning results with each others names with an edit distance of 1 yet that is definitely not the behavior I am observing. I am uncertain of whether I have done something wrong with the indexing, querying, or am simply misunderstanding how fuzzy functions. Any help or clarification would be appreciated. Regards, Ryan Wilson rpwils...@gmail.com