I've seen a couple of threads related to this subject (for example, http://www.mail-archive.com/solr-user@lucene.apache.org/msg33400.html), but I haven't found an answer that addresses the aspect of the problem that concerns me...
I have a field type set up like this: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> The important feature here is the use of WordDelimiterFilterFactory, which allows a search for "WiFi" to match an indexed term of "wi fi" (for example). The problem, of course, is that if a user accidentally introduces a case change in their query, the query analyzer chain breaks it into multiple words and no hits are found... so a search for "exaMple" will look for "exa mple" and fail. I've found two solutions that resolve this problem in the admin panel field analysis tool: 1.) Turn on catenateWords and catenateNumbers in the query analyzer - this reassembles the user's broken word and allows a match. 2.) Turn on preserveOriginal in the query analyzer - this passes through the user's original query, which then gets cleaned up bythe ICUFoldingFilterFactory and allows a match. The problem is that in my real-world application, which uses DisMax, neither of these solutions work. It appears that even though (if I understand correctly) the WordDelimiterFilterFactory is returning ALTERNATIVE tokens, the DisMax handler is combining them a way that requires all of them to match in an inappropriate way... for example, here's partial debugQuery output for the "exaMple" search using Dismax and solution #2 above: "parsedquery":"+DisjunctionMaxQuery((genre:\"(exampl exa) mple\"^300.0 | title_new:\"(exampl exa) mple\"^100.0 | topic:\"(exampl exa) mple\"^500.0 | series:\"(exampl exa) mple\"^50.0 | title_full_unstemmed:\"(example exa) mple\"^600.0 | geographic:\"(exampl exa) mple\"^300.0 | contents:\"(exampl exa) mple\"^10.0 | fulltext_unstemmed:\"(example exa) mple\"^10.0 | allfields_unstemmed:\"(example exa) mple\"^10.0 | title_alt:\"(exampl exa) mple\"^200.0 | series2:\"(exampl exa) mple\"^30.0 | title_short:\"(exampl exa) mple\"^750.0 | author:\"(example exa) mple\"^300.0 | title:\"(exampl exa) mple\"^500.0 | topic_unstemmed:\"(example exa) mple\"^550.0 | allfields:\"(exampl exa) mple\" | author_fuller:\"(example exa) mple\"^150.0 | title_full:\"(exampl exa) mple\"^400.0 | fulltext:\"(exampl exa) mple\")) ()", Obviously, that is not what I want - ideally it would be something like 'exampl OR "ex ample"'. I also read about the autoGeneratePhraseQueries setting, but that seems to take things way too far in the opposite direction - if I set that to false, then I get matches for any individual token; i.e. example OR ex OR ample - not good at all! I have a sinking suspicion that there is not an easy solution to my problem, but this seems to be a fairly basic need; splitOnCaseChange is a useful feature to have, but it's more valuable if it serves as an ALTERNATIVE search rather than a necessary query munge. Any thoughts? thanks, Demian