Dear Solr Users, I'm an enthusiastic solr user since version 1.4. I'm now working on a new solr based application heavily using fuzzy searches for string matching.
Unfortunately I'm facing a strange problem using fuzzy search and I hope someone can help me to get more information. I indexed several company names in a field named ENTITY_NAME using the following parameters in schema.xml . <fieldType name="whitespace_tokenized" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> . <field name="ENTITY_NAME" type="whitespace_tokenized" indexed="true" stored="true" /> . One of these companies is "TS PUBLISHING INC" Following the list of queries with the returned and the expected result 1) ENTITY_NAME:(ts AND publising) => matches, OK 2) ENTITY_NAME:(ts AND publising~1) => matches, OK 3) ENTITY_NAME:(td~1 AND publishing) => doesn't match, KO (it was supposed to match) 4) ENTITY_NAME:(ts AND pablisin~3) => doesn't match, KO (it was supposed to match) Why td~1 does not match ts? Why pablisin~3 publishing? How can I investigate the problem? Is there any parameter I can set in solrconfig.xml? Is there any tool I can use to see how the automata is built? Thanks a lot in advance, Matteo Diarena Senior KM Developer - VOLO.com S.r.l. Via Luigi Rizzo, 8/1 - 20151 MILANO Fax +39 02 8945 3500 Tel +39 02 8945 3023 Cell +39 345 2129244 <mailto:m.diar...@volocom.it> m.diar...@volocom.it <http://www.volocom.it/> http://www.volocom.it