Hi, I am trying to make sure that when I search for text—regardless of what that text is—that I get an exact match. I'm *still* getting some issues, and this last mile is becoming very painful. The solr field, for which I'm setting this up on, is pasted below my explanation. I appreciate any help.
Explanation: I'm crawling websites with Nutch. I'm performing some mechanical-turk-like filtering and term matching. The problem is, there's some very gnarly behavior in Solr due to any number of gotchas. If I want to find *all* Solr documents that match "[id]somejunk\hi[/id]" then life is instantly hell. Likewise, lots of whitespace in between words throws it off " john says hello, how are you?" I would love to be able to search for these exact phrases. If that's just not practical (I'm more than willing to live with a bloated search index), what would some other strategies be? There's no MapReduce in Solr; I could attempt to do Hadoop-streaming, but that's not very ideal for a variety of reasons. Solr Schema.xml, fieldType "text" (no, this is not used everywhere; only on 2 fields): <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" expand="true" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> Thank you, Scott Gonyea