Wow, I tried with minGramSize=1 and maxgramSize=1000 (I want someone to be able to search on any substring, just like "grep"), and the index is multiple orders of magnitude larger than my data!
There's got to be a better way to support full grep-like searching? Thanks! Pete On Nov 4, 2011, at 1:20 AM, Ahmet Arslan wrote: >> Example data: >> 01/23/2011 05:12:34 [Test] a=1; hello_there=50; >> data=[1,5,30%]; >> >> I would love to be able to just "grep" the data - ie. if I >> search for "ello", it finds and returns "ello", and if I >> search for "hello_there=5", it would match too. >> >> Here's what I'm using now: >> >> <fieldType name="text_sy" >> class="solr.TextField"> >> <analyzer> >> <tokenizer >> class="solr.StandardTokenizerFactory"/> >> <filter >> class="solr.LowerCaseFilterFactory"/> >> <filter >> class="solr.WordDelimiterFilterFactory" >> generateWordParts="0" generateNumberParts="0" >> catenateWords="0" catenateNumbers="0" catenateAll="0" >> splitOnCaseChange="0"/> >> </analyzer> >> </fieldType> >> >> The problem with this is that if I search for a substring, >> I don't get anything back. For example, searching for >> "ello" or "*ello*" doesn't return. Any ideas? >> >> http://localhost:8983/solr/select?q=*ello*&start=0&rows=50&hl.maxAnalyzedChars=2147483647&hl.useFastVectorHighlighter=true&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400 > > For sub-string match NGramFilterFactory is required at index time. > > <filter class="solr.NGramFilterFactory" minGramSize="1" > maxGramSize="15"/> > > Plus you may want to use WhiteSpaceTokenizer instead of > StandardTokenizerFactory. Analysis admin page displays behavior of each > tokenizer.