I have about 20k text files, some very small, but some up to 300MB, and would like to do text searching with highlighting.
Imagine the text is the contents of your syslog. I would like to type in some terms, such as "error" and "mail", and have Solr return the syslog lines with those terms PLUS two lines of context. Pretty much just like Google's highlighting. 1) Can Solr handle this? I had extremely long query times when I tried this with Solr 1.4.1 (yes I was using TermVectors, etc.). I tried breaking the files into 1MB pieces, but searching would be wonky => return the wrong number of documents (ie. if one file had a term 5 times, and that was the only file that had the term, I want 1 result, not 5 results). 2) What sort of tokenizer would be best? Here's what I'm using: <field name="body" type="text_pl" indexed="true" stored="true" multiValued="false" termVectors="true" termPositions="true" termOffsets="true" /> <fieldType name="text_pl" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> </analyzer> </fieldType> Thanks! Pete