I have about 20k text files, some very small, but some up to 300MB, and would 
like to do text searching with highlighting.

Imagine the text is the contents of your syslog.

I would like to type in some terms, such as "error" and "mail", and have Solr 
return the syslog lines with those terms PLUS two lines of context.  Pretty 
much just like Google's highlighting.

1) Can Solr handle this?  I had extremely long query times when I tried this 
with Solr 1.4.1 (yes I was using TermVectors, etc.).  I tried breaking the 
files into 1MB pieces, but searching would be wonky => return the wrong number 
of documents (ie. if one file had a term 5 times, and that was the only file 
that had the term, I want 1 result, not 5 results).  

2) What sort of tokenizer would be best?  Here's what I'm using:

   <field name="body" type="text_pl" indexed="true" stored="true" 
multiValued="false" termVectors="true" termPositions="true" termOffsets="true" 
/>

    <fieldType name="text_pl" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="0"/>
      </analyzer>
    </fieldType>


Thanks!
Pete

Reply via email to