On 13-Aug-07, at 6:18 PM, Benjamin Higgins wrote:

(using last night's Solr build)



Can't seem to get this to work.  I am trying to use the regex
highlighter fragment type.  The docs suggest looking at the example
solrconifg.xml for a demonstration of a fragmentor that splits on
sentences.  It looks like this:



<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>



This confuses me somewhat. I would have expected perhaps something that
splits on sentence punctuation like [.!?], but this seems to be the
reverse (perhaps so that the punctuation is included?).  Still, why
isn't it [^.!?]?  I read the regex as match between 20 and 200
characters that are one of dash, alphanumeric, space, comma, newline,
double or single quote.

The pattern is supposed to look like what you _want_ a fragment to look like. The reason why this is so is that the desired fragments are often not all that is present (wheat from chaff), and because you don't necessarily want to start a segment where the last one ended.

Anyway I have tried many many patterns, and I can't often tell how they are working. I certainly haven't been able to split on line boundaries.

What are you fragsize/slop settings relative to the size of the lines you want to match?

Try something like:

hl.regex.pattern: [^\n]+
hl.regex.slop: 1.0
hl.fragsize: <maximum line length

Let me know how that goes,
-Mike

Reply via email to