On 13-Aug-07, at 6:18 PM, Benjamin Higgins wrote:
(using last night's Solr build)
Can't seem to get this to work. I am trying to use the regex
highlighter fragment type. The docs suggest looking at the example
solrconifg.xml for a demonstration of a fragmentor that splits on
sentences. It looks like this:
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
This confuses me somewhat. I would have expected perhaps something
that
splits on sentence punctuation like [.!?], but this seems to be the
reverse (perhaps so that the punctuation is included?). Still, why
isn't it [^.!?]? I read the regex as match between 20 and 200
characters that are one of dash, alphanumeric, space, comma, newline,
double or single quote.
The pattern is supposed to look like what you _want_ a fragment to
look like. The reason why this is so is that the desired fragments
are often not all that is present (wheat from chaff), and because you
don't necessarily want to start a segment where the last one ended.
Anyway I have tried many many patterns, and I can't often tell how
they
are working. I certainly haven't been able to split on line
boundaries.
What are you fragsize/slop settings relative to the size of the lines
you want to match?
Try something like:
hl.regex.pattern: [^\n]+
hl.regex.slop: 1.0
hl.fragsize: <maximum line length
Let me know how that goes,
-Mike