Not sure how to write that subject line.  I'm getting some weird behavior out 
of the highlighter in Solr.  It seems like an edge case, but I'm curious to 
hear if this is known about, or if it's something worth looking into further.

Background:

I'm using Solr's highlighting facility to tag words, found in content crawled 
via Nutch. I split up the content based on those tags, which is later fed into 
a moderation process.

Sample Data (snippet from larger content):
[url=\"http://www.sampleurl.com/baffle_prices.html\"]baffle[/url]

(My "hl.simple.pre" is set to "TEST_KEYWORD_START" and my "hl.simple.post" is 
set to "TEST_KEYWORD_END")

Query for "baffle", and solr highlights it thus:

TEST_KEYWORD_STARTbaffle_prices.html\"]baffleTEST_KEYWORD_END

What should be happening, is this:

TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END_prices.html\"]TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END


Is there something about this data that makes the highlighter not want to split 
it up? Do I have to have Solr tokenize the words by some character that I 
somehow excluded?

Thank you,
Scott Gonyea

Reply via email to