[ http://issues.apache.org/jira/browse/SOLR-57?page=all ]
Yonik Seeley resolved SOLR-57.
------------------------------
Resolution: Duplicate
known issue.
It probably wouldn't be too hard to fix for Whitespace*, but could be pretty
difficult for Standard*
> Highlighter does not work with HTML content that's passed through
> HTMLStrip*Tokenizer
> -------------------------------------------------------------------------------------
>
> Key: SOLR-57
> URL: http://issues.apache.org/jira/browse/SOLR-57
> Project: Solr
> Issue Type: Bug
> Components: search
> Environment: Red Hat Linux 9, Tomcat 5.5.20
> Reporter: Ho Yin Au
> Priority: Minor
>
> I have a fieldtype with the following definition:
> <fieldtype name="htmltext" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer>
> <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.StopFilterFactory" />
> <filter class="solr.EnglishPorterFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> <filter class="solr.ISOLatin1AccentFilterFactory" />
> </analyzer>
> </fieldtype>
> When fields with that definition are included in the list of fields to be
> highlighted, the highlighted term is always offset because it does not take
> into account the HTML tags before it, so you end up with something like this
> for the highlighted snipplet:
> Does your comptuer meet the <a
> href="http:/<em>/www.example</em>.com/system_requirements.shtml">minimum
> system requirements</a>?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira