HTMLStripReader improvement - padding corrected for hexadecimal entities,
option not to emit padding at all added
-----------------------------------------------------------------------------------------------------------------
Key: SOLR-882
URL: https://issues.apache.org/jira/browse/SOLR-882
Project: Solr
Issue Type: Improvement
Reporter: Dawid Weiss
Priority: Trivial
Attachments: patch
Improvements to HTMLStripHighlighter:
- fix padding of hexadecimal entities (currently off by 1)
- add an option not to emit padding at all. In certain applications padding
emitted after entities such as ó may split words that are in fact single
terms.
- add entities that are recognized when written all in uppercase and recognized
by browsers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.