Maybe HTMLStripTransformer is what you are looking for.
* http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
On Tue, May 31, 2011 at 5:35 PM, Erick Erickson erickerick...@gmail.com wrote:
Convert them to what? Individual fields in your docs? Text?
If the former, you might get
Convert them to what? Individual fields in your docs? Text?
If the former, you might get some joy from the XpathEntityProcessor.
If you want to just strip the markup and index all the content you
might get some joy from the various *html* analyzers listed here:
Sorry my question was not clear.
when I get data from database, some field contains some html special chars,
and what i want to do is just convert them automatically.
On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty g...@mimirtech.com wrote:
On Fri, May 27, 2011 at 3:50 PM, anass talby