Re: Filtering HTML content in Solr 4.0.0
Hello! You try to put the HTML into the XML sent to Solr right ? You should use the proper UTF-8 encoding to do that. For example look at the utf8-example.xml file from the exampledocs directory that comes with Solr and you'll see something like this: field name=featurestag with escaped chars: lt;nicetag/gt;/field As you can see the and are properly encoded as lt; and gt; -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, I am using Solr 4.0.0. I have a HTML content as description of a product. If I index it without any filtering it is giving errors on search. How can I filter an HTML content. Pratyul
Re: Filtering HTML content in Solr 4.0.0
I think you will have to write an UpdateProcessor to strip out html tags. http://wiki.apache.org/solr/UpdateRequestProcessor As per Solr 4.0 you can also use scripting languages like Python, Ruby and Javascript to write scripts for use as updateprocessors too. -Mensagem Original- From: Pratyul Kapoor Sent: Friday, October 26, 2012 3:56 AM To: solr-user@lucene.apache.org Subject: Filtering HTML content in Solr 4.0.0 Hi, I am using Solr 4.0.0. I have a HTML content as description of a product. If I index it without any filtering it is giving errors on search. How can I filter an HTML content. Pratyul
Re: Filtering HTML content in Solr 4.0.0
Hello! You don't need a custom update request processor - there is a char filter dedicated to strip HTML tags from your content and index only relevant parts of it - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory However, you first need to properly send it to Solr for indexing. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch I think you will have to write an UpdateProcessor to strip out html tags. http://wiki.apache.org/solr/UpdateRequestProcessor As per Solr 4.0 you can also use scripting languages like Python, Ruby and Javascript to write scripts for use as updateprocessors too. -Mensagem Original- From: Pratyul Kapoor Sent: Friday, October 26, 2012 3:56 AM To: solr-user@lucene.apache.org Subject: Filtering HTML content in Solr 4.0.0 Hi, I am using Solr 4.0.0. I have a HTML content as description of a product. If I index it without any filtering it is giving errors on search. How can I filter an HTML content. Pratyul