Skip to site navigation (Press enter)

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

Jack Krupansky Mon, 18 Feb 2013 23:19:07 -0800

Use the standard update handler and pass the entire HTML page as literaltext in a Solr XML document for the field that has the HTML strip filter,but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax.


You'll have to process meta information yourself.


-- Jack Krupansky

-----Original Message-----From: Divyanand Tiwari

Sent: Monday, February 18, 2013 10:52 PM
To: solr-user@lucene.apache.org

Subject: Re: How can i instruct the Solr/ Solr Cell to output the originalHTML document which was fed to it.?


Thank you for replying sir !!!

I have two queries related with this -

1) So in this case which request handler I have to use because
'ExtractingRequestHandler' by default strips the html content and the
default handler 'UpdateRequestHandler' does not accepts the HTML contrents.

2) How can I 'Extract' & 'Index' META information in the HTML document
separately.

Awaiting your reply....

Thank you!!!