Use the standard update handler and pass the entire HTML page as literal text in a Solr XML document for the field that has the HTML strip filter, but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax.

You'll have to process meta information yourself.

-- Jack Krupansky

-----Original Message----- From: Divyanand Tiwari
Sent: Monday, February 18, 2013 10:52 PM
To: solr-user@lucene.apache.org
Subject: Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

Thank you for replying sir !!!

I have two queries related with this -

1) So in this case which request handler I have to use because
'ExtractingRequestHandler' by default strips the html content and the
default handler 'UpdateRequestHandler' does not accepts the HTML contrents.

2) How can I 'Extract' & 'Index' META information in the HTML document
separately.

Awaiting your reply....
Thank you!!!

Reply via email to