Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?
Hi Chris thank you for replying. My content field in the schema is stored=true and indexed=false because I am copying the content field in text field which is by default indexed=true. I was having a query that I am able to search in the html documents I had fed to the solr, but as the results returned by the Tika/ExtractingRequestHandler is stripped down version of the HTML document, I am not able to present the document in the original format at my site. :( I got certain idea based upon Jack's reply that making my own request handler and I am working on it. I'll update if I am coming up with any solution also any help is most welcomed..!!! Thank you all for all your support...!!! On Fri, Feb 22, 2013 at 6:42 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Hi everyone, i am new to solr technology and not getting a way to get back : the original HTML document with Hits highlighted into it. what : configuration and where i can do to instruct SolrCell/ Tika so that it does : not strips down the tags of HTML document in the content field. I _think_ what you want is simply to ensure that you have a content field in your schema which is stored=true (and indexed=true if you want to serach on it directly) ... and then ExtractingRequestHandler will put the entire XHTML it generates from the documents you index into that field. http://wiki.apache.org/solr/ExtractingRequestHandler If that isn't what you had in mind, then you need to provide us with more details about what you've tried, what results you get, and how exactly those results differ fro mwhat you want to get. -Hoss -- Regards, Divyanand Tiwari
Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?
Thank you for your help Jack. I just wanted to know if there is any ready made solution for this because i really don't know about extracting meta information. awaiting reply.. Thank you On Tue, Feb 19, 2013 at 12:48 PM, Jack Krupansky j...@basetechnology.comwrote: Use the standard update handler and pass the entire HTML page as literal text in a Solr XML document for the field that has the HTML strip filter, but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax. You'll have to process meta information yourself. -- Jack Krupansky -Original Message- From: Divyanand Tiwari Sent: Monday, February 18, 2013 10:52 PM To: solr-user@lucene.apache.org Subject: Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.? Thank you for replying sir !!! I have two queries related with this - 1) So in this case which request handler I have to use because 'ExtractingRequestHandler' by default strips the html content and the default handler 'UpdateRequestHandler' does not accepts the HTML contrents. 2) How can I 'Extract' 'Index' META information in the HTML document separately. Awaiting your reply Thank you!!! -- Regards, Divyanand Tiwari
Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?
Thank you for replying sir !!! I have two queries related with this - 1) So in this case which request handler I have to use because 'ExtractingRequestHandler' by default strips the html content and the default handler 'UpdateRequestHandler' does not accepts the HTML contrents. 2) How can I 'Extract' 'Index' META information in the HTML document separately. Awaiting your reply Thank you!!!