Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-03-05 Thread Divyanand Tiwari
Hi Chris thank you for replying. My content field in the schema is
stored=true and indexed=false because I am copying the content field
in text field which is by default indexed=true.

I was having a query that I am able to search in the html documents I had
fed to the solr, but as the results returned by the
Tika/ExtractingRequestHandler is stripped down version of the HTML
document, I am not able to present the document in the original format at
my site. :(

I got certain idea based upon Jack's reply that making my own request
handler and I am working on it.
I'll update if I am coming up with any solution also any help is most
welcomed..!!!

Thank you all for all your support...!!!


On Fri, Feb 22, 2013 at 6:42 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Hi everyone, i am new to solr technology and not getting a way to get
 back
 : the original HTML document with Hits highlighted into it. what
 : configuration and where i can do to instruct SolrCell/ Tika so that it
 does
 : not strips down the tags of HTML document in the content field.

 I _think_ what you want is simply to ensure that you have a content
 field in your schema which is stored=true (and indexed=true if you
 want to serach on it directly) ... and then ExtractingRequestHandler will
 put the entire XHTML it generates from the documents you index into that
 field.

 http://wiki.apache.org/solr/ExtractingRequestHandler

 If that isn't what you had in mind, then you need to provide us with more
 details about what you've tried, what results you get, and how exactly
 those results differ fro mwhat you want to get.


 -Hoss




-- 
Regards,
Divyanand Tiwari


Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-19 Thread Divyanand Tiwari
Thank you for your help Jack. I just wanted to know if there is any ready
made solution for this because i really don't know about extracting meta
information.

awaiting reply..
Thank you


On Tue, Feb 19, 2013 at 12:48 PM, Jack Krupansky j...@basetechnology.comwrote:

 Use the standard update handler and pass the entire HTML page as literal
 text in a Solr XML document for the field that has the HTML strip filter,
 but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax.

 You'll have to process meta information yourself.


 -- Jack Krupansky

 -Original Message- From: Divyanand Tiwari
 Sent: Monday, February 18, 2013 10:52 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How can i instruct the Solr/ Solr Cell to output the original
 HTML document which was fed to it.?


 Thank you for replying sir !!!

 I have two queries related with this -

 1) So in this case which request handler I have to use because
 'ExtractingRequestHandler' by default strips the html content and the
 default handler 'UpdateRequestHandler' does not accepts the HTML contrents.

 2) How can I 'Extract'  'Index' META information in the HTML document
 separately.

 Awaiting your reply
 Thank you!!!




-- 
Regards,
Divyanand Tiwari


Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-18 Thread Divyanand Tiwari
Thank you for replying sir !!!

I have two queries related with this -

1) So in this case which request handler I have to use because
'ExtractingRequestHandler' by default strips the html content and the
default handler 'UpdateRequestHandler' does not accepts the HTML contrents.

2) How can I 'Extract'  'Index' META information in the HTML document
separately.

Awaiting your reply
Thank you!!!