-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379p4038642.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
it, and Solr is based on Lucene.
-- Jack Krupansky
-Original Message-
From: saisantoshi
Sent: Sunday, January 27, 2013 2:09 PM
To: java-user@lucene.apache.org
Subject: Re: Readers for extracting textual info from pd/doc/excel for
indexing the actual content
We are not using Solr and
framework good enough or is there any other better library. Any
issues/experiences in using the tika framework.
Thanks,
Sai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379p4036557.html
y Solr itself to see how it works:
http://wiki.apache.org/solr/ExtractingRequestHandler
-- Jack Krupansky
-Original Message-
From: Adrien Grand
Sent: Sunday, January 27, 2013 12:53 PM
To: java-user@lucene.apache.org
Subject: Re: Readers for extracting textual info from pd/doc/excel for
Have you tried using the PDFParser [1] and the OfficeParser [2]
classes from Tika?
This question seems to be more appropriate for the Tika user mailing list [3]?
[1]
http://tika.apache.org/1.3/api/org/apache/tika/parser/pdf/PDFParser.html#parse(java.io.InputStream,
org.xml.sax.ContentHandler, or
ext:
http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-