Re: Indexing part of Binary Documents and not the entire contents

Shawn Heisey Tue, 26 Jun 2018 06:28:17 -0700

On 6/26/2018 7:13 AM, neotorand wrote:

Dont you think the below method is very exepensive


autoParser.parse(input, textHandler, metadata, context);

If the document size if bigger than it will need enough memory to hold the
document(ie ContentHandler).
Any other alternative?


I did find this:

https://stackoverflow.com/questions/25043720/using-poi-or-tika-to-extract-text-stream-to-stream-without-loading-the-entire-f

But I have no actual experience with Tika. If you want to get adefinitive answer, you will need to go to a Tika support resource. Although Solr does incorporate Tika, we are not experts in its use.


Thanks,
Shawn

Re: Indexing part of Binary Documents and not the entire contents

Reply via email to