Unable to extract content from chunked portion of large file

raghu vittal Fri, 19 Feb 2016 01:38:31 -0800

Hi All

we have very large PDF,.docx,.xlsx. We are using Tika to extract content and 
dump data in Elastic Search for full-text search.
sending very large files to Tika will cause out of memory exception.


we want to chunk the file and send it to TIKA for content extraction. when we 
passed chunked portion of file to Tika it is giving empty text.
I assume Tika is relied on file structure that why it is not giving any content.

we are using Tika Server(REST api) in our .net application.

please suggest us better approach for this scenario.

Regards,
Raghu.

Unable to extract content from chunked portion of large file

Reply via email to