Tika fails to extract text from very large files

Alec Swan Wed, 16 May 2012 14:41:33 -0700

Hello,

Our tests indicate that while Tika can extract text from average files
it fails to extract text from large files of certain types. In our
tests Tika extracted 0 characters from 100 MB PPTX, 60 MB DOCX and 113
MB PDF files. However, it extracted the right text from 94MB TXT file.


Is this Tika's limitation? How can we troubleshoot this?

Thanks,

Alec

Tika fails to extract text from very large files

Reply via email to