Tika will not extract all the data of an old Word file

Steven White Sat, 14 Sep 2019 14:29:45 -0700

Hi everyone,

I'm using Tika <> to extract raw text from an a Microsoft Word 9.0 file.
Tika is giving me back 1/3 of the data.  If I save the file as DOCX using
MS Word 2017, I still see the problem.  However, if I save the file as PDF
using MS Word 2017, the PDF file gets processed just fine (I get all the
raw text).


How can I debug this to find out what's the issue?

Thanks

Steven

Tika will not extract all the data of an old Word file

Reply via email to