Hi everyone,

I'm using Tika <> to extract raw text from an a Microsoft Word 9.0 file.
Tika is giving me back 1/3 of the data.  If I save the file as DOCX using
MS Word 2017, I still see the problem.  However, if I save the file as PDF
using MS Word 2017, the PDF file gets processed just fine (I get all the
raw text).

How can I debug this to find out what's the issue?

Thanks

Steven

Reply via email to