Hi everyone, I'm using Tika <> to extract raw text from an a Microsoft Word 9.0 file. Tika is giving me back 1/3 of the data. If I save the file as DOCX using MS Word 2017, I still see the problem. However, if I save the file as PDF using MS Word 2017, the PDF file gets processed just fine (I get all the raw text).
How can I debug this to find out what's the issue? Thanks Steven
