I forgot to mention. I'm using Tika 1.22. I see the same issue even when I use Tika as a GUI.
Steven On Sat, Sep 14, 2019 at 5:29 PM Steven White <[email protected]> wrote: > Hi everyone, > > I'm using Tika <> to extract raw text from an a Microsoft Word 9.0 file. > Tika is giving me back 1/3 of the data. If I save the file as DOCX using > MS Word 2017, I still see the problem. However, if I save the file as PDF > using MS Word 2017, the PDF file gets processed just fine (I get all the > raw text). > > How can I debug this to find out what's the issue? > > Thanks > > Steven >
