I forgot to mention.  I'm using Tika 1.22.  I see the same issue even when
I use Tika as a GUI.

Steven

On Sat, Sep 14, 2019 at 5:29 PM Steven White <[email protected]> wrote:

> Hi everyone,
>
> I'm using Tika <> to extract raw text from an a Microsoft Word 9.0 file.
> Tika is giving me back 1/3 of the data.  If I save the file as DOCX using
> MS Word 2017, I still see the problem.  However, if I save the file as PDF
> using MS Word 2017, the PDF file gets processed just fine (I get all the
> raw text).
>
> How can I debug this to find out what's the issue?
>
> Thanks
>
> Steven
>

Reply via email to