Can you create a JIRA & provide a sample of the file?
Does the file has any embeddings, like, Excel, PPT, ...? Or text inserted
as text box?

Steven White  at "Sat, 14 Sep 2019 17:29:19 -0400" wrote:
 SW> Hi everyone,

 SW> I'm using Tika <> to extract raw text from an a Microsoft Word 9.0 file.  
Tika is giving me back 1/3 of the data.  If I save the
 SW> file as DOCX using MS Word 2017, I still see the problem.  However, if I 
save the file as PDF using MS Word 2017, the PDF file
 SW> gets processed just fine (I get all the raw text).

 SW> How can I debug this to find out what's the issue?

 SW> Thanks

 SW> Steven



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Reply via email to