Just to add one further wrinkle, if you do not need the XML for any further
processing, it is possible to convert a .doc file into rich text format
(rtf), leave the .doc extension in place and then have Word silently open it
just as if it were still in the BIFF8 binary file format. Do not know if
that might 'sidestep' the malformed XML issues you are experiencing or even
if it would/might be quicker. I am guessing you are using something like
JODConverter to leverage LibreOffice's functionality and, if this is the
case, it would be a trivial task to ask it to convert one of the 'problem'
documents into rich text format to see what happens.



--
View this message in context: 
http://apache-poi.1045710.n5.nabble.com/Remove-Invalid-XML-in-DOCX-tp5718602p5718628.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to