Just to add one further wrinkle, if you do not need the XML for any further processing, it is possible to convert a .doc file into rich text format (rtf), leave the .doc extension in place and then have Word silently open it just as if it were still in the BIFF8 binary file format. Do not know if that might 'sidestep' the malformed XML issues you are experiencing or even if it would/might be quicker. I am guessing you are using something like JODConverter to leverage LibreOffice's functionality and, if this is the case, it would be a trivial task to ask it to convert one of the 'problem' documents into rich text format to see what happens.
-- View this message in context: http://apache-poi.1045710.n5.nabble.com/Remove-Invalid-XML-in-DOCX-tp5718602p5718628.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
