So here's the scenario... We convert MS-WORD documents to DOCX using LibreOffice (clunky), but some files are unreadable because they contain invalid UTF-8 characters in the XML that version 1.0 and 1.1 of XML do not like.
LibreOffice does not care, but we need to read these documents into POI. Short of disassembling the archive file and editing the appropriate XML files in the container, I was wondering if there was a way to edit the PackagePart data for the relevant bits (it's the word/document.xml this is occurring in most frequently). The PackagePart API makes it unclear how to read the XML into memory and edit, then re-write to the part. Any recommendations are welcome on how to approach this. Thanks
