I want to remove word metadata from .doc files. My .docx files works fine
with XWPFDocument, but the following code for removing metadata fails for
large (> 1MB) files. For example using a 6MB .doc file with images, it
outputs a 4.5MB file in which some images are removed.
public static InputStream removeMetaData(InputStream inputStream) throws
IOException {
POIFSFileSystem fss = new POIFSFileSystem(inputStream);
HWPFDocument doc = new HWPFDocument(fss);
// it even fails on large files if you remove from here to 'until' below
SummaryInformation si = doc.getSummaryInformation();
si.removeAuthor();
si.removeComments();
si.removeLastAuthor();
si.removeKeywords();
si.removeSubject();
si.removeTitle();
doc.getDocumentSummaryInformation().removeCategory();
doc.getDocumentSummaryInformation().removeCompany();
doc.getDocumentSummaryInformation().removeManager();
try {
doc.getDocumentSummaryInformation().removeCustomProperties();
} catch (Exception e) {
// can not remove above
}
// until
ByteArrayOutputStream os = new ByteArrayOutputStream();
doc.write(os);
os.flush();
os.close();
return new ByteArrayInputStream(os.toByteArray());
}
--
View this message in context:
http://apache-poi.1045710.n5.nabble.com/Apache-POI-fails-to-save-HWPFDocument-write-large-word-doc-files-tp5711411.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]