I want to remove word metadata from .doc files. My .docx files works fine
with XWPFDocument, but the following code for removing metadata fails for
large (> 1MB) files. For example using a 6MB .doc file with images, it
outputs a 4.5MB file in which some images are removed. 

public static InputStream removeMetaData(InputStream inputStream) throws
IOException { 
    POIFSFileSystem fss = new POIFSFileSystem(inputStream); 
    HWPFDocument doc = new HWPFDocument(fss); 

    // it even fails on large files if you remove from here to 'until' below 
    SummaryInformation si = doc.getSummaryInformation(); 
    si.removeAuthor(); 
    si.removeComments(); 
    si.removeLastAuthor(); 
    si.removeKeywords(); 
    si.removeSubject(); 
    si.removeTitle(); 

    doc.getDocumentSummaryInformation().removeCategory(); 
    doc.getDocumentSummaryInformation().removeCompany(); 
    doc.getDocumentSummaryInformation().removeManager(); 
    try { 
        doc.getDocumentSummaryInformation().removeCustomProperties(); 
    } catch (Exception e) { 
        // can not remove above 
    } 
    // until 

    ByteArrayOutputStream os = new ByteArrayOutputStream(); 
    doc.write(os); 
    os.flush(); 
    os.close(); 
    return new ByteArrayInputStream(os.toByteArray()); 
} 




--
View this message in context: 
http://apache-poi.1045710.n5.nabble.com/Apache-POI-fails-to-save-HWPFDocument-write-large-word-doc-files-tp5711411.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to