Hi all,

I'm trying extract thumbnail of a MS Word document using HPSF (this file has 
embedded thumbnail). After doc : http://poi.apache.org/hpsf/thumbnails.html, I 
can do with follow code :
static byte[] process(File docFile) throws Exception {
    final HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(docFile);
    SummaryInformation summaryInformation = 
wordDocument.getSummaryInformation();
    System.out.println(summaryInformation.getAuthor());
    System.out.println(summaryInformation.getApplicationName() + ":" + 
summaryInformation.getTitle());
    Thumbnail thumbnail = new Thumbnail(summaryInformation.getThumbnail());
    System.out.println(thumbnail.getClipboardFormat());
    System.out.println(thumbnail.getClipboardFormatTag());
    return thumbnail.getThumbnailAsWMF();
  }

Unfornatly, the extraction raises exception :
Converting E:\test.doc
Saving output to E:\test.wmf
org.apache.poi.hpsf.HPSFException: Clipboard Format Tag of Thumbnail must be 
CFTAG_WINDOWS.
       at org.apache.poi.hpsf.Thumbnail.getClipboardFormat(Thumbnail.java:234)
       at DOC2JPG.process(DOC2JPG.java:52)
       at DOC2JPG.main(DOC2JPG.java:33)
Michel ARNOULD
Microsoft Word 9.0:GROUPE DE PAIRS DE VILLIERS-ST-GEORGES

I exported content from summaryInformation.getThumbnail() to a file, then show 
by Hexa. The 4 bytes value of Clipboard format tag is never -1 (CFTAG_WINDOWS), 
but a '4294967295' :
18 33 00 00 FF FF FF FF 03 00 00 00 08 00 05 52
01 74 E2 18 01 00 09 00 00 03 7C 19 00 00 0A 00
...

I tested on some other Word documents, the format tag value is always 
'4294967295'.

Thank alot for your help.

Hong-Thai

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to