Hi all,
I'm trying extract thumbnail of a MS Word document using HPSF (this file has
embedded thumbnail). After doc : http://poi.apache.org/hpsf/thumbnails.html, I
can do with follow code :
static byte[] process(File docFile) throws Exception {
final HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(docFile);
SummaryInformation summaryInformation =
wordDocument.getSummaryInformation();
System.out.println(summaryInformation.getAuthor());
System.out.println(summaryInformation.getApplicationName() + ":" +
summaryInformation.getTitle());
Thumbnail thumbnail = new Thumbnail(summaryInformation.getThumbnail());
System.out.println(thumbnail.getClipboardFormat());
System.out.println(thumbnail.getClipboardFormatTag());
return thumbnail.getThumbnailAsWMF();
}
Unfornatly, the extraction raises exception :
Converting E:\test.doc
Saving output to E:\test.wmf
org.apache.poi.hpsf.HPSFException: Clipboard Format Tag of Thumbnail must be
CFTAG_WINDOWS.
at org.apache.poi.hpsf.Thumbnail.getClipboardFormat(Thumbnail.java:234)
at DOC2JPG.process(DOC2JPG.java:52)
at DOC2JPG.main(DOC2JPG.java:33)
Michel ARNOULD
Microsoft Word 9.0:GROUPE DE PAIRS DE VILLIERS-ST-GEORGES
I exported content from summaryInformation.getThumbnail() to a file, then show
by Hexa. The 4 bytes value of Clipboard format tag is never -1 (CFTAG_WINDOWS),
but a '4294967295' :
18 33 00 00 FF FF FF FF 03 00 00 00 08 00 05 52
01 74 E2 18 01 00 09 00 00 03 7C 19 00 00 0A 00
...
I tested on some other Word documents, the format tag value is always
'4294967295'.
Thank alot for your help.
Hong-Thai
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]