Hi there,
Apologies for the screenshots, but I think they are the easiest way to explain
my problem.
I need to extract embedded OLE10Native files from CDF Word docs. I thought my
code was working, but someone recently reported it broken when .bin files are
embedded (PBrush files?).
My approach is this:
Open the Word doc as a stream, then:
npoifsFileSystem = new NPOIFSFileSystem(bis);
scanForEmbeddedOleDocs(npoifsFileSystem.getRoot());
In scanForEmbeddedOleDocs() I iterate through the structure (recursing when
other DirectoryNodes are found), looking for entry names of
"\u0001Ole10Native”. When found, I call
byte[] imageData =
Ole10Native.createFromEmbeddedOleObject(dirNode).getDataBuffer();
to get the image data.
Now, this works in some cases (embedded MP3s for example) but fails for others
(BIN files). The 2 screenshots below taken from the debugger show the state of
2 DirectoryNodes at the point of extraction.
The first one (Embedded_MP3_OK.png) shows the success case with an MP3:
[cid:CD9D5A3B-665D-45B2-943F-5F8E176972A3]
The second (Embedded_BIN_Fails.png) show the problem case with the BIN file:
[cid:C5CEF6D2-E1FC-4CC3-9236-899D2426AC3F]
For further validation I converted the doc containing the BIN to docx and
unzipped it and that successfully extracts a bin file (so I know it can be
done!):
find . -ls
1807158 0 drwxr-xr-x 6 cbamford 790807719 204 16
Jul 09:28 .
1807163 8 -rw-r--r-- 1 cbamford 790807719 1701 1
Jan 1980 ./[Content_Types].xml
…..
1807170 0 drwxr-xr-x 3 cbamford 790807719 102 16
Jul 09:28 ./word/embeddings
1807171 8 -rw-r--r-- 1 cbamford 790807719 3072 1
Jan 1980 ./word/embeddings/oleObject1.bin
….
Can anyone tell me what I am doing wrong in my code? I am using POI-3.10-FINAL.
Thanks!
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/