Hi there,

Apologies for the screenshots, but I think they are the easiest way to explain 
my problem.
I need to extract embedded OLE10Native files from CDF Word docs.  I thought my 
code was working, but someone recently reported it broken when .bin files are 
embedded (PBrush files?).
My approach is this:

Open the Word doc as a stream, then:

        npoifsFileSystem = new NPOIFSFileSystem(bis);
        scanForEmbeddedOleDocs(npoifsFileSystem.getRoot());

In scanForEmbeddedOleDocs() I iterate through the structure (recursing when 
other DirectoryNodes are found), looking for entry names of 
"\u0001Ole10Native”.  When found, I call

        byte[] imageData = 
Ole10Native.createFromEmbeddedOleObject(dirNode).getDataBuffer();

to get the image data.

Now, this works in some cases (embedded MP3s for example) but fails for others 
(BIN files).  The 2 screenshots below taken from the debugger show the state of 
2 DirectoryNodes at the point of extraction.

The first one (Embedded_MP3_OK.png) shows the success case with an MP3:

[cid:CD9D5A3B-665D-45B2-943F-5F8E176972A3]

The second (Embedded_BIN_Fails.png) show the problem case with the BIN file:

[cid:C5CEF6D2-E1FC-4CC3-9236-899D2426AC3F]

For further validation I converted the doc containing the BIN to docx and 
unzipped it and that successfully extracts a bin file (so I know it can be 
done!):

find . -ls
1807158        0 drwxr-xr-x    6 cbamford         790807719             204 16 
Jul 09:28 .
1807163        8 -rw-r--r--    1 cbamford         790807719            1701  1 
Jan  1980 ./[Content_Types].xml
…..
1807170        0 drwxr-xr-x    3 cbamford         790807719             102 16 
Jul 09:28 ./word/embeddings
1807171        8 -rw-r--r--    1 cbamford         790807719            3072  1 
Jan  1980 ./word/embeddings/oleObject1.bin
….

Can anyone tell me what I am doing wrong in my code?  I am using POI-3.10-FINAL.

Thanks!

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/






Reply via email to