Rainer,
You were right, the .bin file in /embeddings is Ole and can be read with POIFS.
The gotcha is, there's currently no API to extract the file out of the Ole
structures within POIFS.
HSLF has an API to enumerate Ole objects within slides. But what I need is a
generic API that would let me do the following:
List<Embedding> embeddings = poifs.getEmbeddings();
for (Embedding embedding : embeddings) {
System.out.println(">> Embedding: " + embedding.getName());
embedding.extractTo(new FileOutputStream(outputDir,
Utils.getCleanFileName(embedding.getName())));
}
getEmbeddings() could be getOleObjects() or whatever, but that's the gist of
it..
- Dmitry
-----Original Message-----
From: Rainer Schwarze [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 28, 2008 6:49 PM
To: POI Users List
Subject: Re: How to extract embedded files from Office 07
Dmitry Goldenberg wrote:
> Yegor,
>
> The first 8 bytes contain the standard MS Office magic number stuff - d0 cf
> 11 e0 a1 b1 1a e1.
>
> Seems like they compress data in a proprietary way. I've read one post where
> someone recommended the .NET Packaging API to crack these ... Not a good
> option ...
Hi Dmitry,
this may be interesting (unless you already found it):
http://www.nabble.com/Can-POIFS-convert-PDF-to-OLE-td18568081.html
Looking at such things I suspect this:
The data is inside "Ole10Native". This could be extracted using POIFS.
The structures there look like this:
[4 bytes] = size of structure including data
[???] a few flags and strings (zero terminated)
[4 bytes] = size of actually embedded binary data
[???] = the actual binary data
If you know that it is a ZIP file, you could search for a byte sequence
[size]"PK", where [size] depends on the search position. Assume you
start immediately after the first 4 bytes for total length, then the
size value is length-4. Step further by one byte and check for the
sequence with size set to length-5 a.s.o. When the 6 bytes match the
expected [size]PK sequence, you can be somewhat sure, that "PK"
represents the start of the ZIP file and [size] is its size.
Of course nothing beats the analysis of the actual binary data structure
:-) (Would this be worth the effort for your purpose?)
Best wishes, Rainer
--
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]