Would it be good to devise the generic _Ole10Native reader to do the following?
-
List<String> magicStrings = getMagicStrings();
for (String ms : magicStrings) {
if (found ms within the first N bytes) {
// this must be a file of type <X>
// read from ms onward, extract to disk
break;
}
}
-----Original Message-----
From: Rainer Schwarze [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 28, 2008 6:49 PM
To: POI Users List
Subject: Re: How to extract embedded files from Office 07
Dmitry Goldenberg wrote:
> Yegor,
>
> The first 8 bytes contain the standard MS Office magic number stuff - d0 cf
> 11 e0 a1 b1 1a e1.
>
> Seems like they compress data in a proprietary way. I've read one post where
> someone recommended the .NET Packaging API to crack these ... Not a good
> option ...
Hi Dmitry,
this may be interesting (unless you already found it):
http://www.nabble.com/Can-POIFS-convert-PDF-to-OLE-td18568081.html
Looking at such things I suspect this:
The data is inside "Ole10Native". This could be extracted using POIFS.
The structures there look like this:
[4 bytes] = size of structure including data
[???] a few flags and strings (zero terminated)
[4 bytes] = size of actually embedded binary data
[???] = the actual binary data
If you know that it is a ZIP file, you could search for a byte sequence
[size]"PK", where [size] depends on the search position. Assume you
start immediately after the first 4 bytes for total length, then the
size value is length-4. Step further by one byte and check for the
sequence with size set to length-5 a.s.o. When the 6 bytes match the
expected [size]PK sequence, you can be somewhat sure, that "PK"
represents the start of the ZIP file and [size] is its size.
Of course nothing beats the analysis of the actual binary data structure
:-) (Would this be worth the effort for your purpose?)
Best wishes, Rainer
--
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]