Would it be good to devise the generic _Ole10Native reader to do the following? 
-

List<String> magicStrings = getMagicStrings();
for (String ms : magicStrings) {
   if (found ms within the first N bytes) {
      // this must be a file of type <X>
      // read from ms onward, extract to disk
      break;
   }
}

-----Original Message-----
From: Rainer Schwarze [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 28, 2008 6:49 PM
To: POI Users List
Subject: Re: How to extract embedded files from Office 07

Dmitry Goldenberg wrote:
> Yegor,
>
> The first 8 bytes contain the standard MS Office magic number stuff - d0 cf 
> 11 e0 a1 b1 1a e1.
>
> Seems like they compress data in a proprietary way. I've read one post where 
> someone recommended the .NET Packaging API to crack these ...  Not a good 
> option ...

Hi Dmitry,

this may be interesting (unless you already found it):

http://www.nabble.com/Can-POIFS-convert-PDF-to-OLE-td18568081.html


Looking at such things I suspect this:

The data is inside "Ole10Native". This could be extracted using POIFS.
The structures there look like this:

[4 bytes] = size of structure including data
[???] a few flags and strings (zero terminated)
[4 bytes] = size of actually embedded binary data
[???] = the actual binary data

If you know that it is a ZIP file, you could search for a byte sequence
[size]"PK", where [size] depends on the search position. Assume you
start immediately after the first 4 bytes for total length, then the
size value is length-4. Step further by one byte and check for the
sequence with size set to length-5 a.s.o. When the 6 bytes match the
expected [size]PK sequence, you can be somewhat sure, that "PK"
represents the start of the ZIP file and [size] is its size.

Of course nothing beats the analysis of the actual binary data structure
:-) (Would this be worth the effort for your purpose?)

Best wishes, Rainer
--

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to