Dmitry Goldenberg wrote:
Yegor,

The first 8 bytes contain the standard MS Office magic number stuff - d0 cf 11 
e0 a1 b1 1a e1.

Seems like they compress data in a proprietary way. I've read one post where 
someone recommended the .NET Packaging API to crack these ...  Not a good 
option ...

Hi Dmitry,

this may be interesting (unless you already found it):

http://www.nabble.com/Can-POIFS-convert-PDF-to-OLE-td18568081.html


Looking at such things I suspect this:

The data is inside "Ole10Native". This could be extracted using POIFS. The structures there look like this:

[4 bytes] = size of structure including data
[???] a few flags and strings (zero terminated)
[4 bytes] = size of actually embedded binary data
[???] = the actual binary data

If you know that it is a ZIP file, you could search for a byte sequence [size]"PK", where [size] depends on the search position. Assume you start immediately after the first 4 bytes for total length, then the size value is length-4. Step further by one byte and check for the sequence with size set to length-5 a.s.o. When the 6 bytes match the expected [size]PK sequence, you can be somewhat sure, that "PK" represents the start of the ZIP file and [size] is its size.

Of course nothing beats the analysis of the actual binary data structure :-) (Would this be worth the effort for your purpose?)

Best wishes, Rainer
--

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to