Re: How to extract embedded files from Office 07

Rainer Schwarze Thu, 28 Aug 2008 15:53:19 -0700

Dmitry Goldenberg wrote:

Yegor,


The first 8 bytes contain the standard MS Office magic number stuff - d0 cf 11 
e0 a1 b1 1a e1.

Seems like they compress data in a proprietary way. I've read one post where 
someone recommended the .NET Packaging API to crack these ...  Not a good 
option ...


Hi Dmitry,

this may be interesting (unless you already found it):

http://www.nabble.com/Can-POIFS-convert-PDF-to-OLE-td18568081.html


Looking at such things I suspect this:

The data is inside "Ole10Native". This could be extracted using POIFS.The structures there look like this:


[4 bytes] = size of structure including data
[???] a few flags and strings (zero terminated)
[4 bytes] = size of actually embedded binary data
[???] = the actual binary data

If you know that it is a ZIP file, you could search for a byte sequence[size]"PK", where [size] depends on the search position. Assume youstart immediately after the first 4 bytes for total length, then thesize value is length-4. Step further by one byte and check for thesequence with size set to length-5 a.s.o. When the 6 bytes match theexpected [size]PK sequence, you can be somewhat sure, that "PK"represents the start of the ZIP file and [size] is its size.

Of course nothing beats the analysis of the actual binary data structure:-) (Would this be worth the effort for your purpose?)


Best wishes, Rainer
--

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to extract embedded files from Office 07

Reply via email to