Hi Helmut, As far as I remember, this is OLE header. I decoded OLE embedds from RTF and they were looking similar to yours. Microsoft RTF spec says: "When the object is an OLE embedded or linked object, the data part of the object is the structure produced by the OLESaveToStream function". I tried to reverse-engineer the format and read wine's source for OLESaveToStream and OLELoadFromStream, but was defeated soon as this feature wasn't mandatory in our product.
I hope this will help you somehow, good luck and, please, keep notifying this maillist in case of any progress. On 7/24/08, Helmut Ziegler <[EMAIL PROTECTED]> wrote: > Hi Nick, > > thanks for your response! > I didn't use POIFSViewer but I know (now) the structure of my PDF Ole > Object. Unfortunately this isn't enough ... > > Here is what I did: > > First of all I created a Word2003 xml file with Word and imported a pdf > file. The PDF is recognized as a package (not as a pdf file) as there wasn't > a program to handle pdf files on that computer. > These are the important parts: > <w:docOleData> > <w:binData w:name="oledata.mso"> > 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/ > ... > </w:binData></w:docOleData> > > <o:OLEObject Type="Embed" ProgID="Package" ShapeID="_x0000_i1025" > DrawAspect="Content" ObjectID="_1277043057"/> > > In the word xml file the ole object is base64 encoded. > I decoded it and wrote a binary file (OleObject.bin) that I inspected (first > with 7-zip, later with POIFS) > > The structure of OleObject.bin is the following > + Root entry > ++ _1277043057 > +++[3]OleObjectInfo > +++[1]Ole10Native > +++[1]Ole > +++[1]CompObj > > Ole10Native represents my pdf with a custom header that word attached. > To get to this content I had to: > 1. Create a POIFSFilesSystem based on OleObject.bin > 2. Get the Entry "_1277043057" and write it to the hard disk (as > "_1277043057"). > 3. Strip the first 4 Bytes of "_1277043057" > 4. Use the inflate algortithm to decompress it as "_1277043057_decompressed" > 5. Create a POIFSFileSystem again based on the decompressed > "_1277043057_decompressed") > 6. Write the contents listed above to the hard disk. > ==>I could then open my PDF file. > > So far, so good. Now I tried it vice versa. After packaging the content > again and tried to open the file in Word, Word complained that it can't open > the file because > "The server application, the source file, or the element wasn't found" > (this is only a translation) > > The I was looking for the step that that fails. > Steps 1 to 4 worked also in the other direction but creating > "_1277043057_decompressed" seemed not to work. > When I compared the to original "_1277043057_decompressed" to the generated > one there are many similarities (file size and most of the content). But in > first part of the file original there is more information. > I had a look at it in a text editor. The information is some kind of > metadata: > 1. The alphabet > 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ... > C.o.m.p.O.b.j...." > 3. The kind of ole object "P.a.c.k.a.g.e" > > > Does anyone know how I get this information into my file? > > Cheers, > Helmut > > P. S. The reverse enineering is based on this excellent article > http://www.trustedsource.org/download/research_publications/CAlme_VBOct06.pdf > > > > ---- > -------- Original-Nachricht -------- >> Datum: Thu, 24 Jul 2008 11:42:10 +0100 (BST) >> Von: Nick Burch <[EMAIL PROTECTED]> >> An: POI Users List <[email protected]> >> Betreff: Re: Can POIFS convert PDF to OLE > >> On Thu, 24 Jul 2008, Helmut Ziegler wrote: >> > Actually the Word document should also carry other documents like other >> > word files. >> >> I'd suggest dumping out the stream(s), and looking at them with things >> like org.apache.poi.poifs.dev.POIFSViewer >> >> Start by seeing if you can change on bit of one file in the poifs stream, >> and have the change noticed. If that works, but adding a new poifs stream >> doesn't, then there are extra things in the poifs stream that need to be >> set up. I think you're probably going to need to run diff quite a bit, >> across two files (one that works, one that doesn't) and see what's >> different >> >> Nick >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > -- > Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten > Browser-Versionen downloaden: http://www.gmx.net/de/go/browser > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
