I'm finding similar issues to those Helmut idenfiied, so I'm adding to his
thread.
My goal is to be able to embed a PDF document in a WordprocessingML file.
To start with, I'm looking at a Word document which already has a PDF
embedded in it.
I created this Word document by dragging an existing PDF onto a new document
open in Word 2007, on a computer on which Acrobat (Adobe Reader 8) was
installed.
I'm able to use POIFS to read the resulting oleObject1.bin part.
It looks like this:
Root Entry
CompObj <(0x01)CompObj>
CONTENTS
ObjInfo <(0x03)ObjInfo>
Ole <(0x01)Ole>
EPRINT <(0x03)EPRINT>
This is different to the structure Helmut observed. His structured more or
less conforms to the [MS-OLEDS] specification, in that he has \1Ole,
\1Ole10Native, and \1CompObj.
My part is missing Ole10Native, and instead has CONTENTS, and \3EPRINT.
Anyway, CONTENTS contains the pdf verbatim. I can get it with
createDocumentInputStream("CONTENTS"), save the stream to disk, and open it
with a PDF viewer.
So far so good.
Helmut Ziegler wrote:
>
> The result is the same as with the POIFS generated file: Word says "The
> server application.... was not found"
> :-(
>
Same here: If I call writeFilesystem without changing it (ie a basic round
trip test), and open the resulting WordprocessingML file in Word, then
double click on the object (which should open it in Reader 8), Word says:
"The server application, source file, or item cannot be found. [blagh
blagh]".
The original and re-written oleObject1.bin parts are quite different at a
byte level.
Helmut Ziegler wrote:
>
> Yesterday evening we came to the conclusion that POIFS creates something
> that isn't compatible with Word (because it's not easy to build the
> interior of a black box).
> Then a colleague (with c++ knowledge) wrote a program, that's based on
> ole32.dll. Using it there was nearly(!) no difference to the original file
> created by word (same structure, etc.) except of two small differences in
> the directory structure and the first content part which holds (Ole,
> CompObj, ObjInfo).
>
> I think that the problem might be in the directory structure. The "Root
> Entry"-entry from the generated file ist different to the one of the
> original file.
> It's incredible how minimal the differences are.
>
I wrote a little program to compare the original against the re-written
POIFS at the DocumentInputStream level.
Each DocumentInputStream is identical:
(0x01)CompObj
.. SAME (93 bytes matched)
CONTENTS
.. SAME (20664 bytes matched)
(0x03)ObjInfo
.. SAME (6 bytes matched)
(0x01)Ole
.. SAME (20 bytes matched)
(0x03)EPRINT
.. SAME (1454420 bytes matched)
The documents actually appear in the PropertyTable in different orders
(ascertained from some debug I added to POIFSFileSystem).
EPRINT
ObjInfo
CONTENTS
CompObj
Ole
versus:
CompObj
ObjInfo
CONTENTS
EPRINT
Ole
And, it is likely that the documents themselves are stored in different
sectors.
This means that there is likely to be a lot of noise comparing the
respective oleObject1.bin parts.
So, any suggestions as to what to do next? Are there any tools around with
which to compare the parts at some level lower than the
DocumentInputStreams, but higher than a byte[] level?
cheers
Jason
--
View this message in context:
http://www.nabble.com/Can-POIFS-convert-PDF-to-OLE-tp18568081p21324210.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]