Eric van der Vlist wrote:

No, as far as I can tell, the names of the documents are fixed excepted
for pictures.

The structure of an OpenOffice document archive is described in a
manifest file that looks like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" 
"Manifest.dtd">
<manifest:manifest xmlns:manifest="http://openoffice.org/2001/manifest";>
 <manifest:file-entry manifest:media-type="application/vnd.sun.xml.writer" 
manifest:full-path="/"/>
 <manifest:file-entry manifest:media-type="image/png" 
manifest:full-path="Pictures/10000000000001DA0000005BF70F3350.png"/>
 <manifest:file-entry manifest:media-type="image/png" 
manifest:full-path="Pictures/1000020000000055000000255789B5EB.png"/>
 <manifest:file-entry manifest:media-type="" manifest:full-path="Pictures/"/>
 <manifest:file-entry manifest:media-type="appication/binary" 
manifest:full-path="layout-cache"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="content.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="styles.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="meta.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="settings.xml"/>
</manifest:manifest>

If you take this document and include the content of each document in
the manifest:file-entry elements except for "/" and maybe layout-cache
(either as plain XML, CDATA or base64 based on the media type like you
are doing for streams), you get something that is easy to process in a
pipeline, easy to split through XPointer and easy to reconstitute
through XSLT.

And, if you want to use attach external documents, you can use XInclude.

Looks like that solves a lot of issues and that's probably what I am
going to implement!

The frequent use case where you want to transform content.xml to replace
content while keeping the same presentation would become:

      * Use a converter to convert an OOo model into this XML format or
        retrieve this XML format directly from a conversion previously
        stored in a XML database.
      * Apply a transformation either on the content.xml fragment using
        an XPointer expression for the data entry of the transformation
        (that allows you to use transformations defined as OOo filters)
        or on the compound document.
      * Use a converter to convert either directly the result of the
        transformation or the result of a XSLT transformation
        reconstructing the the compound document.

I think that this is the option I'll be implementing!

As a variation on this, you could is provide a configuration document that would allow to filter what files the processors embeds in the manifest:file-entry elements. This would make sense for extractions scenarios, where you have a large OOo document but just want to extract a small part of it for processing.


If the scenario is most of the time to extract and then recombine, then you don't really need such configuration.

Now isn't the OOo format just a zip file? Could we just implement a processor implementing unzipping in a generic way? Then of course the content of the individual files would have to be decoded / parsed in a separate step, but you could do it with a p:for-each. Maybe that would just be less user-friendly.

-Erik


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ orbeon-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/orbeon-user

Reply via email to