RE: FOP -> POI
Hi all, > On 28 May 2015, at 21:53, Andreas Delmelle wrote: > > On 28 May 2015, at 21:29, Jan Tosovsky wrote: > > > On 2015-05-27 Matthias Reischenbacher wrote: > > > > > > I know pptx a bit, because I had to implement an output channel based > > > mainly on XSLT and a little bit of java (used mainly for zip > > > compression). > > > > ... My idea was rather a dedicated project dependent on both 'libraries', utilizing > > the best of both worlds. > > > In the meantime, having given Jan's replies from yesterday and today some further thought, > I was thinking it may be possible to take the route of adding a SAX ContentHandler to the > processing chain that translates the events from the AreaTreeHandler or the IFHandler into > POI API calls... Something like that? After years of hiatus I finally dedicated some time to my old idea (converting FOP IF to the PPTX output, without modifying FOP codebase). And found the result is almost useless if that PPTX needs to be edited (e.g. translated) as all the text is split into separate lines and even these are broken further into smaller chunks in the case of inline styles. So I am leaving this idea. https://github.com/doctribute/fop-to-powerpoint-converter Jan
Re: FOP -> POI
Hi Jan, Matthias, > On 28 May 2015, at 21:29, Jan Tosovsky wrote: > > Hi Matthias, > > On 2015-05-27 Matthias Reischenbacher wrote: >> >> I know pptx a bit, because I had to implement an output channel based >> mainly on XSLT and a little bit of java (used mainly for zip >> compression). > > Just for curiosity, what was the source format? Could this conversion be > open-sourced? > >> +1 for adding a pptx output to FOP, but I'd recommend doing it without > POI. > > I am afraid it would lead to lots of code duplication. My idea was rather a > dedicated project dependent on both 'libraries', utilizing the best of both > worlds. Matthias does make a good point about the dependency. In the meantime, having given Jan's replies from yesterday and today some further thought, I was thinking it may be possible to take the route of adding a SAX ContentHandler to the processing chain that translates the events from the AreaTreeHandler or the IFHandler into POI API calls... Something like that? I have to agree with the position that adding a hard dependency on POI would be too much, but maybe it could have merit as a plugin / optional runtime dependency (?) Sort of like how one currently needs the PDFBox plugin to include PDFs via fox:external-document ? If one wanted to use it, it would be technically possible and not too hard to achieve, but it would not come out of the box, i.e. POI would not be required to compile and build FOP. Ideas...? Plenty of those over here. KR Andreas
RE: FOP -> POI
Hi Matthias, On 2015-05-27 Matthias Reischenbacher wrote: > > I know pptx a bit, because I had to implement an output channel based > mainly on XSLT and a little bit of java (used mainly for zip > compression). Just for curiosity, what was the source format? Could this conversion be open-sourced? > +1 for adding a pptx output to FOP, but I'd recommend doing it without POI. I am afraid it would lead to lots of code duplication. My idea was rather a dedicated project dependent on both 'libraries', utilizing the best of both worlds. Jan
Re: FOP -> POI
Hi Jan, I know pptx a bit, because I had to implement an output channel based mainly on XSLT and a little bit of java (used mainly for zip compression). Pptx is hard to understand because of all the cross references between the different files, but that wouldn't justify to add another dependency to FOP. So +1 for adding a pptx output to FOP, but I'd recommend doing it without POI. BR, Matthias On 27.05.2015 17:26, Jan Tosovsky wrote: Hi Andreas, On 2015-05-27 Andreas Delmelle wrote: On 2015-05-27 Jan Tosovsky wrote: On 2015-05-25 Andreas Delmelle wrote: it seems like it may just be possible to achieve something like this by means of FOP's Intermediate Formats[*], which can already be utilised to split up the basic formatting and rendering processes. This approach could theoretically elimintate POI completely as most of IF -> PPTX could be done via XSLT ;-) But it is too low level for me. Can you clarify? Not sure I am completely following here... Is POI *not* low level, then? I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is added, its ID is registered in the main file. The slide is derived from a default template, which must be linked via its ID. Slides can have annotations, there is also a special template for them, it has to be linked to the slide as well, and so on... This cross referencing in POI is out of the box. You just add a new slide and all references are updated automatically. Yes, it is doable in pure XSLT, but with additional effort. Another thing is bulding the final PPTX file. What you rather vaguely describe as "the required info somewhere in the memory before serializing it into PDF" for FOP basically *is* the Area Tree. The AT and IF XML formats are just XML representations of said info, so seems like you would not get around it either way...? I simply didn't know that. Thanks for explanation, Jan
RE: FOP -> POI
Hi Andreas, On 2015-05-27 Andreas Delmelle wrote: > On 2015-05-27 Jan Tosovsky wrote: > > On 2015-05-25 Andreas Delmelle wrote: > > > > > > it seems like it may just be possible to achieve something like > > > this by means of FOP's Intermediate Formats[*], which can already > > > be utilised to split up the basic formatting and rendering > > > processes. > > > > This approach could theoretically elimintate POI completely as most > > of IF -> PPTX could be done via XSLT ;-) > > But it is too low level for me. > > Can you clarify? Not sure I am completely following here... Is POI > *not* low level, then? > I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is added, its ID is registered in the main file. The slide is derived from a default template, which must be linked via its ID. Slides can have annotations, there is also a special template for them, it has to be linked to the slide as well, and so on... This cross referencing in POI is out of the box. You just add a new slide and all references are updated automatically. Yes, it is doable in pure XSLT, but with additional effort. Another thing is bulding the final PPTX file. > What you rather vaguely describe as "the required info somewhere in the > memory before serializing it into PDF" for FOP basically *is* the Area > Tree. The AT and IF XML formats are just XML representations of said > info, so seems like you would not get around it either way...? I simply didn't know that. Thanks for explanation, Jan
Re: FOP -> POI
Hi Jan > On 27 May 2015, at 21:22, Jan Tosovsky wrote: > > On 2015-05-25 Andreas Delmelle wrote: >> >> it seems like it may just be possible to achieve something like >> this by means of FOP's Intermediate Formats[*], which can already >> be utilised to split up the basic formatting and rendering processes. > > This approach could theoretically elimintate POI completely as most of IF -> > PPTX could be done via XSLT ;-) Right... I got to thinking that as well. Of course, that happened only after I had already sent this. :) > But it is too low level for me. Can you clarify? Not sure I am completely following here... Is POI *not* low level, then? I mean: it is basically an API to read/write MS Office document formats, so would require some additional code as well, albeit Java instead of XSLT? What you rather vaguely describe as "the required info somewhere in the memory before serializing it into PDF" for FOP basically *is* the Area Tree. The AT and IF XML formats are just XML representations of said info, so seems like you would not get around it either way...? That said, I feel like I may be missing a crucial piece of info here. > Anyway, I'll investigate also POI end. OK, cool. If you do see a way that the fop-dev team can be of assistance, feel free to report back here. KR Andreas
RE: FOP -> POI
On 2015-05-25 Andreas Delmelle wrote: > > On 2015-05-25 Jan Tosovsky wrote: > > > > can you hypothetically imagine any way how to convert virtual page > > objects to the office document structure? I actually I think of > > 'Slides' to PPTX (XSLF) conversion. > > > > There is not an easy way to produce paginated PPTX content using > > pure XSLT. But FOP has all the required info somewhere in the > > memory before serializing it into PDF, which could be somehow > > pushed to POI. > > it seems like it may just be possible to achieve something like > this by means of FOP's Intermediate Formats[*], which can already > be utilised to split up the basic formatting and rendering processes. This approach could theoretically elimintate POI completely as most of IF -> PPTX could be done via XSLT ;-) But it is too low level for me. Anyway, I'll investigate also POI end. Jan <>
Re: FOP -> POI
> On 25 May 2015, at 21:16, Jan Tosovsky wrote: > Hi Jan > can you hypothetically imagine any way how to convert virtual page objects > to the office document structure? I actually I think of 'Slides' to PPTX > (XSLF) conversion. Very interesting question... Somewhat related, as I recall, a suggestion/feature request has been raised in the past to add OpenOffice's document format as a potential new output format to FOP. > There is not an easy way to produce paginated PPTX content using pure XSLT. > But FOP has all the required info somewhere in the memory before serializing > it into PDF, which could be somehow pushed to POI. I must admit that I am unfamiliar with the most recent Apache POI API. Last time I looked at POI must have been almost 10 years ago. That said, it seems like it may just be possible to achieve something like this by means of FOP's Intermediate Formats[*], which can already be utilised to split up the basic formatting and rendering processes. [*] see: http://xmlgraphics.apache.org/fop/trunk/intermediate.html While it is still an XML format, the benefit would be that it is already paginated, which may make it easier to generate PPTX slide-decks from. Basically, you would use FOP to create an IF file (or stream) from XSL-FO input, as a basis for PDF rendering on the one hand, and then somehow feed that same intermediate file to POI for creation of the PPTX. Basic formatting and pagination would be done once, through FOP's layout engine. Not sure what POI can handle as input, though, or how difficult it would be to make it handle FOP's IF... Not sure if that goes in the direction of what you were looking for, but hope this helps! Andreas