RE: FOP - POI

2015-05-28 Thread Jan Tosovsky
Hi Matthias,

On 2015-05-27 Matthias Reischenbacher wrote:
 
 I know pptx a bit, because I had to implement an output channel based
 mainly on XSLT and a little bit of java (used mainly for zip
 compression). 

Just for curiosity, what was the source format? Could this conversion be
open-sourced?

 +1 for adding a pptx output to FOP, but I'd recommend doing it without
POI.

I am afraid it would lead to lots of code duplication. My idea was rather a
dedicated project dependent on both 'libraries', utilizing the best of both
worlds.

Jan



Re: FOP - POI

2015-05-28 Thread Andreas Delmelle
Hi Jan, Matthias,

 On 28 May 2015, at 21:29, Jan Tosovsky j.tosov...@email.cz wrote:
 
 Hi Matthias,
 
 On 2015-05-27 Matthias Reischenbacher wrote:
 
 I know pptx a bit, because I had to implement an output channel based
 mainly on XSLT and a little bit of java (used mainly for zip
 compression). 
 
 Just for curiosity, what was the source format? Could this conversion be
 open-sourced?
 
 +1 for adding a pptx output to FOP, but I'd recommend doing it without
 POI.
 
 I am afraid it would lead to lots of code duplication. My idea was rather a
 dedicated project dependent on both 'libraries', utilizing the best of both
 worlds.

Matthias does make a good point about the dependency.

In the meantime, having given Jan's replies from yesterday and today some 
further thought, I was thinking it may be possible to take the route of adding 
a SAX ContentHandler to the processing chain that translates the events from 
the AreaTreeHandler or the IFHandler into POI API calls... Something like that?

I have to agree with the position that adding a hard dependency on POI would be 
too much, but maybe it could have merit as a plugin / optional runtime 
dependency (?) Sort of like how one currently needs the PDFBox plugin to 
include PDFs via fox:external-document ?

If one wanted to use it, it would be technically possible and not too hard to 
achieve, but it would not come out of the box, i.e. POI would not be required 
to compile and build FOP.

Ideas...? Plenty of those over here.


KR

Andreas


Re: FOP - POI

2015-05-27 Thread Andreas Delmelle
Hi Jan

 On 27 May 2015, at 21:22, Jan Tosovsky j.tosov...@email.cz wrote:
 
 On 2015-05-25 Andreas Delmelle wrote:
 snip /
 it seems like it may just be possible to achieve something like 
 this by means of FOP's Intermediate Formats[*], which can already
 be utilised to split up the basic formatting and rendering processes.
 
 This approach could theoretically elimintate POI completely as most of IF -
 PPTX could be done via XSLT ;-)

Right... I got to thinking that as well. 
Of course, that happened only after I had already sent this. :)

 But it is too low level for me.

Can you clarify? Not sure I am completely following here... Is POI *not* low 
level, then? 
I mean: it is basically an API to read/write MS Office document formats, so 
would require some additional code as well, albeit Java instead of XSLT?

What you rather vaguely describe as the required info somewhere in the memory 
before serializing it into PDF for FOP basically *is* the Area Tree. The AT 
and IF XML formats are just XML representations of said info, so seems like you 
would not get around it either way...?

That said, I feel like I may be missing a crucial piece of info here.

 Anyway, I'll investigate also POI end.

OK, cool. If you do see a way that the fop-dev team can be of assistance, feel 
free to report back here.


KR

Andreas

RE: FOP - POI

2015-05-27 Thread Jan Tosovsky
On 2015-05-25 Andreas Delmelle wrote:
  On 2015-05-25 Jan Tosovsky wrote:
 
  can you hypothetically imagine any way how to convert virtual page
  objects to the office document structure? I actually I think of 
  'Slides' to PPTX (XSLF) conversion.
  
  There is not an easy way to produce paginated PPTX content using 
  pure XSLT. But FOP has all the required info somewhere in the 
  memory before serializing it into PDF, which could be somehow 
  pushed to POI.

 it seems like it may just be possible to achieve something like 
 this by means of FOP's Intermediate Formats[*], which can already
 be utilised to split up the basic formatting and rendering processes.

This approach could theoretically elimintate POI completely as most of IF -
PPTX could be done via XSLT ;-) But it is too low level for me.

Anyway, I'll investigate also POI end.

Jan
attachment: winmail.dat

Re: FOP - POI

2015-05-27 Thread Matthias Reischenbacher

Hi Jan,

I know pptx a bit, because I had to implement an output channel based 
mainly on XSLT and a little bit of java (used mainly for zip 
compression). Pptx is hard to understand because of all the cross 
references between the different files, but that wouldn't justify to add 
another dependency to FOP. So +1 for adding a pptx output to FOP, but 
I'd recommend doing it without POI.


BR,
Matthias

On 27.05.2015 17:26, Jan Tosovsky wrote:

Hi Andreas,

On 2015-05-27 Andreas Delmelle wrote:

On 2015-05-27 Jan Tosovsky wrote:

On 2015-05-25 Andreas Delmelle wrote:

snip /
it seems like it may just be possible to achieve something like
this by means of FOP's Intermediate Formats[*], which can already
be utilised to split up the basic formatting and rendering
processes.

This approach could theoretically elimintate POI completely as most
of IF - PPTX could be done via XSLT ;-)
But it is too low level for me.

Can you clarify? Not sure I am completely following here... Is POI
*not* low level, then?


I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is
added, its ID is registered in the main file. The slide is derived from a
default template, which must be linked via its ID. Slides can have
annotations, there is also a special template for them, it has to be linked
to the slide as well, and so on...

This cross referencing in POI is out of the box. You just add a new slide
and all references are updated automatically. Yes, it is doable in pure
XSLT, but with additional effort.

Another thing is bulding the final PPTX file.
  

What you rather vaguely describe as the required info somewhere in the
memory before serializing it into PDF for FOP basically *is* the Area
Tree. The AT and IF XML formats are just XML representations of said
info, so seems like you would not get around it either way...?

I simply didn't know that.

Thanks for explanation,

Jan





RE: FOP - POI

2015-05-27 Thread Jan Tosovsky
Hi Andreas,

On 2015-05-27 Andreas Delmelle wrote:
 On 2015-05-27 Jan Tosovsky wrote:
  On 2015-05-25 Andreas Delmelle wrote:
   snip /
   it seems like it may just be possible to achieve something like
   this by means of FOP's Intermediate Formats[*], which can already
   be utilised to split up the basic formatting and rendering
   processes.
 
  This approach could theoretically elimintate POI completely as most
  of IF - PPTX could be done via XSLT ;-)
  But it is too low level for me.
 
 Can you clarify? Not sure I am completely following here... Is POI
 *not* low level, then?


I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is
added, its ID is registered in the main file. The slide is derived from a
default template, which must be linked via its ID. Slides can have
annotations, there is also a special template for them, it has to be linked
to the slide as well, and so on...

This cross referencing in POI is out of the box. You just add a new slide
and all references are updated automatically. Yes, it is doable in pure
XSLT, but with additional effort.

Another thing is bulding the final PPTX file.
 
 What you rather vaguely describe as the required info somewhere in the
 memory before serializing it into PDF for FOP basically *is* the Area
 Tree. The AT and IF XML formats are just XML representations of said
 info, so seems like you would not get around it either way...?

I simply didn't know that.

Thanks for explanation,

Jan



Re: FOP - POI

2015-05-25 Thread Andreas Delmelle
 On 25 May 2015, at 21:16, Jan Tosovsky j.tosov...@email.cz wrote:
 

Hi Jan

 can you hypothetically imagine any way how to convert virtual page objects
 to the office document structure? I actually I think of 'Slides' to PPTX
 (XSLF) conversion.

Very interesting question...
Somewhat related, as I recall, a suggestion/feature request has been raised in 
the past to add OpenOffice's document format as a potential new output format 
to FOP.

 There is not an easy way to produce paginated PPTX content using pure XSLT.
 But FOP has all the required info somewhere in the memory before serializing
 it into PDF, which could be somehow pushed to POI.

I must admit that I am unfamiliar with the most recent Apache POI API. Last 
time I looked at POI must have been almost 10 years ago.

That said, it seems like it may just be possible to achieve something like this 
by means of FOP's Intermediate Formats[*], which can already be utilised to 
split up the basic formatting and rendering processes.

[*] see: http://xmlgraphics.apache.org/fop/trunk/intermediate.html

While it is still an XML format, the benefit would be that it is already 
paginated, which may make it easier to generate PPTX slide-decks from. 
Basically, you would use FOP to create an IF file (or stream) from XSL-FO 
input, as a basis for PDF rendering on the one hand, and then somehow feed that 
same intermediate file to POI for creation of the PPTX. Basic formatting and 
pagination would be done once, through FOP's layout engine.

Not sure what POI can handle as input, though, or how difficult it would be to 
make it handle FOP's IF...


Not sure if that goes in the direction of what you were looking for, but hope 
this helps!



Andreas