RE: FOP -> POI

2024-07-11 Thread Jan Tosovsky
Hi all,

> On 28 May 2015, at 21:53, Andreas Delmelle wrote:
> > On 28 May 2015, at 21:29, Jan Tosovsky wrote:
> > > On 2015-05-27 Matthias Reischenbacher wrote:
> > > 
> > > I know pptx a bit, because I had to implement an output channel based 
> > > mainly on XSLT and a little bit of java (used mainly for zip 
> > > compression).
> >
> > ... My idea was rather a dedicated project dependent on both
'libraries', utilizing 
> > the best of both worlds.
> >
> In the meantime, having given Jan's replies from yesterday and today some
further thought, 
> I was thinking it may be possible to take the route of adding a SAX
ContentHandler to the 
> processing chain that translates the events from the AreaTreeHandler or
the IFHandler into
> POI API calls... Something like that?

After years of hiatus I finally dedicated some time to my old idea
(converting FOP IF to the PPTX output, without modifying FOP codebase). And
found the result is almost useless if that PPTX needs to be edited (e.g.
translated) as all the text is split into separate lines and even these are
broken further into smaller chunks in the case of inline styles. So I am
leaving this idea. 
https://github.com/doctribute/fop-to-powerpoint-converter

Jan



Re: FOP -> POI

2015-05-28 Thread Andreas Delmelle
Hi Jan, Matthias,

> On 28 May 2015, at 21:29, Jan Tosovsky  wrote:
> 
> Hi Matthias,
> 
> On 2015-05-27 Matthias Reischenbacher wrote:
>> 
>> I know pptx a bit, because I had to implement an output channel based
>> mainly on XSLT and a little bit of java (used mainly for zip
>> compression). 
> 
> Just for curiosity, what was the source format? Could this conversion be
> open-sourced?
> 
>> +1 for adding a pptx output to FOP, but I'd recommend doing it without
> POI.
> 
> I am afraid it would lead to lots of code duplication. My idea was rather a
> dedicated project dependent on both 'libraries', utilizing the best of both
> worlds.

Matthias does make a good point about the dependency.

In the meantime, having given Jan's replies from yesterday and today some 
further thought, I was thinking it may be possible to take the route of adding 
a SAX ContentHandler to the processing chain that translates the events from 
the AreaTreeHandler or the IFHandler into POI API calls... Something like that?

I have to agree with the position that adding a hard dependency on POI would be 
too much, but maybe it could have merit as a plugin / optional runtime 
dependency (?) Sort of like how one currently needs the PDFBox plugin to 
include PDFs via fox:external-document ?

If one wanted to use it, it would be technically possible and not too hard to 
achieve, but it would not come out of the box, i.e. POI would not be required 
to compile and build FOP.

Ideas...? Plenty of those over here.


KR

Andreas


RE: FOP -> POI

2015-05-28 Thread Jan Tosovsky
Hi Matthias,

On 2015-05-27 Matthias Reischenbacher wrote:
> 
> I know pptx a bit, because I had to implement an output channel based
> mainly on XSLT and a little bit of java (used mainly for zip
> compression). 

Just for curiosity, what was the source format? Could this conversion be
open-sourced?

> +1 for adding a pptx output to FOP, but I'd recommend doing it without
POI.

I am afraid it would lead to lots of code duplication. My idea was rather a
dedicated project dependent on both 'libraries', utilizing the best of both
worlds.

Jan



Re: FOP -> POI

2015-05-27 Thread Matthias Reischenbacher

Hi Jan,

I know pptx a bit, because I had to implement an output channel based 
mainly on XSLT and a little bit of java (used mainly for zip 
compression). Pptx is hard to understand because of all the cross 
references between the different files, but that wouldn't justify to add 
another dependency to FOP. So +1 for adding a pptx output to FOP, but 
I'd recommend doing it without POI.


BR,
Matthias

On 27.05.2015 17:26, Jan Tosovsky wrote:

Hi Andreas,

On 2015-05-27 Andreas Delmelle wrote:

On 2015-05-27 Jan Tosovsky wrote:

On 2015-05-25 Andreas Delmelle wrote:


it seems like it may just be possible to achieve something like
this by means of FOP's Intermediate Formats[*], which can already
be utilised to split up the basic formatting and rendering
processes.

This approach could theoretically elimintate POI completely as most
of IF -> PPTX could be done via XSLT ;-)
But it is too low level for me.

Can you clarify? Not sure I am completely following here... Is POI
*not* low level, then?


I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is
added, its ID is registered in the main file. The slide is derived from a
default template, which must be linked via its ID. Slides can have
annotations, there is also a special template for them, it has to be linked
to the slide as well, and so on...

This cross referencing in POI is out of the box. You just add a new slide
and all references are updated automatically. Yes, it is doable in pure
XSLT, but with additional effort.

Another thing is bulding the final PPTX file.
  

What you rather vaguely describe as "the required info somewhere in the
memory before serializing it into PDF" for FOP basically *is* the Area
Tree. The AT and IF XML formats are just XML representations of said
info, so seems like you would not get around it either way...?

I simply didn't know that.

Thanks for explanation,

Jan





RE: FOP -> POI

2015-05-27 Thread Jan Tosovsky
Hi Andreas,

On 2015-05-27 Andreas Delmelle wrote:
> On 2015-05-27 Jan Tosovsky wrote:
> > On 2015-05-25 Andreas Delmelle wrote:
> > > 
> > > it seems like it may just be possible to achieve something like
> > > this by means of FOP's Intermediate Formats[*], which can already
> > > be utilised to split up the basic formatting and rendering
> > > processes.
> >
> > This approach could theoretically elimintate POI completely as most
> > of IF -> PPTX could be done via XSLT ;-)
> > But it is too low level for me.
> 
> Can you clarify? Not sure I am completely following here... Is POI
> *not* low level, then?
>

I meant PPTX side, mainly various 'dictionaries'. When e.g. a new slide is
added, its ID is registered in the main file. The slide is derived from a
default template, which must be linked via its ID. Slides can have
annotations, there is also a special template for them, it has to be linked
to the slide as well, and so on...

This cross referencing in POI is out of the box. You just add a new slide
and all references are updated automatically. Yes, it is doable in pure
XSLT, but with additional effort.

Another thing is bulding the final PPTX file.
 
> What you rather vaguely describe as "the required info somewhere in the
> memory before serializing it into PDF" for FOP basically *is* the Area
> Tree. The AT and IF XML formats are just XML representations of said
> info, so seems like you would not get around it either way...?

I simply didn't know that.

Thanks for explanation,

Jan



Re: FOP -> POI

2015-05-27 Thread Andreas Delmelle
Hi Jan

> On 27 May 2015, at 21:22, Jan Tosovsky  wrote:
> 
> On 2015-05-25 Andreas Delmelle wrote:
>> 
>> it seems like it may just be possible to achieve something like 
>> this by means of FOP's Intermediate Formats[*], which can already
>> be utilised to split up the basic formatting and rendering processes.
> 
> This approach could theoretically elimintate POI completely as most of IF ->
> PPTX could be done via XSLT ;-)

Right... I got to thinking that as well. 
Of course, that happened only after I had already sent this. :)

> But it is too low level for me.

Can you clarify? Not sure I am completely following here... Is POI *not* low 
level, then? 
I mean: it is basically an API to read/write MS Office document formats, so 
would require some additional code as well, albeit Java instead of XSLT?

What you rather vaguely describe as "the required info somewhere in the memory 
before serializing it into PDF" for FOP basically *is* the Area Tree. The AT 
and IF XML formats are just XML representations of said info, so seems like you 
would not get around it either way...?

That said, I feel like I may be missing a crucial piece of info here.

> Anyway, I'll investigate also POI end.

OK, cool. If you do see a way that the fop-dev team can be of assistance, feel 
free to report back here.


KR

Andreas

RE: FOP -> POI

2015-05-27 Thread Jan Tosovsky
On 2015-05-25 Andreas Delmelle wrote:
> > On 2015-05-25 Jan Tosovsky wrote:
> >
> > can you hypothetically imagine any way how to convert virtual page
> > objects to the office document structure? I actually I think of 
> > 'Slides' to PPTX (XSLF) conversion.
> > 
> > There is not an easy way to produce paginated PPTX content using 
> > pure XSLT. But FOP has all the required info somewhere in the 
> > memory before serializing it into PDF, which could be somehow 
> > pushed to POI.
>
> it seems like it may just be possible to achieve something like 
> this by means of FOP's Intermediate Formats[*], which can already
> be utilised to split up the basic formatting and rendering processes.

This approach could theoretically elimintate POI completely as most of IF ->
PPTX could be done via XSLT ;-) But it is too low level for me.

Anyway, I'll investigate also POI end.

Jan
<>

Re: FOP -> POI

2015-05-25 Thread Andreas Delmelle
> On 25 May 2015, at 21:16, Jan Tosovsky  wrote:
> 

Hi Jan

> can you hypothetically imagine any way how to convert virtual page objects
> to the office document structure? I actually I think of 'Slides' to PPTX
> (XSLF) conversion.

Very interesting question...
Somewhat related, as I recall, a suggestion/feature request has been raised in 
the past to add OpenOffice's document format as a potential new output format 
to FOP.

> There is not an easy way to produce paginated PPTX content using pure XSLT.
> But FOP has all the required info somewhere in the memory before serializing
> it into PDF, which could be somehow pushed to POI.

I must admit that I am unfamiliar with the most recent Apache POI API. Last 
time I looked at POI must have been almost 10 years ago.

That said, it seems like it may just be possible to achieve something like this 
by means of FOP's Intermediate Formats[*], which can already be utilised to 
split up the basic formatting and rendering processes.

[*] see: http://xmlgraphics.apache.org/fop/trunk/intermediate.html

While it is still an XML format, the benefit would be that it is already 
paginated, which may make it easier to generate PPTX slide-decks from. 
Basically, you would use FOP to create an IF file (or stream) from XSL-FO 
input, as a basis for PDF rendering on the one hand, and then somehow feed that 
same intermediate file to POI for creation of the PPTX. Basic formatting and 
pagination would be done once, through FOP's layout engine.

Not sure what POI can handle as input, though, or how difficult it would be to 
make it handle FOP's IF...


Not sure if that goes in the direction of what you were looking for, but hope 
this helps!



Andreas