page-breaking strategies and performance

2005-03-01 Thread Jeremias Maerki
I finally have Knuth's Digital Typography and let myself enlighten by
his well-written words. In [1] Simon outlined different strategies for
page-breaking, obviously closely following the different approaches
defined by Knuth. At first glance, I'd say that best-fit is probably
the obvious strategy to select, especially if TeX is happy with it.
Obviously, it can't find the optimal solution like this but the
additional overhead (memory and CPU power) of a look-ahead/total-fit
strategy is simply too much and unnecessary for things like invoices and
insurance policies which are surely some of the most popular use cases
of XSL-FO. Here, speed is extremely important. People writing
documentation (maybe using DocBook) or glossy stock reports have
additional requirements and don't mind the longer processing time and
additional memory requirements. This leads me to the question if we
shouldn't actually implement two page-breaking strategies (in the end,
not both right now). For a speed-optimized algorithm, we could even
think about ignoring side-floats.

Obviously, in this model we would have to make sure that we use a common
model for both strategies. For example, we still have to make sure that
the line layout gets information on the available IPD on each line, but
probably this will not be a big problem to include later.

An enhanced/adjusted box/glue/penalty model sounds like a good idea to
me especially since Knuth hints at that in his book, too. There's also a
question if part of the infrastructure from line breaking can be reused
for page breaking, but I guess rather not.

As for the plan to implement a new page-breaking mechanism: I've got to
do it now. :-) I'm sorry if this may put some pressure on some of you.
I'm also not sure if I'm fit already to tackle it, but I've got to
do it anyway. Since I don't want to work with a series of patches like
you guys did earlier, I'd like to create a branch to do that on as soon
as we've agreed on a strategy. Any objections to that?

[1] http://wiki.apache.org/xmlgraphics-fop/PageLayout 

Jeremias Maerki



Skype-conference on page-breaking?

2005-03-01 Thread Jeremias Maerki
To speed things up could we hold a conference (using Skype, for example)
to discuss further details on page-breaking? I'd volunteer to sum up any
results during that discussion for the archives. I have Finn on my Skype
radar already.

Jeremias Maerki



Re: DO NOT REPLY [Bug 33760] New: - [Patch] current AWTRenderer

2005-03-01 Thread Renaud Richardet
Victor and Jeremias, thanks for your Inputs.

Victor, I've checked out your aXSL. I'll study it and come back to you
if I have questions.

Jeremias wrote:

  Speaking of startVParea(), could we rename it to something more meanigfull?
  Proposition: TransformPosition, or something like this.
  Deleted the methods moved to AbstractRenderer.
 Actually, I like startVParea() (or rather startViewportArea like I would
 rather call it) because only for viewport a new transformation matrix is
 necessary. I think when you port the matrix concatenation from the PDF
 renderer over to Java2D in startVParea() you will start to understand
 what's going on here. 
OK,  thanks. That makes sense.

  fop.area.CTM: added two getters for e and f. If there's another way to get 
  those
  values, please let me know.
 Normally, we use toArray() but I guess these two getters are ok and
 don't hurt although I think they are not necessary because you need to
 use all other values in the CTM, too, to get the reference orientation
 stuff right. See above.
OK, I'll use the available toArray() instead.

  The enclosed image doesn't have ipd/bpd
  either. Again: is this normal so? I have a workaround in mind (getting those
  values through the FopImage), but it doesn't sound right.
 In this case it is probably better to fix the LMs. I've started doing
 that but haven't finished. ATM this is lower priority for me. I can send
 you my current code if you want to try to fix it. Shouldn't be so
 difficult.
I would also prefer to fix the LM's. I don't want to go into it now
(too complex for me ATM), but I'll come back to you later.
 
  renderTextDecoration(InlineArea) seems to work, even if it's not 
  implemented??
 Huh? It was you who moved the implementation up from PDFRenderer to
 AbstractRenderer. That's how you implemented it. Inheritance!
I mean renderTextDecoration(InlineArea) from AbstractRenderer, which
is an empty ATM . Did you mean renderTextDecoration(Font fs,
InlineArea Inline, int baseline, int startx) instead?
But I think I got in now: when I run examples/fo/basic/textdeko.fo ,
the underline of the sentence This is a whole block wrapped in
fo:inline with the property text-decoration=underline. Some more
Text to get at least two lines. works ok. This is because the
TextArea handles the underline (via renderTextDecoration(Font fs,
InlineArea Inline, int baseline, int startx) ) and the
renderTextDecoration(InlineArea) doesn't do anything.

 BTW, Using Graphics.create() you should be able to create a copy of the
 current Graphics2D object. By pushing the old one on a stack and
 overwriting the graphics member variable should should be able to create
 the same effect as with currentState.push()/saveGraphicsState() in
 PDFRenderer.startVParea () and currentState.pop()/restoreGraphicsState
 ()in endVParea(). When leaving a VP area you can simply restore an older
 Graphics2D object for the stack and continue painting. This will undo
 any transformations and state change done in the copy used within the VP
 area. See second paragraph in javadocs of java.awt.Graphics.
Sounds very good. Why haven't I thought of it ? ;)

 Another thought: One of my low-priority tasks is to create a little
 application that renders a test suite with all of FOP's renderers
 creating bitmap images for each generated document and ultimately
 creating a little website that lets us compare the output. PDFs and PS
 files can be converted to bitmaps using GhostScript. Maybe you might
 want to write such a thingy. I won't get to it before I get to updating
 the PS renderer to full quality.
That would be good. Do you mean something like the Bitmap production
you documented on FopAndJava2D [1]? This is what I intend to work on
after the basic Java2DRenderer works.

Thanks for your valuable comments. I'll work them out carefully and
post an improved patch.

Regards, 
Renaud

[1] http://wiki.apache.org/xmlgraphics-fop/FopAndJava2D


Re: Skype-conference on page-breaking?

2005-03-01 Thread Renaud Richardet
I would be please to listen.

Renaud


RE: page-breaking strategies and performance

2005-03-01 Thread Victor Mote
Jeremias Maerki wrote:

 processing time and additional memory requirements. This 
 leads me to the question if we shouldn't actually implement 
 two page-breaking strategies (in the end, not both right 
 now). For a speed-optimized algorithm, we could even think 
 about ignoring side-floats.
 
 Obviously, in this model we would have to make sure that we 
 use a common model for both strategies. For example, we still 
 have to make sure that the line layout gets information on 
 the available IPD on each line, but probably this will not be 
 a big problem to include later.

This is an excellent idea. It has from time to time gone under the moniker
LayoutStrategy or pluggable layout. To do it without duplicating everything
requires that the other pieces of the system be modularized, the concerns
separated so that they can be reused. The upside is tremendous and the cost
pays for itself in developer productivity.

Victor Mote



Re: [XML Graphics - FOP Wiki] Updated: PageLayout

2005-03-01 Thread Jeremias Maerki
Simon, I've tried to think your example through. If I read the spec
right about space resolution then I get the impression that we may need
to do more in this area than find a suitable box/glue/penalty
combination. There may be several spaces which need to be taken into
account during resolution. There's the precedence and the conditionality
that needs to be evaluated. I think we may need to create special
elements that can hold this information (or reference it). They need to
be distinguishable so we can apply the resolution rules properly.

I believe your example should then look like this:

- box
- penalty (w=0, p=infinite)
- space
- glue (w=0, y=0, z=0)
- space
- penalty (w=0, p=infinite)
- box


A more complex example would look like this:

fo:block space-after=5pt
  fo:blocka line/fo:block
  fo:block space-after=3pt
 blah blah
  /fo:block
/fo:block
fo:block space-before=10pt
  blah bla
/fo:block

- box (a line)
- box (blah blah)
- penalty (w=0, p=infinite)
- space (w=3pt, ref to the space property)
- penalty (w=0, p=infinite)
- space (w=5pt, ref to the space property)
- glue (w=0, y=0, z=0)
- space (w=10pt, ref to the space property)
- penalty (w=0, p=infinite)
- box

The algorithm would have to track down the space element before and
after the break and then apply the space resolution rules. The space
elements would behave much like glue elements.

What do you think?

On 25.02.2005 22:50:17 SimonPepping wrote:
 +=== Space specifiers ===
 +
 +When the space specifiers resolve to zero around a page break, we are
 +in the same situation as that of a word space in line breaking. It is
 +represented by the sequence `box - glue - box`.
 +
 +When the space specifiers do not resolve to zero around a page break,
 +we are in the same situation as that of a word space in line breaking
 +in the case of centered lines. It is represented by the sequence 
 +{{{
 +box - infinite penalty - glue(ha) - zero penalty - glue(hn-ha-hb) - zero 
 width box - infinite penalty - glue(hb) - box
 +}}}
 +where ha is the bpd of
 +the space-after before the page break, hb is the bpd of the
 +space-before after the page-break, hw is the space when there is no
 +page break.


Jeremias Maerki



DO NOT REPLY [Bug 33597] - [Patch] for xdocs Design and Implementation

2005-03-01 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=33597.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=33597


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED




--- Additional Comments From [EMAIL PROTECTED]  2005-03-01 21:48 ---
Renaud,

Applied. Thanks.

There is a patch with a refactoring of the implementation of Knuth's
algorithm, in bug 32612. It does not change the basic algorithm. You
may wish to wait with any documentation until that is applied.

Simon


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


Re: Skype-conference on page-breaking?

2005-03-01 Thread Simon Pepping
On Tue, Mar 01, 2005 at 03:09:46PM +0100, Jeremias Maerki wrote:
 To speed things up could we hold a conference (using Skype, for example)
 to discuss further details on page-breaking? I'd volunteer to sum up any
 results during that discussion for the archives. I have Finn on my Skype
 radar already.

I do not have a broadband connection, and therefore no Skype or other
VoIP.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: page-breaking strategies and performance

2005-03-01 Thread Simon Pepping
On Tue, Mar 01, 2005 at 03:02:38PM +0100, Jeremias Maerki wrote:
 As for the plan to implement a new page-breaking mechanism: I've got to
 do it now. :-) I'm sorry if this may put some pressure on some of you.
 I'm also not sure if I'm fit already to tackle it, but I've got to
 do it anyway. Since I don't want to work with a series of patches like
 you guys did earlier, I'd like to create a branch to do that on as soon
 as we've agreed on a strategy. Any objections to that?

That is a good idea.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: page-breaking strategies and performance

2005-03-01 Thread Simon Pepping
On Tue, Mar 01, 2005 at 07:52:27AM -0700, Victor Mote wrote:
 Jeremias Maerki wrote:
 
  processing time and additional memory requirements. This 
  leads me to the question if we shouldn't actually implement 
  two page-breaking strategies (in the end, not both right 
  now). For a speed-optimized algorithm, we could even think 
  about ignoring side-floats.
  
  Obviously, in this model we would have to make sure that we 
  use a common model for both strategies. For example, we still 
  have to make sure that the line layout gets information on 
  the available IPD on each line, but probably this will not be 
  a big problem to include later.
 
 This is an excellent idea. It has from time to time gone under the moniker
 LayoutStrategy or pluggable layout. To do it without duplicating everything
 requires that the other pieces of the system be modularized, the concerns
 separated so that they can be reused. The upside is tremendous and the cost
 pays for itself in developer productivity.

The idea of having two page breaking strategies is OK. But it is also
a goal that is yet far over the horizon.

I hope this is smaller than having pluggable layout. We should try to
express the layout constraints in a simple language, which can be used
by the algorithms of both strategies. Knuth's model is an effort to
achieve that, and a PageLayoutManager which receives the Knuth
elements and invokes the appropriate algorithm goes with it.

Such a setup should not only enable multiple page breaking strategies,
but also help us implement a simple strategy to start with, and
gradually evolve it to a higher level of sophistication.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: page-breaking strategies and performance

2005-03-01 Thread Jeremias Maerki

On 01.03.2005 22:25:12 Simon Pepping wrote:
 On Tue, Mar 01, 2005 at 07:52:27AM -0700, Victor Mote wrote:
  Jeremias Maerki wrote:
  
   processing time and additional memory requirements. This 
   leads me to the question if we shouldn't actually implement 
   two page-breaking strategies (in the end, not both right 
   now). For a speed-optimized algorithm, we could even think 
   about ignoring side-floats.
   
   Obviously, in this model we would have to make sure that we 
   use a common model for both strategies. For example, we still 
   have to make sure that the line layout gets information on 
   the available IPD on each line, but probably this will not be 
   a big problem to include later.
  
  This is an excellent idea. It has from time to time gone under the moniker
  LayoutStrategy or pluggable layout. To do it without duplicating everything
  requires that the other pieces of the system be modularized, the concerns
  separated so that they can be reused. The upside is tremendous and the cost
  pays for itself in developer productivity.
 
 The idea of having two page breaking strategies is OK. But it is also
 a goal that is yet far over the horizon.

Right. What I'd like to achieve is having a usable layout engine with
the minimum of effort for most use cases but without blocking our way in
terms of full compliance like what happened with the old code base. I
also don't want to invest to much time in an infrastructure to support
pluggable strategies, only that we keep it in mind while we build the
first one.

 I hope this is smaller than having pluggable layout.

My hope, too. The critical part is to determine the model that helps us
express all the elements of the XSL-FO standard.

 We should try to
 express the layout constraints in a simple language, which can be used
 by the algorithms of both strategies. Knuth's model is an effort to
 achieve that, and a PageLayoutManager which receives the Knuth
 elements and invokes the appropriate algorithm goes with it.

That's what I'm currently trying to figure out. I guess we'll need to
sketch all the different layout elements that we need to support like
you started.

 Such a setup should not only enable multiple page breaking strategies,
 but also help us implement a simple strategy to start with, and
 gradually evolve it to a higher level of sophistication.

That's the idea.


Jeremias Maerki



Re: Skype-conference on page-breaking?

2005-03-01 Thread The Web Maestro
I'd be happy to 'participate' although I don't have a skype acct yet. I 
don't know what I can offer, but I'm here to help!

Cheers!
On Mar 1, 2005, at 2:31 PM, Jeremias Maerki wrote:
Maybe I could hook you into a Skype conference by using SkypeOut. It's 
pretty
cheap to call to the Netherlands. According to the FAQ this is 
possible.

On 01.03.2005 22:26:50 Simon Pepping wrote:
On Tue, Mar 01, 2005 at 03:09:46PM +0100, Jeremias Maerki wrote:
To speed things up could we hold a conference (using Skype, for 
example)
to discuss further details on page-breaking? I'd volunteer to sum up 
any
results during that discussion for the archives. I have Finn on my 
Skype
radar already.
I do not have a broadband connection, and therefore no Skype or other
VoIP.
Jeremias Maerki
Web Maestro Clay
--
[EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/flow TableBody.java

2005-03-01 Thread Glen Mazza
OH!!!  lightBulb state=on wattage=25/ 

Yes, you're right, Chris--now I see the issue.  I
implemented validation for about 80% of the FOs, but
80% is not 100%.  fo:table-body never had any
validation implemented, hence the NPE's that were
occurring.  

Sorry, Jeremias, I thought you had just gratuitously
*removed* the validation from fo:table-body -- I
should have researched that it wasn't there to begin
with.

Thanks,
Glen


--- Chris Bowditch [EMAIL PROTECTED] wrote:
 
 Glen:
 
 All Jeremias was doing was changing the code to
 prevent a rather nasty NPE in 
 the event of an empty fo:table-body. Surely you
 cannot be arguging that the 
 NPE be restored?!?
 
 Chris
 
 snip/