Re: [Xmlgraphics-fop Wiki] Update of PageLayout/PageAndLineBreaking by SimonPepping

2006-09-18 Thread Simon Pepping
Vincent,

Thanks for your reaction.

On Tue, Sep 12, 2006 at 08:26:49PM +0200, Vincent Hennebert wrote:
 According to that representation paragraphs with inline text have
 legal linebreak points. I consider those legal linebreak points also
 as legal pagebreak points. In addition, there are legal pagebreak
 points between the vertical elements such as paragraphs and blocks.
 
 One issue that will have to be addressed is that widow or
 keep-with-previous settings may invalidate some previously believed
 legal breakpoints. In such cases, active nodes which contain those
 breakpoints in their chains will have to be deactivated; if they were
 the chosen best nodes, some other nodes will somehow have to be
 retrieved to replace them. I hope this won't be a too great difficulty.

Indeed, you mentioned it before. It certainly is a complicating factor.
 
 In page independent linebreaking, for each feasible breakpoint the
 best node is retained, which represents the best layout of the
 paragraph up to that point, and which, due to the dynamic principle,
 is part of the best layout of the whole paragraph if that layout uses
 this breakpoint. If line numbers matter, the best node for each line
 number is retained. In page dependent linebreaking, even that is not
 enough. We must retain the best node for each vertical offset on the
 page, because that is the quantity that influences page breaking. This
 
 Good point. This led me to the following thoughts:
 
 Currently the iteration over the active nodes is broken into two loops:
 one loop for iterating over the line numbers, one for iterating over the
 active nodes associated to each line number. Why? Because if line widths
 aren't the same they have an influence on the computation of best line
 breaks. Because when considering a given legal breakpoint, we must know
 the width of the line it would end in order to be able to compute the
 shrinking/stretching for that line. In fact we make a distinction
 between line numbers because they determine the /context/ in which
 linebreaks are computed. If all the lines had the same widths such a
 distinction wouldn't be necessary.
 
 The merging of line- and page-breakings generalizes this problem of
 differing contexts. This time, not only the line number counts, but also
 the page number, the offset from the beginning of the page, the
 out-of-lines to be placed, etc. I think the greatest challenge will be
 to identify all the elements which determine that context, and to be
 able to compare two contexts and say if they are equivalent or not.
 Considering the case 1 you describe on the wiki page, there are only two
 different contexts: page number even or odd. In this case the offset
 from the beginning of the page doesn't count. In other more complex
 documents this may be much more complicated.

Yeah, difficult point. I do not think I understand this aspect well
enough.

 In most cases the whole paragraph will fit on the page, and the legal
 pagebreak point after it is not feasible. So we do another paragraph,
 and another. Finally we will reach a feasible pagebreak point, which
 ends page 1. The corresponding linebreak point ends a line which is
 able to fill the page height with maximally tolerable stretching. In
 the next steps we will find more feasible pagebreak points. They
 correspond to different layouts of the last paragraph that starts on
 this page, with the same number of lines, and to layouts with more
 lines, which also fit on the page, until the maximal shrink is
 exceeded. At this point the starting node is deactivated, and the
 active page node list only contains nodes that end page 2.
 
 Unless I've missed something: that end page *1*. Or?

Yes. Thanks.

 It should be noted that for each feasible pagebreak node, we restart
 the linebreaking calculations of that page. Of course we must optimize
 this. We could attach a data structure to the starting page node which
 records the results of the calculations done. Those results consist of
 the active line node list, the last linebreak point considered, and
 the widths calculated up to that point. During uninterrupted linebreak
 calculation these data are held in loop variables. Now we must store
 them so that they can be retrieved when we resume the calculation in
 the consideration of the next current pagebreak point. On the first
 
 If the main loop were iterating over line breaks, we wouldn't have to do
 this. But perhaps we would have to for pagebreaks then? Which would be
 equivalent, but I can't figure out yet.

I have not thought about the problem in this way. It seems unnatural
to me and also rather equivalent. 

I will update the wiki page as soon as I can reach it again.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu


Re: [Xmlgraphics-fop Wiki] Update of PageLayout/PageAndLineBreaking by SimonPepping

2006-09-12 Thread Vincent Hennebert

Hi Simon,

I finally took the time to read and digest your Wiki page. This is an
interesting reading. A few comments:



According to that representation paragraphs with inline text have
legal linebreak points. I consider those legal linebreak points also
as legal pagebreak points. In addition, there are legal pagebreak
points between the vertical elements such as paragraphs and blocks.


One issue that will have to be addressed is that widow or
keep-with-previous settings may invalidate some previously believed
legal breakpoints. In such cases, active nodes which contain those
breakpoints in their chains will have to be deactivated; if they were
the chosen best nodes, some other nodes will somehow have to be
retrieved to replace them. I hope this won't be a too great difficulty.



Within the inner loop, we consider the page and paragraph layout
between the active and the current pagebreak point. If the active
breakpoint is within a paragraph, we calculate the best line breaks
from that breakpoint to the end of the paragraph. For all complete


Unless the current breakpoint lies in the same paragraph.



In page independent linebreaking, for each feasible breakpoint the
best node is retained, which represents the best layout of the
paragraph up to that point, and which, due to the dynamic principle,
is part of the best layout of the whole paragraph if that layout uses
this breakpoint. If line numbers matter, the best node for each line
number is retained. In page dependent linebreaking, even that is not
enough. We must retain the best node for each vertical offset on the
page, because that is the quantity that influences page breaking. This


Good point. This led me to the following thoughts:

Currently the iteration over the active nodes is broken into two loops:
one loop for iterating over the line numbers, one for iterating over the
active nodes associated to each line number. Why? Because if line widths
aren't the same they have an influence on the computation of best line
breaks. Because when considering a given legal breakpoint, we must know
the width of the line it would end in order to be able to compute the
shrinking/stretching for that line. In fact we make a distinction
between line numbers because they determine the /context/ in which
linebreaks are computed. If all the lines had the same widths such a
distinction wouldn't be necessary.

The merging of line- and page-breakings generalizes this problem of
differing contexts. This time, not only the line number counts, but also
the page number, the offset from the beginning of the page, the
out-of-lines to be placed, etc. I think the greatest challenge will be
to identify all the elements which determine that context, and to be
able to compare two contexts and say if they are equivalent or not.
Considering the case 1 you describe on the wiki page, there are only two
different contexts: page number even or odd. In this case the offset
from the beginning of the page doesn't count. In other more complex
documents this may be much more complicated.



When the linewidths depend on the page number, we need to remember the
best pagebreak node for each feasible pagebreak point for each page
number. Otherwise, we only need to remember the best pagebreak node
for each feasible pagebreak point. Note that the latter condition is
true in the presence of out-of-line elements, because those are
related to the content of the page, not the the page number.


Small typo: not the the page number



Optimization opportunity 1: We may need to reuse many times the best
layout from a breakpoint to the end of the paragraph, and the best
layout from the start of the paragraph to a breakpoint. Therefore we
need to store a reference to the best end node for either case in the
active node. If we wish to take into account the different possible
heights of the part of the paragraph, we need to store references to a
set of best end nodes. Especially for long page sequences, a page
breakpoint may be feasible both for an odd and an even page. In that
case, we need to store different end points for each, due to the
different line widths on odd and even pages. This optimization is
certainly true for the start of the paragraph. There will be many page
layouts on which the whole paragraph fits.


So, this might be handled automatically by the dynamic algorithm if we
were able to identify the different contexts.



Optimization opportunity 2: Do we need to consider each active node?
Or can we already determine that some active nodes will never give a
better layout than others? Suppose that a paragraph has two feasible
breakpoints A and B, which have an equal number of lines, or even the
same height, before and after the page breakpoint. Suppose that B has
a higher amount of demerits than A. Can we then conclude that B will
never be part of the best layout, because a better layout can always
be achieved with A? Yes, we can.


Same here, we would just have to detect that linebreak