Re: Markers added to the wrong page

2005-02-15 Thread Jeremias Maerki
Simon, thanks for looking into this. I'm not quite sure I got your 
instructions for a markers5c right. I'd appreciate if you'd check in the 
test case you created.

I've added a Bugzilla item [1] for this. I intend to revisit this later.
It would really be good to have test cases for all those nasty effects.
If you find anything wrong (not just fpr markers) PLEASE write a test
case and check it in even if it doesn't work ATM and you have to add it
to the disabled-testcases.txt. You don't have to write the XPath checks
yourself. We can always do that later.

[1] http://issues.apache.org/bugzilla/show_bug.cgi?id=33555

On 13.02.2005 12:26:27 Simon Pepping wrote:
 Jeremias,
 
 I have looked into the problem in more depth. The wrong retrieved
 marker is not only due to the bogus area, but also to the flawed logic
 of dealing with retrieve-position for the markers in
 PageViewport.addMarkers. Even if block 5 adds a bogus area to page 1,
 block 5 would not qualify for last-ending-within-page.
 
 A correct implementation of the retrieve-position logic requires that
 the traits is-first and is-last are correctly set. These find a place
 in BreakPoss, but not all LMs set the traits correctly. A bogus area
 would again upset the markers, because it would falsely take the trait
 is-first.
 
 If you create markers5c such that block 4 takes two lines (and change
 the region before to: retrieve-position=first-including-carryover),
 you have another failing test case.
 
 BPs for bogus areas can be recognized by their position:
 BP.getPosition().getPosition() == -1. Probably it is a good idea to
 suppress BPs for bogus areas altogether: if (over 
 childBreaks.size() == 0) return null.
 
 Note that for empty blocks also a BP without areas is returned, with
 position = -2. I am not sure whether they make sense. They do not
 generate an area due to a special condition in addAreas. If it is
 possible to add markers to an empty block, such BPs are necessary,
 although the position of the markers around a page break is undefined.
 
 This is a nasty case:
 
 fo:blockfo:block/Several lines of text/fo:block
 
 when the text 'Several...' starts on the new page. There would again
 be an empty area on the previous page. It would again be recognized by
 !BP.generatesAreas(), but it would again falsely take the trait
 is-first.
 
 On Thu, Feb 03, 2005 at 06:19:29PM +0100, Jeremias Maerki wrote:
  Team, 
  
  I've just checked in markers5a and markers5b which look very closely
  which marker is added to which page for every block.
  
  As I'm still somewhat in the process of getting to know the layout
  engine I don't have a quick answer to that although I'm making progress
  in understanding. Here's what I found out so far:
  
  The reason for the bad markers is the addMarkers call, for example in
  BlockLayoutManager.addAreas(). When the LM finds out that the next area
  won't fit on the current page it creates a BreakPoss signalling that
  state. The problem now is that addAreas() also gets these BreakPoss
  instances which aren't visible in the generated document but are still
  generating an (empty) Area (on the page preceeding the one where the
  Area will really come to rest). That causes one marker too many to be
  added to a page.
  
  The same BTW applies to addID() calls which also remembers IDs on the
  wrong pages.
  
  What I'm currently doing is trying to really (!) understand the whole
  BreakPoss mechanism so I can figure out a reliable way to find out how
  to avoid this bug. Two possibilities:
  1. Don't generate bogus areas at all.
  2. Just suppress addMarkers() and addID().
  
  I'm currently wondering if the generated BreakPoss instances should get
  an additional flag (or two) which controls the generation of the area
  and the addMarkers()/addID() calls. addMarkers()/addID() may be
  necessary in certain circumstances while on the other side no area is
  generated (empty block with only markers).
  
  You can easily see these bogus areas when you output to the area tree
  renderer or in build/test-results/layoutengine when running the Ant
  build.
  
  I'll continue investigating but would appreciate any ideas you might
  have.
  
  Jeremias Maerki
  
 
 -- 
 Simon Pepping
 home page: http://www.leverkruid.nl



Jeremias Maerki



Re: Markers added to the wrong page

2005-02-13 Thread Simon Pepping
Jeremias,

I have looked into the problem in more depth. The wrong retrieved
marker is not only due to the bogus area, but also to the flawed logic
of dealing with retrieve-position for the markers in
PageViewport.addMarkers. Even if block 5 adds a bogus area to page 1,
block 5 would not qualify for last-ending-within-page.

A correct implementation of the retrieve-position logic requires that
the traits is-first and is-last are correctly set. These find a place
in BreakPoss, but not all LMs set the traits correctly. A bogus area
would again upset the markers, because it would falsely take the trait
is-first.

If you create markers5c such that block 4 takes two lines (and change
the region before to: retrieve-position=first-including-carryover),
you have another failing test case.

BPs for bogus areas can be recognized by their position:
BP.getPosition().getPosition() == -1. Probably it is a good idea to
suppress BPs for bogus areas altogether: if (over 
childBreaks.size() == 0) return null.

Note that for empty blocks also a BP without areas is returned, with
position = -2. I am not sure whether they make sense. They do not
generate an area due to a special condition in addAreas. If it is
possible to add markers to an empty block, such BPs are necessary,
although the position of the markers around a page break is undefined.

This is a nasty case:

fo:blockfo:block/Several lines of text/fo:block

when the text 'Several...' starts on the new page. There would again
be an empty area on the previous page. It would again be recognized by
!BP.generatesAreas(), but it would again falsely take the trait
is-first.

On Thu, Feb 03, 2005 at 06:19:29PM +0100, Jeremias Maerki wrote:
 Team, 
 
 I've just checked in markers5a and markers5b which look very closely
 which marker is added to which page for every block.
 
 As I'm still somewhat in the process of getting to know the layout
 engine I don't have a quick answer to that although I'm making progress
 in understanding. Here's what I found out so far:
 
 The reason for the bad markers is the addMarkers call, for example in
 BlockLayoutManager.addAreas(). When the LM finds out that the next area
 won't fit on the current page it creates a BreakPoss signalling that
 state. The problem now is that addAreas() also gets these BreakPoss
 instances which aren't visible in the generated document but are still
 generating an (empty) Area (on the page preceeding the one where the
 Area will really come to rest). That causes one marker too many to be
 added to a page.
 
 The same BTW applies to addID() calls which also remembers IDs on the
 wrong pages.
 
 What I'm currently doing is trying to really (!) understand the whole
 BreakPoss mechanism so I can figure out a reliable way to find out how
 to avoid this bug. Two possibilities:
 1. Don't generate bogus areas at all.
 2. Just suppress addMarkers() and addID().
 
 I'm currently wondering if the generated BreakPoss instances should get
 an additional flag (or two) which controls the generation of the area
 and the addMarkers()/addID() calls. addMarkers()/addID() may be
 necessary in certain circumstances while on the other side no area is
 generated (empty block with only markers).
 
 You can easily see these bogus areas when you output to the area tree
 renderer or in build/test-results/layoutengine when running the Ant
 build.
 
 I'll continue investigating but would appreciate any ideas you might
 have.
 
 Jeremias Maerki
 

-- 
Simon Pepping
home page: http://www.leverkruid.nl


Re: Page breaking [was: Markers added to the wrong page]

2005-02-08 Thread Finn Bock
[The Web Maestro]
I don't know how the spec deals with this, but I doubt the 
spec cares which algorithm is used. That said, would it be a 
good idea to determine which algorithm to use based on 
something in userconfig.xml or something? If the Knuth system 
is more 'expensive' in terms of resources, we could make 
non-Knuth the default, and enable Knuth via a config file.
Yes indeed. But without measurements I would guess that knuth page 
breaking would be far less expensive than the knuth line breaking. The 
line breaking has to deal with far more elements and because of that a 
far larger set of active nodes.

But clearly a choice will be good, both for page and for line breaking.
[Victor]
The good news is that this is an excellent idea. The bad news is that FOP's
determination to not even tolerate more than one algorithm is what drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, which
might explain the absence of responses (so far).
BTW, since it is relevant here, thanks for your kind comments last week in
another thread. You are ever the peacemaker, a very useful role. As long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are also.
To my perhaps still naive mind, layout consists of exactly two tasks: 1)
finding line-breaks within logical blocks (nested blocks consisting of a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style algorithm
is by definition a total-fit algorithm. The scope of such an algorithm for
the first task is the logical block itself. The scope of the second task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a good
idea) for item 2, requires the ability to see an entire page-sequence in the
AreaTree at one time.
Not at all. It is a rather trivial change to knuth to pick a page break 
when there is N pages of lookahead.

If we assume that finished page knuth element arrive one at time to the 
KnuthPage algorithm, the main loop becomes:

addKnuthElement(element) {
if element.isBox():
totalWidth += element.width
else if element.isGlue():
if (previous element is box):
considerLegalBreak()
totalWidth += element.width
totalStretch += element.stretch
totalShrink += element.shrink
else if element.isPenalty():
if element.penalty  INFINITE:
considerLegalBreak()
if activeList.startLine  currentPage + lookahead:
// In the activeList, find the node with the best demerit.
bestNode = findBestDemerit()
// Remove all nodes in activeList which does not have bestNode
// in their parentage.
clearActiveList(bestNode)
makePageBreak(bestNode)
Here it is only clearActiveList() which does not have a fast 
implementation when using the usual implementation of ActiveNode. It 
will require a scan over all the active node and for each node a scan 
back thru the previous links to the node at currentPage+1.

clearActiveList(bestNode):
for node in activeList:
// jump back thru previous (node.page-currentPage) times.
while i++  node.page - currentPage
previous = node.previous
if previous != bestNode:
activeList.remove(node)

The rest of the methods called are similar to the methods in my knuth 
line breaking patch (which is quite similar the current implementation).

My own insecurity comes from figuring out which penalty values and 
demerit algorihtm to use to get keeps and orphans right.

regards,
finn


RE: Page breaking [was: Markers added to the wrong page]

2005-02-08 Thread Victor Mote
Finn Bock wrote:

 Not at all. It is a rather trivial change to knuth to pick a 
 page break when there is N pages of lookahead.
 
 If we assume that finished page knuth element arrive one at 
 time to the KnuthPage algorithm, the main loop becomes:
 
 addKnuthElement(element) {
  if element.isBox():
  totalWidth += element.width
  else if element.isGlue():
  if (previous element is box):
  considerLegalBreak()
  totalWidth += element.width
  totalStretch += element.stretch
  totalShrink += element.shrink
  else if element.isPenalty():
  if element.penalty  INFINITE:
  considerLegalBreak()
  if activeList.startLine  currentPage + lookahead:
  // In the activeList, find the node with the best demerit.
  bestNode = findBestDemerit()
  // Remove all nodes in activeList which does not 
 have bestNode
  // in their parentage.
  clearActiveList(bestNode)
   makePageBreak(bestNode)
 
 Here it is only clearActiveList() which does not have a fast 
 implementation when using the usual implementation of 
 ActiveNode. It will require a scan over all the active node 
 and for each node a scan back thru the previous links to the 
 node at currentPage+1.
 
 clearActiveList(bestNode):
  for node in activeList:
  // jump back thru previous (node.page-currentPage) times.
  while i++  node.page - currentPage
  previous = node.previous
  if previous != bestNode:
  activeList.remove(node)
   
 
 The rest of the methods called are similar to the methods in 
 my knuth line breaking patch (which is quite similar the 
 current implementation).

Hmmm. I guess it depends on where the N in N pages of lookahead comes
from. If that can be set to include the entire page-sequence, then you can
call it a Knuth-style algorithm. Otherwise, I think you have something less,
which is OK for those who don't want the performance hit.

 My own insecurity comes from figuring out which penalty values and 
 demerit algorihtm to use to get keeps and orphans right.

Yes. Fortunately, the XSL-FO standard provides some guidance here, but I
agree that this part will be a challenge.

Good luck. I am glad to see you guys heading down this path. Let me know if
there is anything I can do to help.

Victor Mote



Re: Page breaking [was: Markers added to the wrong page]

2005-02-08 Thread Luca Furini
Finn Bock wrote:

 I would pass the element on the immidiate parent, which recursively
 passes them on the top-level LM in the direction. For inline, the
 toplevel would be LineLM and for blocks it would be the PageLM.

Ok, I misunderstood what you wrote, now I think we were saying the same
thing using different words! :-)

Regards
Luca





Re: Markers added to the wrong page

2005-02-07 Thread Jeremias Maerki
Simon,

I'm sorry that I missed your plan. I didn't remember your posting this.

I think your plan would be a good start into refactoring the breaking in
block progression direction. I think it'll need to be clarified WRT
resets/backtracking in case of final decisions on page break, column
handling and intrusion. Your approach seems to create BPs in advance of
which some might be discarded because of later decisions. These will
then be rebuilt based on different circumstances.

I'm somewhat torn between a project plan I have to follow and doing
things right the first time. So I'll apply my changes for now and I'll
keep the whole thing on my todo list with a big exclamation mark. It
could very well be that because of my next tasks (tables and keeps) I
will get back to this sooner rather than later. Another aspect here: I'd
like to have some well visible improvements to the code quickly so it
shows to the outside that the project's not dead like many users think.

We'll see what happens. Thanks for your input and I hope you'll find
some time soon to do some coding yourself.

On 06.02.2005 22:08:40 Simon Pepping wrote:
 Jeremias,
 
 I do not have much time to look into your problem, so I will just try
 to give a quick answer.
 
 In my view the current BP setup is not able to generate good page
 break decisions. It only can do a first-fit algorithm. From your
 account, BPs are also overloaded to signal the completion of a page
 while they do not really end an area. Your hack is a hack indeed, but
 from a quick inspection I would say that it properly marks the
 overloaded nature of BPs.
 
 I have written down a proposal for a different strategy of page break
 decision. I put my description on the wiki,
 http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves
 two goals:
 1. Enabling smarter page break algorithms.
 2. Simplifying the addAreas calls, and esp. its iteration over the
 collected BPs.
 
 I have not had time to implement this, and therefore also no time to
 detect the flaws in my proposal. I would not mind if someone else
 would implement it.
 
 Regards, Simon
 
 On Thu, Feb 03, 2005 at 06:19:29PM +0100, Jeremias Maerki wrote:
  Team, 
  
  I've just checked in markers5a and markers5b which look very closely
  which marker is added to which page for every block.
  
  As I'm still somewhat in the process of getting to know the layout
  engine I don't have a quick answer to that although I'm making progress
  in understanding. Here's what I found out so far:
  
  The reason for the bad markers is the addMarkers call, for example in
  BlockLayoutManager.addAreas(). When the LM finds out that the next area
  won't fit on the current page it creates a BreakPoss signalling that
  state. The problem now is that addAreas() also gets these BreakPoss
  instances which aren't visible in the generated document but are still
  generating an (empty) Area (on the page preceeding the one where the
  Area will really come to rest). That causes one marker too many to be
  added to a page.
  
  The same BTW applies to addID() calls which also remembers IDs on the
  wrong pages.
  
  What I'm currently doing is trying to really (!) understand the whole
  BreakPoss mechanism so I can figure out a reliable way to find out how
  to avoid this bug. Two possibilities:
  1. Don't generate bogus areas at all.
  2. Just suppress addMarkers() and addID().
  
  I'm currently wondering if the generated BreakPoss instances should get
  an additional flag (or two) which controls the generation of the area
  and the addMarkers()/addID() calls. addMarkers()/addID() may be
  necessary in certain circumstances while on the other side no area is
  generated (empty block with only markers).
  
  You can easily see these bogus areas when you output to the area tree
  renderer or in build/test-results/layoutengine when running the Ant
  build.
  
  I'll continue investigating but would appreciate any ideas you might
  have.
  
  Jeremias Maerki
  
 
 -- 
 Simon Pepping
 home page: http://www.leverkruid.nl



Jeremias Maerki



Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Finn Bock
Simon Pepping wrote:
I do not have much time to look into your problem, so I will just try
to give a quick answer.
In my view the current BP setup is not able to generate good page
break decisions. It only can do a first-fit algorithm. From your
account, BPs are also overloaded to signal the completion of a page
while they do not really end an area. Your hack is a hack indeed, but
from a quick inspection I would say that it properly marks the
overloaded nature of BPs.
I have written down a proposal for a different strategy of page break
decision. I put my description on the wiki,
http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves
two goals:
1. Enabling smarter page break algorithms.
2. Simplifying the addAreas calls, and esp. its iteration over the
collected BPs.
I have not had time to implement this, and therefore also no time to
detect the flaws in my proposal. I would not mind if someone else
would implement it.
Hi Simon,
Using your description as a jumping point, here is my ideas for page 
breaking. I suppose it is even more pie-in-the-sky since I haven't yet 
written anything about it.

The algorithm that the PageLM uses are a slightly modified knuth (no 
need to maintain fitnessclass, and with the ability to decide on a break 
when there is N pages of lookahead). The elements return from the LMs 
are boxes (for lines), spaces and penalties. The elements are not 
returned from the LMs but pushed from the LM into the pageLM:

   parent.generateElement(new Space(resolveBefore());
   parent.generateElement(new Box(lineHeigth);
The LMs also push start and end markers so that the order list of 
elements in the pageLM is actually a flattened tree and can be used 
directly for creation of areas (so no more Position tree). The exact 
same flattened tree is applied to inline.

The element are pushed to the pageLM during a non-recursive traversal of 
the LM-tree. The areas are created during a non-recursive traversal of 
the flattened elements tree. During area creation the parent LMs is kept 
in a external stack, at the end of page the stack represent the areas 
that is continued on the next page and stack is used as the starting 
point for creatin

A significant drawback is that a knuth based page break algorithm is 
difficult to explain and justify, just like it is difficult to explain 
why line breaking knuth do as it do by looking at individual lines in 
the output.

regards,
finn


Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Luca Furini

Finn Bock wrote:

Using your description as a jumping point, here is my ideas for page
breaking. I suppose it is even more pie-in-the-sky since I haven't yet
written anything about it.

As I have been doing a few experiments about page breaking, I'm happy to
join in this conversation.

The algorithm that the PageLM uses are a slightly modified knuth (no
need to maintain fitnessclass, and with the ability to decide on a break
when there is N pages of lookahead). The elements return from the LMs
are boxes (for lines), spaces and penalties.

Note that using the box - penalty - glue representation does not
necessarily involve using Knuth's algorithm.
A first-fit (or best-fit) algorithm could be enough in most situations;
and if there are no adjustable elements in the fo document (for exaple,
lots of paragraphs without adjustable spaces between them) Knuth's
algorithm becomes a best-fit.
Given its higher cost, maybe it could be used only when it could really
produce a better output.

The elements are not
returned from the LMs but pushed from the LM into the pageLM:

parent.generateElement(new Space(resolveBefore());
parent.generateElement(new Box(lineHeigth);


I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:

 outer block
 |
 -
 |   |
  innerinner
  blockblock
12

In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements created
by its child, looks at them and adds / removes / corrects; could this be
done if the inner BlockLMs directly give their element to the top-level
LM?

BTW, what about your great refactoring of all the knuth code?

Regards,
Luca



Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Finn Bock
[Luca]
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements created
by its child, looks at them and adds / removes / corrects; could this be
done if the inner BlockLMs directly give their element to the top-level
LM?
I would pass the element on the immidiate parent, which recursively 
passes them on the top-level LM in the direction. For inline, the 
toplevel would be LineLM and for blocks it would be the PageLM.

BTW, what about your great refactoring of all the knuth code?
I do not have the time atm to do the followup and bugfixes, so it will 
have to wait. If anyone also wants to commit it, feel free.

regards,
finn


Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread The Web Maestro
On Feb 7, 2005, at 7:21 AM, Luca Furini wrote:
Finn Bock wrote:
Using your description as a jumping point, here is my ideas for page
breaking. I suppose it is even more pie-in-the-sky since I haven't yet
written anything about it.
As I have been doing a few experiments about page breaking, I'm happy 
to
join in this conversation.
I rarely find myself feeling the urge to enter these conversations 
(it's not really my forte), but every now and then, I find I might 
offer an idea or two I think (hope) will be constructive...

The algorithm that the PageLM uses are a slightly modified knuth (no
need to maintain fitnessclass, and with the ability to decide on a 
break
when there is N pages of lookahead). The elements return from the LMs
are boxes (for lines), spaces and penalties.
Note that using the box - penalty - glue representation does not
necessarily involve using Knuth's algorithm.
A first-fit (or best-fit) algorithm could be enough in most situations;
and if there are no adjustable elements in the fo document (for exaple,
lots of paragraphs without adjustable spaces between them) Knuth's
algorithm becomes a best-fit.
Given its higher cost, maybe it could be used only when it could really
produce a better output.
I don't know how the spec deals with this, but I doubt the spec cares 
which algorithm is used. That said, would it be a good idea to 
determine which algorithm to use based on something in userconfig.xml 
or something? If the Knuth system is more 'expensive' in terms of 
resources, we could make non-Knuth the default, and enable Knuth via a 
config file.

The elements are not
returned from the LMs but pushed from the LM into the pageLM:
   parent.generateElement(new Space(resolveBefore());
   parent.generateElement(new Box(lineHeigth);
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements 
created
by its child, looks at them and adds / removes / corrects; could this 
be
done if the inner BlockLMs directly give their element to the top-level
LM?

BTW, what about your great refactoring of all the knuth code?
Regards,
Luca
Cheers!
Web Maestro Clay
--
[EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


RE: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Victor Mote
The Web Maestro wrote:

 I don't know how the spec deals with this, but I doubt the 
 spec cares which algorithm is used. That said, would it be a 
 good idea to determine which algorithm to use based on 
 something in userconfig.xml or something? If the Knuth system 
 is more 'expensive' in terms of resources, we could make 
 non-Knuth the default, and enable Knuth via a config file.

Hi Clay:

The good news is that this is an excellent idea. The bad news is that FOP's
determination to not even tolerate more than one algorithm is what drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, which
might explain the absence of responses (so far).

BTW, since it is relevant here, thanks for your kind comments last week in
another thread. You are ever the peacemaker, a very useful role. As long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are also.

To my perhaps still naive mind, layout consists of exactly two tasks: 1)
finding line-breaks within logical blocks (nested blocks consisting of a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style algorithm
is by definition a total-fit algorithm. The scope of such an algorithm for
the first task is the logical block itself. The scope of the second task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a good
idea) for item 2, requires the ability to see an entire page-sequence in the
AreaTree at one time. I am a little confused about whether FOP's design will
even allow that to happen right now. I understood both Keiron's and Finn's
design ideas (and Peter West's also) to be releasing pages from the AreaTree
long before a page-sequence was even completely parsed. However, I may have
misunderstood, especially Finn's, which I am especially fuzzy on.

I read somewhere that Peter Karow was going to try to implement a
Knuth-style algorithm for URW, and I have tried desperately and
unsuccessfully to get a copy of his book on the topic. If anybody knows that
status of that effort or how to find his out-of-print books, please let us
know.

The general approach that FOray hopes to take eventually is a first-fit
algorithm for the initial pass through a page-sequence, then a second
optimization look that I hope to make do a Knuth-style evaluation. That may
be sub-optimal, but it is my best guess right now about a reasonable way to
proceed.

Victor Mote



Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread The Web Maestro
On Feb 7, 2005, at 11:57 AM, Victor Mote wrote:
The Web Maestro wrote:
I don't know how the spec deals with this, but I doubt the
spec cares which algorithm is used. That said, would it be a
good idea to determine which algorithm to use based on
something in userconfig.xml or something? If the Knuth system
is more 'expensive' in terms of resources, we could make
non-Knuth the default, and enable Knuth via a config file.
Hi Clay:
The good news is that this is an excellent idea. The bad news is that 
FOP's
determination to not even tolerate more than one algorithm is what 
drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, 
which
might explain the absence of responses (so far).
Thanks for the heads up, and also for the 'clue'... I never completely 
understood what happened, as it occurred before I was committed. ;-) I 
understand a bit more, thanks to this e-mail... I suspect there may be 
another 'side' to the story--there  always is--and that there's may be 
other contributing factors... but this helps me understand much more 
than I understood before. Your explanation below also helps! I probably 
just need to ask for help when I don't understand something... but I've 
got enough going in my head anyway...

FOP-DEV... Please don't take this as griping (it's not!), but I get 
lost in thee BP/BPD/IPD/AT speak now and again, which is one reason I 
tend to stay out of the fray on that stuff. If I get confused--and 
choose to attempt to contribute to a discussion--I'll speak up about 
anything that strikes my fancy. If there's something I don't get... 
I'll search Google first.

BTW, since it is relevant here, thanks for your kind comments last 
week in
another thread. You are ever the peacemaker, a very useful role. As 
long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are 
also.

To my perhaps still naive mind, layout consists of exactly two tasks: 
1)
finding line-breaks within logical blocks (nested blocks consisting of 
a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style 
algorithm
is by definition a total-fit algorithm. The scope of such an algorithm 
for
the first task is the logical block itself. The scope of the second 
task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a 
good
idea) for item 2, requires the ability to see an entire page-sequence 
in the
AreaTree at one time. I am a little confused about whether FOP's 
design will
even allow that to happen right now. I understood both Keiron's and 
Finn's
design ideas (and Peter West's also) to be releasing pages from the 
AreaTree
long before a page-sequence was even completely parsed. However, I may 
have
misunderstood, especially Finn's, which I am especially fuzzy on.

I read somewhere that Peter Karow was going to try to implement a
Knuth-style algorithm for URW, and I have tried desperately and
unsuccessfully to get a copy of his book on the topic. If anybody 
knows that
status of that effort or how to find his out-of-print books, please 
let us
know.
Might these links help:
http://www.google.com/search?hl=enq=Peter+Karow+bookspell=1
http://www.amazon.com/exec/obidos/ISBN%3D0387572236/102-3161708-3958541
http://www.amazon.com/exec/obidos/ISBN%3D0387565094/102-3161708-3958541
The general approach that FOray hopes to take eventually is a first-fit
algorithm for the initial pass through a page-sequence, then a second
optimization look that I hope to make do a Knuth-style evaluation. 
That may
be sub-optimal, but it is my best guess right now about a reasonable 
way to
proceed.

Victor Mote
Thanks again for the explanation! Cheers!
Web Maestro Clay
--
[EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


RE: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Victor Mote
The Web Maestro wrote:

 e-mail... I suspect there may be another 'side' to the 
 story--there  always is--and that there's may be other 
 contributing factors... but this helps me understand much 
 more than I understood before. Your explanation below also 

Yes, there is another side, and I respect that. The point is that the
differences are hard-wired at a pretty low-level (think silicon instead of
Java). I am constitutionally unable to make myself work on a project that
*can* be broken up into smaller pieces and isn't. Others appear to be
constitutionally unable to allow that to happen. Most can probably live with
it either way. OK, not every problem is solvable, and not every difference
is reconcilable. We go on. You guys pull east, I'll pull west, and as long
as we're pulling *different* wagons, we each have a shot at making some
progress.

 Might these links help:

I think those are the other books, but, to tell the truth, after looking
at the links, I can't tell. I'll have to go dig out my old notes to see
which book I was wanting. I'll check into it further. Thanks.

Victor Mote



Re: Markers added to the wrong page

2005-02-06 Thread Jeremias Maerki
Glen,

no, I'm afraid the isBogus() method is necessary because it checks the
parent LM. In the markers6a.xml example (for which I still have to write
the checks, I just found out), the block in the third cell of the fifth
row has an effective BPD of 0 and therefore fits nicely into the
available stacking dimension. Therefore its BreakPoss doesn't have the
nextBreakOverflows() return true. But the row in which that block and
its parent cell are found has an overflow because of the first two cells
which don't fit. If isBogus() doesn't check the parentLM the markers for
row 5 are added to page 1 which is wrong.

I think I'll commit my patch as is tomorrow. This will solve the problem
for now, but I have a feeling that I might have to do some additional
work on the BreakPoss part quite soon when I get to tasks such as keeps
which is quite soon now. The isBogus() method is easily removed if it
turns up later that it can be handled differently and in a better way.

On 06.02.2005 04:43:48 Glen Mazza wrote:
 Jeremias,
 
 I don't see the need for the bBogus variable/isBogus()
 method, because in the three or four places where the
 value of this variable is actually *being used*, it is
 just set to :
 
 bBogus = !bp1.generatesAreas(); 
 
 So it appears you can just rely on
 !bp1.generatesAreas() in these places
 instead--perhaps just commenting the conditional that
 bogus areas are being ignored--and we can worry about
 getting rid of the actual bogus areas later (when
 overall team grokkage in this part of the code
 improves).
 
 Glen
 
 --- Jeremias Maerki [EMAIL PROTECTED] wrote:
 
  I've got a possible fix for the problem. But I don't
  know if it's not
  too much of a hack. At least it somehow feels like a
  hack. Any comments
  about the attached patch? Obviously, some javadocs
  are missing and the
  naming could probably be improved but this is just
  an idea how this
  could be fixed.
  
  IMO it would be nicer if these bogus areas as I
  call them wouldn't be
  constructed at all. I experimented with such an
  approach but the code
  got far too complicated with too much
  copy/paste/modify. And I also
  didn't manage to make it work. On the other side
  these special areas
  are not a big problem since there are relatively few
  of them.
  
  Still interested in opinions and ideas Thanks!
  



Jeremias Maerki



Re: Markers added to the wrong page

2005-02-06 Thread Glen Mazza
OK...I see its purpose now.  But please put in a
descriptive comment for isBogus() in
AbstractLayoutManager so others down the road
understand what it means and what it is for.

Glen

--- Jeremias Maerki [EMAIL PROTECTED] wrote:

 Glen,
 
 no, I'm afraid the isBogus() method is necessary
 because it checks the
 parent LM. 


Re: Markers added to the wrong page

2005-02-06 Thread Simon Pepping
Jeremias,

I do not have much time to look into your problem, so I will just try
to give a quick answer.

In my view the current BP setup is not able to generate good page
break decisions. It only can do a first-fit algorithm. From your
account, BPs are also overloaded to signal the completion of a page
while they do not really end an area. Your hack is a hack indeed, but
from a quick inspection I would say that it properly marks the
overloaded nature of BPs.

I have written down a proposal for a different strategy of page break
decision. I put my description on the wiki,
http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves
two goals:
1. Enabling smarter page break algorithms.
2. Simplifying the addAreas calls, and esp. its iteration over the
collected BPs.

I have not had time to implement this, and therefore also no time to
detect the flaws in my proposal. I would not mind if someone else
would implement it.

Regards, Simon

On Thu, Feb 03, 2005 at 06:19:29PM +0100, Jeremias Maerki wrote:
 Team, 
 
 I've just checked in markers5a and markers5b which look very closely
 which marker is added to which page for every block.
 
 As I'm still somewhat in the process of getting to know the layout
 engine I don't have a quick answer to that although I'm making progress
 in understanding. Here's what I found out so far:
 
 The reason for the bad markers is the addMarkers call, for example in
 BlockLayoutManager.addAreas(). When the LM finds out that the next area
 won't fit on the current page it creates a BreakPoss signalling that
 state. The problem now is that addAreas() also gets these BreakPoss
 instances which aren't visible in the generated document but are still
 generating an (empty) Area (on the page preceeding the one where the
 Area will really come to rest). That causes one marker too many to be
 added to a page.
 
 The same BTW applies to addID() calls which also remembers IDs on the
 wrong pages.
 
 What I'm currently doing is trying to really (!) understand the whole
 BreakPoss mechanism so I can figure out a reliable way to find out how
 to avoid this bug. Two possibilities:
 1. Don't generate bogus areas at all.
 2. Just suppress addMarkers() and addID().
 
 I'm currently wondering if the generated BreakPoss instances should get
 an additional flag (or two) which controls the generation of the area
 and the addMarkers()/addID() calls. addMarkers()/addID() may be
 necessary in certain circumstances while on the other side no area is
 generated (empty block with only markers).
 
 You can easily see these bogus areas when you output to the area tree
 renderer or in build/test-results/layoutengine when running the Ant
 build.
 
 I'll continue investigating but would appreciate any ideas you might
 have.
 
 Jeremias Maerki
 

-- 
Simon Pepping
home page: http://www.leverkruid.nl


Re: Markers added to the wrong page

2005-02-05 Thread Glen Mazza
Jeremias,

I don't see the need for the bBogus variable/isBogus()
method, because in the three or four places where the
value of this variable is actually *being used*, it is
just set to :

bBogus = !bp1.generatesAreas(); 

So it appears you can just rely on
!bp1.generatesAreas() in these places
instead--perhaps just commenting the conditional that
bogus areas are being ignored--and we can worry about
getting rid of the actual bogus areas later (when
overall team grokkage in this part of the code
improves).

Glen

--- Jeremias Maerki [EMAIL PROTECTED] wrote:

 I've got a possible fix for the problem. But I don't
 know if it's not
 too much of a hack. At least it somehow feels like a
 hack. Any comments
 about the attached patch? Obviously, some javadocs
 are missing and the
 naming could probably be improved but this is just
 an idea how this
 could be fixed.
 
 IMO it would be nicer if these bogus areas as I
 call them wouldn't be
 constructed at all. I experimented with such an
 approach but the code
 got far too complicated with too much
 copy/paste/modify. And I also
 didn't manage to make it work. On the other side
 these special areas
 are not a big problem since there are relatively few
 of them.
 
 Still interested in opinions and ideas Thanks!
 



Markers added to the wrong page

2005-02-03 Thread Jeremias Maerki
Team, 

I've just checked in markers5a and markers5b which look very closely
which marker is added to which page for every block.

As I'm still somewhat in the process of getting to know the layout
engine I don't have a quick answer to that although I'm making progress
in understanding. Here's what I found out so far:

The reason for the bad markers is the addMarkers call, for example in
BlockLayoutManager.addAreas(). When the LM finds out that the next area
won't fit on the current page it creates a BreakPoss signalling that
state. The problem now is that addAreas() also gets these BreakPoss
instances which aren't visible in the generated document but are still
generating an (empty) Area (on the page preceeding the one where the
Area will really come to rest). That causes one marker too many to be
added to a page.

The same BTW applies to addID() calls which also remembers IDs on the
wrong pages.

What I'm currently doing is trying to really (!) understand the whole
BreakPoss mechanism so I can figure out a reliable way to find out how
to avoid this bug. Two possibilities:
1. Don't generate bogus areas at all.
2. Just suppress addMarkers() and addID().

I'm currently wondering if the generated BreakPoss instances should get
an additional flag (or two) which controls the generation of the area
and the addMarkers()/addID() calls. addMarkers()/addID() may be
necessary in certain circumstances while on the other side no area is
generated (empty block with only markers).

You can easily see these bogus areas when you output to the area tree
renderer or in build/test-results/layoutengine when running the Ant
build.

I'll continue investigating but would appreciate any ideas you might
have.

Jeremias Maerki



Re: Markers added to the wrong page

2005-02-03 Thread Jeremias Maerki
I've got a possible fix for the problem. But I don't know if it's not
too much of a hack. At least it somehow feels like a hack. Any comments
about the attached patch? Obviously, some javadocs are missing and the
naming could probably be improved but this is just an idea how this
could be fixed.

IMO it would be nicer if these bogus areas as I call them wouldn't be
constructed at all. I experimented with such an approach but the code
got far too complicated with too much copy/paste/modify. And I also
didn't manage to make it work. On the other side these special areas
are not a big problem since there are relatively few of them.

Still interested in opinions and ideas Thanks!


On 03.02.2005 18:19:29 Jeremias Maerki wrote:
 Team, 
 
 I've just checked in markers5a and markers5b which look very closely
 which marker is added to which page for every block.
 
 As I'm still somewhat in the process of getting to know the layout
 engine I don't have a quick answer to that although I'm making progress
 in understanding. Here's what I found out so far:
 
 The reason for the bad markers is the addMarkers call, for example in
 BlockLayoutManager.addAreas(). When the LM finds out that the next area
 won't fit on the current page it creates a BreakPoss signalling that
 state. The problem now is that addAreas() also gets these BreakPoss
 instances which aren't visible in the generated document but are still
 generating an (empty) Area (on the page preceeding the one where the
 Area will really come to rest). That causes one marker too many to be
 added to a page.
 
 The same BTW applies to addID() calls which also remembers IDs on the
 wrong pages.
 
 What I'm currently doing is trying to really (!) understand the whole
 BreakPoss mechanism so I can figure out a reliable way to find out how
 to avoid this bug. Two possibilities:
 1. Don't generate bogus areas at all.
 2. Just suppress addMarkers() and addID().
 
 I'm currently wondering if the generated BreakPoss instances should get
 an additional flag (or two) which controls the generation of the area
 and the addMarkers()/addID() calls. addMarkers()/addID() may be
 necessary in certain circumstances while on the other side no area is
 generated (empty block with only markers).
 
 You can easily see these bogus areas when you output to the area tree
 renderer or in build/test-results/layoutengine when running the Ant
 build.
 
 I'll continue investigating but would appreciate any ideas you might
 have.
 
 Jeremias Maerki



Jeremias Maerki


MarkerFixProposal.diff.txt
Description: Binary data


Re: Markers added to the wrong page

2005-02-03 Thread Glen Mazza
Can't add much here, but I do remember noticing the
bogus areas being created when I was trying to fix
spacing about a year ago.  And, yes, I do agree it
would be better if our algorithms did not need them to
be created.

Thanks,
Glen

--- Jeremias Maerki [EMAIL PROTECTED] wrote:

 I've got a possible fix for the problem. But I don't
 know if it's not
 too much of a hack. At least it somehow feels like a
 hack. Any comments
 about the attached patch? Obviously, some javadocs
 are missing and the
 naming could probably be improved but this is just
 an idea how this
 could be fixed.