RE: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Victor Mote
The Web Maestro wrote:

> e-mail... I suspect there may be another 'side' to the 
> story--there  always is--and that there's may be other 
> contributing factors... but this helps me understand much 
> more than I understood before. Your explanation below also 

Yes, there is another side, and I respect that. The point is that the
differences are hard-wired at a pretty low-level (think silicon instead of
Java). I am constitutionally unable to make myself work on a project that
*can* be broken up into smaller pieces and isn't. Others appear to be
constitutionally unable to allow that to happen. Most can probably live with
it either way. OK, not every problem is solvable, and not every difference
is reconcilable. We go on. You guys pull east, I'll pull west, and as long
as we're pulling *different* wagons, we each have a shot at making some
progress.

> Might these links help:

I think those are the "other" books, but, to tell the truth, after looking
at the links, I can't tell. I'll have to go dig out my old notes to see
which book I was wanting. I'll check into it further. Thanks.

Victor Mote



Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread The Web Maestro
On Feb 7, 2005, at 11:57 AM, Victor Mote wrote:
The Web Maestro wrote:
I don't know how the spec deals with this, but I doubt the
spec cares which algorithm is used. That said, would it be a
good idea to determine which algorithm to use based on
something in userconfig.xml or something? If the Knuth system
is more 'expensive' in terms of resources, we could make
non-Knuth the default, and enable Knuth via a config file.
Hi Clay:
The good news is that this is an excellent idea. The bad news is that 
FOP's
determination to not even tolerate more than one algorithm is what 
drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, 
which
might explain the absence of responses (so far).
Thanks for the heads up, and also for the 'clue'... I never completely 
understood what happened, as it occurred before I was committed. ;-) I 
understand a bit more, thanks to this e-mail... I suspect there may be 
another 'side' to the story--there  always is--and that there's may be 
other contributing factors... but this helps me understand much more 
than I understood before. Your explanation below also helps! I probably 
just need to ask for help when I don't understand something... but I've 
got enough going in my head anyway...

FOP-DEV... Please don't take this as griping (it's not!), but I get 
lost in thee BP/BPD/IPD/AT speak now and again, which is one reason I 
tend to stay out of the fray on that stuff. If I get confused--and 
choose to attempt to contribute to a discussion--I'll speak up about 
anything that strikes my fancy. If there's something I don't get... 
I'll search Google first.

BTW, since it is relevant here, thanks for your kind comments last 
week in
another thread. You are ever the peacemaker, a very useful role. As 
long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are 
also.

To my perhaps still naive mind, layout consists of exactly two tasks: 
1)
finding line-breaks within logical blocks (nested blocks consisting of 
a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style 
algorithm
is by definition a total-fit algorithm. The scope of such an algorithm 
for
the first task is the logical block itself. The scope of the second 
task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a 
good
idea) for item 2, requires the ability to see an entire page-sequence 
in the
AreaTree at one time. I am a little confused about whether FOP's 
design will
even allow that to happen right now. I understood both Keiron's and 
Finn's
design ideas (and Peter West's also) to be releasing pages from the 
AreaTree
long before a page-sequence was even completely parsed. However, I may 
have
misunderstood, especially Finn's, which I am especially fuzzy on.

I read somewhere that Peter Karow was going to try to implement a
Knuth-style algorithm for URW, and I have tried desperately and
unsuccessfully to get a copy of his book on the topic. If anybody 
knows that
status of that effort or how to find his out-of-print books, please 
let us
know.
Might these links help:
http://www.google.com/search?hl=en&q=Peter+Karow+book&spell=1
http://www.amazon.com/exec/obidos/ISBN%3D0387572236/102-3161708-3958541
http://www.amazon.com/exec/obidos/ISBN%3D0387565094/102-3161708-3958541
The general approach that FOray hopes to take eventually is a first-fit
algorithm for the initial pass through a page-sequence, then a second
optimization look that I hope to make do a Knuth-style evaluation. 
That may
be sub-optimal, but it is my best guess right now about a reasonable 
way to
proceed.

Victor Mote
Thanks again for the explanation! Cheers!
Web Maestro Clay
--
<[EMAIL PROTECTED]> - 
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


RE: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Victor Mote
The Web Maestro wrote:

> I don't know how the spec deals with this, but I doubt the 
> spec cares which algorithm is used. That said, would it be a 
> good idea to determine which algorithm to use based on 
> something in userconfig.xml or something? If the Knuth system 
> is more 'expensive' in terms of resources, we could make 
> non-Knuth the default, and enable Knuth via a config file.

Hi Clay:

The good news is that this is an excellent idea. The bad news is that FOP's
determination to not even tolerate more than one algorithm is what drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, which
might explain the absence of responses (so far).

BTW, since it is relevant here, thanks for your kind comments last week in
another thread. You are ever the peacemaker, a very useful role. As long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are also.

To my perhaps still naive mind, layout consists of exactly two tasks: 1)
finding line-breaks within logical blocks (nested blocks consisting of a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style algorithm
is by definition a total-fit algorithm. The scope of such an algorithm for
the first task is the logical block itself. The scope of the second task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a good
idea) for item 2, requires the ability to see an entire page-sequence in the
AreaTree at one time. I am a little confused about whether FOP's design will
even allow that to happen right now. I understood both Keiron's and Finn's
design ideas (and Peter West's also) to be releasing pages from the AreaTree
long before a page-sequence was even completely parsed. However, I may have
misunderstood, especially Finn's, which I am especially fuzzy on.

I read somewhere that Peter Karow was going to try to implement a
Knuth-style algorithm for URW, and I have tried desperately and
unsuccessfully to get a copy of his book on the topic. If anybody knows that
status of that effort or how to find his out-of-print books, please let us
know.

The general approach that FOray hopes to take eventually is a first-fit
algorithm for the initial pass through a page-sequence, then a second
optimization look that I hope to make do a Knuth-style evaluation. That may
be sub-optimal, but it is my best guess right now about a reasonable way to
proceed.

Victor Mote



Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread The Web Maestro
On Feb 7, 2005, at 7:21 AM, Luca Furini wrote:
Finn Bock wrote:
Using your description as a jumping point, here is my ideas for page
breaking. I suppose it is even more pie-in-the-sky since I haven't yet
written anything about it.
As I have been doing a few experiments about page breaking, I'm happy 
to
join in this conversation.
I rarely find myself feeling the urge to enter these conversations 
(it's not really my forte), but every now and then, I find I might 
offer an idea or two I think (hope) will be constructive...

The algorithm that the PageLM uses are a slightly modified knuth (no
need to maintain fitnessclass, and with the ability to decide on a 
break
when there is N pages of lookahead). The elements return from the LMs
are boxes (for lines), spaces and penalties.
Note that using the box - penalty - glue representation does not
necessarily involve using Knuth's algorithm.
A first-fit (or best-fit) algorithm could be enough in most situations;
and if there are no adjustable elements in the fo document (for exaple,
lots of paragraphs without adjustable spaces between them) Knuth's
algorithm becomes a best-fit.
Given its higher cost, maybe it could be used only when it could really
produce a better output.
I don't know how the spec deals with this, but I doubt the spec cares 
which algorithm is used. That said, would it be a good idea to 
determine which algorithm to use based on something in userconfig.xml 
or something? If the Knuth system is more 'expensive' in terms of 
resources, we could make non-Knuth the default, and enable Knuth via a 
config file.

The elements are not
returned from the LMs but pushed from the LM into the pageLM:
   parent.generateElement(new Space(resolveBefore());
   parent.generateElement(new Box(lineHeigth);
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements 
created
by its child, looks at them and adds / removes / corrects; could this 
be
done if the inner BlockLMs directly give their element to the top-level
LM?

BTW, what about your great refactoring of all the knuth code?
Regards,
Luca
Cheers!
Web Maestro Clay
--
<[EMAIL PROTECTED]> - 
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Finn Bock
[Luca]
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements created
by its child, looks at them and adds / removes / corrects; could this be
done if the inner BlockLMs directly give their element to the top-level
LM?
I would pass the element on the immidiate parent, which recursively 
passes them on the top-level LM in the direction. For inline, the 
toplevel would be LineLM and for blocks it would be the PageLM.

BTW, what about your great refactoring of all the knuth code?
I do not have the time atm to do the followup and bugfixes, so it will 
have to wait. If anyone also wants to commit it, feel free.

regards,
finn


Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Luca Furini

Finn Bock wrote:

>Using your description as a jumping point, here is my ideas for page
>breaking. I suppose it is even more pie-in-the-sky since I haven't yet
>written anything about it.

As I have been doing a few experiments about page breaking, I'm happy to
join in this conversation.

>The algorithm that the PageLM uses are a slightly modified knuth (no
>need to maintain fitnessclass, and with the ability to decide on a break
>when there is N pages of lookahead). The elements return from the LMs
>are boxes (for lines), spaces and penalties.

Note that using the box - penalty - glue representation does not
necessarily involve using Knuth's algorithm.
A first-fit (or best-fit) algorithm could be enough in most situations;
and if there are no adjustable elements in the fo document (for exaple,
lots of paragraphs without adjustable spaces between them) Knuth's
algorithm becomes a best-fit.
Given its higher cost, maybe it could be used only when it could really
produce a better output.

>The elements are not
>returned from the LMs but pushed from the LM into the pageLM:
>
>parent.generateElement(new Space(resolveBefore());
>parent.generateElement(new Box(lineHeigth);
>

I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:

 outer block
 |
 -
 |   |
  innerinner
  blockblock
12

In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements created
by its child, looks at them and adds / removes / corrects; could this be
done if the inner BlockLMs directly give their element to the top-level
LM?

BTW, what about your great refactoring of all the knuth code?

Regards,
Luca



Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Finn Bock
Simon Pepping wrote:
I do not have much time to look into your problem, so I will just try
to give a quick answer.
In my view the current BP setup is not able to generate good page
break decisions. It only can do a first-fit algorithm. From your
account, BPs are also overloaded to signal the completion of a page
while they do not really end an area. Your hack is a hack indeed, but
from a quick inspection I would say that it properly marks the
overloaded nature of BPs.
I have written down a proposal for a different strategy of page break
decision. I put my description on the wiki,
http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves
two goals:
1. Enabling smarter page break algorithms.
2. Simplifying the addAreas calls, and esp. its iteration over the
collected BPs.
I have not had time to implement this, and therefore also no time to
detect the flaws in my proposal. I would not mind if someone else
would implement it.
Hi Simon,
Using your description as a jumping point, here is my ideas for page 
breaking. I suppose it is even more pie-in-the-sky since I haven't yet 
written anything about it.

The algorithm that the PageLM uses are a slightly modified knuth (no 
need to maintain fitnessclass, and with the ability to decide on a break 
when there is N pages of lookahead). The elements return from the LMs 
are boxes (for lines), spaces and penalties. The elements are not 
returned from the LMs but pushed from the LM into the pageLM:

   parent.generateElement(new Space(resolveBefore());
   parent.generateElement(new Box(lineHeigth);
The LMs also push start and end markers so that the order list of 
elements in the pageLM is actually a flattened tree and can be used 
directly for creation of areas (so no more Position tree). The exact 
same flattened tree is applied to inline.

The element are pushed to the pageLM during a non-recursive traversal of 
the LM-tree. The areas are created during a non-recursive traversal of 
the flattened elements tree. During area creation the parent LMs is kept 
in a external stack, at the end of page the stack represent the areas 
that is continued on the next page and stack is used as the starting 
point for creatin

A significant drawback is that a knuth based page break algorithm is 
difficult to explain and justify, just like it is difficult to explain 
why line breaking knuth do as it do by looking at individual lines in 
the output.

regards,
finn


Re: Problems with layoutengine test files

2005-02-07 Thread Jeremias Maerki
Simon,

can you tell me which parser and XSLT processor combination gave you
these problems? I'd like to reproduce the problem so I can be sure I fix
them correctly. Thanks.

On 06.02.2005 13:54:54 Simon Pepping wrote:
> Hi Jeremias,
> 
> I have errors with the layoutengine test files, for example:
> 
> text-decoration1.xml(org.apache.fop.layoutengine.LayoutEngineTestSuite$1)java.lang.RuntimeException:
> Expected XPath expression to evaluate to 'line-through', but got 'li'
> (XPath: //flow/block[3]/lineArea/inlineparent[1]/text)
> 
> This is due to the fact that the text may be split over more than one
> text area. Indeed, whether I get the error depends on which parser
> version I use; different parsers may produce different arrangements of
> text over text areas.
> 
> If you change the test to:
> 
>  xpath="//flow/block[3]/lineArea/inlineparent[1]"/>
> 
> the error disappears. I think XObject.str() applies the XPath string
> function, which returns the string-value of the first node. If you
> evaluate the parent node 'inlineparent', you get the string-value of
> all of its children, which is usually what you want to have.
> 
> There are many such cases in the test files. I think you should modify
> all cases where you test on the content of a text area.


Jeremias Maerki



Re: Markers added to the wrong page

2005-02-07 Thread Jeremias Maerki
Simon,

I'm sorry that I missed your plan. I didn't remember your posting this.

I think your plan would be a good start into refactoring the breaking in
block progression direction. I think it'll need to be clarified WRT
resets/backtracking in case of final decisions on page break, column
handling and intrusion. Your approach seems to create BPs in advance of
which some might be discarded because of later decisions. These will
then be rebuilt based on different circumstances.

I'm somewhat torn between a project plan I have to follow and doing
things right the first time. So I'll apply my changes for now and I'll
keep the whole thing on my todo list with a big exclamation mark. It
could very well be that because of my next tasks (tables and keeps) I
will get back to this sooner rather than later. Another aspect here: I'd
like to have some well visible improvements to the code quickly so it
shows to the outside that the project's not dead like many users think.

We'll see what happens. Thanks for your input and I hope you'll find
some time soon to do some coding yourself.

On 06.02.2005 22:08:40 Simon Pepping wrote:
> Jeremias,
> 
> I do not have much time to look into your problem, so I will just try
> to give a quick answer.
> 
> In my view the current BP setup is not able to generate good page
> break decisions. It only can do a first-fit algorithm. From your
> account, BPs are also overloaded to signal the completion of a page
> while they do not really end an area. Your hack is a hack indeed, but
> from a quick inspection I would say that it properly marks the
> overloaded nature of BPs.
> 
> I have written down a proposal for a different strategy of page break
> decision. I put my description on the wiki,
> http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves
> two goals:
> 1. Enabling smarter page break algorithms.
> 2. Simplifying the addAreas calls, and esp. its iteration over the
> collected BPs.
> 
> I have not had time to implement this, and therefore also no time to
> detect the flaws in my proposal. I would not mind if someone else
> would implement it.
> 
> Regards, Simon
> 
> On Thu, Feb 03, 2005 at 06:19:29PM +0100, Jeremias Maerki wrote:
> > Team, 
> > 
> > I've just checked in markers5a and markers5b which look very closely
> > which marker is added to which page for every block.
> > 
> > As I'm still somewhat in the process of getting to know the layout
> > engine I don't have a quick answer to that although I'm making progress
> > in understanding. Here's what I found out so far:
> > 
> > The reason for the bad markers is the addMarkers call, for example in
> > BlockLayoutManager.addAreas(). When the LM finds out that the next area
> > won't fit on the current page it creates a BreakPoss signalling that
> > state. The problem now is that addAreas() also gets these BreakPoss
> > instances which aren't visible in the generated document but are still
> > generating an (empty) Area (on the page preceeding the one where the
> > Area will really come to rest). That causes one marker too many to be
> > added to a page.
> > 
> > The same BTW applies to addID() calls which also remembers IDs on the
> > wrong pages.
> > 
> > What I'm currently doing is trying to really (!) understand the whole
> > BreakPoss mechanism so I can figure out a reliable way to find out how
> > to avoid this bug. Two possibilities:
> > 1. Don't generate bogus areas at all.
> > 2. Just suppress addMarkers() and addID().
> > 
> > I'm currently wondering if the generated BreakPoss instances should get
> > an additional flag (or two) which controls the generation of the area
> > and the addMarkers()/addID() calls. addMarkers()/addID() may be
> > necessary in certain circumstances while on the other side no area is
> > generated (empty block with only markers).
> > 
> > You can easily see these bogus areas when you output to the area tree
> > renderer or in build/test-results/layoutengine when running the Ant
> > build.
> > 
> > I'll continue investigating but would appreciate any ideas you might
> > have.
> > 
> > Jeremias Maerki
> > 
> 
> -- 
> Simon Pepping
> home page: http://www.leverkruid.nl



Jeremias Maerki