Re: Markers added to the wrong page

2005-02-07 Thread Jeremias Maerki
Simon,

I'm sorry that I missed your plan. I didn't remember your posting this.

I think your plan would be a good start into refactoring the breaking in
block progression direction. I think it'll need to be clarified WRT
resets/backtracking in case of final decisions on page break, column
handling and intrusion. Your approach seems to create BPs in advance of
which some might be discarded because of later decisions. These will
then be rebuilt based on different circumstances.

I'm somewhat torn between a project plan I have to follow and doing
things right the first time. So I'll apply my changes for now and I'll
keep the whole thing on my todo list with a big exclamation mark. It
could very well be that because of my next tasks (tables and keeps) I
will get back to this sooner rather than later. Another aspect here: I'd
like to have some well visible improvements to the code quickly so it
shows to the outside that the project's not dead like many users think.

We'll see what happens. Thanks for your input and I hope you'll find
some time soon to do some coding yourself.

On 06.02.2005 22:08:40 Simon Pepping wrote:
 Jeremias,
 
 I do not have much time to look into your problem, so I will just try
 to give a quick answer.
 
 In my view the current BP setup is not able to generate good page
 break decisions. It only can do a first-fit algorithm. From your
 account, BPs are also overloaded to signal the completion of a page
 while they do not really end an area. Your hack is a hack indeed, but
 from a quick inspection I would say that it properly marks the
 overloaded nature of BPs.
 
 I have written down a proposal for a different strategy of page break
 decision. I put my description on the wiki,
 http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves
 two goals:
 1. Enabling smarter page break algorithms.
 2. Simplifying the addAreas calls, and esp. its iteration over the
 collected BPs.
 
 I have not had time to implement this, and therefore also no time to
 detect the flaws in my proposal. I would not mind if someone else
 would implement it.
 
 Regards, Simon
 
 On Thu, Feb 03, 2005 at 06:19:29PM +0100, Jeremias Maerki wrote:
  Team, 
  
  I've just checked in markers5a and markers5b which look very closely
  which marker is added to which page for every block.
  
  As I'm still somewhat in the process of getting to know the layout
  engine I don't have a quick answer to that although I'm making progress
  in understanding. Here's what I found out so far:
  
  The reason for the bad markers is the addMarkers call, for example in
  BlockLayoutManager.addAreas(). When the LM finds out that the next area
  won't fit on the current page it creates a BreakPoss signalling that
  state. The problem now is that addAreas() also gets these BreakPoss
  instances which aren't visible in the generated document but are still
  generating an (empty) Area (on the page preceeding the one where the
  Area will really come to rest). That causes one marker too many to be
  added to a page.
  
  The same BTW applies to addID() calls which also remembers IDs on the
  wrong pages.
  
  What I'm currently doing is trying to really (!) understand the whole
  BreakPoss mechanism so I can figure out a reliable way to find out how
  to avoid this bug. Two possibilities:
  1. Don't generate bogus areas at all.
  2. Just suppress addMarkers() and addID().
  
  I'm currently wondering if the generated BreakPoss instances should get
  an additional flag (or two) which controls the generation of the area
  and the addMarkers()/addID() calls. addMarkers()/addID() may be
  necessary in certain circumstances while on the other side no area is
  generated (empty block with only markers).
  
  You can easily see these bogus areas when you output to the area tree
  renderer or in build/test-results/layoutengine when running the Ant
  build.
  
  I'll continue investigating but would appreciate any ideas you might
  have.
  
  Jeremias Maerki
  
 
 -- 
 Simon Pepping
 home page: http://www.leverkruid.nl



Jeremias Maerki



Re: Problems with layoutengine test files

2005-02-07 Thread Jeremias Maerki
Simon,

can you tell me which parser and XSLT processor combination gave you
these problems? I'd like to reproduce the problem so I can be sure I fix
them correctly. Thanks.

On 06.02.2005 13:54:54 Simon Pepping wrote:
 Hi Jeremias,
 
 I have errors with the layoutengine test files, for example:
 
 text-decoration1.xml(org.apache.fop.layoutengine.LayoutEngineTestSuite$1)java.lang.RuntimeException:
 Expected XPath expression to evaluate to 'line-through', but got 'li'
 (XPath: //flow/block[3]/lineArea/inlineparent[1]/text)
 
 This is due to the fact that the text may be split over more than one
 text area. Indeed, whether I get the error depends on which parser
 version I use; different parsers may produce different arrangements of
 text over text areas.
 
 If you change the test to:
 
 eval expected=line-through 
 xpath=//flow/block[3]/lineArea/inlineparent[1]/
 
 the error disappears. I think XObject.str() applies the XPath string
 function, which returns the string-value of the first node. If you
 evaluate the parent node 'inlineparent', you get the string-value of
 all of its children, which is usually what you want to have.
 
 There are many such cases in the test files. I think you should modify
 all cases where you test on the content of a text area.


Jeremias Maerki



Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Finn Bock
[Luca]
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements created
by its child, looks at them and adds / removes / corrects; could this be
done if the inner BlockLMs directly give their element to the top-level
LM?
I would pass the element on the immidiate parent, which recursively 
passes them on the top-level LM in the direction. For inline, the 
toplevel would be LineLM and for blocks it would be the PageLM.

BTW, what about your great refactoring of all the knuth code?
I do not have the time atm to do the followup and bugfixes, so it will 
have to wait. If anyone also wants to commit it, feel free.

regards,
finn


Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread The Web Maestro
On Feb 7, 2005, at 7:21 AM, Luca Furini wrote:
Finn Bock wrote:
Using your description as a jumping point, here is my ideas for page
breaking. I suppose it is even more pie-in-the-sky since I haven't yet
written anything about it.
As I have been doing a few experiments about page breaking, I'm happy 
to
join in this conversation.
I rarely find myself feeling the urge to enter these conversations 
(it's not really my forte), but every now and then, I find I might 
offer an idea or two I think (hope) will be constructive...

The algorithm that the PageLM uses are a slightly modified knuth (no
need to maintain fitnessclass, and with the ability to decide on a 
break
when there is N pages of lookahead). The elements return from the LMs
are boxes (for lines), spaces and penalties.
Note that using the box - penalty - glue representation does not
necessarily involve using Knuth's algorithm.
A first-fit (or best-fit) algorithm could be enough in most situations;
and if there are no adjustable elements in the fo document (for exaple,
lots of paragraphs without adjustable spaces between them) Knuth's
algorithm becomes a best-fit.
Given its higher cost, maybe it could be used only when it could really
produce a better output.
I don't know how the spec deals with this, but I doubt the spec cares 
which algorithm is used. That said, would it be a good idea to 
determine which algorithm to use based on something in userconfig.xml 
or something? If the Knuth system is more 'expensive' in terms of 
resources, we could make non-Knuth the default, and enable Knuth via a 
config file.

The elements are not
returned from the LMs but pushed from the LM into the pageLM:
   parent.generateElement(new Space(resolveBefore());
   parent.generateElement(new Box(lineHeigth);
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements 
created
by its child, looks at them and adds / removes / corrects; could this 
be
done if the inner BlockLMs directly give their element to the top-level
LM?

BTW, what about your great refactoring of all the knuth code?
Regards,
Luca
Cheers!
Web Maestro Clay
--
[EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


RE: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Victor Mote
The Web Maestro wrote:

 I don't know how the spec deals with this, but I doubt the 
 spec cares which algorithm is used. That said, would it be a 
 good idea to determine which algorithm to use based on 
 something in userconfig.xml or something? If the Knuth system 
 is more 'expensive' in terms of resources, we could make 
 non-Knuth the default, and enable Knuth via a config file.

Hi Clay:

The good news is that this is an excellent idea. The bad news is that FOP's
determination to not even tolerate more than one algorithm is what drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, which
might explain the absence of responses (so far).

BTW, since it is relevant here, thanks for your kind comments last week in
another thread. You are ever the peacemaker, a very useful role. As long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are also.

To my perhaps still naive mind, layout consists of exactly two tasks: 1)
finding line-breaks within logical blocks (nested blocks consisting of a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style algorithm
is by definition a total-fit algorithm. The scope of such an algorithm for
the first task is the logical block itself. The scope of the second task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a good
idea) for item 2, requires the ability to see an entire page-sequence in the
AreaTree at one time. I am a little confused about whether FOP's design will
even allow that to happen right now. I understood both Keiron's and Finn's
design ideas (and Peter West's also) to be releasing pages from the AreaTree
long before a page-sequence was even completely parsed. However, I may have
misunderstood, especially Finn's, which I am especially fuzzy on.

I read somewhere that Peter Karow was going to try to implement a
Knuth-style algorithm for URW, and I have tried desperately and
unsuccessfully to get a copy of his book on the topic. If anybody knows that
status of that effort or how to find his out-of-print books, please let us
know.

The general approach that FOray hopes to take eventually is a first-fit
algorithm for the initial pass through a page-sequence, then a second
optimization look that I hope to make do a Knuth-style evaluation. That may
be sub-optimal, but it is my best guess right now about a reasonable way to
proceed.

Victor Mote



Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread The Web Maestro
On Feb 7, 2005, at 11:57 AM, Victor Mote wrote:
The Web Maestro wrote:
I don't know how the spec deals with this, but I doubt the
spec cares which algorithm is used. That said, would it be a
good idea to determine which algorithm to use based on
something in userconfig.xml or something? If the Knuth system
is more 'expensive' in terms of resources, we could make
non-Knuth the default, and enable Knuth via a config file.
Hi Clay:
The good news is that this is an excellent idea. The bad news is that 
FOP's
determination to not even tolerate more than one algorithm is what 
drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, 
which
might explain the absence of responses (so far).
Thanks for the heads up, and also for the 'clue'... I never completely 
understood what happened, as it occurred before I was committed. ;-) I 
understand a bit more, thanks to this e-mail... I suspect there may be 
another 'side' to the story--there  always is--and that there's may be 
other contributing factors... but this helps me understand much more 
than I understood before. Your explanation below also helps! I probably 
just need to ask for help when I don't understand something... but I've 
got enough going in my head anyway...

FOP-DEV... Please don't take this as griping (it's not!), but I get 
lost in thee BP/BPD/IPD/AT speak now and again, which is one reason I 
tend to stay out of the fray on that stuff. If I get confused--and 
choose to attempt to contribute to a discussion--I'll speak up about 
anything that strikes my fancy. If there's something I don't get... 
I'll search Google first.

BTW, since it is relevant here, thanks for your kind comments last 
week in
another thread. You are ever the peacemaker, a very useful role. As 
long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are 
also.

To my perhaps still naive mind, layout consists of exactly two tasks: 
1)
finding line-breaks within logical blocks (nested blocks consisting of 
a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style 
algorithm
is by definition a total-fit algorithm. The scope of such an algorithm 
for
the first task is the logical block itself. The scope of the second 
task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a 
good
idea) for item 2, requires the ability to see an entire page-sequence 
in the
AreaTree at one time. I am a little confused about whether FOP's 
design will
even allow that to happen right now. I understood both Keiron's and 
Finn's
design ideas (and Peter West's also) to be releasing pages from the 
AreaTree
long before a page-sequence was even completely parsed. However, I may 
have
misunderstood, especially Finn's, which I am especially fuzzy on.

I read somewhere that Peter Karow was going to try to implement a
Knuth-style algorithm for URW, and I have tried desperately and
unsuccessfully to get a copy of his book on the topic. If anybody 
knows that
status of that effort or how to find his out-of-print books, please 
let us
know.
Might these links help:
http://www.google.com/search?hl=enq=Peter+Karow+bookspell=1
http://www.amazon.com/exec/obidos/ISBN%3D0387572236/102-3161708-3958541
http://www.amazon.com/exec/obidos/ISBN%3D0387565094/102-3161708-3958541
The general approach that FOray hopes to take eventually is a first-fit
algorithm for the initial pass through a page-sequence, then a second
optimization look that I hope to make do a Knuth-style evaluation. 
That may
be sub-optimal, but it is my best guess right now about a reasonable 
way to
proceed.

Victor Mote
Thanks again for the explanation! Cheers!
Web Maestro Clay
--
[EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet


RE: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Victor Mote
The Web Maestro wrote:

 e-mail... I suspect there may be another 'side' to the 
 story--there  always is--and that there's may be other 
 contributing factors... but this helps me understand much 
 more than I understood before. Your explanation below also 

Yes, there is another side, and I respect that. The point is that the
differences are hard-wired at a pretty low-level (think silicon instead of
Java). I am constitutionally unable to make myself work on a project that
*can* be broken up into smaller pieces and isn't. Others appear to be
constitutionally unable to allow that to happen. Most can probably live with
it either way. OK, not every problem is solvable, and not every difference
is reconcilable. We go on. You guys pull east, I'll pull west, and as long
as we're pulling *different* wagons, we each have a shot at making some
progress.

 Might these links help:

I think those are the other books, but, to tell the truth, after looking
at the links, I can't tell. I'll have to go dig out my old notes to see
which book I was wanting. I'll check into it further. Thanks.

Victor Mote