Re: Skype-conference on page-breaking?

2005-03-08 Thread Luca Furini
Jeremias Maerki wrote:

 Luca, do you think your total-fit approach may be written in a way to
 handle changing available IPDs and that look-ahead can be disabled to
 improve processing speed at the cost of optimal break decisions?

I think that a first fit algorithm could be implemented in two different ways:
1) wait until the list of elements representing a whole page-sequence is
   collected, and call findBreakingPoints(); this method will call a
   different considerLegalBreak() method, much simpler and faster than
   the knuth's one.
2) start building pages little by little: the FlowLM returns elements to
   the PageLM as soon as one of its own child returns them

Alternative 1) is much like the total fit algorithm: breaks are computed
at the end of each page-sequence; even if the evaluation method is much
faster than Knuth's one, there could still be a long wait in order to get
the whole list.

With alternative 2) the PageLM would behave much the same as it now does:
as soon as a page is filled, it is possible to call addAreas. Note that
the last elements in the partial sequence cannot be considered as feasible
break. For example, if there is a block which creates 6 lines, the
sequence will be something like:
 penalty(not infinite)
 penalty(not infinite)
and the evaluation must stop at the second penalty; only when some
following elements are known it will be possible to decide whether the
last two lines could be at the end of a page.

If the IPD is always the same, I think the two alternatives are
equivalent, and the first one is better because it just needs a
different considerLegalBreak() method; as the output file cannot be
printed until the end of the process, the only advantage of 2) could be
memory usage.

 That's the part where I have a big question mark about changing
 available IPD. We may have to have a check that figures out if the
 available IPD changes within a page-sequence by inspecting the
 page-masters. That would allow us to switch automatically between
 total-fit and best-fit or maybe even first-fit.

If the IPD changes, I fear 2) must be necessarily used: if a block is
split between pages with different ipd, only a few lines need to be
Using 1), the LineLM should know how wide the lines are, but this cannot
be known as page breaking has not yet started.

The check could be done before starting the layout phase: if there is a
change, 2) is used, otherwise 1).
Maybe, the check could be even more sophisticated: for example, if the
first page is different, but the following are equally wide, we could use
2) to create the first page and then switch to 1).

 A remaining question
 mark is with side-floats as they influence the available IPD on a
 line-to-line basis.

This is a question mark for me too! :-)

 One thing for a deluxe strategy for book-style
 documents is certainly alignment of lines between facing pages. But
 that's something that's not important at the moment.

I have created and implemented a new property right about this! :-)

 I'd be very interested to hear what you think about the difficulty of
 changing available IPD. The more I think about it, however, the more I
 think the total-fit model gets too complicated for what we/I need right
 now. But I'm unsure here.

If changing ipd is really important and not just a theorical possibility,
we could start implementing 2, and later add the check and the algorithm
1: the getNextKnuthElements() in the block-level LM could be used in both


Re: [XML Graphics - FOP Wiki] Updated: PageLayout

2005-03-04 Thread Luca Furini

Jeremias Maerki wrote:

 However, Luca's example
 does not fully resolve in my brain. The penalty, for example, must not
 be infinite or it will not be eligible for break possibility. A legal
 break is defined by Knuth as a number b such that either
 (i) xb is a penalty item with pb  infinity, or (ii) xb is a glue item
 and xb-1 is a box item. As far as I can see Luca's example doesn't
 contain any legal break, but I guess that was simply an oversight.

Ops, you are right, the penalty should have a 0 penalty value (or maybe
greater, anyway not infinite), otherwise it is completely useless to set
its width, as it will never be used!

 The big problem I still have with both your examples is that the table
 header is very special in terms of the standard Knuth model. This model
 doesn't allow for conditional items at the beginning of a line. What
 Luca did in his example looks to me like forcing the model to do
 something it wasn't designed for.

Yes, line breaking has not something analogous to the repeated header in a
table, although the elements representing spaces in centered effect (which
have effect only if there is a break between them) are somewhat similar.

But I think that the main point is to have a representation whose height
is correct: the representation is not the content, its only purpose is to
allow the creation of pages satisfying the constraints (orphans, widows,
keep, ...): what to put into the pages concerns the LMs.

 I'm a bit sceptical that the code will
 be able to identify such special conditions reliably.

I think it would not be that difficult: for example, the penalty (whose
width is header height + footer height) could have a conventional

In the addAreas phase, if the last position in the iterator is that
particular Position, the LM will know that the table has been split
between pages and, according to the value of table-omit-*-at-break, will
add header and footer.



Re: Skype-conference on page-breaking?

2005-03-04 Thread Luca Furini

Jeremias Maerki wrote:

Anyway, I'd like to ask if we could hold to a brainstorming conference
call on page breaking either Sunday evening or next Monday or Tuesday
somewhere between 8:00 and 24:00 CET. Of course, on my wish list there
are Simon, Finn and Luca. I'm happy to call either of you on your normal
phone via SkypeOut if you don't have broadband. I hope I can get at
least one of you three on the line.

I'v very interested in page breaking, and I would be happy to contribute.

Unfortunately, I'm not much used to speaking english :-(, so I think I
would be much more comfortable with the idea of communicating via written

As I have said before (or maybe I forgot to ...) I have done a few
experiments trying to use Knuth's algorithm in page braking, and I have a
working implementation which handles only some block level formatting
objects (blocks and lists) and simplified documents (no footnotes or
floats, at the moment, and pages with equal length and width), but it has
some (I hope) interesting features: for example, it is able to adjust the
number of lines used for each paragraph in order to both fill the pages
and avoid orphans and widows.
In a few words, using the box - penalty - glue model it is possible to
represent paragraphs with an adjustable number of lines.

I started working on it a few months ago, and I could not keep it updated
with all the changes, but if you are interested I could try and recreate
these features using the most recent code. Anyway, this could be done
after we have reached a basic implementation.


Re: [XML Graphics - FOP Wiki] Updated: PageLayout

2005-02-28 Thread Luca Furini

Simon Pepping wrote:

+=== Space specifiers ===
+When the space specifiers resolve to zero around a page break, we are
+in the same situation as that of a word space in line breaking. It is
+represented by the sequence `box - glue - box`.

I add just a few thoughts about this subject.

If there cannot be a break between the two block (the first has
keep-with-next || the second has keep-with-previous || their block father
has keep-together), the representation can be box - infinite penalty -
glue - box.

+=== Possible page break between content elements ===
+Here the most general situation is that when the content is different
+with and without page break:
+ * content Cn when there is no page break,
+ * content Ca at the end of the page before the page break,
+ * content Cb at the start of the page after the page break.
+An example of this situation is a page break between table rows:
+no page break:page break:
+- -
+  row 1 row 1
+- -
+ border n  border a
+- -
+  row 2footer
+- -
+  page break
+  -
+   header
+  -
+   border b
+  -
+row 2
+  -
+This situation cannot be dealt with using Knuth's box/glue/penalty

Maybe there is no need to create new kinds of elements (not that it's
forbidden :-) , only the fewer they are, the simpler the algorithm is).

Header and footer, with their borders, are duplicated around each break if
table-omit-*-at-break is false; so, at each break the total height
increases by (border a + footer + header + border b) and decreases by
border n.

Here is a different representation which uses normal penalties; there are
two rows whose bpd is h1 and h2, header and footer with bpd hH and hF,
border before the footer , border after the header and border between rows
hA, hB and hN.

box(h1) - penalty(inf, hH + hA + hB + hF) - glue(hN) - box(h2) - box(hB +
hF) - box(hH + hA)

If there is no break, the overall bpd is (hH + hA + h1 + hN + h2 + hB +
hF), otherwise the first piece has bpd (hH + hA + h1 + hB + hF) and the
second one (hH + hA + h2 + hB + hF), and the border between the rows is

The elements representing the header and its border are moved at the end
of the sequence, but I don't think this is could be a real problem: the
TableLayoutManager would place it at its right place when adding areas.


Re: Error in computation of inline progression dimension ?

2005-02-16 Thread Luca Furini

Jeremias Maerki wrote:

 [Glen Mazza]
 So Luca is correct that both fo:simple-page-masters
 should generate the same overall margins of 50 pt.
 each, no?

No. :-)

Ok, now I am convinced you are right.
Thanks for all your explanations, I always found this part of the
recommendation quite obscure!


Error in computation of inline progression dimension ?

2005-02-15 Thread Luca Furini
I noticed a strange behaviour concerning margins that could be related to
the inheritance of start-indend and end-indent, which was discussed a few
weeks ago.
It seems that in some situations the margins are subtracted twice from the
available inline progression dimension.

In the little fo file I'm attaching there are two simple-page-masters:
- in one of them, left and right margins are set inside simple-page-master
- in the other, they are set inside region-body

In both cases, the page width is 200 points, with left and right margin
set to 50 points; so, the line width should be 100 points.

In the method PageSequenceLayoutManager.getViewportRectangle(), the
computed ipd is right when the margins are set in region-body, but it is 0
if they are set in simple-page-master, because relDims.ipd is already 100
and start- end-indent are 50.

As this method has not been modified recently, the error (if this
behaviour is really wrong) must be elsewhere ...


?xml version=1.0 encoding=UTF-8?

fo:root xmlns:fo=;



  fo:simple-page-master master-name=marginsInMaster









  fo:simple-page-master master-name=marginsInRegion





fo:region-body margin-left=50pt





fo:page-sequence master-reference=marginsInMaster

  hyphenate=true language=it text-align=justify

  fo:flow flow-name=xsl-region-body

fo:block background-color=pink font-size=16pt font-weight=bold 
space-after=5mmcorto piccolo mini super qualcosa corto qui e qui 
precipitevolissimevolmente e poi ancora bla bla bla e ancora qualche 



fo:page-sequence master-reference=marginsInRegion

  hyphenate=true language=it text-align=justify

  fo:flow flow-name=xsl-region-body

fo:block background-color=pink font-size=16pt font-weight=bold 
space-after=5mmcorto piccolo mini super qualcosa corto qui e qui 
precipitevolissimevolmente e poi ancora bla bla bla e ancora qualche 




Re: Page breaking [was: Markers added to the wrong page]

2005-02-08 Thread Luca Furini
Finn Bock wrote:

 I would pass the element on the immidiate parent, which recursively
 passes them on the top-level LM in the direction. For inline, the
 toplevel would be LineLM and for blocks it would be the PageLM.

Ok, I misunderstood what you wrote, now I think we were saying the same
thing using different words! :-)


Re: cvs commit: xml-fop/test/layoutengine/testcases normal-breaking2.xml

2005-02-02 Thread Luca Furini

Jeremias Maerki wrote:

 If someone has an idea about the ArrayOutOfBoundsException, all the
 better. I'm currently trying to debug that thing.

It's because of this line in LineLM.addAreas():
iCurrParIndex = 0;
If a block has properly handled preserved linefeeds, it will generate more
than one paragraph, so the ArrayList knuthParagraphs will have more than one
In your test file, there will be:
(Paragraph #0) !-- list level 1 --
(Paragraph #1) fo:list-block provisional-distance-between-starts=0.4cm
(Paragraph #2)provisional-label-separation=0.15cm
(Paragraph #3) !-- list item --

If the block is split between two pages, the method LineLM.addAreas will be
called twice: for example, there will be Paragraph #0 and #1 on page 1, and
the others on page 2.
The second time addAreas() is called iCurrParIndex is set to 0, and here is
the bug, because the index stored in the LineBreakPosition lbp refers to an
element in Paragraph #2!

Maybe the best place to store the correct value is inside the
LineBreakPositions; I'm attaching a little patch which modifies the
LineBreakPosition class (and also removes an old, unused method).

As per the ignored newline characters in the TextLM, I really can't remember
why I wrote those lines! :-)
It really looks like they shouldn't be ignored.


RCS file: 
retrieving revision 1.37
diff -u -r1.37
---  9 Dec 2004 15:53:36 -   1.37
+++  2 Feb 2005 14:15:18 -
@@ -95,16 +95,18 @@
 private static class LineBreakPosition extends LeafPosition {
 // int iPos;
+int iParIndex; // index of the Paragraph this Position refers to
 double dAdjust; // Percentage to adjust (stretch or shrink)
 double ipdAdjust; // Percentage to adjust (stretch or shrink)
 int startIndent;
 int lineHeight;
 int baseline;
-LineBreakPosition(LayoutManager lm, int iBreakIndex,
+LineBreakPosition(LayoutManager lm, int index, int iBreakIndex,
   double ipdA, double adjust, int ind, int lh, int bl) 
 super(lm, iBreakIndex);
 // iPos = iBreakIndex;
+iParIndex = index;
 ipdAdjust = ipdA;
 dAdjust = adjust;
 startIndent = ind;
@@ -139,7 +141,6 @@
 private int iReturnedLBP = 0;
 private int iStartElement = 0;
 private int iEndElement = 0;
-private int iCurrParIndex = 0;
 private KnuthNode bestDeactivatedNode = null;
@@ -762,6 +763,7 @@
 new LineBreakPosition(this,
+  knuthParagraphs.indexOf(par),
   lastElementIndex ,
   ratio, 0, indent,
   lineLead + middlefollow,
@@ -1337,135 +1339,6 @@
- * Make a line break for returning as the next break.
- * This makes the line break and calculates the height and
- * ipd adjustment factors.
- *
- * @param prevLineEnd previous line break index
- * @param target the target ipd value
- * @param textalign the text align in operation for this line
- * @return the line break position
- */
-private BreakPoss makeLineBreak(int prevLineEnd, MinOptMax target,
-int textalign) {
-// make a new BP
-// Store information needed to make areas in the LineBreakPosition!
-// lead to baseline is
-// max of: baseline fixed alignment and middle/2
-// after baseline is
-// max: top height-lead, middle/2 and bottom height-lead
-int halfLeading = (lineHeight - lead - follow) / 2;
-// height before baseline
-int lineLead = lead + halfLeading;
-// maximum size of top and bottom alignment
-int maxtb = follow + halfLeading;
-// max size of middle alignment below baseline
-int middlefollow = maxtb;
-// calculate actual ipd
-MinOptMax actual = new MinOptMax();
-BreakPoss lastBP = null;
-LayoutManager lastLM = null;
-for (Iterator iter = vecInlineBreaks.listIterator(prevLineEnd);
-iter.hasNext();) {
-BreakPoss bp = (BreakPoss);
-if (bp.getLead()  lineLead) {
-lineLead = bp.getLead();
-if (bp.getTotal()  maxtb) {
-maxtb = bp.getTotal();
-if (bp.getMiddle()  middlefollow) {
-middlefollow = bp.getMiddle();
-// the stacking size of textLM 

Re: Refactoring of knuth line breaking code.

2004-12-13 Thread Luca Furini
Finn Bock wrote:

 I've been playing around with the knuth line breaking code and made a
 slight refactoring of it. [...]
 These improvements could also be applied to the existing code, so I
 think the more interesting point is the quality of the refactoring

I think this is a very good patch, and it could be applied as it is now:
my last changes introduced a first fit algorithm used if the Knuth's one
failed, but your re-starting strategy seems much better.


Re: cvs commit: xml-fop/src/java/org/apache/fop/

2004-12-07 Thread Luca Furini
Glen Mazza wrote:

 Luca, I think we should be using getWidth() instead of
 getW(), correct?

Yes, it would be much clearer!
I'm going to rename:
  getW() - getWidth()
  getY() - getStretch()
  getZ() - getShrink()
  getP() - getPenaltyValue()

The last name is quite long: I first thought of getPenalty() or
getValue(), but they seemed less clear to me.


Re: Knuth linebreaking questions

2004-12-06 Thread Luca Furini

Finn Bock wrote:

 I tend to read that to mean that word spacing may be pushed beyond the
 specified range by justification. And I would think that unjustified
 alignment still has the option of using the word-spacing range but
 ofcourse has to stay within the range.

I'm not convinced ...
The effect of having left-aligned text and adjustable word-spacing would
be an output in which most lines are justified, but the ones in which the
adjustment ratio would be  1 ... I really don't think this would be
better than having all lines left-aligned! :-)

And if the user sets text-align=left but does not explicitly sets
word-spacing, and the default value is used, and for a lucky coincidence
the algorithm find breaking points involving ratios  1, the output would
show justified lines, instead of the left-aligned lines the user would
have likely expected.


Re: More questions on line breaking

2004-12-06 Thread Luca Furini
Finn Bock wrote:

 Ok, so it isn't really needed when the algorithm is implemented in java.
 Just by having the previous node linked from within bestActiveNode is
 enough to keep the inactive nodes alive.

 So inactiveList can be removed.

You are right, I'm going to remove it.


Re: Good news: Jeremias has been elected as an ASF member!

2004-12-02 Thread Luca Furini

 I have the great pleasure to announce that Jeremias Maerki has
 been elected as an ASF member at the last member's meeting
 during ApacheCon.



Re: Knuth linebreaking questions

2004-12-02 Thread Luca Furini

Finn Bock wrote:

(starting from the second question)
 And why not adjust the spacing within the user specified min/max for
 START and END alignment?

Should the user desire adjusted spaces, wouldn't it be better for him to
specify justified alignment? :-)
Seriously, the recommendation (at 7.16.2 letter-spacing and 7.16.8
word-spacing) states that these spaces may also be influenced by
justification, but says nothing about start and end alignments.

 I'm still not sure why it would be ok to ignore any user specified
 min and max values of 'word-spacing' during START and END alignment.
 If a user specifies a length range, what would the reason be for not
 using it? Perhaps with additional DEFAULT_SPACE_WIDTH.

When alignment is start or end, each space has always its .optimum width,
so there is no need to look at the .minimum and .maximum: the user most
preferred value is already used.
But the knuth algorithm would not work if there were no elements with
adjustable width (glue with stretchability and/or shrinkability); the
actual value used is not very relevant, because the computed adjustment
ratio will not be applied.

 Ok, performance is indeed a fine reason, but IMHO such quality vs.
 speed tradeoffs should eventually be made by the user rather than us.

Simon told the same:

# Note that in TeX such thresholds are user-adjustable parameters. I
# think they should eventually be so in FOP too, for those of us who
# have the most exquisite taste of line layout.

and I think it's a good idea; the algorithm should:

 1 find breaking points without hyphenation
 2 hyphenate
 3 find breaking points with hyphenation
 4 decide which ones are better

and point #4 uses the user-definable threshold; where should this constant
be stored? Inside the code of LineLM or in a configuration file?


Re: Knuth linebreaking questions

2004-11-30 Thread Luca Furini
Finn Bock wrote:

 1) What is the purpose of 2 glues for a normal space in END and START

 new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false));
 new KnuthPenalty(0, 0, false, , true));
 new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true));

The purpose is to give each line (but the last one) the same
stretchability, regardless of the number of spaces in it.

If the penalty is not used (there is no line ending there) the overall
effect of the 2 glues is a 0 stretchability and does not modify the line
total; if the penalty is used (a line ends there) then the stretchability
of the previous glue is added to the line total, which becomes 3 *
wordSpaceIPD.opt because the previous space, as said before, added 0 (the
following glue is suppressed).

In justified text, a line with many spaces can be adjusted in order to be
much shorter, or much longer.
If left-aligned text used the same elements, the algorithm would find the
same breaking points; but this time adjustment ratios are not used, so a
line with many spaces would be too much longer, or too much shorter, than
the other lines.
Using these elements, the algorithm creates lines whose unadjusted width is
quite the same.

 and why isn't the min and max of wordspaceIPD used.

Well, you just made me notice there is a little bug,
LineLayoutManager.DEFAULT_SPACE_WIDTH should be used insted! :-)

It's just a magic number: the point is that every TextLM should use the
same value.

 2) What does the threshold parameter to findBreakingPoints controll?
 It seems to be a performance parameter which control the number of
 active nodes, rather than a quality parameter.
 Or to frame my question
 differently, if threshold=1 finds a set of breaks, will threshold=5
 always pick the same set of breaks? Or can threshold=5 find a better set
 of breaks?

It controls both performance and quality: minimum quality.

If threshold = 1 finds a set of breaks, it is the best possible set of
breaks, because the adjustment ratio of each break is = 1 which means
that spaces and other adjustable objects will not need to be longer than
their .max width.

But with this optimal threshold the algorithm could fail, and find no set
of breaking points; so, a try with a higher threshold must be done.

If with threshold = 1 a set is found, with threshold = 5 the same set
would be found, but it would take more time, because a greater number of
active nodes are used.

 3) What is the reasoning for doing hyphenation only after threshold=1
 fails. Naive common sense tells me that if the user specify hyphenation
 we should do hyphenation before finding line breaks.

Finding hyphenation points is time-expansive (all words must be
hyphenated, not only the ones near a line's end), the sequence of
elements becomes longer, there are more feasible breaking points, and a
line ending with a - is less beautiful; so I thought that if a set of
breaking points could be find without hyphenation.

I just took the hyphenate property as a suggestion instead of an order! :-)

Note that the same algorithm with the same threshold could find a
different set of breaking points with and without hyphenation, because the
elements are different. Without hyphenation, spaces could need a little
higher adjustment, for example.

 4) I've compared your code to tex_wrap
 and the main difference is in the way new KnuthNodes are added to the
 active list. Is the BestRecords part of Knuth or is it your own
 invention? Why is it only fitness_class'es in BestRecord that is higher
 then minDemerits + incompatibleFitnessDemerit that is added to
 activeList? Why not all fitness_class'es in BestRecords?

At the moment I don't have the book at hand, but I am quite sure it's
*not* an invention of mine! :-)

As far as I can remember, the Knuth book uses 4 different variables, named
C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!)
and I just created this structure to store them.

I'll try and find some time to look at this ...

Thanks for your interest and your comments, they are most welcome!


Re: [Bug 32253] - Marker bugs

2004-11-19 Thread Luca Furini
Simon Pepping wrote:

 Indeed. Something like ICLM is needed, which creates an inline area
 containing the block areas.

A block inside another block

fo:blockNormal text fo:blockinner block/fo:block normal text./fo:block

creates 3 different paragraphs:
- Normal text
- inner block
- normal text.
and each paragraph's layout is unrelated to the other paragraphs' layout (there 
are 3 LineLM).

A block inside an inline inside a block

fo:blockNormal text fo:inline background-color=lightgreeninline text 1
fo:blockinner block/fo:block inline text 2/fo:inline normal

a) 3 different paragraphs too:
- Normal text inline text 1
- inner block
- inline text 2 normal text.
b) a single paragraph with all the text:
- Normal text inline text 1 inner block inline text 2 normal text.

I'd say a), but I'm not sure.
If this were true, there should be 3 different LineLMs.

This is the LM tree at the moment:

TextLM  InlineLM  TextLM
Normal text   |  normal text.
TextLM   BlockLM2 TextLM
inline text 1 | inline text 2
  inner block

LineLM1 tries to have get elements from all its chidren, and fails.

But, even if it could be given the elements representing inner block, it
could layout them wrongly, because of the block properties: the inner
block could have different alignment, borders, margins, indents, 

So, the LM tree could be:

  | ||
   LineLM   BlockLM2  LineLM
   --+--|   -+-
   |   ||   | |
TextLM InlineLM  LineLM InlineLMTextLM
Normal text  ||   |normal text.
   ||   |
TextLM   TextLM  TextLM
   inline text 1   inner block   inline text 2

This modified tree can be easily obtained from the previous one:
- the new BlockLM is created
- if the LM which should add it to its children list is an InlineLevelLM
or a LineLM, the new BlockLM is given to its parent, i.e. it will become a
child of the nearest BlockLM ancestor
- an instance of the LM which could not handle the new BlockLM (in the
example, InlineLM son of LineLM) must be created in order to handle inline
siblings of the inner fo:block.

I hope this can help ...


Re: commenting the Knuth code/centering issue

2004-11-11 Thread Luca Furini

I have tried to add some comments to the Knuth[Element, Box, Glue,
Penalty] classes.

As I am not sure they are clear enough (I'm not even sure they are written
in a proper English! :-) ) I'd like to hear your opinions before
committing them.


RCS file: 
retrieving revision 1.2
diff -u -r1.2
---   6 Sep 2004 19:07:12 -   1.2
+++   11 Nov 2004 17:56:22 -
@@ -18,6 +18,16 @@
 package org.apache.fop.layoutmgr;
+ * An instance of this class represents an unbreakable piece of content with
+ * fixed width: for example an image, a syllable (but only if letter spacing
+ * is constant), ...
+ *  A KnuthBox is never a feasible breaking point.
+ *  The represented piece of content is never suppressed.
+ *  Besides the inherited methods and attributes, this class has some more
+ * attributes to store information about the content height and its vertical
+ * positioning, and the methods used to get them.
+ */
 public class KnuthBox extends KnuthElement {
 private int lead;
 private int total;
RCS file: 
retrieving revision 1.2
diff -u -r1.2
---   6 Sep 2004 19:07:12 -   1.2
+++   11 Nov 2004 17:56:23 -
@@ -18,6 +18,13 @@
 package org.apache.fop.layoutmgr;
+ * This is the super class for KnuthBox, KnuthGlue and KnuthPenalty.
+ *  It stores information common to all sub classes, and the methods to get it:
+ * the width, a Position and a boolean marking KnuthElements used for some
+ * special feature (for example, the additional elements used to represent
+ * a space when text alignment is right, left or center).
+ */
 public abstract class KnuthElement {
 public static final int KNUTH_BOX = 0;
RCS file: 
retrieving revision 1.2
diff -u -r1.2
---  6 Sep 2004 19:07:12 -   1.2
+++  11 Nov 2004 17:56:23 -
@@ -18,6 +18,28 @@
 package org.apache.fop.layoutmgr;
+ * An instance of this class represents a piece of content with adjustable 
+ * width: for example a space between words of justified text.
+ *  A KnuthGlue is a feasible breaking point only if it immediately follows
+ * a KnuthBox.
+ *  The represented piece of content is suppressed if either the KnuthGlue
+ * is a chosen breaking point or there isn't any KnuthBox between the
+ * previous breaking point and the KnuthGlue itself.
+ *  So, an unsuppressible piece of content with adjustable width, for example
+ * a leader or a word with adjustable letter space, cannot be represented
+ * by a single KnuthGlue; it can be represented using the sequence:
+ *   KnuthBox(width = 0)
+ *   KnuthPenalty(width = 0, penalty = infinity)
+ *   KnuthGlue(...)
+ *   KnuthBox(width = 0)
+ * where the infinity penalty avoids choosing the KnuthGlue as a breaking point
+ * and the 0-width KnuthBoxes prevent suppression.
+ *  Besides the inherited methods and attributes, this class has two attributes
+ * used to store the stretchability (difference between max and opt width) and
+ * the shrinkability (difference between opt and min width), and the methods
+ * to get these values.
+ */
 public class KnuthGlue extends KnuthElement {
 private int stretchability;
 private int shrinkability;
RCS file: 
retrieving revision 1.2
diff -u -r1.2
---   6 Sep 2004 19:07:12 -   1.2
+++   11 Nov 2004 17:56:23 -
@@ -18,6 +18,21 @@
 package org.apache.fop.layoutmgr;
+ * An instance of this class represents information about a feasible
+ * breaking point; it does not represent any piece of content.
+ *  A KnuthPenalty is a feasible breaking point unless its value is infinity;
+ * a KnuthPenalty whose value is -infinity represents a forced break.
+ *  A KnuthPenalty is suppressed, and its width is ignored, if it is not a
+ * chosen breaking point; for example, a KnuthPenalty representing a
+ * hyphenation point has a width (the - width), which must be ignored if
+ * that point is not chosen as a breaking point.
+ *  Besides the inherited methods and attributes, this class has two more
+ * attributes and the methods used to get them: the penalty value, which is
+ * a kind of aesthetic cost (the higher the value, the more 

Re: commenting the Knuth code/centering issue

2004-11-06 Thread Luca Furini
Jeremias Maerki wrote:

 No, I don't think Luca has write access, yet. I know by now that the CLA
 is recorded but the account hasn't been created, yet, although I've
 already sent a reminder.

I have just received the e-mail confirming the creation of my account.

I am now reading the documents, so I should
soon be able to work.


Re: commenting the Knuth code/centering issue

2004-11-06 Thread Luca Furini
Glen Mazza wrote:

 [BTW, I'm considering getting that Digital Typography
 book by Knuth you had mentioned earlier.  Do you
 recommend it?  (I was thinking that given all the time
 I spend on FOP I should start looking a little more at
 the scientific aspects of this work.)]

I think it's very interesting.
It is a collection of essays on various topics, and besides the one about
line breaking (with fabulous features like paragraphs with different sized
lines ...) there is one about bidirectional text, one about fonts ...
It is surely an authoritative source on such matters.


Re: commenting the Knuth code/centering issue

2004-11-04 Thread Luca Furini
Glen Mazza wrote:

 The title centers correctly in 0.20.5, but is left-justified in 1.0.
 Luca, are you looking at this issue of text alignment in general?

Yes, this happens when the text is short and the algorithm is not able to
find a set of breaking points: the fallback method can't compute the
needed indent, and the text is left-aligned instead of centered.

One way to solve this problem is to use, instead of the constant
DEFAULT_SPACE_WIDTH, a value computed according to the line width and the
maximum acceptable adjustment; this would prevent the algorithm from
failing. This value should be computed by the LineLM, and the InlineLMs
have to know it.

Another way is to use, instead of considerLegalBreak(), a different method
to evaluate a possible break point if no breakpoints are found: this
alternative method would be much simpler, and would re-create the
behaviour of the old getNextBreakPoss method (add content to the existing
line if possible, otherwise start another line).

I think this could be a much better solution:
- it's simpler: it's just another private method in the LineLM
- it would solve not only this issue with centered text, but also the
other situations in which the Knuth's algorithm fails (for example, when
there is a word larger than the line: see Simon's test file id=12494 for
bug 29124)

I have already done it, so if you agree I could create a patch in a short

 Also, any chance we can get the Knuth classes commented so we have a
 better idea what KnuthBox, KnuthPenalty, KnuthGlue, etc. are for?

Sure, I should have done this before!
I'll try and comment these files as soon as possible.


Re: DO NOT REPLY [Bug 31206] - [PATCH] Improvements over the new line breaking algorithm

2004-10-15 Thread Luca Furini

Simon Pepping wrote:

 1. InlineStackingLM implements InlineLM.  LineLM extends
 InlineStackingLM and thus implements InlineLM. IMHO it should
 not. Implementing InlineLM should be equivalent to
 generatesInlineAreas returning true.

You are right, it's quite strange, but the LineLM still uses a few methods
inherited from InlineStackingLM. This was not due to the new algorithm, it
was already in the old code; I'm going to see if they still do something
useful ...

 2. TextLM now extends LeafNodeLM instead of AbstractLM. What is the
 gain?  I see no related changes in TextLM.

There isn't at the moment any practical gain, I just thought that, as a
text node has no children, a TextLM is a (special case of) LeafNodeLM.

 3. In LineLM:
 // this constant is used to create elements when text-align is center:
 // every TextLM descendant of LineLM must use the same value,
 // otherwise the line breaking algorithm does not find the right
 // break point
 public static final int DEFAULT_SPACE_WIDTH = 3336;
 private static final int INFINITE_RATIO = 1000;

 If these are static final, they might be better placed in
 InlineLM. Alternatively, they might be attributes of the LineLM
 object, which allows changing them per paragraph, e.g. depending on
 the font. But then the problem arises how to propagate them to the
 descendant LMs.

I decided to change and use a constant because the important thing is to
have the same value used by every LM, but this isn't the perfect solution;
if we try to center a short object (a single word, for example) in a long
line, it is likely that the algorithm fails because there isn't enough
stretchability in the line.
Maybe it's better to have the LineLM compute a value depending on the line
lenght and the maximum adjustment ratio; the child LM should ask the
LineLM for this value.

 4. The textheight of the large font is rather large. The property
 lineheight is not followed (reproduce existing behaviour).

 5. LineLayoutManager:675: line is always 3, so that firstElementIndex
 = 1 for the first line, and the first box is skipped in the line
 height calculation.

The second version of the patch, which I'm going to attach to the Bugzilla
issue, fixes these errors.

It also implement the vertical-align property: now the values of top,
bottom, middle and baseline should be supported.
I made a few tests with fo:inline, fo:character and fo:external-graphic,
and it seems to work.

IMPORTANT: I had to revert Finn Bock's changes to the PDFRenderer (dated
2004/09/22 13:22:16), otherwise leaders with svg use-content produce
errors in the pdf output.
There isn't any run-time error, but when I try to open the pdf file, I get
these warnings:
 - There was an error processing a page. Wrong operand type.
 - Illegal operation 'q' inside a text object.
 - Wrong operand type.
and the page with the svg leaders is left empty.
I think it could be something involving the saveGraphicsState() method.

Still to be done:
  - remove unused methods and variables
  - simplify InlineStackingLM methods as suggested by Simon
I'll try and fix these points as soon as possible.


Re: [VOTE:RESULT] Luca Furini for Committer

2004-10-08 Thread Luca Furini

Jeremias Maerki wrote:


 is your CLA on its way?

I sent it by fax a few minutes ago (sorry for the delay).
Do I have to send it by mail too?


Re: DO NOT REPLY [Bug 29124] - New line breaking algorithm

2004-09-13 Thread Luca Furini
Simon Pepping wrote:

 Still to be done:

 - Resolve the regressions mentioned above.

As concerns leader with use content, patch created and successfully tested.
The ContentLM calls getNextKnuthElements on his child InlineStackingLM, uses
the returned elements to calculate the pattern width and returns them to the
LeaderLM. The LeaderLM uses them when calling addAreas.

I also found a bug affecting leaders with leader-pattern = dots: the
TextArea with the dot (created in LeaderLM.getLeaderInlineArea) had width =
0; calling setWidth() fixes this problem.
There is still a little difference between a leader with leader-pattern =
dots and one with use-content and a single dot as content: the former is
placed a bit over the baseline, but I couldn't find the reason.

Note that using the fo file xml-fop/examples/fo/basic/ to test the
patch you won't see the leaders with leader-pattern = use-content, as they
don't have a width property and the default .opt value (12pt) is  than the
pattern width. Setting a larger width, or text-align-last = justify, makes
the leaders visible.

 - I support the idea to create an InlineLayoutManager interface, which
   extends LayoutManager.

Done, same patch (or maybe I should create a different one?). I also
removed the getWordSpaceIPD() method, as I find out that a constant value
works better: the LineLM and its child must use the same value, or the
result is not always correct.

 Can we be sure that U+A is always alone or the first item in a
 textArray; does this not depend on the Parser, how it calls the SAX
 characters method?

Right, it's better to handle the most general case.
The patch will fix this too.

I will try to fix the other points reported by Simon as soon as possible.


Re: Handling of text-align=justify when nesting blocks

2004-09-03 Thread Luca Furini
Arnd Bei├čner wrote:

 In the example, line 2 is neither the last nor the only line of a block,
 and it's also not a line ending in U+000A, so it should be justified.

Section 7.15.10 of the recommendation states that the property
text-align-last Specifies the alignment of the last line-area child of the
last block-area generated and returned by the formatting object, *and to any
line-area generated by the formatting object whose following sibling is a
block-area that is not a line-area*, and any lines in the block ending in

So, as a fo:block generates a block-area that is not a line-area (its
children are line-areas), I think line 2 should *not* be justified.


Re: DO NOT REPLY [Bug29124] - New line breaking algorithm

2004-09-03 Thread Luca Furini

Simon Pepping wrote:

 You mention that you have not implemented the Knuth algorithm for
 ContentLM. Would it be difficult to do that?

No, I have almost done.
I think I will be able to attach a patch including this fix this


Re: DO NOT REPLY [Bug 29124] - New line breaking algorithm

2004-08-24 Thread Luca Furini

Simon Pepping wrote:

 Nested inline and other LMs: The output contains errors, see the
 comments in the text. The errors occur when hyphenation is set to

Fixed: there were errors in the method addALetterSpaceTo of LeafNodeLM and

I also found a bug in the LeafNodeLM.addAreas method, affecting HEAD too:
the area is added to the area tree (with parentLM.addChild(curArea))
*before* widthAdjustArea is called, so its width is not correctly added to
the inline parent width and the output sometimes shows overlapped text
(when there is another child of inline parent after the leader area).

 Justification: This is a test fo you submitted earlier. According to
 the text in the file the second block should be hyphenated; it is
 not. Should it still be hyphenated, or can this not be enforced with
 the Knuth algorithm and text-align=start?

I cannot find a hyphenate property in the fo file you attached, so I'm not
sure whether I understand what you mean.
Anyway, hyphenate = true means, according to the recommendation (7.9.4),
that hyphenation may be used in the line-breaking algorithm, not that it
*must* be used.
As hyphenation is time-expansive and bad-looking, I think it should be
used only if necessary.

 No breakpoints: An exception is thrown, at
 LineLayoutManager.getNextBreakPoss( It
 occurs because breakpoints has size 0; the third call to
 findBreakingPoints also returned 0. This should not be possible; the
 algorithm should always return a breakpoint.

Right, I completely forgot to provide a fallback in case the algorithm
doesn't find a good set of breaking points.
I added a boolean argument called force to findBreakingPoints: if it is
true, and after the main loop there are no active nodes, the last
deactivated node is used to create LineBreakPositions.
There will zero or more good lines followed by a single line including
all the remaining content (this line will obviously get off the right

The method findBreakingPoints will be called no more than three times:
I) no hyphenation, adjustment ratios must be = 1
II) hyphenation (if allowed), or ratios up to 5
III) ratios up to 20, and if necessary force the creation of LineBreakPositions

 A few small remarks:

 Can you move the following log messages to trace log level:
 [DEBUG] AbstractLayoutManager - - Word to hyphenate: We


 In TextLM, returning null for a forced LF is not an idea that I like,
 because it overloads the null return value. Cannot you return an
 special Knuth element for LF? Alternatively, you could return null and
 process the paragraph. The second paragraph would then be produced and
 processed later.

A preserved linefeed can be represented by a penalty item whose value is
-infinite: +inf means that there can't be a break here, -inf means that
there must be a break (as there can't be a better breakpoint).

Preserved linefeeds inside inlines are much more problematical than I
first thought, but they should work now: I had to add a List argument to
the applyChanges() and getChangedKnuthElements() methods, to tell an ISLM
which children it has to consider.

 InlineStackingLM.getNextKnuthElements: 'if (lc.startsNewArea())' no
 longer used?

I tried to preserve the existing code as much as possible, so I didn't
touch that if statement.
Maybe I removed some lines in the LineLM so that lc.startsNewArea is never


Re: [Bug 29124] - New line breaking algorithm

2004-08-12 Thread Luca Furini

Simon Pepping wrote:

  - word spacing and letter spacing are now fully implemented, they can
  both have MinOptMax values; but I am still thinking about how to
  differentiate a user-defined zero value from a default zero value ...

 You cannot. A default value is a user-defined value supplied by the
 system to save the user the trouble of always having to enter a
 value. It is just a convenience, and you cannot attach a different
 meaning to it.

You are right: default values must be respected no less than expressed
values, I asked the wrong question.

The point is that the XSL recommendation states that the default
word-spacing value is normal, meaning The normal inter-word space, as
defined by the current font and/or the UserAgent, not zero.
At the moment, the SpaceVal variable in the TextInfo object used by the TLM
 .getSpace().min == .getSpace().max == 0
even if the word-spacing property was not set in the fo document.

So, the right question is: how can the TLM see if the word-spacing property
value is normal?

  - text-align-last is partially implemented; text-align-last = justify
  works only if text-align = justify too; this is because Knuth's
  algorithm doesn't  provide for a different alignment for the last line.

 TeX uses glue to achieve this, \parfillskip. It is the special amount
 of glue appended to the last line. In the TeXbook, p. 99, Knuth
 describes it as 'the special trick that allows the final line of a
 paragraph to be shorter than the others'. Setting \parfillskip to 0
 removes this ability. Usually \parfillskip has infinite

I fear this trick works only with justified text.
Knuth's book suggests a way to implement right/left and center alignment
which is not just justify text and then ignore the computed adjustment:
this different strategy involves using special sequences of elements
representing spaces.
For example, with left (or right) aligned text each space generates the
glue(width 0, stretch X, shrink 0)
penalty(value 0)
glue(width word-space-width, stretch -X, shrink 0)
If the line is not broken after the first glue element, the overall sretch
of these elements is 0: so, the total available stretch of each line is
always X, regardless of the number of spaces, while with the justified-text
strategy the more spaces are in a line, the more stretch that line will
The computed adjustment ratio refers to this constant value, and it is
completely useless if we want to justify the last line. We could use the
computed difference to calculate the space adjustment, but we don't know how
many spaces there are in the line.
Setting \parfillskip to 0 does not avail, as it just forces the algorithm to
find lines all with the same width.

Maybe, with text-align = left, right or center and text-align-last =
justify we should use the justified-text strategy with stretchable and
unshrinkable spaces.


Re: InlineLMs

2004-06-03 Thread Luca Furini

Simon, thanks for all the information.

At the moment, I am working on LeafNodeLM (now fo:leader works)
and InlineStackingLM.
I think I'll be able to post a second patch next week.


Re: Justification and line breaking

2004-05-25 Thread Luca Furini

Chris Bowditch wrote:

 Here is the stack trace, which is the same for both documents:

 Any thoughts would be appreciated.

JThe method startParagraphs dereferences only knuthParagraphs and
textIndent, so maybe there is a missing line concerning their initialization.

KnuthParagraphs (the ArrayList whose item are the different
linefeed-separated paragraphs) is initialized just before calling

  // it's the first time this method is called
  knuthParagraphs = new ArrayList(); /* here it is */
  breakpoints = new ArrayList();

  // convert all the text in a sequence of paragraphs made
  // of KnuthBox, KnuthGlue and KnuthPenalty objects
  Paragraph knuthPar = new Paragraph();

While textIndent is initialized in the TLM constructor (but this was already
in the code, it's not a change of mine):

  bTextAlignmentLast = blockProps.textAlignLast;
  textIndent = blockProps.firstIndent; /* here it is */
  hyphProps = propMgr.getHyphenationProps();

I hope this can help you; thanks for testing my patch.


Re: Justification and line breaking

2004-05-25 Thread Luca Furini

Simon Pepping wrote:

 I changed the constructor call as well. I changed the logger calls
 from getLogger() to log. log is a member of
 AbstractLayoutManager. These are changes applied by Glen, and
 apparently Luca made his diff before they were applied.

Yes; more exactly, I forgot to apply these changes to the code I was
already working on.
To avoid this kind of problems, I will try to keep my code updated.

 I am not surprised that your complex document did not run well. If I
 understand the situation well, then the patch works for LineLM with
 TextLM childLMs. It does not work with any other childLM. I expect the
 other childLMs to use the null implementation of getNextKnuthElement()
 and therefore to contribute no areas.

You are completely right; by the way, what other child LMs could the LLM

 I do not believe that the patch is mature for committing to the trunk
 code. See above. Luca, do you share my view?

Completely: I submitted the patch to get a first feedback, comments,
suggestions ...

I thank all that spent some time looking at it and testing it, and I
apologize if I have not been able to clearly explain what my aim was.

I'll work on this patch for some more time, and I'll let you know of the


Re: Justification and line breaking

2004-05-21 Thread Luca Furini

Chris Bowditch wrote:

 I'm just starting to look at your patch now. First thing that strikes
 me, and this was pointed out to me before I became a committer. Please
 try to avoid commenting out large chunks of code. If the code is no
 longer needed please delete it. If we need to go back to the old code,
 we can always get the previous versions from CVS.

Ok, sorry.

 Second thing, are you just ignoring word spacing and letter spacing at the
 moment? I think one of the other FOP Dev's hit the problem of how do we
 tell if property was specified or is it just the default before? Hopefully
 they will speak up with a work around. If we do go without the letter
 spacing property, then they really ought to stay as NotYetImplemented in the
 FOPropertyMapping and just put come comments into the code to explain why
 it cant be implemented ATM.

No, both properties are implemented (the letter spacing only partially).
When the word spacing is not .min == .max == 0 it is used to define the
inline progression dimension of a space; otherwise, a default value is used
Letter spacing is used too, but at the moment only its .opt value is used,
so letter spacing will be the same in every line of the paragraph. The
problem is that Knuth's algorithm doesn't explicitly take into account
letter space.


Re: Justification and line breaking

2004-05-20 Thread Luca Furini

Peter wrote:

 Do you know of a web-accessible version of the paper, or summary of the

Unfortunately I don't.
In the bugzilla issue I'm going to try and summarize it.


Justification and line breaking

2004-05-19 Thread Luca Furini

   Hi all

I am still thinking about justification and the more general problem of
line-breaking, and I have come to think that it's quite strange that the
LineLayoutManager should make choices about breaking points using only the
information provided by the TextLayoutManagers, while it should have a
wider knowledge of all the text.
(I see bug 28706 as an example of this strangeness: the LLM wants the TLM
to say if there is other text after the returned BreakPoss, but the TLM
doesn't know of the other TLMs' text)

At the moment, lines are built one at a time, and in normal cases only
underfull lines are taken into account: as both bpDim and availIPD have
.min == .opt == .max, no BreakPoss is added to vecPossEnd and the chosen
one is simply the last short BP returned by a TLM.
Even if bpDim had .min != .max, the choice would be made between a few
alternatives for the current line, without considering what will happen
next; this could generate an output alternating tight and loose lines,
which is not very beautiful.

So, I have tried to implement Knuth's line-breaking algorithm [1], which
calculates breaking points after having gathered information about a whole
Here are a few advantages of this algorithm:
- first of all, the output is very beautiful; there is not a big
  difference in width between spaces in consecutive lines, and the max
  space width is smaller than before
- the interaction between LLM and TLM is quite the same; the TLM returns a
  different kind of objects, much smaller
- the TLM code is simplified a bit, as it has no more to handle leading
  spaces, or calculate flags (which IMO are rather line-related than
- the LLM now can quite easily handle properties such as text-indent,
  text-align-last, word-spacing and letter-spacing

Could I open a bugzilla issue and attach a patch? It would be quite a raw
patch, as I took some short cuts to make it work and there could be some
useless variables, anyway it works and could be used to show the quality
of the output. I have tested it with text-only blocks, so I don't know
what could happen in more complex situations.


[1] D. E. Knuth and M. F. Plass, Breaking paragraphs into lines; I found
this essay in D. E. Knuth, Digital typography, published by CSLI

Re: [Bug 27773] - [PATCH] Hyphenation

2004-04-16 Thread Luca Furini

 I think it would be better to report this item in a patch of
 its own. It really is a new issue.

Ok, sorry.
I'm going to do as you suggest.