DO NOT REPLY [Bug 29124] New: - New line breaking algorithm

bugzilla Thu, 20 May 2004 08:29:24 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=29124>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=29124

New line breaking algorithm

           Summary: New line breaking algorithm
           Product: Fop
           Version: 1.0dev
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: general
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Ok, so here is the patch I was talking about in the fop-dev mailing list.
I'm going to attach:
- the patch concerning existing files
- 5 new .java files (diff did not consider them)
- a few test fo files showing some properties (text-indent, text-align-last, 
word-spacing & letter-spacing)
- an fo file with some long paragraphs, which can be used to make space width 
comparison; for example, HEAD and xep have very large spaces in some lines of 
the first paragraph.

I try to summarize briefly the algorithm, but I'm not so sure I can be both 
clear and exhaustive :-) 

The first step in the algorithm is converting the paragraph into a sequence of 
elements; there are 3 kind of elements:
- "box" elements, representing words or whatever else; a box has a width
- "glue" elements, representing adjustable spaces between boxes; a glue has an 
optimal width, a potential stretch and a potential shrink, i.e. its length is 
in the range [width - shrink, width + stretch]
- "penalty" items represents positions in which a line could be broken; a 
penalty has a width (i.e. the "-" width when hyphenating), a penalty value and 
a boolean; the penalty value is a kind of "aesthetic cost", and it could be a 
finite number, +infinite (where there can't be a break) or -infinite (a forced 
break); penalties with value = 0 can be omitted

For example, the text "life is beautiful" will generate the sequence:

 box - glue - box - glue - box - penalty - box - penalty - box
  ^            ^            ^       ^       ^       ^       ^
  |            |            |       |       |       |       |
 life         is           beau     -       ti      -      ful

Then, the algorithm finds the breaking points:
For each element in the paragraph, it verifies if this element is a "feasible 
break", i.e. there is a previous "active" feasible break such that the 
elements between the two break can fill a line, using the available stretch 
and shrink.
If the element is a feasible break, the algorithm computes a "score" depending 
on the needed adjustment and the penalty (the lower is the score, the better 
is the break), and create a structure representing this element as an active 
feasible break.
Note that the number of active feasible break at each moment is quite small, 
as feasible breaks are deactivated if they are too close to the new one.

Some implementation notes:
1)
In order to minimize changes in other files, the LLM interacts with the TLM 
calling brand new methods that I added to the LayoutManager interface, 
providing null implementation in the AbstractLayoutManager. I also added a 
KnuthPossPosIter, which is almost a clone of BreakPossPosIter.
I think a much more elegant solution would be having the LLM call 
getNextBreakPoss, and casting the returned object to KnuthElement. Maybe 
BreakPoss and KnuthElement could be subclasses of a same class, abstract, 
containing only a Position object and the method to access it:

        BreakPoss (same name, but a different class)
            |
     ---------------
     |             |
 KnuthElement    TheClassPreviouslyKnownAsBreakPoss

All LM call getNextBreakPoss, but the LLM casts to KnuthElement and the other 
LM cast to "TheClassPreviouslyKnownAsBreakPoss"

2)
The letter space is at the moment set to .min = .opt and .max = .opt as there 
are a few problems with adjustable letter space (Knuth's original algorithm 
does not handle letter space); I also have some problems in counting the 
letter spaces in a hyphenated word.

3)
The LLM tries the first time to find breaking point without word hyphenation; 
only if it doesn't find a set of break points, it calls again the algorithm 
after having hyphenated all words, so this patch includes the ones concerning 
hyphenation (bug 27773) and hyphenation of word with punctuation marks (bug 
28431).
I tried to use the old information (about the whole words) as much as 
possible, but I don't know whether this saves some time or it is just an 
unneeded complication.

4)
Concerning word and letter spacing, one problem is that at the moment I see no 
way to decide whether the fo property letter-spacing (or word-spacing) was set 
to 0 (and fop should respect this value), or it wasn't set and 0 is its 
default value (and fop could modify it in order to justify).

Comments, suggestions, etc. are most welcome!

Regards
    Luca

DO NOT REPLY [Bug 29124] New: - New line breaking algorithm

Reply via email to