DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=29124>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm Summary: New line breaking algorithm Product: Fop Version: 1.0dev Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: general AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Ok, so here is the patch I was talking about in the fop-dev mailing list. I'm going to attach: - the patch concerning existing files - 5 new .java files (diff did not consider them) - a few test fo files showing some properties (text-indent, text-align-last, word-spacing & letter-spacing) - an fo file with some long paragraphs, which can be used to make space width comparison; for example, HEAD and xep have very large spaces in some lines of the first paragraph. I try to summarize briefly the algorithm, but I'm not so sure I can be both clear and exhaustive :-) The first step in the algorithm is converting the paragraph into a sequence of elements; there are 3 kind of elements: - "box" elements, representing words or whatever else; a box has a width - "glue" elements, representing adjustable spaces between boxes; a glue has an optimal width, a potential stretch and a potential shrink, i.e. its length is in the range [width - shrink, width + stretch] - "penalty" items represents positions in which a line could be broken; a penalty has a width (i.e. the "-" width when hyphenating), a penalty value and a boolean; the penalty value is a kind of "aesthetic cost", and it could be a finite number, +infinite (where there can't be a break) or -infinite (a forced break); penalties with value = 0 can be omitted For example, the text "life is beautiful" will generate the sequence: box - glue - box - glue - box - penalty - box - penalty - box ^ ^ ^ ^ ^ ^ ^ | | | | | | | life is beau - ti - ful Then, the algorithm finds the breaking points: For each element in the paragraph, it verifies if this element is a "feasible break", i.e. there is a previous "active" feasible break such that the elements between the two break can fill a line, using the available stretch and shrink. If the element is a feasible break, the algorithm computes a "score" depending on the needed adjustment and the penalty (the lower is the score, the better is the break), and create a structure representing this element as an active feasible break. Note that the number of active feasible break at each moment is quite small, as feasible breaks are deactivated if they are too close to the new one. Some implementation notes: 1) In order to minimize changes in other files, the LLM interacts with the TLM calling brand new methods that I added to the LayoutManager interface, providing null implementation in the AbstractLayoutManager. I also added a KnuthPossPosIter, which is almost a clone of BreakPossPosIter. I think a much more elegant solution would be having the LLM call getNextBreakPoss, and casting the returned object to KnuthElement. Maybe BreakPoss and KnuthElement could be subclasses of a same class, abstract, containing only a Position object and the method to access it: BreakPoss (same name, but a different class) | --------------- | | KnuthElement TheClassPreviouslyKnownAsBreakPoss All LM call getNextBreakPoss, but the LLM casts to KnuthElement and the other LM cast to "TheClassPreviouslyKnownAsBreakPoss" 2) The letter space is at the moment set to .min = .opt and .max = .opt as there are a few problems with adjustable letter space (Knuth's original algorithm does not handle letter space); I also have some problems in counting the letter spaces in a hyphenated word. 3) The LLM tries the first time to find breaking point without word hyphenation; only if it doesn't find a set of break points, it calls again the algorithm after having hyphenated all words, so this patch includes the ones concerning hyphenation (bug 27773) and hyphenation of word with punctuation marks (bug 28431). I tried to use the old information (about the whole words) as much as possible, but I don't know whether this saves some time or it is just an unneeded complication. 4) Concerning word and letter spacing, one problem is that at the moment I see no way to decide whether the fo property letter-spacing (or word-spacing) was set to 0 (and fop should respect this value), or it wasn't set and 0 is its default value (and fop could modify it in order to justify). Comments, suggestions, etc. are most welcome! Regards Luca