Hyphenation
Hi all! I am an italian student of the University of Bologna. I have tried to solve a few problems concerning hyphenation, in particular: - show the '-' at the end of the hyphenated lines - use the fo:hyphenate property to enable hyphenation, instead of the alignment - specify the hyphenation character using the fo:hyphenation-character property So, here are the modifications I suggest; I have done some (simple) tests and they seem to work. *** layoutmgr/LineLayoutManager.java *** I changed the condition tested to decide whether try hyphenate, now it only looks at the appropriate fo property 24a25 > import org.apache.fop.fo.Constants; 212c213,214 < if (bTextAlignment == TextAlign.JUSTIFY || prevBP == null) { --- > //if (bTextAlignment == TextAlign.JUSTIFY || prevBP == null) { > if (hyphProps.hyphenate == Constants.TRUE) { *** fo/TextInfo.java *** I added the hyphenation character, and removed the bCanHyphenate boolean, which is no longer necessary, because of the previous change 54c54,56 < public boolean bCanHyphenate = true; --- > //public boolean bCanHyphenate = true; > /** the hyphenation character to be used */ > public char hyphChar = '-'; *** fo/PropertyManager.java *** Sets the hyphenation character in the TextInfo object 496a497,498 > textInfo.hyphChar = this.propertyList.get( > PR_HYPHENATION_CHARACTER).getCharacter(); *** layoutmgr/TextLayoutManager.java *** I added a iHyphenated boolean in the AreaInfo class; it is set to true according to the flags of the break possibility chosen, and is value is looked at before creating the areas, to decide whether to add the hyphenation character 49a50,52 > private boolean iHyphenated; > //public AreaInfo(short iSIndex, short iBIndex, short iWS, > // MinOptMax ipd) { 51c54,55 < MinOptMax ipd) { --- > MinOptMax ipd, boolean iHyp) { > iHyphenated = iHyp; 114c118,119 < hyphIPD = textInfo.fs.getCharWidth('-'); --- > //hyphIPD = textInfo.fs.getCharWidth('-'); > hyphIPD = textInfo.fs.getCharWidth(textInfo.hyphChar); 208c213,214 < if (textArrayLength < iStopIndex || textInfo.bCanHyphenate == false) { --- > //if (textArrayLength < iStopIndex || textInfo.bCanHyphenate == false) { > if (textArrayLength < iStopIndex) { 401a408,409 > //vecAreaInfo.add( > // new AreaInfo(iWordStart, iNextStart, iWScount, ipd)); 403c411 < new AreaInfo(iWordStart, iNextStart, iWScount, ipd)); --- > new AreaInfo(iWordStart, iNextStart, iWScount, ipd, ((flags & > BreakPoss.HYPHENATED) != 0))); 499a508,512 > > // add hyphenation character if the last word is hyphenated > if (ai.iHyphenated) { > str += textInfo.hyphChar; > } Ok, that's all, I hope this can be useful. Bye Luca
Re: [Bug 27773] - [PATCH] Hyphenation
> I think it would be better to report this item in a patch of > its own. It really is a new issue. Ok, sorry. I'm going to do as you suggest. Luca
Justification and line breaking
Hi all I am still thinking about justification and the more general problem of line-breaking, and I have come to think that it's quite "strange" that the LineLayoutManager should make choices about breaking points using only the information provided by the TextLayoutManagers, while it should have a wider knowledge of all the text. (I see bug 28706 as an example of this strangeness: the LLM wants the TLM to say if there is other text after the returned BreakPoss, but the TLM doesn't know of the other TLMs' text) At the moment, lines are built one at a time, and in "normal" cases only underfull lines are taken into account: as both bpDim and availIPD have .min == .opt == .max, no BreakPoss is added to vecPossEnd and the chosen one is simply the last "short" BP returned by a TLM. Even if bpDim had .min != .max, the choice would be made between a few alternatives for the current line, without considering what will happen next; this could generate an output alternating tight and loose lines, which is not very beautiful. So, I have tried to implement Knuth's line-breaking algorithm [1], which calculates breaking points after having gathered information about a whole paragraph. Here are a few advantages of this algorithm: - first of all, the output is very beautiful; there is not a big difference in width between spaces in consecutive lines, and the max space width is smaller than before - the interaction between LLM and TLM is quite the same; the TLM returns a different kind of objects, much smaller - the TLM code is simplified a bit, as it has no more to handle leading spaces, or calculate flags (which IMO are rather line-related than text-related) - the LLM now can quite easily handle properties such as text-indent, text-align-last, word-spacing and letter-spacing Could I open a bugzilla issue and attach a patch? It would be quite a raw patch, as I took some short cuts to make it work and there could be some useless variables, anyway it works and could be used to show the quality of the output. I have tested it with text-only blocks, so I don't know what could happen in more complex situations. Regards Luca [1] D. E. Knuth and M. F. Plass, "Breaking paragraphs into lines"; I found this essay in D. E. Knuth, "Digital typography", published by CSLI Publications
Re: Justification and line breaking
Peter wrote: > Do you know of a web-accessible version of the paper, or summary of the > algorithm? Unfortunately I don't. In the bugzilla issue I'm going to try and summarize it. Regards Luca
Re: Applying Large Patches (was Re: Justification and line breaking)
Chris Bowditch wrote: > I think the easiest solution will be for > Luca to supply his complete LineLayoutManager.java and > TextLayoutManager.java (the two files that have more than > 1000 lines changed) Would you mind attaching them to the patch Luca? Ok, I'm going to do it. By the way, the code I was working on was not the latest one, but I did not modify files changed later (maybe the XMLrenderer, but it's a small change). Luca
Re: Justification and line breaking
Chris Bowditch wrote: > I'm just starting to look at your patch now. First thing that strikes > me, and this was pointed out to me before I became a committer. Please > try to avoid commenting out large chunks of code. If the code is no > longer needed please delete it. If we need to go back to the old code, > we can always get the previous versions from CVS. Ok, sorry. > Second thing, are you just ignoring word spacing and letter spacing at the > moment? I think one of the other FOP Dev's hit the problem of "how do we > tell if property was specified or is it just the default" before? Hopefully > they will speak up with a work around. If we do go without the letter > spacing property, then they really ought to stay as NotYetImplemented in the > FOPropertyMapping and just put come comments into the code to explain why > it cant be implemented ATM. No, both properties are implemented (the letter spacing only partially). When the word spacing is not .min == .max == 0 it is used to define the inline progression dimension of a space; otherwise, a default value is used instead. Letter spacing is used too, but at the moment only its .opt value is used, so letter spacing will be the same in every line of the paragraph. The problem is that Knuth's algorithm doesn't explicitly take into account letter space. Luca
Re: Justification and line breaking
Chris Bowditch wrote: > Here is the stack trace, which is the same for both documents: > [...] > > Any thoughts would be appreciated. JThe method startParagraphs dereferences only knuthParagraphs and textIndent, so maybe there is a missing line concerning their initialization. KnuthParagraphs (the ArrayList whose item are the different linefeed-separated paragraphs) is initialized just before calling startParagraph: ... // it's the first time this method is called knuthParagraphs = new ArrayList(); /* here it is */ breakpoints = new ArrayList(); // convert all the text in a sequence of paragraphs made // of KnuthBox, KnuthGlue and KnuthPenalty objects Paragraph knuthPar = new Paragraph(); knuthPar.startParagraph(); ... While textIndent is initialized in the TLM constructor (but this was already in the code, it's not a change of mine): ... bTextAlignmentLast = blockProps.textAlignLast; textIndent = blockProps.firstIndent; /* here it is */ hyphProps = propMgr.getHyphenationProps(); ... I hope this can help you; thanks for testing my patch. Regards Luca
Re: Justification and line breaking
Simon Pepping wrote: > I changed the constructor call as well. I changed the logger calls > from getLogger() to log. log is a member of > AbstractLayoutManager. These are changes applied by Glen, and > apparently Luca made his diff before they were applied. Yes; more exactly, I forgot to apply these changes to the code I was already working on. To avoid this kind of problems, I will try to keep my code updated. > I am not surprised that your complex document did not run well. If I > understand the situation well, then the patch works for LineLM with > TextLM childLMs. It does not work with any other childLM. I expect the > other childLMs to use the null implementation of getNextKnuthElement() > and therefore to contribute no areas. You are completely right; by the way, what other child LMs could the LLM have? > I do not believe that the patch is mature for committing to the trunk > code. See above. Luca, do you share my view? Completely: I submitted the patch to get a first feedback, comments, suggestions ... I thank all that spent some time looking at it and testing it, and I apologize if I have not been able to clearly explain what my aim was. I'll work on this patch for some more time, and I'll let you know of the progress. Regards Luca
Re: InlineLMs
Simon, thanks for all the information. At the moment, I am working on LeafNodeLM (now fo:leader works) and InlineStackingLM. I think I'll be able to post a second patch next week. Regards Luca
Re: [Bug 29124] - New line breaking algorithm
Simon Pepping wrote: > > - word spacing and letter spacing are now fully implemented, they can > > both have MinOptMax values; but I am still thinking about how to > > differentiate a user-defined zero value from a default zero value ... > > You cannot. A default value is a user-defined value supplied by the > system to save the user the trouble of always having to enter a > value. It is just a convenience, and you cannot attach a different > meaning to it. You are right: default values must be respected no less than expressed values, I asked the wrong question. The point is that the XSL recommendation states that the default word-spacing value is "normal", meaning "The normal inter-word space, as defined by the current font and/or the UserAgent", not zero. At the moment, the SpaceVal variable in the TextInfo object used by the TLM has .getSpace().min == .getSpace().max == 0 even if the word-spacing property was not set in the fo document. So, the right question is: how can the TLM see if the word-spacing property value is "normal"? > > - text-align-last is partially implemented; text-align-last = "justify" > > works only if text-align = "justify" too; this is because Knuth's > > algorithm doesn't provide for a different alignment for the last line. > > TeX uses glue to achieve this, \parfillskip. It is the special amount > of glue appended to the last line. In the TeXbook, p. 99, Knuth > describes it as 'the special trick that allows the final line of a > paragraph to be shorter than the others'. Setting \parfillskip to 0 > removes this ability. Usually \parfillskip has infinite > stretchability. I fear this trick works only with justified text. Knuth's book suggests a way to implement right/left and center alignment which is not just "justify text and then ignore the computed adjustment": this different strategy involves using special sequences of elements representing spaces. For example, with left (or right) aligned text each space generates the sequence: glue(width 0, stretch X, shrink 0) penalty(value 0) glue(width word-space-width, stretch -X, shrink 0) If the line is not broken after the first glue element, the overall sretch of these elements is 0: so, the total available stretch of each line is always X, regardless of the number of spaces, while with the justified-text strategy the more spaces are in a line, the more stretch that line will have. The computed adjustment ratio refers to this constant value, and it is completely useless if we want to justify the last line. We could use the computed difference to calculate the space adjustment, but we don't know how many spaces there are in the line. Setting \parfillskip to 0 does not avail, as it just forces the algorithm to find lines all with the same width. Maybe, with text-align = "left", "right" or "center" and text-align-last = "justify" we should use the justified-text strategy with stretchable and unshrinkable spaces. Regards, Luca
Re: DO NOT REPLY [Bug 29124] - New line breaking algorithm
Simon Pepping wrote: > Nested inline and other LMs: The output contains errors, see the > comments in the text. The errors occur when hyphenation is set to > true. Fixed: there were errors in the method addALetterSpaceTo of LeafNodeLM and InlineStackingLM. I also found a bug in the LeafNodeLM.addAreas method, affecting HEAD too: the area is added to the area tree (with parentLM.addChild(curArea)) *before* widthAdjustArea is called, so its width is not correctly added to the inline parent width and the output sometimes shows overlapped text (when there is another child of inline parent after the leader area). > Justification: This is a test fo you submitted earlier. According to > the text in the file the second block should be hyphenated; it is > not. Should it still be hyphenated, or can this not be enforced with > the Knuth algorithm and text-align="start"? I cannot find a hyphenate property in the fo file you attached, so I'm not sure whether I understand what you mean. Anyway, hyphenate = "true" means, according to the recommendation (7.9.4), that "hyphenation may be used in the line-breaking algorithm", not that it *must* be used. As hyphenation is time-expansive and bad-looking, I think it should be used only if necessary. > No breakpoints: An exception is thrown, at > LineLayoutManager.getNextBreakPoss(LineLayoutManager.java:495). It > occurs because breakpoints has size 0; the third call to > findBreakingPoints also returned 0. This should not be possible; the > algorithm should always return a breakpoint. Right, I completely forgot to provide a fallback in case the algorithm doesn't find a good set of breaking points. I added a boolean argument called "force" to findBreakingPoints: if it is true, and after the main loop there are no active nodes, the last deactivated node is used to create LineBreakPositions. There will zero or more "good" lines followed by a single line including all the remaining content (this line will obviously get off the right margin). The method findBreakingPoints will be called no more than three times: I) no hyphenation, adjustment ratios must be <= 1 II) hyphenation (if allowed), or ratios up to 5 III) ratios up to 20, and if necessary force the creation of LineBreakPositions > A few small remarks: > > Can you move the following log messages to trace log level: > [DEBUG] AbstractLayoutManager - - Word to hyphenate: We Done > In TextLM, returning null for a forced LF is not an idea that I like, > because it overloads the null return value. Cannot you return an > special Knuth element for LF? Alternatively, you could return null and > process the paragraph. The second paragraph would then be produced and > processed later. A preserved linefeed can be represented by a penalty item whose value is -infinite: +inf means that there can't be a break here, -inf means that there must be a break (as there can't be a better breakpoint). Preserved linefeeds inside inlines are much more problematical than I first thought, but they should work now: I had to add a List argument to the applyChanges() and getChangedKnuthElements() methods, to tell an ISLM which children it has to consider. > InlineStackingLM.getNextKnuthElements: 'if (lc.startsNewArea())' no > longer used? I tried to preserve the existing code as much as possible, so I didn't touch that if statement. Maybe I removed some lines in the LineLM so that lc.startsNewArea is never true? Regards, Luca
Re: Handling of text-align="justify" when nesting blocks
Arnd Beißner wrote: > In the example, line 2 is neither the last nor the only line of a block, > and it's also not a line ending in U+000A, so it should be justified. Section 7.15.10 of the recommendation states that the property text-align-last "Specifies the alignment of the last line-area child of the last block-area generated and returned by the formatting object, *and to any line-area generated by the formatting object whose following sibling is a block-area that is not a line-area*, and any lines in the block ending in U+000A." So, as a fo:block generates a block-area that is not a line-area (its children are line-areas), I think line 2 should *not* be justified. Regards Luca
Re: DO NOT REPLY [Bug29124] - New line breaking algorithm
Simon Pepping wrote: > You mention that you have not implemented the Knuth algorithm for > ContentLM. Would it be difficult to do that? No, I have almost done. I think I will be able to attach a patch including this fix this afternoon. Regards Luca
Re: DO NOT REPLY [Bug 29124] - New line breaking algorithm
Simon Pepping wrote: > Still to be done: > > - Resolve the regressions mentioned above. As concerns leader with use content, patch created and successfully tested. The ContentLM calls getNextKnuthElements on his child InlineStackingLM, uses the returned elements to calculate the pattern width and returns them to the LeaderLM. The LeaderLM uses them when calling addAreas. I also found a bug affecting leaders with leader-pattern = "dots": the TextArea with the dot (created in LeaderLM.getLeaderInlineArea) had width = 0; calling setWidth() fixes this problem. There is still a little difference between a leader with leader-pattern = "dots" and one with "use-content" and a single dot as content: the former is placed a bit over the baseline, but I couldn't find the reason. Note that using the fo file xml-fop/examples/fo/basic/leader.fo to test the patch you won't see the leaders with leader-pattern = "use-content", as they don't have a width property and the default .opt value (12pt) is < than the pattern width. Setting a larger width, or text-align-last = "justify", makes the leaders visible. > - I support the idea to create an InlineLayoutManager interface, which > extends LayoutManager. Done, same patch (or maybe I should create a different one?). I also removed the getWordSpaceIPD() method, as I find out that a constant value works better: the LineLM and its child must use the same value, or the result is not always correct. > 1. > Can we be sure that U+A is always alone or the first item in a > textArray; does this not depend on the Parser, how it calls the SAX > characters method? Right, it's better to handle the most general case. The patch will fix this too. I will try to fix the other points reported by Simon as soon as possible. Regards Luca
Re: DO NOT REPLY [Bug 31206] - [PATCH] Improvements over the new line breaking algorithm
Simon Pepping wrote: > 1. InlineStackingLM implements InlineLM. LineLM extends > InlineStackingLM and thus implements InlineLM. IMHO it should > not. Implementing InlineLM should be equivalent to > generatesInlineAreas returning true. You are right, it's quite strange, but the LineLM still uses a few methods inherited from InlineStackingLM. This was not due to the new algorithm, it was already in the "old" code; I'm going to see if they still do something useful ... > 2. TextLM now extends LeafNodeLM instead of AbstractLM. What is the > gain? I see no related changes in TextLM. There isn't at the moment any practical gain, I just thought that, as a text node has no children, a TextLM is a (special case of) LeafNodeLM. > 3. In LineLM: > // this constant is used to create elements when text-align is center: > // every TextLM descendant of LineLM must use the same value, > // otherwise the line breaking algorithm does not find the right > // break point > public static final int DEFAULT_SPACE_WIDTH = 3336; > private static final int INFINITE_RATIO = 1000; > > If these are static final, they might be better placed in > InlineLM. Alternatively, they might be attributes of the LineLM > object, which allows changing them per paragraph, e.g. depending on > the font. But then the problem arises how to propagate them to the > descendant LMs. I decided to change and use a constant because the important thing is to have the same value used by every LM, but this isn't the perfect solution; if we try to center a short object (a single word, for example) in a long line, it is likely that the algorithm fails because there isn't enough stretchability in the line. Maybe it's better to have the LineLM compute a value depending on the line lenght and the maximum adjustment ratio; the child LM should ask the LineLM for this value. > 4. The textheight of the large font is rather large. The property > lineheight is not followed (reproduce existing behaviour). > > 5. LineLayoutManager:675: line is always 3, so that firstElementIndex > = 1 for the first line, and the first box is skipped in the line > height calculation. The second version of the patch, which I'm going to attach to the Bugzilla issue, fixes these errors. It also implement the "vertical-align" property: now the values of top, bottom, middle and baseline should be supported. I made a few tests with fo:inline, fo:character and fo:external-graphic, and it seems to work. IMPORTANT: I had to revert Finn Bock's changes to the PDFRenderer (dated 2004/09/22 13:22:16), otherwise leaders with svg use-content produce errors in the pdf output. There isn't any run-time error, but when I try to open the pdf file, I get these warnings: - There was an error processing a page. Wrong operand type. - Illegal operation 'q' inside a text object. - Wrong operand type. and the page with the svg leaders is left empty. I think it could be something involving the saveGraphicsState() method. Still to be done: - remove unused methods and variables - simplify InlineStackingLM methods as suggested by Simon I'll try and fix these points as soon as possible. Regards Luca
Re: [VOTE:RESULT] Luca Furini for Committer
Jeremias Maerki wrote: > Luca, > > is your CLA on its way? I sent it by fax a few minutes ago (sorry for the delay). Do I have to send it by mail too? Regards Luca
Re: cvs commit: xml-fop/src/java/org/apache/fop/layoutmgr LineLayoutManager.java
Jeremias Maerki wrote: > jeremias2004/10/10 04:21:29 > > Modified:src/java/org/apache/fop/layoutmgr LineLayoutManager.java > Log: > This is supposed to fix a problem that surfaced with Finn's latest > change in PageSequence. There was an ArrayIndexOutOfBoundsException here in > LineLayoutManager when a static region was layouted for the second page > (instance is reused). It seems to me that "iCurrParIndex" could be made a > method-local variable instead of an instance variable. It took me a while to notice that these changes cause another problem for blocks containing preserved linefeeds and whose lines are parted on different pages. For these blocks, the array named knuthParagraphs has more than one element (the block will generate more than one paragraph) and the method LineLM.addAreas() is called more than once, so it isn't always right to set iCurrParIndex to 0. I'm attaching a fo file showing this problem. So, iCurrParIndex must be set to 0 in addAreas only for LineLM descendant of a StaticContentLM. But maybe it isn't necessary to call addAreas more than once for static content: as the static content is repeated on each page, wouldn't it be better to store the static area sub-tree and re-use it? Regards, Luca http://www.w3.org/1999/XSL/Format";> First paragraph in the static content. Second paragraph in the static content. Text Text The following block has a preserved linefeed. Text with preserved linefeeds, bla first first first first first first first first first first first first; this is the first paragraph. This is the second paragraph, after a preserved linefeed: second second second second second second second second. The following block has a preserved linefeed. Text with preserved linefeeds, bla first first first first first first first first first first first first; this is the first paragraph. This is the second paragraph, after a preserved linefeed: second second second second second second second second fisecond rst.
Re: commenting the Knuth code/centering issue
Glen Mazza wrote: > The title centers correctly in 0.20.5, but is left-justified in 1.0. > [...] > Luca, are you looking at this issue of text alignment in general? Yes, this happens when the text is short and the algorithm is not able to find a set of breaking points: the fallback method can't compute the needed indent, and the text is left-aligned instead of centered. One way to solve this problem is to use, instead of the constant DEFAULT_SPACE_WIDTH, a value computed according to the line width and the maximum acceptable adjustment; this would prevent the algorithm from failing. This value should be computed by the LineLM, and the InlineLMs have to know it. Another way is to use, instead of considerLegalBreak(), a different method to evaluate a possible break point if no breakpoints are found: this alternative method would be much simpler, and would re-create the behaviour of the old getNextBreakPoss method (add content to the existing line if possible, otherwise start another line). I think this could be a much better solution: - it's simpler: it's just another private method in the LineLM - it would solve not only this issue with centered text, but also the other situations in which the Knuth's algorithm fails (for example, when there is a word larger than the line: see Simon's test file id=12494 for bug 29124) I have already done it, so if you agree I could create a patch in a short time. > Also, any chance we can get the Knuth classes commented so we have a > better idea what KnuthBox, KnuthPenalty, KnuthGlue, etc. are for? Sure, I should have done this before! I'll try and comment these files as soon as possible. Regards, Luca
Re: commenting the Knuth code/centering issue
Jeremias Maerki wrote: > No, I don't think Luca has write access, yet. I know by now that the CLA > is recorded but the account hasn't been created, yet, although I've > already sent a reminder. I have just received the e-mail confirming the creation of my account. I am now reading the http://www.apache.org/dev/ documents, so I should soon be able to work. Regards, Luca
Re: commenting the Knuth code/centering issue
Glen Mazza wrote: > [BTW, I'm considering getting that Digital Typography > book by Knuth you had mentioned earlier. Do you > recommend it? (I was thinking that given all the time > I spend on FOP I should start looking a little more at > the scientific aspects of this work.)] I think it's very interesting. It is a collection of essays on various topics, and besides the one about line breaking (with fabulous features like paragraphs with different sized lines ...) there is one about bidirectional text, one about fonts ... It is surely an authoritative source on such matters. Regards, Luca
Re: commenting the Knuth code/centering issue
I have tried to add some comments to the Knuth[Element, Box, Glue, Penalty] classes. As I am not sure they are clear enough (I'm not even sure they are written in a proper English! :-) ) I'd like to hear your opinions before committing them. Regards, Luca Index: KnuthBox.java === RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/layoutmgr/KnuthBox.java,v retrieving revision 1.2 diff -u -r1.2 KnuthBox.java --- KnuthBox.java 6 Sep 2004 19:07:12 - 1.2 +++ KnuthBox.java 11 Nov 2004 17:56:22 - @@ -18,6 +18,16 @@ package org.apache.fop.layoutmgr; +/** + * An instance of this class represents an unbreakable piece of content with + * fixed width: for example an image, a syllable (but only if letter spacing + * is constant), ... + * A KnuthBox is never a feasible breaking point. + * The represented piece of content is never suppressed. + * Besides the inherited methods and attributes, this class has some more + * attributes to store information about the content height and its vertical + * positioning, and the methods used to get them. + */ public class KnuthBox extends KnuthElement { private int lead; private int total; Index: KnuthElement.java === RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/layoutmgr/KnuthElement.java,v retrieving revision 1.2 diff -u -r1.2 KnuthElement.java --- KnuthElement.java 6 Sep 2004 19:07:12 - 1.2 +++ KnuthElement.java 11 Nov 2004 17:56:23 - @@ -18,6 +18,13 @@ package org.apache.fop.layoutmgr; +/** + * This is the super class for KnuthBox, KnuthGlue and KnuthPenalty. + * It stores information common to all sub classes, and the methods to get it: + * the width, a Position and a boolean marking KnuthElements used for some + * special feature (for example, the additional elements used to represent + * a space when text alignment is right, left or center). + */ public abstract class KnuthElement { public static final int KNUTH_BOX = 0; Index: KnuthGlue.java === RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/layoutmgr/KnuthGlue.java,v retrieving revision 1.2 diff -u -r1.2 KnuthGlue.java --- KnuthGlue.java 6 Sep 2004 19:07:12 - 1.2 +++ KnuthGlue.java 11 Nov 2004 17:56:23 - @@ -18,6 +18,28 @@ package org.apache.fop.layoutmgr; +/** + * An instance of this class represents a piece of content with adjustable + * width: for example a space between words of justified text. + * A KnuthGlue is a feasible breaking point only if it immediately follows + * a KnuthBox. + * The represented piece of content is suppressed if either the KnuthGlue + * is a chosen breaking point or there isn't any KnuthBox between the + * previous breaking point and the KnuthGlue itself. + * So, an unsuppressible piece of content with adjustable width, for example + * a leader or a word with adjustable letter space, cannot be represented + * by a single KnuthGlue; it can be represented using the sequence: + * KnuthBox(width = 0) + * KnuthPenalty(width = 0, penalty = infinity) + * KnuthGlue(...) + * KnuthBox(width = 0) + * where the infinity penalty avoids choosing the KnuthGlue as a breaking point + * and the 0-width KnuthBoxes prevent suppression. + * Besides the inherited methods and attributes, this class has two attributes + * used to store the stretchability (difference between max and opt width) and + * the shrinkability (difference between opt and min width), and the methods + * to get these values. + */ public class KnuthGlue extends KnuthElement { private int stretchability; private int shrinkability; Index: KnuthPenalty.java === RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/layoutmgr/KnuthPenalty.java,v retrieving revision 1.2 diff -u -r1.2 KnuthPenalty.java --- KnuthPenalty.java 6 Sep 2004 19:07:12 - 1.2 +++ KnuthPenalty.java 11 Nov 2004 17:56:23 - @@ -18,6 +18,21 @@ package org.apache.fop.layoutmgr; +/** + * An instance of this class represents information about a feasible + * breaking point; it does not represent any piece of content. + * A KnuthPenalty is a feasible breaking point unless its value is infinity; + * a KnuthPenalty whose value is -infinity represents a forced break. + * A KnuthPenalty is suppressed, and its width is ignored, if it is not a + * chosen breaking point; for example, a KnuthPenalty representing a + * hyphenation point has a width (the "-" width), which must be ignored if + * that point is not chosen as a breaking point. + * Besides the inherited methods and attributes, this class has two more + * attributes and the methods used to get them: the penalty value, which is + * a kind of "aesthetic cost" (the higher the value, the mor
Re: [Bug 32253] - Marker bugs
Simon Pepping wrote: > Marker extends FObjMixed, which adds InlineStackingLM. This is a > problem. In the new model there should be a strict separation between > LMs implementing InlineLevelLM and those that do not. InlineStackingLM > does not, and probably should not do. I still have to really understand InlineStackingLM, I find it very enigmatic! It generates inline areas, but it has LineLM as a subclass ... Maybe it should implement both getNextBreakPoss and getNextKnuthElements: looking at the retrieve-marker's parent and / or at the marker's children should be enough to decide whether to call the one or the other. Anyway, I was wondering whether it is really necessary to add a new LM. If the fo tree is fo:page-sequence ---+--- | | fo:static-content fo:flow | | ... ... | | ret. marker's parentmarker's parent | | fo:retrieve-markerfo:marker ---+--- | | chld1 chld2 at the moment, RetrieveMarkerLM tries to achieve this (in the LM tree): ... | parentLM | RetrieveMarkerLM | InlineStackingLM ---+--- | | chldLM1 chldLM2 but, as a marker can only have children which could replace its retrieve-marker, wouldn't it be better to have just: ... | parentLM ---+--- | | chldLM1 chldLM2 ? Regards, Luca
Re: [Bug 32253] - Marker bugs
Simon Pepping wrote: > Indeed. Something like ICLM is needed, which creates an inline area > containing the block areas. A block inside another block Normal text inner block normal text. creates 3 different paragraphs: - Normal text - inner block - normal text. and each paragraph's layout is unrelated to the other paragraphs' layout (there are 3 LineLM). A block inside an inline inside a block Normal text inline text 1 inner block inline text 2 normal text. creates: a) 3 different paragraphs too: - Normal text inline text 1 - inner block - inline text 2 normal text. or b) a single paragraph with all the text: - Normal text inline text 1 inner block inline text 2 normal text. ??? I'd say a), but I'm not sure. If this were true, there should be 3 different LineLMs. This is the LM tree at the moment: BlockLM1 | LineLM1 | -+- ||| TextLM InlineLM TextLM "Normal text" | "normal text." | -+- ||| TextLM BlockLM2 TextLM "inline text 1" | "inline text 2" LineLM2 | TextLM "inner block" LineLM1 tries to have get elements from all its chidren, and fails. But, even if it could be given the elements representing "inner block", it could layout them wrongly, because of the block properties: the inner block could have different alignment, borders, margins, indents, So, the LM tree could be: BlockLM1 --+- | || LineLM BlockLM2 LineLM --+--| -+- | || | | TextLM InlineLM LineLM InlineLMTextLM "Normal text" || |"normal text." || | TextLM TextLM TextLM "inline text 1" "inner block" "inline text 2" This modified tree can be "easily" obtained from the previous one: - the new BlockLM is created - if the LM which should add it to its children list is an InlineLevelLM or a LineLM, the new BlockLM is given to its parent, i.e. it will become a child of the nearest BlockLM ancestor - an instance of the LM which could not handle the new BlockLM (in the example, InlineLM son of LineLM) must be created in order to handle inline siblings of the inner fo:block. I hope this can help ... Regards, Luca
Re: Knuth linebreaking questions
Finn Bock wrote: > 1) What is the purpose of 2 glues for a normal space in END and START > alignment: > > new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false)); > new KnuthPenalty(0, 0, false, , true)); > new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true)); The purpose is to give each line (but the last one) the same stretchability, regardless of the number of spaces in it. If the penalty is not used (there is no line ending there) the overall effect of the 2 glues is a 0 stretchability and does not modify the line total; if the penalty is used (a line ends there) then the stretchability of the previous glue is added to the line total, which becomes 3 * wordSpaceIPD.opt because the previous space, as said before, added 0 (the following glue is suppressed). In justified text, a line with many spaces can be adjusted in order to be much shorter, or much longer. If left-aligned text used the same elements, the algorithm would find the same breaking points; but this time adjustment ratios are not used, so a line with many spaces would be too much longer, or too much shorter, than the other lines. Using these elements, the algorithm creates lines whose unadjusted width is quite the same. > and why isn't the min and max of wordspaceIPD used. Well, you just made me notice there is a little bug, LineLayoutManager.DEFAULT_SPACE_WIDTH should be used insted! :-) It's just a "magic number": the point is that every TextLM should use the same value. > 2) What does the threshold parameter to findBreakingPoints controll? > It seems to be a performance parameter which control the number of > active nodes, rather than a quality parameter. > Or to frame my question > differently, if threshold=1 finds a set of breaks, will threshold=5 > always pick the same set of breaks? Or can threshold=5 find a better set > of breaks? It controls both performance and quality: minimum quality. If threshold = 1 finds a set of breaks, it is the best possible set of breaks, because the adjustment ratio of each break is <= 1 which means that spaces and other adjustable objects will not need to be longer than their .max width. But with this optimal threshold the algorithm could fail, and find no set of breaking points; so, a try with a higher threshold must be done. If with threshold = 1 a set is found, with threshold = 5 the same set would be found, but it would take more time, because a greater number of active nodes are used. > 3) What is the reasoning for doing hyphenation only after threshold=1 > fails. Naive common sense tells me that if the user specify hyphenation > we should do hyphenation before finding line breaks. Finding hyphenation points is time-expansive (all words must be hyphenated, not only the ones "near a line's end"), the sequence of elements becomes longer, there are more feasible breaking points, and a line ending with a "-" is less beautiful; so I thought that if a set of breaking points could be find without hyphenation. I just took the "hyphenate" property as a suggestion instead of an order! :-) Note that the same algorithm with the same threshold could find a different set of breaking points with and without hyphenation, because the elements are different. Without hyphenation, spaces could need a little higher adjustment, for example. > 4) I've compared your code to tex_wrap > http://oedipus.sourceforge.net/texlib/ > and the main difference is in the way new KnuthNodes are added to the > active list. Is the BestRecords part of Knuth or is it your own > invention? Why is it only fitness_class'es in BestRecord that is higher > then minDemerits + incompatibleFitnessDemerit that is added to > activeList? Why not all fitness_class'es in BestRecords? At the moment I don't have the book at hand, but I am quite sure it's *not* an invention of mine! :-) As far as I can remember, the Knuth book uses 4 different variables, named C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!) and I just created this structure to store them. I'll try and find some time to look at this ... Thanks for your interest and your comments, they are most welcome! Regards Luca
Re: Good news: Jeremias has been elected as an ASF member!
> I have the great pleasure to announce that Jeremias Maerki has > been elected as an ASF member at the last member's meeting > during ApacheCon. Congratulations! Luca
Re: Knuth linebreaking questions
Finn Bock wrote: (starting from the second question) > And why not adjust the spacing within the user specified min/max for > START and END alignment? Should the user desire adjusted spaces, wouldn't it be better for him to specify justified alignment? :-) Seriously, the recommendation (at 7.16.2 "letter-spacing" and 7.16.8 "word-spacing") states that these spaces "may also be influenced by justification", but says nothing about start and end alignments. > I'm still not sure why it would be ok to ignore any user specified > min and max values of 'word-spacing' during START and END alignment. > If a user specifies a length range, what would the reason be for not > using it? Perhaps with additional DEFAULT_SPACE_WIDTH. When alignment is start or end, each space has always its .optimum width, so there is no need to look at the .minimum and .maximum: the user most preferred value is already used. But the knuth algorithm would not work if there were no elements with adjustable width (glue with stretchability and/or shrinkability); the actual value used is not very relevant, because the computed adjustment ratio will not be applied. > Ok, performance is indeed a fine reason, but IMHO such quality vs. > speed tradeoffs should eventually be made by the user rather than us. Simon told the same: # Note that in TeX such thresholds are user-adjustable parameters. I # think they should eventually be so in FOP too, for those of us who # have the most exquisite taste of line layout. and I think it's a good idea; the algorithm should: 1 find breaking points without hyphenation 2 hyphenate 3 find breaking points with hyphenation 4 decide which ones are "better" and point #4 uses the user-definable threshold; where should this constant be stored? Inside the code of LineLM or in a configuration file? Regards Luca
Re: Knuth linebreaking questions
Finn Bock wrote: > I tend to read that to mean that word spacing may be pushed beyond the > specified range by justification. And I would think that unjustified > alignment still has the option of using the word-spacing range but > ofcourse has to stay within the range. I'm not convinced ... The effect of having left-aligned text and adjustable word-spacing would be an output in which most lines are justified, but the ones in which the adjustment ratio would be > 1 ... I really don't think this would be better than having all lines left-aligned! :-) And if the user sets text-align="left" but does not explicitly sets word-spacing, and the default value is used, and for a lucky coincidence the algorithm find breaking points involving ratios < 1, the output would show justified lines, instead of the left-aligned lines the user would have likely expected. Regards Luca
Re: More questions on line breaking
Finn Bock wrote: > Ok, so it isn't really needed when the algorithm is implemented in java. > Just by having the previous node linked from within bestActiveNode is > enough to keep the inactive nodes alive. > > So inactiveList can be removed. You are right, I'm going to remove it. Thanks! Regards, Luca
Re: cvs commit: xml-fop/src/java/org/apache/fop/layoutmgrKnuthElement.java KnuthBox.java KnuthGlue.java KnuthPenalty.java
Glen Mazza wrote: > Luca, I think we should be using getWidth() instead of > getW(), correct? Yes, it would be much clearer! I'm going to rename: getW() -> getWidth() getY() -> getStretch() getZ() -> getShrink() getP() -> getPenaltyValue() The last name is quite long: I first thought of getPenalty() or getValue(), but they seemed less clear to me. Regards Luca
Re: Refactoring of knuth line breaking code.
Finn Bock wrote: > I've been playing around with the knuth line breaking code and made a > slight refactoring of it. [...] > These improvements could also be applied to the existing code, so I > think the more interesting point is the quality of the refactoring > job. I think this is a very good patch, and it could be applied as it is now: my last changes introduced a first fit algorithm used if the Knuth's one failed, but your re-starting strategy seems much better. Regards Luca
Re: cvs commit: xml-fop/test/layoutengine/testcases normal-breaking2.xml
Jeremias Maerki wrote: > If someone has an idea about the ArrayOutOfBoundsException, all the > better. I'm currently trying to debug that thing. It's because of this line in LineLM.addAreas(): iCurrParIndex = 0; If a block has properly handled preserved linefeeds, it will generate more than one paragraph, so the ArrayList knuthParagraphs will have more than one element. In your test file, there will be: (Paragraph #0) (Paragraph #1) (Paragraph #3) If the block is split between two pages, the method LineLM.addAreas will be called twice: for example, there will be Paragraph #0 and #1 on page 1, and the others on page 2. The second time addAreas() is called iCurrParIndex is set to 0, and here is the bug, because the index stored in the LineBreakPosition lbp refers to an element in Paragraph #2! Maybe the best place to store the correct value is inside the LineBreakPositions; I'm attaching a little patch which modifies the LineBreakPosition class (and also removes an old, unused method). As per the ignored newline characters in the TextLM, I really can't remember why I wrote those lines! :-) It really looks like they shouldn't be ignored. HTH Regards, Luca Index: LineLayoutManager.java === RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java,v retrieving revision 1.37 diff -u -r1.37 LineLayoutManager.java --- LineLayoutManager.java 9 Dec 2004 15:53:36 - 1.37 +++ LineLayoutManager.java 2 Feb 2005 14:15:18 - @@ -95,16 +95,18 @@ */ private static class LineBreakPosition extends LeafPosition { // int iPos; +int iParIndex; // index of the Paragraph this Position refers to double dAdjust; // Percentage to adjust (stretch or shrink) double ipdAdjust; // Percentage to adjust (stretch or shrink) int startIndent; int lineHeight; int baseline; -LineBreakPosition(LayoutManager lm, int iBreakIndex, +LineBreakPosition(LayoutManager lm, int index, int iBreakIndex, double ipdA, double adjust, int ind, int lh, int bl) { super(lm, iBreakIndex); // iPos = iBreakIndex; +iParIndex = index; ipdAdjust = ipdA; dAdjust = adjust; startIndent = ind; @@ -139,7 +141,6 @@ private int iReturnedLBP = 0; private int iStartElement = 0; private int iEndElement = 0; -private int iCurrParIndex = 0; private KnuthNode bestDeactivatedNode = null; @@ -762,6 +763,7 @@ breakpoints.add(insertIndex, new LineBreakPosition(this, + knuthParagraphs.indexOf(par), lastElementIndex , ratio, 0, indent, lineLead + middlefollow, @@ -1337,135 +1339,6 @@ } /** - * Make a line break for returning as the next break. - * This makes the line break and calculates the height and - * ipd adjustment factors. - * - * @param prevLineEnd previous line break index - * @param target the target ipd value - * @param textalign the text align in operation for this line - * @return the line break position - */ -private BreakPoss makeLineBreak(int prevLineEnd, MinOptMax target, -int textalign) { -// make a new BP -// Store information needed to make areas in the LineBreakPosition! - -// lead to baseline is -// max of: baseline fixed alignment and middle/2 -// after baseline is -// max: top height-lead, middle/2 and bottom height-lead -int halfLeading = (lineHeight - lead - follow) / 2; -// height before baseline -int lineLead = lead + halfLeading; -// maximum size of top and bottom alignment -int maxtb = follow + halfLeading; -// max size of middle alignment below baseline -int middlefollow = maxtb; - -// calculate actual ipd -MinOptMax actual = new MinOptMax(); -BreakPoss lastBP = null; -LayoutManager lastLM = null; -for (Iterator iter = vecInlineBreaks.listIterator(prevLineEnd); -iter.hasNext();) { -BreakPoss bp = (BreakPoss) iter.next(); -if (bp.getLead() > lineLead) { -lineLead = bp.getLead(); -} -if (bp.getTotal() > maxtb) { -maxtb = bp.getTotal(); -} -if (bp.getMiddle() > middlefollow) { -middlefollow = bp.getMiddle(); -} - -// the stacking size of textLM accumulate for each break -// so the ipd is only added at the end of each LM -if (bp.getLayoutManager() != lastLM) { -if
Re: Page breaking [was: Markers added to the wrong page]
Finn Bock wrote: >Using your description as a jumping point, here is my ideas for page >breaking. I suppose it is even more pie-in-the-sky since I haven't yet >written anything about it. As I have been doing a few experiments about page breaking, I'm happy to join in this conversation. >The algorithm that the PageLM uses are a slightly modified knuth (no >need to maintain fitnessclass, and with the ability to decide on a break >when there is N pages of lookahead). The elements return from the LMs >are boxes (for lines), spaces and penalties. Note that using the box - penalty - glue representation does not necessarily involve using Knuth's algorithm. A first-fit (or best-fit) algorithm could be enough in most situations; and if there are no adjustable elements in the fo document (for exaple, lots of paragraphs without adjustable spaces between them) Knuth's algorithm becomes a best-fit. Given its higher cost, maybe it could be used only when it could really produce a better output. >The elements are not >returned from the LMs but pushed from the LM into the pageLM: > >parent.generateElement(new Space(resolveBefore()); >parent.generateElement(new Box(lineHeigth); > I'm not sure it is always possible to do this: sometimes the representation of a block depends on the properties of a higher level block. For example: outer block | - | | innerinner blockblock 12 In order to decide whether there can be a page break between the lines generated by innerBlock1 and innerBlock2, we must know: - if inner block 1 has keep-with-next - if inner block 2 has keep-with-previous - if outer block has keep-together This can be done if the outer BlockLM calls receives the elements created by its child, looks at them and adds / removes / corrects; could this be done if the inner BlockLMs directly give their element to the top-level LM? BTW, what about your great refactoring of all the knuth code? Regards, Luca
Re: Page breaking [was: Markers added to the wrong page]
Finn Bock wrote: > I would pass the element on the immidiate parent, which recursively > passes them on the top-level LM in the direction. For inline, the > toplevel would be LineLM and for blocks it would be the PageLM. Ok, I misunderstood what you wrote, now I think we were saying the same thing using different words! :-) Regards Luca
Error in computation of inline progression dimension ?
I noticed a strange behaviour concerning margins that could be related to the inheritance of start-indend and end-indent, which was discussed a few weeks ago. It seems that in some situations the margins are subtracted twice from the available inline progression dimension. In the little fo file I'm attaching there are two simple-page-masters: - in one of them, left and right margins are set inside simple-page-master itself - in the other, they are set inside region-body In both cases, the page width is 200 points, with left and right margin set to 50 points; so, the line width should be 100 points. In the method PageSequenceLayoutManager.getViewportRectangle(), the computed ipd is right when the margins are set in region-body, but it is 0 if they are set in simple-page-master, because relDims.ipd is already 100 and start- end-indent are 50. As this method has not been modified recently, the error (if this behaviour is really wrong) must be elsewhere ... Regards, Luca http://www.w3.org/1999/XSL/Format"; xmlns:svg="http://www.w3.org/2000/svg";> corto piccolo mini super qualcosa corto qui e qui precipitevolissimevolmente e poi ancora bla bla bla e ancora qualche parola. corto piccolo mini super qualcosa corto qui e qui precipitevolissimevolmente e poi ancora bla bla bla e ancora qualche parola.
Re: Error in computation of inline progression dimension ?
Jeremias Maerki wrote: >> [Glen Mazza] >> So Luca is correct that both fo:simple-page-masters >> should generate the same overall margins of 50 pt. >> each, no? > >No. :-) Ok, now I am convinced you are right. Thanks for all your explanations, I always found this part of the recommendation quite obscure! Regards, Luca
Re: [XML Graphics - FOP Wiki] Updated: PageLayout
Simon Pepping wrote: >+=== Space specifiers === >+ >+When the space specifiers resolve to zero around a page break, we are >+in the same situation as that of a word space in line breaking. It is >+represented by the sequence `box - glue - box`. I add just a few thoughts about this subject. If there cannot be a break between the two block (the first has keep-with-next || the second has keep-with-previous || their block father has keep-together), the representation can be box - infinite penalty - glue - box. >+=== Possible page break between content elements === >+ >+Here the most general situation is that when the content is different >+with and without page break: >+ * content Cn when there is no page break, >+ * content Ca at the end of the page before the page break, >+ * content Cb at the start of the page after the page break. >+ >+An example of this situation is a page break between table rows: >+ >+{{{ >+no page break:page break: >+ >+- - >+ row 1 row 1 >+- - >+ border n border a >+- - >+ row 2footer >+- - >+ page break >+ - >+ header >+ - >+ border b >+ - >+row 2 >+ - >+}}} >+ >+This situation cannot be dealt with using Knuth's box/glue/penalty >+model. Maybe there is no need to create new kinds of elements (not that it's forbidden :-) , only the fewer they are, the simpler the algorithm is). Header and footer, with their borders, are duplicated around each break if table-omit-*-at-break is false; so, at each break the total height increases by (border a + footer + header + border b) and decreases by border n. Here is a different representation which uses normal penalties; there are two rows whose bpd is h1 and h2, header and footer with bpd hH and hF, border before the footer , border after the header and border between rows hA, hB and hN. box(h1) - penalty(inf, hH + hA + hB + hF) - glue(hN) - box(h2) - box(hB + hF) - box(hH + hA) If there is no break, the overall bpd is (hH + hA + h1 + hN + h2 + hB + hF), otherwise the first piece has bpd (hH + hA + h1 + hB + hF) and the second one (hH + hA + h2 + hB + hF), and the border between the rows is ignored. The elements representing the header and its border are moved at the end of the sequence, but I don't think this is could be a real problem: the TableLayoutManager would place it at its right place when adding areas. Regards Luca
Re: [XML Graphics - FOP Wiki] Updated: PageLayout
Jeremias Maerki wrote: > However, Luca's example > does not fully resolve in my brain. The penalty, for example, must not > be infinite or it will not be eligible for break possibility. A "legal > break" is defined by Knuth as "a number b such that either > (i) xb is a penalty item with pb < infinity, or (ii) xb is a glue item > and xb-1 is a box item." As far as I can see Luca's example doesn't > contain any legal break, but I guess that was simply an oversight. Ops, you are right, the penalty should have a 0 penalty value (or maybe greater, anyway not infinite), otherwise it is completely useless to set its width, as it will never be used! > The big problem I still have with both your examples is that the table > header is very special in terms of the standard Knuth model. This model > doesn't allow for conditional items at the beginning of a line. What > Luca did in his example looks to me like forcing the model to do > something it wasn't designed for. Yes, line breaking has not something analogous to the repeated header in a table, although the elements representing spaces in centered effect (which have effect only if there is a break between them) are somewhat similar. But I think that the main point is to have a representation whose height is correct: the representation is not the content, its only purpose is to allow the creation of pages satisfying the constraints (orphans, widows, keep, ...): what to put into the pages concerns the LMs. > I'm a bit sceptical that the code will > be able to identify such special conditions reliably. I think it would not be that difficult: for example, the penalty (whose width is header height + footer height) could have a conventional Position. In the addAreas phase, if the last position in the iterator is that particular Position, the LM will know that the table has been split between pages and, according to the value of table-omit-*-at-break, will add header and footer. Regards Luca
Re: Skype-conference on page-breaking?
Jeremias Maerki wrote: >Anyway, I'd like to ask if we could hold to a brainstorming conference >call on page breaking either Sunday evening or next Monday or Tuesday >somewhere between 8:00 and 24:00 CET. Of course, on my wish list there >are Simon, Finn and Luca. I'm happy to call either of you on your normal >phone via SkypeOut if you don't have broadband. I hope I can get at >least one of you three on the line. I'v very interested in page breaking, and I would be happy to contribute. Unfortunately, I'm not much used to speaking english :-(, so I think I would be much more comfortable with the idea of communicating via written words! As I have said before (or maybe I forgot to ...) I have done a few experiments trying to use Knuth's algorithm in page braking, and I have a working implementation which handles only some block level formatting objects (blocks and lists) and simplified documents (no footnotes or floats, at the moment, and pages with equal length and width), but it has some (I hope) interesting features: for example, it is able to adjust the number of lines used for each paragraph in order to both fill the pages and avoid orphans and widows. In a few words, using the box - penalty - glue model it is possible to represent paragraphs with an adjustable number of lines. I started working on it a few months ago, and I could not keep it updated with all the changes, but if you are interested I could try and recreate these features using the most recent code. Anyway, this could be done after we have reached a basic implementation. Regards Luca
Re: Skype-conference on page-breaking?
Jeremias Maerki wrote: >Would you consider sharing what you already >have? This may help us in the general discussion and may be a good >starting point. Ok, I'll try to. The main change in the LineLM is that the line breaking algorithm does not select only the node in activeList with fewest demerits: all the nodes whose demerits are <= a threshold are used to create LineBreakPositions, so for each paragraph there is a set of layout options (for example, a paragraph could create 8 to 10 lines, 9 being the layout with fewest demerits). According to the value of widows and orphans, the LineLM creates a sequence of elements: besides "normal" lines, represented by a box, there are "optional lines", represented by box(0) penalty(inf,0) glue(0,1,0) box(0) and "removable lines" box(0) penalty(inf,0) glue(1,0,1) box(0) A few complications arise if not every possible layout allows breaks between lines, but they all can be solved using boxes, glues and penalties (for example, if a paragraph needs 3 or 4 lines, if it uses 3 it cannot be parted). The BlockLM, and a block stacking LM in general, adds elements representing its children's spaces and keep condition, for example adding a 0 penalty or an infinite penalty according to child1.mustKeepWithNext(), child2.mustKeepWithPrevious() and this.mustKeepTogether(). The PageLM, once it has the list of elements representing a whole page-sequence (or the content before a forced page break), calls the same breaking algorithm, using only a different selection method which leaves only one node in activeList. It has now a rough sequence of pages: each one may may have a positive or negative difference (with respect to the page height); the glue elements representing adjustable lines or adjustable spaces in a page are collected in different lists and they are used to "negotiate" a block progression adjustment with the LM which created them. In this phase each LineLM knows how many lines it will finally create. Then, a new sequence of elements is created, and this time each element has a fixed width (as the adjustments have already been decided). This sequence is used to create the final pages; note that if the adjustments have been enough to perfectly fill the pages, a first fit algorithm would be enough to recreate the right page breaks. This phase is needed, at the moment, because the Positions that the LineLMs store in their elements are not LineBreakPosition (as they still don't know how many lines they have to create), but maybe it could be avoided in some way ... Don't hesitate to ask for further details, I'll try to answer as clearly as possible! As per the columns, I did not think about them yet, but if they are equally wide it shouldn't be terribly hard to handle them ... Regards Luca
Re: Skype-conference on page-breaking?
Jeremias Maerki wrote: > Luca, do you think your total-fit approach may be written in a way to > handle changing available IPDs and that look-ahead can be disabled to > improve processing speed at the cost of optimal break decisions? I think that a first fit algorithm could be implemented in two different ways: 1) wait until the list of elements representing a whole page-sequence is collected, and call findBreakingPoints(); this method will call a different considerLegalBreak() method, much simpler and faster than the knuth's one. 2) start building pages little by little: the FlowLM returns elements to the PageLM as soon as one of its own child returns them Alternative 1) is much like the total fit algorithm: breaks are computed at the end of each page-sequence; even if the evaluation method is much faster than Knuth's one, there could still be a long wait in order to get the whole list. With alternative 2) the PageLM would behave much the same as it now does: as soon as a page is filled, it is possible to call addAreas. Note that the last elements in the partial sequence cannot be considered as feasible break. For example, if there is a block which creates 6 lines, the sequence will be something like: box box penalty(not infinite) box penalty(not infinite) box box and the evaluation must stop at the second penalty; only when some following elements are known it will be possible to decide whether the last two lines could be at the end of a page. If the IPD is always the same, I think the two alternatives are equivalent, and the first one is "better" because it just needs a different considerLegalBreak() method; as the output file cannot be printed until the end of the process, the only advantage of 2) could be memory usage. > That's the part where I have a big question mark about changing > available IPD. We may have to have a check that figures out if the > available IPD changes within a page-sequence by inspecting the > page-masters. That would allow us to switch automatically between > total-fit and best-fit or maybe even first-fit. If the IPD changes, I fear 2) must be necessarily used: if a block is split between pages with different ipd, only a few lines need to be recreated. Using 1), the LineLM should know how wide the lines are, but this cannot be known as page breaking has not yet started. The check could be done before starting the layout phase: if there is a change, 2) is used, otherwise 1). Maybe, the check could be even more sophisticated: for example, if the first page is different, but the following are equally wide, we could use 2) to create the first page and then switch to 1). > A remaining question > mark is with side-floats as they influence the available IPD on a > line-to-line basis. This is a question mark for me too! :-) > One thing for a deluxe strategy for book-style > documents is certainly alignment of lines between facing pages. But > that's something that's not important at the moment. I have created and implemented a new property right about this! :-) > I'd be very interested to hear what you think about the difficulty of > changing available IPD. The more I think about it, however, the more I > think the total-fit model gets too complicated for what we/I need right > now. But I'm unsure here. If changing ipd is really important and not just a theorical possibility, we could start implementing 2, and later add the check and the algorithm 1: the getNextKnuthElements() in the block-level LM could be used in both cases. Regards Luca