Re: Borders in page regions
Seifeddine Dridi wrote: > Does anybody know why it isn’t allowed to set borders in page regions ? The > XSL specs says that border-width and padding values must be 0, but I don’t > understand why it is enforcing this restriction, RenderX for instance allows > borders in page regions. You can activate the "relaxed validation mode", which allows borders and paddings on regions; it would still give you a warning (instead of a validation error), but it will render them fine. Relaxed validation is activated by using the -r option from the command line, or by setting false in the configuration file. Hope this helps Luca
Request for developer's powers on JIRA
Now that I'm back in harness, thanks to a lot of patient people, I only need the magical power to work on JIRA issues in order to be a (somewhat) useful committer. Glen, as I'm told you are FOP's JIRA administrator, could you please give the necessary privileges to my lfurini JIRA account? Do you need any additional info? Bye Luca
[jira] [Resolved] (FOP-2348) [PATCH] PDF File Attachment Extension is broken
[ https://issues.apache.org/jira/browse/FOP-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Furini resolved FOP-2348. -- Resolution: Fixed Fix Version/s: trunk Patch applied in revision r1655099 Revision r1659776 added an automatic testcase checking for this feature, so as to avoid regressions in the future. [PATCH] PDF File Attachment Extension is broken --- Key: FOP-2348 URL: https://issues.apache.org/jira/browse/FOP-2348 Project: Fop Issue Type: Bug Components: renderer/pdf Affects Versions: trunk Reporter: Matthias Reischenbacher Assignee: Luca Furini Fix For: trunk Attachments: 2348-testcase.zip, 2348.patch PDF File attachments are broken in latest trunk. I didn't investigate in detail, but I think its since rev 1537948 or 1522934. When generating a PDF with file attachments a NullPointerException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException
[ https://issues.apache.org/jira/browse/FOP-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Furini resolved FOP-2441. -- Resolution: Duplicate Fix Version/s: trunk Assignee: Luca Furini As Thanasis Giannimaras commented, this is a duplicate of another bug, now resolved. pdf:embedded-file extension is broken, gives NullPointerException - Key: FOP-2441 URL: https://issues.apache.org/jira/browse/FOP-2441 Project: Fop Issue Type: Bug Components: renderer/pdf Affects Versions: trunk Reporter: Luca Furini Assignee: Luca Furini Priority: Minor Fix For: trunk Attachments: change.diff, test_attachment.fo The extension property pdf:embedded-file (to attach files to the pdf) is not working, and generates a NullPointerException. I noticed the problem while trying to write an answer to this StackOverflow question: http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop (the question is about a different problem, but while testing on fop-trunk I noticed this bug I'm reporting). Looking at the revision history, I think the implementation of this extension has been broken since revision [1522934]. I'm going to attach a simple fo file showing the problem, together with a proposed patch. I have been a fop committer for some time, followed by a looong period of just lurking the mailing list; I tried to commit the changes myself, but I guess my long inactivity period has caused the revocation of my commit privileges. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Updated] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException
Luis Bernardo wrote: I am under the impression that committership rights are never revoked but I could be wrong. Are you sure that you can log in to your Apache account? Maybe a year ago or so Apache forced a change in passwords. Did you change your password when that happened? Yes, I remember changing the password when requested. When I try to commit I get a 403 forbidden error message after being requested username and password (by comparison, if I enter a wrong username / password I keep being asked to enter them correctly). I think my apache account still exists as I can log in to https://id.apache.org, and I'm in the commiters list at http://people.apache.org/committer-index.html#lfurini (although I'm not assigned to any svn projects). On the other hand I have been inactive for several years, so I wouldn't be surprised or offended if someone / an authomatic procedure revoked my powers ... So ... does anyone has any ideas? :-) Bye Luca
[jira] [Updated] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException
[ https://issues.apache.org/jira/browse/FOP-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Furini updated FOP-2441: - Attachment: test_attachment.fo Simple fo file to reproduce the error pdf:embedded-file extension is broken, gives NullPointerException - Key: FOP-2441 URL: https://issues.apache.org/jira/browse/FOP-2441 Project: Fop Issue Type: Bug Components: renderer/pdf Affects Versions: trunk Reporter: Luca Furini Priority: Minor Attachments: test_attachment.fo The extension property pdf:embedded-file (to attach files to the pdf) is not working, and generates a NullPointerException. I noticed the problem while trying to write an answer to this StackOverflow question: http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop (the question is about a different problem, but while testing on fop-trunk I noticed this bug I'm reporting). Looking at the revision history, I think the implementation of this extension has been broken since revision [1522934]. I'm going to attach a simple fo file showing the problem, together with a proposed patch. I have been a fop committer for some time, followed by a looong period of just lurking the mailing list; I tried to commit the changes myself, but I guess my long inactivity period has caused the revocation of my commit privileges. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException
[ https://issues.apache.org/jira/browse/FOP-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Furini updated FOP-2441: - Attachment: change.diff Proposed patch pdf:embedded-file extension is broken, gives NullPointerException - Key: FOP-2441 URL: https://issues.apache.org/jira/browse/FOP-2441 Project: Fop Issue Type: Bug Components: renderer/pdf Affects Versions: trunk Reporter: Luca Furini Priority: Minor Attachments: change.diff, test_attachment.fo The extension property pdf:embedded-file (to attach files to the pdf) is not working, and generates a NullPointerException. I noticed the problem while trying to write an answer to this StackOverflow question: http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop (the question is about a different problem, but while testing on fop-trunk I noticed this bug I'm reporting). Looking at the revision history, I think the implementation of this extension has been broken since revision [1522934]. I'm going to attach a simple fo file showing the problem, together with a proposed patch. I have been a fop committer for some time, followed by a looong period of just lurking the mailing list; I tried to commit the changes myself, but I guess my long inactivity period has caused the revocation of my commit privileges. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FOP-2441) pdf:embedded-file extension is broken, gives NullPointerException
Luca Furini created FOP-2441: Summary: pdf:embedded-file extension is broken, gives NullPointerException Key: FOP-2441 URL: https://issues.apache.org/jira/browse/FOP-2441 Project: Fop Issue Type: Bug Components: renderer/pdf Affects Versions: trunk Reporter: Luca Furini Priority: Minor The extension property pdf:embedded-file (to attach files to the pdf) is not working, and generates a NullPointerException. I noticed the problem while trying to write an answer to this StackOverflow question: http://stackoverflow.com/questions/28110607/unable-to-add-an-attachment-to-a-pdf-while-using-fop (the question is about a different problem, but while testing on fop-trunk I noticed this bug I'm reporting). Looking at the revision history, I think the implementation of this extension has been broken since revision [1522934]. I'm going to attach a simple fo file showing the problem, together with a proposed patch. I have been a fop committer for some time, followed by a looong period of just lurking the mailing list; I tried to commit the changes myself, but I guess my long inactivity period has caused the revocation of my commit privileges. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Absolute-positioned block-containers using left and bottom
On Fri, Jul 4, 2008 at 6:09 PM, Andreas Delmelle [EMAIL PROTECTED] wrote: Now, I'm wondering... In theory, it should not be too difficult to get at this info, since ultimately it is also needed when computing 'top' or 'left' if they are specified as a percentage. In that case, the value is obtained through AbstractBaseLayoutManager.getBaseLength() - getReferenceAreaIPD() or getReferenceAreaBPD(). Yeah, thank you, that was it! Right under my nose, yet I could not see it ... :-) Now everything should be ok in revision 674489. Regards Luca
Re: Absolute-positioned block-containers using left and bottom
On Wed, Jul 2, 2008 at 8:24 PM, Andreas Delmelle [EMAIL PROTECTED] wrote: If you have the area's own dimensions, and the complement properties (bottom-right), is that not enough? For the renderer: - top = (bottom - area-bpd - borders - padding)) - left = (right - area-bpd - borders - padding)) Bottom is the distance from the nearest ancestor reference area *bottom* edge, not from the top one (the same for left), so we need to know the reference bpd and ipd. So, if I read the spec right, a 10pt bottom position set on the same absolutely positioned block-c would translated into different top-positions according to the the reference bpd (= the reference bottom edge). It seems so simple... Am I missing something? *At least* one of us is missing something ;-) (It's hot here, too! ;-)) We probably need a summer team located in the southern hemisphere! Regards Luca
Absolute-positioned block-containers using left and bottom
(it's still me, I just subscribed with the gmail account I use more frequently, to avoid problems of messages not reaching the list ...) On Wed, Jul 2, 2008 at 8:24 PM, Andreas Delmelle [EMAIL PROTECTED] wrote: If you have the area's own dimensions, and the complement properties (bottom-right), is that not enough? For the renderer: - top = (bottom - area-bpd - borders - padding)) - left = (right - area-bpd - borders - padding)) Bottom is the distance from the nearest ancestor reference area *bottom* edge, not from the top one (the same for left), so we need to know the reference bpd and ipd. If I read the spec right, a 10pt bottom position set on the same absolutely positioned block-container would be translated into different top-positions according to the the reference bpd (= the reference bottom edge). It seems so simple... Am I missing something? *At least* one of us is missing something ;-) (It's hot here, too! ;-)) We probably need a summer team located in the southern hemisphere! Regards Luca
Re: Absolute-positioned block-containers using left and bottom
(I'm re-posting this message as I sent it yesterday and still cannot see it in the list archives, I hope I'm not duplicating it unnecessarily) On Mon, Jun 23, 2008 at 5:12 PM, Luca Furini [EMAIL PROTECTED] wrote: If there is a block-container with both width and height set, its position can be correctly controlled using top and left (and indeed there are many testcases checking that) but bottom and right do not have any visible effect. I've solved the bug for simple situations, but the solution is not nearly general enough to be committed. The point is: right and bottom distances need to be respectively translated into x- and y-offset at some time, and in doing this we must know the ipd and bpd of the nearest ancestor reference area, as, for example, x-offset = reference-bpd - object-bpd - right-distance My first idea was to set the offsets at the LM level, when creating areas, so that there would be no changes at all for the renderers, but I failed to find a way to obtain the nearest ancestor reference area, as areas have no parent pointer (and I couldn't even think of a nice way to find the appropriate region reference ...). So, I'm almost convinced that the bottom- and right- distances should be preserved in the area tree, and translated into offset during the rendering, where it would be possible to keep updated a nearestReferenceArea pointer just like current*PPosition is. Comments, suggestions, warnings would be most welcome, as I fear the heat of these days is making me insane! :-) Regards Luca
Absolute-positioned block-containers using left and bottom
While playing a bit with absolute positioned block container, I think I stumbled into a little bug. If there is a block-container with both width and height set, its position can be correctly controlled using top and left (and indeed there are many testcases checking that) but bottom and right do not have any visible effect. I'm attaching a simple file, whose expected output would show four colored block-container adjacently placed 2x2 (I tried another formatter, and it behaves as expected). I did not investigate any deeper, but I noticed that in the area tree xml we use only two attributes (top-position and left-position), and they are 0 when the corresponding block-container has @bottom / @right. Tomorrow I'll work on this, obviously if no one arrives first or convinces me that the right output is what we already get :-) Regards Luca ?xml version=1.0 encoding=UTF-8? fo:root xmlns:fo=http://www.w3.org/1999/XSL/Format; fo:layout-master-set fo:simple-page-master master-name=simple page-width=6in page-height=5in margin=1in fo:region-body/ /fo:simple-page-master /fo:layout-master-set fo:page-sequence master-reference=simple fo:flow flow-name=xsl-region-body fo:block font-size=48ptposition/fo:block fo:block font-size=48pt text-align=rightfo:inlineNOT/fo:inline ok!/fo:block fo:block-container absolute-position=absolute width=51pt height=30pt background-color=red top=57pt left=109pt fo:block/ /fo:block-container fo:block-container absolute-position=absolute width=51pt height=30pt background-color=yellow top=57pt right=77pt fo:block/ /fo:block-container fo:block-container absolute-position=absolute width=51pt height=30pt background-color=blue bottom=99pt left=109pt fo:block/ /fo:block-container fo:block-container absolute-position=absolute width=51pt height=30pt background-color=green bottom=99pt right=77pt fo:block/ /fo:block-container /fo:flow /fo:page-sequence /fo:root expectedOutput.pdf Description: Adobe PDF document fopOutput.pdf Description: Adobe PDF document
Re: Border and padding on page regions
On Thu, Jun 19, 2008 at 3:45 PM, Jeremias Maerki [EMAIL PROTECTED] wrote: There's both in FOP. block-container has the border on the viewport. table-cell has it on the reference area (table-cell doesn't generate a viewport). But I fear we might actually be wrong about having the border and padding on the viewport area. Ok, so the region reference is the right place for borders and padding; a posteriori it seems reasonable: the viewport defines the window, the reference area starts defining what we see ... (but I could easily convince myself of the other option too :-) ) It's interesting that we treat background and borders together in the renderers although 4.9.4 http://www.w3.org/TR/xsl11/#rend-border makes a distinction where the background is to be applied. But we don't support background-attachment so that didn't get noticed that way. I could split the matod AbstractPathOrientedRenderer.drawBackAndBorders() in two drawBackground() / drawBorders() methods, as the background trait is still in the viewport while borders and padding will be in the reference area. Thanks for the feedback (to Andreas too) Regards Luca
Border and padding on page regions
Some time ago (well, almost 2 years!) we spoke about the possibility to allow users to define borders and padding for the page regions [1]. This week I finally found some time to do it, so I have it working on my local copy ... but then I was struck by a dilemma: the additional traits about borders and padding should be set for the region viewport (class RegionViewport) of for the region reference area (RegionReference)? This sentence in the specs (4.2.2. common traits) made me decide to put them in the RegionRefernce, as the padding results in a reduction of the content rectangle bpd / ipd: Only a reference-area may have a block-progression-direction which is different from that of its parent. But (6.4.14. fo:region-body), where it says that padding and borders should be 0, also says that it's the region viewport that has margins (so it would have paddings and borders too, if allowed). Moreover, the code already present in AbstractPathOrientedRendere.handleRegionTraits() (not to mention Murphy's laws!) seem to suggest that these traits should belong to the region viewport. So, a couple of questions: - do we still think that supporting borders and padding on regions when relaxed validation is on would be something good (or, at least, not bad)? - is RegionViewport the right place for the additional traits? Regards Luca [1] http://www.nabble.com/Re%3A-svn-commit%3A-r225580xmlgraphics-fop-trunk-test-layoutengine-testcases-page-master4.xml-to511937.html#a511937
Re: Border and padding on page regions
On Thu, Jun 19, 2008 at 1:26 PM, Luca Furini [EMAIL PROTECTED] wrote: Only a reference-area may have a block-progression-direction which is different from that of its parent. Ops, I realize only now that it says direction and not dimension :-) Ok, so I think this definitely means that the traits should be in the region viewport ... Sorry for the noise! Regards Luca
Re: svn commit: r668177 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/layoutmgr/ test/layoutengine/standard-testcases/
On Mon, Jun 16, 2008 at 4:52 PM, [EMAIL PROTECTED] wrote: Fixing the PageBreakingAlgorithm, replacing calls to getLineWidth() with getLineWidth(int) so as to take into account each page's real height. This fixes the positioning of footnotes when the page bpd is not the same for all pages. This was a little nasty bug I stumbled upon a few days ago, and it took me some time to track back where the problem was ... The PageBreakingAlgorithm, in particular the computeDifference() method, had some calls to getLineWidth() without parameters and some other with an int parameter indicating the page. When the page bpd changes from page to page, the two methods returned different values, with the effect that the algorithm first believed it could place a whole footnote in the page, and then found out that this led to an overflow. In order to avoid similar problems, the parameter-less getLineWidth() method could maybe be deprecated? Regards Luca
Re: Checking: difference between negative stretch and positive shrink?
Andreas L Delmelle wrote: Just wondering about some KnuthSequences for spaces I noticed during a debug-session: glue w=0 stretch=10008 shrink=0 penalty w=0 p=0 glue w=3336 stretch=-10008 shrink=0 What does it mean that the latter glue can be stretched by a negative amount? Why not: glue w=3336 stretch=0 shrink=10008 Is there a difference as to how the algorithm treats these? Negative stretch is not the same as a positive shrink (and vice-versa): a negative stretch is used to cancel (or diminish) a positive one provided by some other elements: for each possible break point, however, the overall stretch / shrink should always be = 0. The meaning of the mini-sequence above is: - if there is a break at the penalty element, there is some stretch for the line ending there - otherwise, the overall stretch is zero This is with unjustified text to give each line the same amount of stretch, so that the algorithm should build lines with similar length (while in justified text a line with many spaces and few letters could be stretched a lot). HTH Luca
Re: svn commit: r557347 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/layoutmgr/inline/ test/layoutengine/ test/layoutengine/stan
(I see that Jeremias agrees with Andreas about how to interpret the nested keeps, so I reply just once) Andreas L. Delmelle wrote: In very rough terms, the logic behind it would be: if a given break #1 has a plain adjustment ratio of 3 and a governing keep of auto, and the next break #2, regardless of its adjustment ratio, would violate a keep-constraint with a higher value, = apply a 'correction factor' to the demerits of break #1 so that the influence of the plain adjustment ratio is significantly reduced, and it will be considered a good break in spite of its being way too short. If the effective demerit computation of break #2 takes into account the fact that that break itself would violate a keep (other than auto), its demerits would in turn be artificially increased, so if eventually presented with a choice, the algorithm would prefer the first break over the second. No idea if this would be feasible in practice though... double effectiveRatio = (adjustmentRatio * keepRatio); where keepRatio is a double value based on the keep-values, that would either leave the adjustmentRatio as is (= 1), or increase/ reduce the influence of the plain value. Seems a good idea! Maybe we could *add* something to adjustmentRatio, so we could be quite sure that a break violating a keep suddenly becomes quite ugly even if its original ratio would be 1. I am trying to imagine if we have to handle in a different way breaks violating a keep and having a ratio 0 ... maybe these are the ones that really needs to be depracated. Satisfying a keep condition without overflowing involves the creation of *short* lines, as what we want to avoid is violating a keep just to fill a line better. So: - probably there is no need to penalize a break violating a keep if it has a high ratio: successive breaks will surely be better, whether they still violate a keep or not - a long break, on the other hand, should be used only if there isn't any other alternative (for example, because the content of the keep is a single word longer than a line), so they should not enter the record structure, but only saved as a lastTooLong solution. So, what we need could be just correct the violating breaks having a negative ratio, so that it becomes -1 and would not become an active node (unless there is a restart, in which case it's ok). ... but I'm basically writing down things as I think, so I could be missing some important point! One last quick note concerning nesting: what if the inner inline had a *lower* force? feasible breaks in it should be first given a high penalty (to see if we can put everything together) and then a lower one (so the penalties should hold a nested chain of force values)? Indeed, easy enough if the parent inline has a keep of always [...] One might argue that by violating the inner keep, we would also violate the outer keep, and as such it makes no difference where the violation occurs... On the other hand, I wonder whether the use-case for such combinations would not precisely be to indicate to the formatter: If violating a keep is unavoidable, then this is where it should preferably happen... Ok, you are for the stronger keep wins interpretation, and now I'm convinced too. At first, my doubt was that in this way we have an explicitly specified value that gets overridden by another one specified on an *ancestor* node. This seemed to me a strange exception to the general principle that an explicit value overrides an inherited one, until I realized this is quite similar to the space resolution rules, where the space set for an object could win against one set in one of its descendants, because of its higher precedence. So, now I think that an inner keep with lesser force should not have any effect. Luca
Re: svn commit: r557347 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/layoutmgr/inline/ test/layoutengine/ test/layoutengine/stan
Andreas L. Delmelle wrote: That's one detail I was still unsure about. Only if the other factors remain identical, the algorithm would prefer a break at penalty 50 over one at penalty 100... but if the value of the penalty is only of marginal influence as you suggest, then this would indeed not be enough. I made some quick computation to see how the demerits change according to the penalty value and the adjustment ratio (see the table in the attached pdf). It seems that the penalty value is highly relevant as long as the adjustment ratio varies between -1 and 1 (i.e. we are choosing among breaks that are quite good), but rapidly becomes less and less important as the adjustment ratio grows. For example, in order to let the algorithm prefer a break with ratio 2 (and no penalty value) to a different one with ratio 1.5, the penalty value for the second penalty should be at least 500; and no penalty value can make the algorithm prefer a break with ratio = 3 to another one with ratio = 2.5 (as we use 1000 as infinite penalty). In the example I posted earlier: fo:block Some text with auto keep-constraint fo:inline keep-together.within-line=100 Some text with a keep.within-line constraint of 100 fo:inline keep-together.within-line=500 keep.within-line=500 /fo:inline Some more text in the first nested inline /fo:inline More text after the first nested inline. /fo:block The acceptable set of breaks may turn out to give a result like (with '|' = the end-boundary of the line) Some text with auto keep-constraint |(1) Some text with a keep.within-line constraint of|(2) 100 keep.within-line=500 |(3) Some more text in the first nested inline |(4) More text after the first nested inline|(5) Only the third and the fourth line I'm still unsure about. May the content in the fourth line be broken itself? I'm quite unsure about the third line too, as the outer keep constraint affects also the space between the inner inline and Some more text ... so break #3 is violating the keep and even creating a very short line. I would probably expect something like Some text with auto keep-constraint |(1) Some text with a keep.within-line constraint of|(2) 100 keep.within-line=500 Some more text in |(3) the first nested inline More text after the|(4) first nested inline|(5) where the inner keep with higher force is fully satisfied, and the outer one is violated twice (breaks #2 and #3). But maybe the algorithm could still prefer Some text with auto keep-constraint Some|(1) text with a keep.within-line constraint of |(2) 100 keep.within-line=500 Some more text in |(3) the first nested inline More text after the|(4) first nested inline|(5) where the outer keep is violated thrice (breaks #1, #2 and #3). So, maybe my sketched strategy would respect the keep priority (the inner keep will never be violated by a lower force one) but does not find the minimal set of violations. Or, at least, not if minimal means just fewer violations; if it could be interpreted as fewer violations producing a good-looking result it could be ok. One last quick note concerning nesting: what if the inner inline had a *lower* force? feasible breaks in it should be first given a high penalty (to see if we can put everything together) and then a lower one (so the penalties should hold a nested chain of force values)? This fuzzy logic is complicated! :-) Luca demerits.pdf Description: Adobe PDF document
Re: svn commit: r557347 - in /xmlgraphics/fop/trunk: ./ src/documentation/content/xdocs/ src/java/org/apache/fop/fo/ src/java/org/apache/fop/layoutmgr/inline/ test/layoutengine/ test/layoutengine/stan
Firstly, hi all! It has been quite a long time since I last posted or committed anything, but I'm still here!. :-) Then, congratulations for all the great progresses fop is making! And finally, concerning the keeps ... Andreas L. Delmelle wrote: [inserting penalties with higher value to represent numeric keeps] This should steer the line-breaking algorithm in the right direction to satisfy all keep constraints, IIC. The only big difference compared to an auto keep-constraint, if I judge correctly, would then be that we would somehow have to use penalties to represent all legal break-opportunities. Instead of glues being considered as feasible breakpoints, they would always be preceded by a zero-width penalty having a value corresponding to the keep-constraint governing the base FO. I'm not sure the steering capability of penalty values would be enough to get the prescribed result [section 4.8 Keeps and breaks (particularly the last paragraph)]: the algorithm could still prefer violating a keep with force = N to satisfy some keeps with force N, as, IIRC, the demerits ultimately depends much more on the necessary stretch / shrink than on the penalty value. I think that the breaking algorithm could be performed one time for each distinct force value. Something like this: lastConfirmedBreaks = ... the set of breaking points considering only always keeps ArrayList forceLevelList = findForceLevels(sequence); // in the reversed order int forceLevelIndex = 0; boolean tryAgain = true; while (tryAgain forceLevelIndex = forceLevelList.size() - 1) { revisedSequence = setPenaltyValue(seq, lastConfirmedBreaks, forceLevelList.get(i), HIGH_PENALTY_VALUE); ... compute the set of breaking points for revisedSequence if (... they are still acceptable) { lastConfirmedBreaks = ... these ones i ++; } else { tryAgain = false } } - in the sequence, keeps having force = always would be represented by +INF penalties; - keeps with numeric force start with a 0 penalty value; - the method setPenaltyValue() sets those with the given force a high value, and those with greater force (which have not been chosen as breking points, this is why we must pass the computed breaks too) are set to +INF so we are sure they would not be violated If this approach is correct, the key point would be how to decide whether or not the computed set of breaks is still acceptable ... Hope this helps ... Luca
Re: Footnotes in the float branch
Vincent Hennebert wrote: Hi Luca, Hi! I had a look at your patch and have several comments: - I see you re-enabled the noBreakBetween method; I don't think it's a good solution because it artificially prevents some nodes to be created, which even if bad may be necessary for some complex documents. See for example the attached fo file. Right, it is quite an unlucky document! Anyway, I still think that a footnote should be placed in a page following its citation only if there isn't really any other option: for example, the citation is inside a large block of unbreakable text, and the footnote itself is a large unbreakable block, and their cumulative height is taller than the page height (a situation that will surely happen sooner or later ;-) , but is quite more unlikely than your example). I think your example would not look so bad in the context of a page with some book-like width and height: yes, there would be quite a large space between the last content line and the first footnote line, but I think many users would prefer such an output to one having the footnote placed in the following page. I also documented a similar problem on the wiki [1]. While it makes the testcases work it actually creates some bad layout in other cases. The one in the 4.1 / footnote section? It's a very interesting one, although I think it's quite another story. While in the previous example we have two valid options, and the algorithm chooses the ugliest one, here we have the algorithm a priori discarding the option that would be the best one. I think we could call this a bug, as there can be no doubt concerning what a user would expect. I'm attaching [check; double check; look again; yes, it's there!] a testcase showing this kind of layout problem. Trunk leaves an empty space between the last content line and the first footnote one, the float branch places two more content lines, filling the empty space, and the patched branch behaves the same way. [snip on the other good remarks] My feeling is that the Knuth algorithm can nicely handle such problems already as is. It's just a matter of defining the right demerits for deferred footnotes, and give a chance to too-short nodes with non-deferred footnotes to be considered WRT normal nodes with deferred ones. Demerits could not be enough: if there isn't any object with some stretch or shrink and the footnotes / floats do not fit exactly in the page but the content lines do, too-short nodes will only be considered when there is a restart and there isn't any deactivated node. Maybe we should be less restrictive on the ratio-based selection criterion. I seem to remember that there was also a problem with flushing floats on the last page (footnotes were unnecessarily deferred). I'd have to dig deeper into that. I'll try to illustrate my ideas in a patch in the next days. Ok, I'm looking forward to see it! Regards Luca footnote_positioning_6.xml Description: application/xml footnote_positioning_6.patched.pdf Description: Adobe PDF document
Footnotes in the float branch
Hi all I recently had the time (and the pleasure) to look at before-float implementation branch, and I played a bit with it. I focused on the handling of footnotes, as I noticed that sometimes they were placed on a page following their citations without a real necessity to do it; as I wrote some time ago (and I rememeber there was some consesuns on this) this behaviour is acceptable for before floats, but is probably not what a user would expect for footnotes. I have tried to fix this in the PageBreakingAlgorithm, computing a minimum required index for footnotes, so that no page break will be considered that unnecessarily defers some old footnotes to the next page. I'm attaching a diff file showing the changes (or maybe should I just apply it?); after applying the patch, there are 4 more passing testcases (foonote_footnote-separator, footnote_large, footnote_positioning_{4,5}) and no regressions. Testcases footnote_positioning_{2,3} still generate some run-time exception, and in the next days I'm going to see what's wrong with them. I add just a few comments about the new classes: I must admit that it took me a while to see and understand the interaction between the PageBreakingAlgorithm and the Footnotes / BeforeFloats Record, together with their inner Footnotes / BeforeFloats Progress. In particular, at the beginning I thought the *Progress classes were just convenience classes to get pieces of footnotes and floats without directly fiddling with element lists, and I found only later that their methods can actually create new active nodes. Another thing that I find a bit strange is that the PageBreakingAlgorithm does not directly interact with the before floats, as the calls to BeforeFloatsProgress.consider() are hidden in the FootnotesProgress class. So, I was wondering whether it wouldn't be more clear to have the PageBreakingAlgorit control all the node creation logic, after having accessed information about footnotes and floats that could be placed in the page via the helper classes. WDYT? Regards Luca
Re: Footnotes in the float branch
On Mon, 26 Mar 2007, Luca Furini wrote: I'm attaching a diff file showing the changes Well, *now* I'm attaching bla bla :-) Regards LucaIndex: src/java/org/apache/fop/layoutmgr/breaking/FootnotesRecord.java === --- src/java/org/apache/fop/layoutmgr/breaking/FootnotesRecord.java (revision 521755) +++ src/java/org/apache/fop/layoutmgr/breaking/FootnotesRecord.java (working copy) @@ -91,6 +91,21 @@ addSeparator(); } } + +/** + * + */ +public void handleDeferredFootnotes(int requestedLastIndex) { + boolean separatorAlreadyAdded = (alreadyInserted.getLength() 0); + // check if we must add more footnotes + while (lastInsertedIndex requestedLastIndex) { + next(); + } + // if needed, add the separator + if (!separatorAlreadyAdded alreadyInserted.getLength() 0) { + addSeparator(); + } +} /** * If the current page is a float-only page, handles the splitting of the last Index: src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java === --- src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(revision 521755) +++ src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(working copy) @@ -545,6 +545,17 @@ log.debug(Could not find a set of breaking points + threshold); return 0; } +// lastDeactivated was a good break, while lastTooShort and lastTooLong +// were bad breaks since the beginning; +// if it is not the node we just restarted from, lastDeactivated can +// replace either lastTooShort or lastTooLong +if (lastDeactivated != null lastDeactivated != lastForced) { +if (lastDeactivated.adjustRatio 0) { +lastTooShort = lastDeactivated; +} else { +lastTooLong = lastDeactivated; +} +} if (lastTooShort == null || lastForced.position == lastTooShort.position) { if (isPartOverflowRecoveryActivated()) { if (this.lastRecovered == null) { Index: src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java === --- src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java (revision 521755) +++ src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java (working copy) @@ -1285,7 +1285,12 @@ (new KnuthGlue(lineStartBAP, 0, 0, new LeafPosition(this, -1), false)); } else { +// the first penalty is necessary in order to avoid the glue to be a feasible break +// while we are ignoring hyphenated breaks hyphenElements.add +(new KnuthPenalty(0, KnuthElement.INFINITE, false, +new LeafPosition(this, -1), false)); +hyphenElements.add (new KnuthGlue(0, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(this, -1), false)); hyphenElements.add Index: src/java/org/apache/fop/layoutmgr/PageBreakingAlgorithm.java === --- src/java/org/apache/fop/layoutmgr/PageBreakingAlgorithm.java (revision 521755) +++ src/java/org/apache/fop/layoutmgr/PageBreakingAlgorithm.java (working copy) @@ -77,7 +77,7 @@ /** * Are footnotes-only pages allowed? */ -public static final boolean FOOTNOTES_ONLY_PAGES_ALLOWED = true; +public static final boolean FOOTNOTES_ONLY_PAGES_ALLOWED = false; /** * Additional demerits for an underfull page, which however has an acceptable fill ratio. @@ -115,6 +115,14 @@ private BeforeFloatsRecord beforeFloatsRecord; private FootnotesRecord.FootnotesProgress footnotesProgress; private BeforeFloatsRecord.BeforeFloatsProgress beforeFloatsProgress; +// number of new footnotes met since the last feasible break +private int newFootnotesCount = 0; +// the method noBreakBetween(int, int) uses these variables +// to store parameters and result of the last call, in order +// to reuse them and take less time +private int storedPrevBreakIndex = -1; +private int storedBreakIndex = -1; +private boolean storedValue = false; private ActiveNodeRecorder activeNodeRecorder = new ActiveNodeRecorder(); @@ -682,6 +690,7 @@ if (box instanceof KnuthBlockBox
Re: Before floats + footnotes
Vincent Hennebert wrote: I've had a quick look, that's not handled currently. At some place in the code the space-before set on the separator is converted into a MinOptMax(opt, opt, opt). If I remember correctly, the separator bpd is taken from the generated area (so there isn't any stretch or shrink) instead of the element sequence. I think I (?) did so after a few unsuccessful tries to get the dimension in a better way. Anyway, even if defining some elastic height for the separator would certainly help improve the situation, that's not something we can expect from users. I agree, the algorithm should be able to handle the most common situation without any special hint. I think that, after all, this could be fixed just by checking some additional condition before calling handleNode() the first time (when footnotes and before floats are not not taken into account). Not sure it's that simple. Is a page containing only two lines of normal text with no deferred footnote more desirable than a full page with a deferred footnote? I think that might disturb the reader as well. That's why I got the idea of a minimum fill ratio. But obviously, as this is currently implemented that's not enough. I'll think more about it. You are right, there should probably be an upper limit to the amount of footnotes, and probably a user-configurable limit would be the ideal solution, so that the users could set it in the document using an extension property. While the old code forbade pages with too few footnotes, the minimum fill ratio avoids pages with too few content lines: we should find a way to combine these techniques, without eliminating each possible solution! :-) Regards Luca
Re: Before floats + footnotes
Vincent Hennebert wrote: I don't think there is much you can do in that case. It appears that the 15 lines of text at 12 pt exactly fill the 3 inch-high page. So that makes a feasible node which is always preferred to too-short nodes. Change the page-height to 3.1 inch and you no longer have the footnotes deferred to the next page. That's exactly why I introduced the MIN_NORMAL_PAGE_FILL_RATIO constant in PageBreakingAlgorithm: to give a chance to underfull pages with no deferred floats to be preferred over full pages with deferred ones. Keep the page-height of 3 inches and change that constant to 0.9 and you have your footnotes back on the first page. I agree that, in a sense, the error is in the fo file, defining a fixed-height footnote separator that does not fit well the page; with an heigth of 12 pt, all would be ok. In alternative, it could be defined using a min-opt-max line height or space-*, allowing for some stretch and shrink (not sure this would be handled correctly at the moment, but this is not the point). But I don't agree with you when you say that makes a feasible node which is always preferred to too-short nodes. I'm not at all convinced that it is a feasible break. Even if the FO recommendation says that the footnote body could be placed in a page following the one with the anchor, I think it should be read in a restrictive interpretation, deferring a footnote only if there isn't any possible alternative. In this case, it is possible to place the footnote in the same page that contains its citation, so I think that the algorithm should not be allowed to prefer a break that defers it. Note [:-)] that the footnote could appear after *many* pages, if there are lots of 12pt-high lines of normal text. In this respect, from a user perspective, footnotes and before floats are quite different: while it's completely acceptable for a figure or a table to be placed in a page following the text referring to it, I'm sure most users would be quite disappointed to find out that a footnote has been unnecessarily deferred. So, while I think the idea of the page fill ratio is very good for the placement of before floats, I think footnotes should have a different handling, a preferential treatment limiting deferments to the extremely unlikely case of an unbreakable group of lines with a lot of footnotes, a few of which does not fit in the page (or some other extreme situations). The actual problem IMO is to define the right demerits for underfull pages and deferred before-floats and footnotes in order to have a decent result (i.e., that a human would expect) in every case. I don't think it would be enough: the expected break (the one with both footnotes on page 1) is a short solution and is not recorded, it just updates lastTooShort. As long as there is not a restart (and having just 12pt-high lines it will never happen), it doesn't have a chance to be used. I think that, after all, this could be fixed just by checking some additional condition before calling handleNode() the first time (when footnotes and before floats are not not taken into account). Regards Luca
Before floats + footnotes
Hi all! At long last, I'm finally allowed some time to look at the float branch and ... wow! Really impressive, a great lot of good work! In order to apologize for my long absence :-) , I'm trying to see what's wrong with the failing testcases, in particular the ones with footnotes. Looking at the behaviour of the page breaking algorithm during the processing of testcase footnote_footnote-separator, I found out that: - the right page break (12 lines of content, some space, the separator and 2 footnote lines) does not create a new active node, it just updates lastTooShort; this is right, as there are no stretchable elements and the resulting adjustment ratio would be +inf; - but then, instead of having a restart, new active nodes are found that fill the page but push the footnotes in the following page. I'm going to see how best to fix this behaviour ... obviously if nobody else is quicker than me! :-) Regards Luca
Fix for bugs 41019 + 41121
Hi all I have a patch fixing bugs 41019 and 41121, for both trunk and float branch, and I'm wondering how it's best for me to proceed in order to avoid merging problems: should I change both trunk and branch, or just one of them? The patch is extremely simple and does not break any testcase: I only had to adjust the checks in a testcase because of the different line breaks. However, it adds some three lines to the TextLM, so maybe it's better if I wait for Simon to apply his unicode breaking changes? I'm attaching the patches, just to let you see if they interfere with someone else's work-in-progress. (sorry for repeating what I wrote some time ago, but I have experienced some e-mail problems and I probably lost some messages) Regards LucaIndex: test/layoutengine/standard-testcases/block-container_content_size_percentage.xml === --- test/layoutengine/standard-testcases/block-container_content_size_percentage.xml (revision 486106) +++ test/layoutengine/standard-testcases/block-container_content_size_percentage.xml (working copy) @@ -61,9 +61,9 @@ !-- from the spec: If that dimension is not specified explicitly (i.e., it depends on content's blockprogression-dimension), the value is interpreted as auto. -- !-- The 10% are ignored in this case. -- -eval expected=28800 xpath=//flow/block[2]/@bpd/ !-- 2 lines -- +eval expected=43200 xpath=//flow/block[2]/@bpd/ !-- 3 lines -- eval expected=10 xpath=//flow/block[2]/@ipd/ -eval expected=28800 xpath=//flow/block[2]/block[1]/block[1]/@bpd/ +eval expected=43200 xpath=//flow/block[2]/block[1]/block[1]/@bpd/ eval expected=5 xpath=//flow/block[2]/block[1]/block[1]/@ipd/ !-- absolute -- @@ -76,9 +76,11 @@ !-- from the spec: If that dimension is not specified explicitly (i.e., it depends on content's blockprogression-dimension), the value is interpreted as auto. -- !-- The 10% are ignored in this case. -- -eval expected=43200 xpath=//flow/block[4]/@bpd/ !-- 3 lines -- +eval expected=57600 xpath=//flow/block[4]/@bpd/ !-- 4 lines -- eval expected=10 xpath=//flow/block[4]/@ipd/ -eval expected=43200 xpath=//flow/block[4]/block[1]/block[1]/@bpd/ +eval expected=28800 xpath=//flow/block[4]/block[1]/block[1]/@bpd/ !-- the first 2 lines ... -- eval expected=5 xpath=//flow/block[4]/block[1]/block[1]/@ipd/ +eval expected=28800 xpath=//flow/block[4]/block[1]/block[2]/@bpd/ !-- ... and the other 2 lines -- +eval expected=5 xpath=//flow/block[4]/block[1]/block[2]/@ipd/ /checks /testcase Index: src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java === --- src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java (revision 486104) +++ src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java (working copy) @@ -1285,7 +1285,12 @@ (new KnuthGlue(lineStartBAP, 0, 0, new LeafPosition(this, -1), false)); } else { +// the first penalty is necessary in order to avoid the glue to be a feasible break +// while we are ignoring hyphenated breaks hyphenElements.add +(new KnuthPenalty(0, KnuthElement.INFINITE, false, +new LeafPosition(this, -1), false)); +hyphenElements.add (new KnuthGlue(0, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(this, -1), false)); hyphenElements.add Index: src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java === --- src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(revision 486104) +++ src/java/org/apache/fop/layoutmgr/BreakingAlgorithm.java(working copy) @@ -545,6 +545,17 @@ log.debug(Could not find a set of breaking points + threshold); return 0; } +// lastDeactivated was a good break, while lastTooShort and lastTooLong +// were bad breaks since the beginning; +// if it is not the node we just restarted from, lastDeactivated can +// replace either lastTooShort or lastTooLong +if (lastDeactivated != null lastDeactivated != lastForced) { +if (lastDeactivated.adjustRatio 0) { +lastTooShort = lastDeactivated; +} else { +lastTooLong = lastDeactivated; +} +} if (lastTooShort == null || lastForced.position == lastTooShort.position) { if
LineBreakUtils compilation error?
I've just updated my local copy of trunk and rebuilt. At first, I could not be able to successfully complete the compilation, as I received an error concerning the file src/java/org/apache/fop/text/linebreak/LineBreakUtils.java non containing the class org.apache.fop.text.linebreak.LineBreakUtils. Indeed, the reported package inside the file was org.apache.commons.text.linebreak after changing it to org.apache.fop.text.linebreak ant ended with a success. Is this a little oversight, or I simply wasn't following the right compilation procedure? Regards Luca
Re: UAX#14 implementation
Manuel Mall wrote: After making the appropriate adjustment to the checks in that testcase ALL testcases are now passing! Wonderful! I'm really looking forward to see this great new feature! Just a couple of doubts concerning the differences with respect to the old implementation (I must confess I read the Unicode Annex quite quickly ...): Just discovered the first instance of an existing testcase which gives a different result. Under UAX#14 the following text (Note this is plain text not FO markup!): text-align=center .conditionality=retain linefeed-treatment=preserve. which appears in inline_border_padding_conditionality_2.xml has only a single break opportunity which is before the word linefeed-treatment. The space between center and .conditionality is not a break Does this happens because that space is just before a .? Another doubt: why aren't the - signs in text-align and linefeed-treatment possible breaks? Regards Luca
Re: DO NOT REPLY [Bug 41019] - Left-align oddness with long, unbreakable strings following
Vincent Hennebert wrote: I'd have to think more about it, but: - perhaps the compareNodes method should compare the line/page numbers for each node rather than the index in the Knuth sequence. Or some mixing of the two. The index can tell us which node allows to lay out more content, the line number ... I am not able to see it as a very informative measure ... - if you restart using the last deactivated node you are sure that immediately after that you'll have to restart using the last too-short/too-long node, because no feasible break will be found (otherwise the list of active nodes wouldn't have been emptied). Yes, but I think we have a significant difference: in the first case we will have N good lines, a bad line and maybe some other good lines; in the second we have N-1 good lines, a quite-bad one (either too long or too short), then a bad one and finally some good ones. I've preparared a very small patch fixing a couple of things: - the TLM add a zero-width infinite-value penalty to forbid breaks at the glue elements used for left/right aligned text (I'm going to check if a similar fix is needed elsewhere in the code) - the BreakingAlgorithm uses (if possible) lastDeactivated instead of either lastTooShort or lastTooLong. The patch is just a dozen of lines long, and it was easy to apply it to the float branch. How should I proceed? Apply it to both trunk and branch? Only to the branch? I'm also going to mark bug 41121 as a duplicate of 41109, as the problem is exactly the same: the algorithm restarts from a very bad break instead of a good one (in that case, after the first word). Regards Luca
Re: DO NOT REPLY [Bug 41019] - Left-align oddness with long, unbreakable strings following
Chuck Bearden wrote: If in a left-aligned block some typical text words are followed by a string longer than the line-length and containing no spaces (e.g. a long URL), then the foregoing text will have premature line breaks, i.e. halfway to two-thirds the way into the line. I had a look at this, and what I found out is that the strange-looking lines are the combined effect of three different problems. So, sorry in advance for the long post, but breaking is never an easy matter! :-) 1) TextLM breaks the text even when a / or a - is found, handling them as hyphenation points with the usual sequence of glue + penalty + glue elements. The LineLM tries, in the first instance, to avoid using hyphenation points, so the penalty is not taken into account. But this has the side effect of using the first glue element as a feasible break (if the penalty were a feasible break too, it would surely be a better one, such avoiding the glue to be effectively chosen). This is probably the smaller of the problems, and can be solved just adding an infinite penalty before the first glue element. But maybe we want to prevent this breaking to happen, as we can now use zero-width-spaces to explicitly insert breaking positions? 2) The presence of an inline object larger that the available width makes the algorithm to deactivate all the active nodes and then restart with a second-hand node, as no line can be built that does not overflow. The restarting node was chosen, in BreakingAlgorithm.findBreakingPoints(), between lastTooShort and lastTooLong, neither of them being a good breaking point. There is a lastDeactivated node chosen among the deactivated nodes but it was not used. A deactivated node previously was an active one, so it is surely better than a node who failed to qualify; replacing either lastTooShort or lastTooLong (according to the adjustment) with lastDeactivated leads to a better set of breaks. However, this in not enough. The attached file small.20.pdf shows the result after fixing these first two problems. 3) At the moment, the LineLM can call findBreakingPoints() up to three times, the last one with a maximum adjusting ratio equal to 20. I came to the conclusion that this is really TOO much. I tried stopping after the second call (with max ratio = 5) and the result is much better (see attached file small.5.pdf). A high maximum adjustment ratio means that the algorithm is allowed to stretch spaces a lot in order to find a set of breaks which is *globally* better; this means that it can choose some not-so-beautiful breaks in order to build a set spanning over a larger portion of the paragraph. In our example: there can be a break just before the long url (a line ending after Consider:) only if we use an enormous adjustment ratio. With a smaller, more appropriate threshold, Consider: can no more end a line, so the algorithm will restart from a previous point. In conclusion: the first two items are easily fixed, and I'm going to commit the changes in the afternoon (in there are no objections); concerning the question of the automatic break at /- characters, I'll probably leave the code unchaged for the moment, until we decide what is best. Concerning point #3, I'm going to have a closer look at the restarting mechanism ... Regards Luca small.20.pdf Description: Adobe PDF document small.5.pdf Description: Adobe PDF document
Re: XSL-FO 2.0 workshop in Heidelberg next week
Jeremias Maerki wrote: If anyone has any requirements for XSL-FO 2.0 which I should bring up at the workshop in Heidelberg next week, please let me know. Deadline 2006-10-16 so I have time to prepare. Luca, are you going, too? How do you travel? Yes, I'm going. I think I'll travel by train, but I haven't fixed all the details yet. I was waiting for more precise news to appear on the workshop site, but there have been no recent updates ... I should really start deciding anyway! I think I'll end up arriving the day before the workshop, and probably leaving the day after it, so we could find plenty of time to chat about fop. Regards Luca
Re: Necessary conditions to defer footnotes
Vincent Hennebert wrote: there is something I don't get with the handling of footnotes. When there is not enough room on the current page to place all the footnotes, the algorithm tries to find a place where to split them. But there is a condition: it must be possible to defer old footnotes (PageBreakingAlgorithm, l.332). And this is possible only if there is no legal breakpoint between the previous active node and the currently considered breakpoint (checkCanDeferOldFootnotes method). I don't understand this latter condition? This is to avoid keeping deferring part of the old footnotes when there is no real need to do it. Let me explain with an example: let's pretend we have a long footnote, which cannot be wholly placed on the same page where its citation is; so, when we start building the following page we should try to place all the remaining old footnote lines, if this is possible. However, it can happen that the breaking algorithm, without this check, prefers filling the page with normal lines, such placing just a single footnote line and deferring the others to the next pages. For example, the footnote has 10 lines, and 3 are placed on the first page while the others are deferred one first time as there is not enough space for them; without this condition, it could happen that if there are no new footnotes (which would force a flush of the old one) the algorithm places just a single footnote line in the following seven pages, filling the remaining space with normal lines, while we want the footnote to be deferred again only if there is no way to place lines 4 to 10 together. And, reading the code, I don't understand if this method's purpose is to determine if it is /allowable/ to defer footnotes (am I authorized to defer footnotes if any), or if it is /possible/ (are there footnotes to defer). Ok, this is a bit subtile, but understanding that would help me get the intent of the algorithm. The former one, the method purpose is to determine if the algorithm is allowed to break the foonote once again, which can happen only if we have added only the slightest bit of normal lines () and the remaining space is not enough. If there are no old footnotes the method return false (which is maybe not very clear), but has no effect. HTH Luca
Re: Error message: Should be first
Jeremias Maerki wrote: One of my clients reported to me that he gets a Should be first error message on the log. This happens in (Page)BreakingAlgorithm.removeNode(). I get the impression that the code there is not finished rather than that is a real error condition. I'll try to extend removeNode() so it really removes the disabled node. That's quite strange ... The reason why the to-be-removed node should be the first one is this: active nodes are ordered by line (page) number and by index of the element where the feasible break can happen, so, for example, a node representing a break for page 13 at element #150 is (or at least it should be) before a break for page 13 at element #152; a node is removed when it is too far from the current feasible break being evaluated (or, in other words, from the node and the current position there is too much content to be placed in a single line / page), so in normal situations nodes are removed in order: for example, if we are evaluating a break at element #180, and we are too far from the node representing the break for page 13 at element #152, we will have already removed the node representing a break at page 13 element #150 (as it will be farther from the current element); this could be no more true when there are footnotes: for example the break at element #152 could represent a page where we have placed one more normal line in page 13, but fewer footnote lines with regard to the break at element #150, so the node coming first allows to place more content than the following one, and we could need to remove the node at #152 *before* the one at element #150 However, this does not explain why this warning shows in what appears to be a very simple document. I'm going to have a closer look ... Regards Luca
Re: Error message: Should be first
I've had another look at this. A few debug outputs shows that the error arises when trying to remove the node KnuthNode at 734 4527603+682968-135942 line:10 prev:687 dem:11527.971465493918 while the list of active nodes contains [ KnuthNode at 734 4527603+682968-135942 line:10 prev:683 dem:11513.226030457132, KnuthNode at 734 4527603+682968-135942 line:10 prev:687 dem:11527.971465493918, ] This removal, however, happens at the end of the algorithm, when the best layout is chosen (just like Vincent pointed out), and in this situation a node could be rightly be removed even if it's not the first one. We could maybe add a boolean parameter to removeNode(), stating whether it is allowed to remove the nodes out of order or not, and only the calls in filterActiveNodes() would have it true. HTH Luca
Re: keep...=always and Knuth penalties
Jeremias Maerki wrote: On 19.06.2006 15:45:36 Luca Furini wrote: It seems to me that the prescribed behaviour requires a keep constraint with force = always to be satisfied *always* :-), even if this would mean having some overflowing content. Obviously, we disagree here. I read it so that always can also be relaxed if the keep cannot be satisfied. Did anyone check what other implementations do? A quick test shows that AntennaHouse's xslformatter satisfies all the keeps, even when this means having some content overflow the body region (the overflowing content is actually clipped), while RenderX's xep relaxes a keep constraint in order to avoid overflows. So, it seems the match is still a draw! ;-) Regards Luca
Re: keep...=always and Knuth penalties
Manuel Mall wrote: What is still unclear to me is if it is worthwhile to implement this two pass approach, i.e. use INFINITE penalties first and relax later, or if it is good enough for 99.99% of cases just to start with INFINITE-1 penalties for mandatory keeps? I think the second pass is necessary, in order to be sure that we are breaking a keep because there really isn't any other alternative. Otherwise, I'm sure that for each value INFINITE we use, we could create a (contrived) example where the algorithm prefers breaking the keep instead of using a different, legal (but somewhat uglier) break, such behaving in a non-conformant way. Reading again the specs, I even start wondering whether it would really be right to allow a break between objects tied by a keep constraint: Each keep condition must also be satisfied, except when this would cause a break condition or a stronger keep condition to fail to be satisfied. If not all of a set of keep conditions of equal strength can be satisfied, then some maximal satisfiable subset of conditions of that strength must be satisfied (together with all break conditions and maximal subsets of stronger keep conditions, if any). It seems to me that the prescribed behaviour requires a keep constraint with force = always to be satisfied *always* :-), even if this would mean having some overflowing content. More than this, even a keep with force = N could be broken only in order to satisfy a keep with greater force, and not to avoid an overflow. I seem to recall that in Knuth's paper the author talks about a symbol he introduced in tex to represent a space that could be used as a line break in dire straits, having a penalty value = inf-1 (where inf was the special finite value representing infinity). Maybe we could similarly add some soft-keep extensions? Regards Luca
Re: [GSoC] Wiki page for progress informations
Jeremias Maerki wrote: did you already investigate how footnotes are implemented? Can you say anything about how similar the problem of footnotes is to before-floats? Just so you don't have to start from scratch while there may be something to build upon. After all, the footnotes also contain some logic to move certain parts to a different page than where anchor is located. A few quick comments about the footnote implementation: 1) the FootnoteLM returns only the sequence of elements representing the inline part (not the footnote-body part); it just adds to the last (inline) box a reference to the FootnoteBodyLM. 2) the LineLM, after computing the breaks, adds to each (block) box representing a line the references to the FootnoteBodyLM whose citations are in that line 3) during the remaining of the element collection phase, these references are not used (but in the creation of combined element lists, when they should be copied inside the new elements) 4) the PageSequenceLM.PageBreaker.getNextKnuthElements() method, after receiving all the (block) elements, scans them looking for footnote information, gets the elements from the referenced FootnoteBodyLM and puts them in a different list (at the moment a list of lists, but this is sub-optimal), and from the footnote-separator (in a separate list) 5) these lists are looked at in PageBreakingAlgorithm.computeDifference(), where we try to add some footnote content to the normal page content using getFootnoteSplit(), and in computeDemerits(), where some extra demerits are added if we break a footnote or some footnotes are deferred. This last point at the moment is performed using many PageBreakingAlgorithm private variables, which is maybe not the best way to do it, as we must be very careful about their initialization and their use, especially when the algorithm restarts. I think that a state object storing these variables could be used to store these values, and explicitly passed along the methods instead of relying on the class members, but concerning this I'd like to hear the opinions of the other committers ... Insertion of before-floats could be implemented in a similar way, giving the precedence to the footnote insertion (as it is affected by more strict constraints). An important difference between a footnote and a before-float is that the latter does not have an inline part, so (if we want to follow the same pattern) we need to either store the reference inside a previously-created box or to add some new elements containing the reference (but we must be sure that these elements cannot be parted from the previous ones, see the constraints in section 6.10.2 in the spec). A crucial point is the demerit function as, if I remember correctly, it greatly affect the computational complexity of the breaking algorithm (thre should be a M. Plass paper concerning this). HTH Another thing that we may need to keep in mind: There was lots of desire from the user community that FOP supports large documents (long-term goal, not necessary yours). I wrote that a first-fit algorithm could help free memory earlier. Obviously, for complex before-float situations a total-fit approach is probably more interesting as it can come up with more creative solutions. I'm just mentioning it so we keep the bigger picture in mind and since there could be conflicting goals. A first degree of first-fit algorithm could be achieved quite quickly by having a BreakingAlgorithm interface which is implemented by a TotalFitBA (the existing implementation) and a FirstFitBA which would have a much simpler considerLegalBreak() method that, instead of the complex set of nodes, just keeps in mind a single node. This would surely decrease the memory footprint, but is not (I think) what we really want, as this simplified algorithm would be performed on the whole sequence of elements. In order to start processing the sequence as soon as we receive a few elements we need to do some deeper changes. An idea (I just had it now, so I did not fully consider all its implications). At the moment, the block-level LM collect elements from their children and return just a single sequence (if there are no break conditions); we could have a parameter requesting them to return after they receive each child sub-sequence, and have a canStartComputingBreak() method that returns true if the sequence contains enough elements and we are using a first-fit algorithm, or false otherwise ... Sorry for the long post ... and for the long absence too, but it seems that just after thinking great, now I've really got some time to spend on FOP I receive tons of other things to do ... :-( Regards Luca
Re: some footnotes not being displayed
Jeremias Maerki wrote: No idea if anyone else has time to look into it. I don't think it's an easy fix, or at least easy to isolate, because footnote handling is not trivial. Having a good test case is instrumental in finding the problem quickly. Usually, this is step is 60% to fixing a bug. I'm going to look at this bug: it appears to be (more or less) the same bug as #37579: while the sequences of elements representing the list-item-label and the list-item-body are combined, information about footnotes is lost. While having a quick fix should not be difficult, I'd like to see if there is an elegant way to deal with this kind of problem without having to replicate the same code wherever an element combination is performed. At the moment, I'm thinking of an element sequence iterator, moving from a feasible break to another and returning an object with all the needed information (width, stretch, shrink, footnotes, ...). I think such an object could come in handy for severaral classes ... If anyone has other ideas, suggestions, objections, just let me know! Regards Luca
Re: some footnotes not being displayed
I've started looking at the patch attached at bug #37579, for the moment concentrating on footnotes inside lists. Concerning shortcoming 2) (from the bug comment): 2) Footnotes from list-item-body starts at the same position (from the starting edge) than the list-item-body itself and not at the starting edge of the region-body. I'm not sure whether what happens is wrong: isn't this the correct result of the inheritance of indents? Shortcoming 1) 1) Footnotes in list-item-label produce a Cannot find LM to handle given FO for LengthBase. AFAICS in the getBaseLength method of AbstractBaseLayoutManger. is quite related to this: the message is due to failed attempt to recover the value for end-indent (setting end-indent to a fixed value gets rid of the message). The method AbstractBaseLayoutManager.getBaseLength() iterates over the LM tree, moving from a LM to its parent: in this case, the traversed LM are: BlockLM FootnoteBodyLM FlowLM PageSequenceLM null It seems that the FootnoteBodyLM should have, in this case, a ListItemContentLM parent (or maybe some kind of reference, so not to break the passing of elements with the PageSequenceLM). One last note: in the attached example for lisrs, there is a footnote inside a static-content, commented out as if this is uncommented a runtime error results (quote from the comment). A run time error is never a good thing, anyway the specs states that It is an error if the fo:footnote occurs as a descendant of a flow that is not assigned to a region-body (section 6.10.3 fo:footnote); this should maybe originate a validation exception ... Tomorrow I will try and finish fixing this. As a quick fix, it should be enough to apply Gerhard Oettl's patch and explicitly set indents on the footnote bodies. Regards Luca
Re: Generalized Knuth-Plass Linebreaking Algorithm
Simon Pepping wrote: [...] See http://www.leverkruid.nl/GKPLinebreaking/index.html. Please, let me know what you think of it. I'm going to read it carefully, it seems very interesting! Regards Luca
Re: letter-spacing
Jeremias Maerki wrote: Still trying to fix my problem with letter-spacing and fixed width spaces. Do I understand that correctly that XSL-FO's view of letter-spacing is different than, say, PDF's? PDF's character spacing (PDF 1.4, 5.2.1) is designed so it advances the cursor for each (!) character by the Tc value. Yes, I remember that when I was working on letter spacing it took me a while to understand what was wrong with the resulting pdf! :-) letter-spacing=1pt: |_t__e__x__t_ _t__e__x__t_ _t__e__x__t_| At the moment, fop has |t__e__x__t t__e__x__t t__e__x__t| in other words there are letter spaces only between letters, and not between a letter and a space. The recommendation states that The algorithm for resolving the adjusted values between word spacing and letter spacing is User Agent dependent. (7.17.2 in the candidate recommendation), so I think this is not a wrong behaviour: it just assumes that word spaces have a higher precedence than letter spaces. Another little difference: each letter space depends on the preceding letter size, instead of depending on both the preceding and following letters sizes; but this has some visible effect only when a word is composed of letters having different sizes. PDF's character spacing would work like this, I think (although the last character space needs to be eliminated by the layout manager [1]): |t__e__x__t__ __t__e__x__t__ t__e__x__t|(__) -- [1] This is why the word spacing adjustment stored in the textAreas is not the computed one, but is specifically modified in order to counterbalance the 2 letter spaces that the pdf will add. If I'm right here (not really sure, that's why I'm asking), it would mean that we should probably stop using the Tc feature from PDF and instead control the glyph positioning ourselves like we already do in PostScript. WDYT? As long as we have just two character categories (letter / spaces) the two pdf operators were enough. Now, with fixed width spaces too, which should be unaffected by the both word spacing (such being different from spaces) and letter spacing (differing from normal letters), two operators are too few. I don't think we need to set the horizontal positioning of each character or word, but just fix the placement of a character sequence following a fixed width space, removing the letter spaces wrongly added by the Tc operator, alternating character sequences and horizontal adjustments in the TJ array. HTH Regards Luca
Re: letter-spacing
Jeremias Maerki wrote: The recommendation states that The algorithm for resolving the adjusted values between word spacing and letter spacing is User Agent dependent. (7.17.2 in the candidate recommendation), so I think this is not a wrong behaviour: it just assumes that word spaces have a higher precedence than letter spaces. No, actually in both cases the precedence is force so all spaces survive the resolution process. So, just to check I understood: - according to the pdf specifications between two words there is 1 word space + 2 letter spaces - according to the xsl recommendation there is 1 word space + 1 letter space (or better, two half letter spaces) - fop currently puts just a word space Is this correct? But I still don't understand what the words concerning adjusted values between word spacing and letter spacing are supposed to mean ... However, while I was out for a few hours I was thinking about this and I came to the conclusion that it may make sense to keep an array of character offsets as an attribute of a WordArea in the area tree. It would probably be the best way to deal with kerning too. My only concern is about the resulting pdf size: if we specify an offset for each character, wouldn't it become (at least) twice as big as before? Regards Luca
Re: white-space-collapse not working in trunk?
(moved from fop-users as we are going into the implementation details) Manuel Mall wrote: the shorthand property white-space=pre should be used or its expanded equivalents: linefeed-treatment=preserve white-space-collapse=false white-space-treatment=preserve wrap-option=no-wrap If you do that the bug I was referring to would show because white-space-treatment=preserve is not correctly implemented. In order to preserve all spaces, we could use the elements that are now generated for a nbsp: box w=0 penalty inf glue (elastic or not, according to the alignment) They are not suppressed and they do not allow a break, so I think they should fit quite well this situation too, when white-space-treatment = preserve and wrap-option=no-wrap. If wrap-option=wrap, however, we must add some penalties in order to allow a break between spaces; we must be careful, as if there are 3 spaces between two words, there are 4 possible breaks (ignoring, at the moment, unicode breaking rules), so we just cannot add a penalty before or after the other elements. Is this ok, or am I missing some important detail? Regards Luca
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
Manuel Mall wrote: 1. The suppress-at-line-break property can be applied to all characters. I would take the position at the moment that explicit specification of the suppress-at-line-break property is not supported and we worry about it at a later stage. I would certainly argue against just supporting it in the context of nbsp. Ok, it's better to take a step at a time! 2. When we discussed UAX#14 line breaking on this list last year Joerg pointed out that he had a table driven implementation for it. At the the time I took a look, liked it, and updated it for compliance to the lastest UAX#14 spec and then shelved it for integration into FOP. That is when we move determining line break opportunities to the LineLM level (which we discussed extensively before) we get UAX#14 linebreaking as part of it by integrating Joerg's implementation. As a consequence I recommend against putting any UAX#14 specific stuff at the lower levels (e.g. TextLM) now in the context of fixing the nbsp problem. It will disappear anyway and IMO is therefore not worth the effort. Ok, so for the moment I'll avoid considering interaction between spaces, and just fix the character-by-character element creation, which is ready and should be enough to handle the most common situations. This also solves another bug concerning a nbsp being removed when starting a line. I'll make the commit in a few minutes Regards Luca
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
Manuel Mall wrote: IMO nbsp (and any other Unicode special spaces) are outside the scope of XSL-FO whitespace handling. XSL-FO refers to whitespace as defined in XML. In XML only x#20, x#9, x#a, and x#d are considered whitespace. Therefore nbsp does not need to be considered when looking at white-space-treatment and white-space-collapse. Would that approach remove the complications you mentioned? Thanks for the clarification, Manuel! This solves the first supposed problem (interaction between nbsp and pretty-printing spaces), but the second one is still open: what happens if we have someContentnbspspaceotherContent ? *IF* (and I'm not at all sure about this) there can be a break , then both spaces should be discarded: in order to implement the correct behaviour for this almost hypothetical situation, we would need to create elements for both spaces as a whole (and thay could belong to different LMs) otherwise the algorithm would not be able to ignore the nbsp during the line breaking. Anyway I think this is quite an unlikely combination of entities and properties :-) ; as I see you are already working on something else, for the moment I will prepare a patch for the most common situations. Regards Luca
Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output
Manuel Mall wrote: This solves the first supposed problem (interaction between nbsp and pretty-printing spaces), but the second one is still open: what happens if we have someContentnbspspaceotherContent ? *IF* (and I'm not at all sure about this) there can be a break , then both spaces should be discarded: IMO yes there can be a break and no only the space needs to be removed. Again the argument is that nbsp is not whitespace as per XSL-FO definition and need not to be removed. What makes you think that both the nbsp and the space needs to be removed around a fop generated linebreak? Oops, I forgot to add an importand condition: if the user explicitly states that the nsbp must be discarded around a line break: fo:inline suppress-at-line-break=suppressnbsp;/fo:inline Well, the more I look at this, the more it seems unlikely to ever happen ... we are probably having a highly theoretical disquisition! :-) Anyway, I was still not sure whether there could be a break so I looked back at the Unicode Annex #14. GL Non-breaking (Glue) (XB/XA) (normative) Non-breaking characters prohibit breaks on either side, but that prohibition can be overridden by SP or ZW. In particular, when NBSP follows SPACE, there is a break opportunity after the SPACE and NBSP will go as visible space onto the next line. See also WJ. The following lists the characters of line break class GL with additional description. 00A0 NO-BREAK SPACE (NBSP) 202F NARROW NO-BREAK SPACE (NNBSP) 180E MONGOLIAN VOWEL SEPARATOR (MVS) NO-BREAK SPACE is the preferred character to use where two words should be visually separated but kept on the same line, as in the case of a title and a name Dr.NBSPJoseph Becker. When SPACE follows NBSP, there is no break, because there never is a break in front of SPACE. NARROW NO-BREAK SPACE is used in Mongolian. The mongolian vowel separator acts like a NNBSP in its line breaking behavior. It additionally affects the shaping of certain vowel characters as described in [Unicode] Section 12.3, Mongolian. So, it seems there could be a break between SPACE and NBSP (with NBSP starting the next line), but not between NBSP and SPACE. Can we say this is settled? Regards Luca
Re: line breaking and whitespace handling
Manuel Mall wrote: As far as I remember our last discussion was about who should generate the Knuth element lists: The individual layout managers or the Line layout manager. You argued in favour of retaining the current system and I tended to favour the moving it up the hierarchy to the line LM. I never spelled out why I am tending to favour the line LM. It all boils down to in my mind: Do we need to create LM spanning Knuth elements? If the answer is Yes then my gut feel is we are better off doing it at line LM level instead of passing context around in argument lists. If the answer is No then leaving it at lower level LMs is fine. I see your point, and I agree with you that elements representing inline spaces, borders and padding must take into account information coming from multiple fo nodes. Moving the generation to the LineLM level avoids the need to pass rich (and large) context information to the children: but the downside is that the children must give the LineLM their Positions, as the addAreas() phase still counts on the Positions stored in the elements in order to know what to place where (or we must rethink this phase too). One reason to have LM spanning Knuth elements could be for consecutive whitespace (BTW is it 'white space' or 'whitespace' - I don't have a clue?) The xsl recommendation 1.0 had white space (but whitespace in the quotation from the css specs), 1.1 has both white space and whitespace ... they should really make up their mind! :-) which we need to discard around formatter generated linebreaks. Or which we may have to stretch/shrink for justification. What I am saying is if whitespace-collapse=false it may make things easier (and more economic) if we model consecutive whitespace as a single glue element. What is a more complicated case is having an fo:inline with border/padding and whitespace before and after the border: [example] What about using the UnresolvedElements? Just as per the block-level space resolution, each inlineLM could append at the beginning and at the end of its element list an UnresolvedElement storing its border, padding and spacing information. Before performing the line breaking, when all UnresolvedElements are known, their information can be combined to create the actual elements. Another issue which came up since our last discussion but not really related to the issue above is that because of markers we cannot do whitespace handling at fo level in all cases but must rerun the fo level type whitespace handling again at LM level when we have the actual whitespace related property values which apply in the retrieve-marker context as they can be different to the values of the same properties in the marker context. Maybe in this case we could use UnresolvedElements for the inner spaces too (the spaces in the middle of a node text, whose handling in the previous situation did not need other infrmation). WDYT? Regards Luca
Re: line breaking and whitespace handling
Manuel Mall wrote: What about using the UnresolvedElements? Just as per the block-level space resolution, each inlineLM could append at the beginning and at the end of its element list an UnresolvedElement storing its border, padding and spacing information. I don't know anything about the UnresolvedElements as I so far have not studied the block level LMs. But this reminds about another requirement we may need to consider: Proper conditional start/end space resolution. This is currently not done. I don't think we even have testcases for it. When Jeremias did the block level before/after stuff the idea was that may be we can port this to the inline LMs for the start/end space resolution. So, we could start from here, using UnresolvedElements to handle inline space resolution, then take into account conditional borders and paddings, and finally trailing / leading space characters. In the end, it all boils down to compute how much space we have to allocate between two words if there isn't a break, and how much after one and before the other if there is. Sounds almost easy in these words there must be a trick somewhere! :-) [white space handling within markers] Maybe in this case we could use UnresolvedElements for the inner spaces too (the spaces in the middle of a node text, whose handling in the previous situation did not need other infrmation). Not sure here - I am more inclined to reuse the fo logic, that is iterate over all characters in a paragraph and tell the LMs which one to delete probably combined with the Unicode UAX#14 linebreaking. Ok, effectively if all we have to decide is whether to discard or retain a space character it's better to reuse what we already have. Regards Luca
Off line for a week
Hi all! I apologize for having been not very active for the last weeks, but at long last things should change: next week I will be in San Jose (California) attending a conference about digital publishing, and after that I should have some time to spend working on FOP (and I really can't wait to!). Regards Luca
Re: Hyphetation broken with last commits
Manuel Mall wrote: Luca, why does our line breaking algorithm insist on having at least one Box in a paragraph? Is that inherent in the Knuth algorithm, i.e. can't it deal with empty paragraphs, that is paragraphs containing only Glue/Pen elements? If I remember correctly, a sequence starting with glue / penalty elements would not make the algorithm crash, but the produced ouput will take into account the width of the glues too, while it should not. This happens because there is not a previous break, whose handling would have the effect of ignoring glues and penalties between the break and the first next box. We could maybe move the leading space removal at the beginning of the breaking algorithm itself, which could then check if there are some elements left and create an empty line break if there is none. HTH, unfortunately these days I'm really really busy and I have not much time to look at this. Luca
Re: 4.3.2 Overconstrained space-specifiers
Jeremias Maerki wrote: You will have seen that I've been working on overconstrained documents. 5.3.4 Overconstrained Geometry is more or less implemented, so now I need to have a look at 4.3.2 which proves quite difficult to understand. At least I can't make much sense of it ATM. [...] If anyone has an idea what rule 4 in 4.3.1 or the section 4.3.2 is about I'd love to read your thoughts. Otherwise, I will run this through the XSL editors list. I always thought (probably wrongly) these sections of the spec refer to the page regions, maybe because of the property display-align, and more as a way to formally justify what is usually done than as prescribing some particular behaviour. To be more clear (I hope :-)): region viewports usually have a well-known height (unless there is only a single page whose height is unbounded); their area children don't always fill them completely. The content areas are placed at the top / center / bottom of the viewport according to the value of display-align: but, as these extra spaces may be in contrast with the space properties of the first and last child areas, we need, from a formal point of view, a rule saying that we are allowed to do this, otherwise the specs would be inconsistent. In other words, I always read these rules as: spaces added ad the top / bottom of a page to implement display-align have greater precedence than space-before or space-after traits of the child areas. According to me, rule 4 should state something like this: the maximum value of the space-specifier is set to the difference between the containing height and the content height. Don't know if this makes any sense ... Regards Luca
Re: Kerning
Starting from your final summary: Manuel Mall wrote: IMO FOP should limit itself to: a) Use kerning only for consecutive characters within the same fo Ok, but more on this later in this message ... b) Limit itself to the kerning information in the font Ok c) Only apply kerning if the letter-spacing property has the value normal (and the font supports it) Isn't this condition too strong? I see kerning as an extra space, something that is added to the letter spacing, not something that replaces it. A simple example with our kerning couple AV: a) at the moment kerning is not implemented, so with normal letter-spacing the space between A and V seems bigger than the space between I and L, for example; b) we implement kerning, so the space between A and V is reduced and it visually looks like the space between I and L; c) what if we have a negative letter-spacing? if we don't apply kerning any more, we go back to a): the space between A and V would seem bigger than the space between I and L. In other words: if the kerning value stored in the font is correct, it should always be added to the letter spacing: it would make the characters overlap only when the letter-spacing alone would make normal characters overlap, and in this case this should be considered the desired output. In the end XSL-FO has the letter-spacing property which users (and programs generating XSL-FO) can use to adjust kerning. A little doubt concerning letter spaces: at the moment, a letter space is assigned to the preceding character. Is this correct? I don't remember any section in the specs stating about the ownership of letter spaces ... I think that everything is simpler, from the point of view of both users and implementors, if each letter space is owned by the preceding (or following) formatting object, but this does not mean it is what the specs require! An example: if we have the text WORD where each letter is a fo:character, the first three fo:character ATM have a letter space each, and the fourth has none. All is ok as long as the fo:characters have no (or equal) letter-spacing, but what if each fo:character has a different letter-spacing property the ouput is different according to the fo:characters controlling letter spaces. Regards Luca
Re: DO NOT REPLY [Bug 37743] - exception: border-style (shorthand)
First of all, thanks for your comments: I really tend to forget in a short time all the details concerning white space! Manuel Mall wrote: Glyphs are only allowed to be merged if they carry the same / matching set of property values. Personally I would not be concerned if we therefore limit that logic to within a LM. While it is possible that someone could write something like fo:blockfo:inlinea/fo:inlinefo:inline#x0308;/fo:inline and the a and #x0308; could be combined into an x00e4; IMO this is a pretty degenerated case. Seems reasonable: so, we can delete glyph substitution from the list of things we must consider in this phase. But, now I think of it, we must consider kerning too, so the list does not get any thinner! my summary is: a) We both seem to want the same outcome, that is add required features and at the same time get rid of some of the workarounds currently used. Agreed. b) We both agree that the character by character analysis is done at Line LM level. Agreed. c) Your initial thought is that the Line LM should then provide enough information to the LMs to generate their Knuth sequences while my initial thought is that the Line LM generates the Knuth sequences and provides enough information for the LMs to generate their areas. If you agree with this summary may be we can concentrate on discussing the pros and cons of the two approaches mentioned in item c) above? Ok, I'll send a new message soon! Regards Luca
Re: Indent Inheritance and Collapsing Border Model
Jeremias Maerki wrote: The first concerns indent inheritance [...] So what I'd like to do is implement the alternative behaviour as a configurable option in the FO tree. The default would still be what the specification describes (see [1]), but users would be able to set a switch that would make FOP reset start-indent and end-indent to zero in cases where in the area tree a reference area boundary would be crossed (block-containers and table-cell, mainly). I agree with the need to provide users what they expect, but I did not understand where this switch will be: in the configuration file (+1) or in the document itself as an extension property / element (not so enthusiastic about that)? In the first case the file would be correct, only its rendering will be deliberately wrong: the user is aware that he is requiring a non-standard rendering *to the formatter*. In the second the document itself would require a non-standard rendering, which only our implementation will provide; in other words, it seems to me that this solution would give the impression that the file itself is enough to achieve the expected result, while it is not. Or maybe you were thinking of something else? The second issue is about the collapsing border model. Currently, having an fo:table with no explicit border-collapse=separate results in a warning message in the log as well as frequent exceptions due to the fact that this border model not completely implemented. I would like to modify the FO tree in a way that a table always reports being in separate border model mode. The other idea would have been to change the default but I don't particularly like that approach because it breaks the spec. Obviously, this is only a temporary measure until the collapsing border model becomes usable. I agree with you, I prefer the first option. Regards Luca
Re: svn commit: r345909 - in /xmlgraphics/fop/trunk: src/java/org/apache/fop/fo/flow/ src/java/org/apache/fop/layoutmgr/ src/java/org/apache/fop/layoutmgr/inline/ test/java/org/apache/fop/
I wrote: Implementation of hyphenation-ladder-count. Just a couple of annotations: - this implementation does not store any extra information inside the nodes: the algorithm checks wheter a break is ok or not using a for loop; if you prefer, I could change this so that the number of consecutive lines ending with a hyphen is stored inside the nodes, and the check takes a constant time - the specs states that this property Specifies a limit on the number of successive hyphenated line-areas the formatter may generate *in a block-area*; so, if the value is 2 and a block creates 5 lines, the first 4 lines could all end with a hyphen provided there is a break after the second one. This implementation would not create such a set of lines: anyway, the produced output still satisfies the condition; in other words, we check a more strict condition. Regards Luca
Re: Hyphenation
Manuel Mall wrote: Not sure what other committers and the PMC think but as a vote on the release has started I would suggest no further changes to the codebase unless agreed? What I am saying is - by all means do the development but don't put it back into svn until after the release. Ok, this seems a good idea. Regards Luca
Illegal property values
While working on the implementation of hyphenation-ladder-count, I noticed that at the moment the property system can return illegal values coming from the fo file instead of the fallback value defined by the specs. There are significant differences in wording between XSL 1.0 and 1.1: for example, concerning hyphenation-ladder-count 1.0 has (7.15.2): integer an integer greater than or equal to 1 While 1.1 (7.16.2) reads: number an integer greater than or equal to 1. If a zero, negative, or non-integer value is provided, the value will be rounded to the nearest integer value greater than or equal to 1 So, should the property be improperly set to -0.5: - if we want to follow closely 1.0, we should stop with an error - if we follow 1.1 we should continue using 1 instead, maybe with a warning message There are other properties with a validity range and a fallback value: column-count, initial-page-number, column-number, number-columns-repeated, number-columns-spanned, number-rows-spanned, hyphenation-{push, remain}-character-count; only hyphenation-ladder-count does not have a fallback value in 1.0, so maybe this was just an oversight. Note that the fallback value is different, in general, from the default value, as it is derived from the illegal value by rounding. At the moment the layout process continues with the incorrect values, and this could create errors in several different places; for example a non-integer value would probably create an error if we assign it to an integer variable, a negative integer value could create an IllegalArgumentException if we use it as the size of an array (this happens, for example, with a negative column-count) ... Regards Luca
Re: Hyphenation
Manuel Mall wrote: Hmm, just changed the value to 3000 (I think that's the value suggested in the article) and there is no change in hyphenation behaviour with the above mentioned example. That makes me a bit suspicious... I traced the beheviour of the breaking algorithm applied to the first paragraph of the example (the one with 4 consevutive lines ending with a hyphen) and it seems to me that the algorithm works well: the chosen set of breaks has about 15000 demerits, while the existing three alternatives either have some more demerits and the same quantity of consecutive flagged lines or about 3 demerits. It seems that out example, and in particular its first paragraph, perfectly follow Murphy's laws! Tomorrow I should have some time to implement hyphenation-ladder-count and fix the penalty values for justified / unjustified text. Regards Luca
Re: Leading/trailing space removal in LineLM
Manuel Mall wrote: So we end up with only two cases to consider: preserve white space and remove white space around a line break created by the Knuth algorithm. 1. Preserve white space: IMO in this case the space itself is actually not a break opportunity but there are now two break opportunities: one before the space and one after the space. That is a sequence like 'abc#x20;def' is more like 'abc#x200b;#xa0;#x200b;def' or in a more readable notation 'abczwspnbspzwspdef'. That is our normal space becomes a non-breakable space flanked by zero-width spaces which represent the break opportunities. If this is correct the Knuth elements would look like: glue w=0 box w=0 pen +INFINITE glue w=space pen glue w=0 Is this sequence correct? The first and last glue represent the zwsp and are break opportunities. The box prevents the removal of the space if a break is created before the space. The penalty prevents the space to be considered as a break opportunity. Of course as usual these sequences are further complicated in the absence of justification and in the presence of border/padding. I like your idea of expanding a preserved space into zwsps and nbsp; this allows us to forget alignments and borders / padding as we just have to insert the appropriate elements for the non breaking space. The sequence is very good, as it has a couple of interesting properties: - it interacts with the surrounding elements just a single glue element - if there are two (or more) consecutive, non-collapsed spaces the sequence has just 3 feasible breaks, not 4 However, I have a doubt: reading the Unicode document about line breaking, it seems to me that, regardless of the quantity of consecutive spaces, there is only *one* feasible break, after the last one (Unicode Standard Annex #14, section 2 Definitions, in particular the definition of direct break and indirect break) --- begin quoted text --- Direct Break - a line break opportunity exists between two adjacent characters of the given line breaking classes. This is indicated in the rules below as B ? A, where B is the character class of the character before and A is the character class of the character after the break. If they are separated by one or more space characters, a break opportunity also exists after the last space. In the pair table, the optional space characters are not shown. Indirect Break - a line break opportunity exists between two characters of the given line breaking classes only if they are separated by one or more spaces. In this case, a break opportunity exists after the last space. No break opportunity exists if the characters are immediately adjacent. This is indicated in the pair table below as B % A, where B is the character class of the character before and A is the character class of the character after the break. Even though space characters are not shown in the pair table, an indirect break can only occur if one or more spaces follow B. In the notation of the rules in Section 6, Line Breaking Algorithm this would be represented as two rules: B ? A and B SP+ ? A. --- end quoted text --- I still have not read the document from top to bottom, and I could have misunderstood even the sections I read :-), but I think this point must be clarified before we continue. Regards Luca
Re: Leading/trailing space removal in LineLM
Manuel Mall wrote: Luca wrote a longer response to this but my mail reader doesn't like the character set (is that topical or what?). Sorry, it looks really horrible ... still don't know what went wrong, but I won't do it again! :-) Any way at end Luca ask the question about the UAX#14 line breaking algorithm and its handling of spaces. My answer to that is: a) Yes UAX#14 always breaks at the of a sequence of spaces b) But is also says that it assumes any trailing spaces in a line are being removed This conflicts with XSL-FO which can force spaces being retained therefore adjustments to the algorithm are necessary to cater for that. One possible adjustment is simply changing what is given to the algorithm as indicated above, ie sp becomes zwspnbspzwsp. Ok, so back to your previous message: 2. Removal of white space: This is the current behaviour but it works only for a single space and not for a sequence of spaces. Actually because the algorithm removes leading glues/penalties it is mainly a problem for trailing white space. I am not sure how to best tackle this. What comes to mind is: a) Do the same as for leading glues/penalties at the end of the line. However I am not sure how tricky it would be to determine the boundary because any 'blocking boxes' (see 1. above) are only placed before but not after elements. This options suffers from the problem that it will not remove leading/trailing white space across inline boundaries with border/padding as these generate zero width boxes to block removal of the glue elements for the border/padding. b) Do not generate individual Knuth sequences for each white space character but instead collect all consecutive white space and create one glue-penalty sequence for it. Again I am uncertain of the consequences of doing that. To do that correctly we would need to collect white space across inline boundaries. This firstly breaks the current getNextKnuth approach which assumes each LM can generate its sequences without knowledge of its neighbours. It would also break the current area info structures as a single Knuth element could now refer to text snippets from different LMs. I'm not sure I follow you in all the details of white space handling and here we have borders too ... :-) I like b) most: after all, this is somewhat similar to the space resolution, as we have interactions between spaces coming from different nodes, and it's difficult to have each LM decide on its own. And I think we could find a way to keep the 1-1 relationship between AreaInfo objects and Positions. I have tried to play with the elements, and here are a few results: I hope they can help! At the moments, the sequence for a single space with borders and padding is: 1 glue w=endBP 2 penalty w=0 3 glue w=(spaceIPD - endBP - startBP) 4 box w=0 5 infinite penalty 6 glue w=startBP total width = spaceIPD if break at #2 = endBP / startBP If we have two (or more) spaces, we could use the sequence: 1 glue w=endBP 2 penalty w=0 3 glue w=(- endBP - startBP) 4 glue w=spaceIPD1 5 glue w=spaceIPD2 6 box w=0 7 infinite penalty 8 glue w=startBP total width = spaceIPD1 + spaceIPD2 if break at #2 = endBP / startBP Glues #4 and #5 have a Position pointing to different AreaInfo objects (from different LMs). This should solve (?) the case of ignore-if-surrounding. If white-space-treatment is ignore-if-after, and we have two consecutive spaces we could use the sequence: 1 glue w=endBP 2 penalty w=0 3 glue w=(spaceIPD - endBP) 4 penalty w=0 5 glue w=(spaceIPD - startBP) 6 box w=0 7 infinite penalty 8 glue w=startBP total width = 2 * spaceIPD if break at #2 = endBP / startBP if break at #4 = endBP + spaceIPD / startBP With three or more consecutive spaces: 1 glue w=endBP 2 penalty w=0 3 glue w=(spaceIPD - endBP) 4 penalty w=0 5 glue w=spaceIPD 6 penalty w=0 7 glue w=(spaceIPD - startBP) 8 box w=0 9 infinite penalty 10 glue w=startBP total width = 3 * spaceIPD if break at #2 = endBP / startBP if break at #4 = endBP + spaceIPD / startBP if break at #6 = endBP + 2 * spaceIPD / startBP I did not find a sequence for ignore-if-before yet ... Regards Luca
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
Manuel Mall wrote: But we need to know which spaces can be adjusted, and which cannot. If we don't wont to duplicate the logic for the space recognition, the SpaceAreas must simply have a boolean value stating whether the space is adjustable, so that the renderers won't need to look at the space and decide. I don't get that point. Isn't it enough for the renderer to know the offset for the area in question? What additional decisions would the renderer make based on the adjust flag? Or do you mean we still have the twsAdjust on the TextArea and the offset is only relative to twsAdjust? Do we really gain anything with that instead of making the offset the corrected twsAdjust value? At the moment we still use the twsAdjust value, and the individual offset would be an additional adjustment. Maybe there is little gain, but when the font is not multi-byte this saves us from setting the offset on each adjustable SpaceArea and using it in the renderer. It's not much, both in terms of time and output length: but if there is an easy way to adjust all the spaces at once ... why should we do another way? :-) [...] So, what if we rename offset - spaceAfter? It seems to me that we are here speaking of the same thing using two different names. :-) Fair enough, I agree we do. Good! We just have to reach an agreement on this last detail, and I'll implement the changes. Regards Luca
Re: White space handling Wiki page
Manuel Mall wrote: Side note: FOP doesn't quite do the same internally, i.e. a character explicitly specified using fo:character.../ is handled separately from 'plain text'. If someone would write a style sheet which does a transform of every character into a fo:character / object and would feed the output to FOP the formatting results would be lets say VERY DISAPPOINTING. Actually something like: fo:block background-color=yellowword1fo:character character= /fo:character character= /word2fo:character character= /word3fo:character character= //fo:block currently causes an exception! This is a problem of the whitespace-related code, but anyway the CharacterLM always creates a sequence of element corresponding to a non-space character, so the only feasible breaks recognized by the algorithm would be the hyphenation points inside the words ... I think that just as TextArea and Character both extend an AbstractTextArea, TextLM and CharLM should have a common super class holding the createElementsFor*() methods. It would not be necessary to add a SpaceArea or a WordArea child to a Character area, anyway (but we could decide to do it anyway just for analogy). Regards Luca
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
I wrote: Manuel Mall wrote: There is no need to expose creation of the Space/Word areas directly to TextLayoutManager either. TextArea could easily expose an addWord and an addSpace method instead of the monolithic setText. In the end it probably boils down to me arguing that the setText logic currently in TextArea IMO should be in TextLayoutManager (and probably based on its data structures) because it is an operation closely coupled to layout and not to areas. Ok. Done: http://svn.apache.org/viewcvs.cgi?view=revrev=328882 I added a boolean attribute in SpaceArea that is true for adjustable spaces (at the moment it is not used, but I will fix it soon). At the moment the offset in SpaceArea and WordArea are unused, but this is how I think they could be used: if, because of the rounding in the adjustment computation, the applied adjustment is different than the needed one, the TextLM should distribute this difference (a few millipoints) among the SpaceAreas and / or WordAreas, setting their offset. The renderers will use this according to their own adjustment rule: for example the PDFRenderer would add it to the text adjustment if the character is multibyte. The offset could come in handy for the cjk support (bug 36977): in this case there are no adjustable spaces, and if text is justified all the difference between line width and unadjusted character width could be handled modifying the offsets of some special characters. Regards Luca
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
Manuel Mall wrote: I have a question on this. You break in TextArea the text into words based on CharUtilities.isAnySpace. Is this guaranteed to be consistent with the breaking and adjustment calculations in TextLayoutManager? I am concerned we may be using different rules for word breaking in different places. As far as consistency is concerned, I agree with you: the handling of the different kinds of spaces (breaking, non-breaking, fixed width, ...) is still quite incomplete and dispersed over different classes. Just to add another example, the CharacterLM implicitly expects its character to be a non-space character and has its own lines of code concerning the creation of the elements, while it could share the methods already called by the TextLM. Having a single, centralized class taking care of the breaking (be it a Java utility class or a Fop one) and a single, shared method implementing the creation of the elements would surely increase consistency and clarity. Somehow it doesn't feel right to me that TextLayoutManager does all the breaking and calculations and then we give the whole chunk to TextArea and it breaks it again using a possibly different algorithm but still using the adjustment value calculated by TextLayoutManager. When I was trying to fix bug 36238 I initially started modifying TextLM#createTextArea(), using the AreaInfo objects to create WordAreas and SpaceAreas, but I then decided to move the string splitting inside TextArea because: 1) if WordAreas and SpaceAreas are not directly created by the LMs, there is no need to change a single line of code inside the classes creating TextAreas; this is not a real reason supporting the choice, just an handy consequence of it; 2) if TextArea still provides a getText() method, the renderers are not forced to render the text word by word and space by space if their word spacing treatment is not affected by multi-byte characters; but once again, this is not a real reason as we could provide this method anyway; 3) although both SpaceArea and WordArea hava an offset attribute it is ATM not used, so these areas does not carry any formatting information; their only purpose is to highlight spaces, thus allowing some specific renderer to handle them correctly regardless of their encoding; in other words, we are not losing braking and calculations, we simply do not need them anymore as we already know exactly which text will be placed in each line, and how wide it will be once it's correctly adjusted; 4) the text that will be placed in a line cannot be directly taken from textArray (in the TextLM), and the string str should be used instead anyway, as it may be different from the concatenation of the single pieces of text; at the moment the only difference concerns the hyphenation character - added at the end of the line, but I suspect that in different languages there could be other differences; so, we cannot simply create a WordAreas for each AreaInfo object. So, if you find it strange to break the text, put it together and split it again, me too! :-) But this initial feeling disappeared when I realized that the final splitting does not involve breaking in its proper sense, but just classification of characters. This is why I did what I did; if I did not manage to convince you ... you can try and convince me! :-) Regards Luca
Re: DO NOT REPLY [Bug 36238] - text-align=justify doesnt' work on custom fonts
Yesterday I added a couple of comments concerning this bug; at the moment I haven't received the bugzilla email yet, so here is a copy-and-paste of the last message. I added a comment after the copied text, so this message would not be completely useless even if you received the original one! :-) --- Additional Comment #8 From Luca Furini 2005-10-18 12:53 Quotation from the pdf reference, version 1.6, section 5.2.2 Word spacing: Word spacing is applied to every occurrence of the single-byte character code 32 in a string when using a simple font or a composite font that defines code 32 as a single-byte code. It does not apply to occurrences of the byte value 32 in multiple-byte codes. So, it seems that at least we have found where the problem lies ... anyone has an idea how to solve it too? :-) --- At the moment, my only idea about how fixing this is go back to the creation of several text areas, one for each word or space: so the multibyte space character could be converted to the single-byte space, or we could leave it as it is and forcing the adjustment modifying the ipd of the area created for a space. A disadvantage of this solution would be the big increase in the area tree size. An advantage could be the possibility to get rid of errors due to the adjustment rounding: at the moment the letter space can lead to an error of the order of the number of letter spaces, as the adjustment is rounded up to the nearest millipoint and is applied to all the letter spaces in a line. Having distinct text areas for each word, we could correct this error setting appropriately each area ipd. Regards Luca
Re: svn commit: r321084 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr/inline/LineLayoutManager.java
Fixing a ClassCastException due to the incorrect pattern of elements representing a space checked when there are inline borders and padding. The testcase inline_border_padding_block_nested_2.xml stil does not pass: there is a failing check concerning ipda. But at least there are no more exceptions! :-) Regards Luca
Re: Inline border / padding and nested blocks
Manuel Mall wrote: inline_border_padding_block_nested.xml. If you run the test case as is you get a Expect inline sequence as first sequence when last paragraph is not null message. The first message refers to the first block in the testcase: I think this has something to do with the correct mixing of block and inline sequences, as the content of the inner block is placed in the first line, while it should be in the second. The output should be: Before inline starting with a block after block After inline but we get starting wit a block Before inine after block After inline Note that the text before and after the inline (containing the nested block) appear in the same line, and this means their elements ended up in the same sequence, while they should be in two different sequences. I'm going to look at what happens in detail ... If you comment everything out and uncomment the last block you get a ClassCastException on a Knuth element. This happens during LineLM.removeElementsForTrailingSpaces(): as you wrote some time ago, at the moment when the LineLM meets a glue element at the end of a sequence it could wrongly deduce it represents a trailing space, while it represents borders / paddings. I'm going to look at the possible patterns that the elements for border and padding can have, and fix the method. Regards Luca
Re: Inline border / padding and nested blocks
Manuel Mall wrote: Is that actually conceptually the right thing to do, that is removing the trailing spaces before the end of a block as part of the Knuth handling? For leading spaces it is done somewhere completely different (and currently in the same piece of code it is done incorrectly for embedded spaces). I'm not sure it is the best place to do it, although I think that before the breaks are computed trailing spaces should exist no more: otherwise, the content width would take into account the width of these spaces too, and right / center alignment could be incorrect. Moreover, a glue just before the elements appended by the LineLM could be a feasible break, and this would create an empty page after the last one with some content. In other words, that removal is there as it could not be performed any later: but the sooner we get rid of the trailing spaces, the better! :-) I have a picture in mind with all white space handling done as part of the layout (area tree building) but before the actual Knuth sequences are constructed. But that's only a rough idea driven by the description of white space handling in the 1.1WD. Would you like to share it with us? I always find the specs quite obscure as far as white space handling is concerned, so your explanation could really be of great help! Regards Luca
Re: Inline border / padding and nested blocks
Manuel Mall I would appreciate if you could please have a look at test case inline_border_padding_block_nested.xml. If you run the test case as is you get a Expect inline sequence as first sequence when last paragraph is not null message. If you comment everything out and uncomment the last block you get a ClassCastException on a Knuth element. For both issues I am a bit out of my depth and hope you could help. First of all, my compliments for your wonderful work! I'll surely have a look at what happens, although I could have no time to do this until monday. Regards Luca
Re: Knuth algorithm problem
Jeremias Maerki wrote: I think I've just stumbled over a problem in the Knuth algorithm. I'm going to see what happens ... Regards Luca
Re: svn commit: r306656 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr: BreakingAlgorithm.java PageBreakingAlgorithm.java
Fixing a bug reported by Jeremias affecting the handling of glue and penalty elements after a break when the algorithm restarts. Now it should be ok. A nasty little bug, anyway ... Unfortunately, I had to duplicate a few lines (a for loop looking for glue elements after a feasible break): the point is that there are three different variables (the width, stretch and shrink) that must be modified during this loop. I'm going to see if there is some possible refactoring of this piece of code. Regards Luca
Re: Maintainability
Peter B. West wrote: Thanks to Luca for his (perhaps entirely co-incidental) posting to the Wiki. Well, not entirely co-incidental! :-) I started writing the page some time ago, but never found the time to finish it: your message made me think I really couldn't put this off any longer, so I added a few things and posted what was ready. I'd try and add the missing parts as soon as possible. Comments, questions, suggestions, and especially additions :-) are most welcome! Regards Luca
Re: Maintainability
Peter B. West wrote: There seem to be some misapprehensions about what you are attempting; perhaps they are mine, so please clarify this. As I understand it, the mature, well-documented technology is the line-breaking, as in Breaking Lines Into Paragraphs. Using this model for page-breaking is something that has been speculated about, in particular by Plass. However, in implementing this, you and the others are breaking new ground. If this is the case, then it is quite inaccurate to describe the page-breaking as mature, well-understood, well-documented and well-behaved technology. Is that fair? As Manuel has quickly answered, the box-penalty-glue model can be applied to both line breaking and page breaking, and this is already clearly stated in the cited article. In the Texbook you can find out that horizontal lists (representing the content of paragraphs) and vertical lists (representing content in the block progression direction) are made of the same elements: boxes, glues and penalties. So, I think we can surely state that the model suits the page breaking problem too. What Tex does not is performing Knuth's breaking algorithm in order to produce page breaks: it performs instead a simpler algorithm. But this is due to the resource limits existing at the moment when Tex was devised, and in the cited paper Knuth explicitly says so. The page breaking problem has some more difficulties, concerning objects whose placement does not follow the main flow, for example floating figures; in this case, the difficulty is the other side of freedom (the position of these objects has little constraints) and comes from trying to place them in the best possible way, which could lead to high computational complexity: should this be too much, it would be enough to use a simpler strategy instead, for example placing them in the first page where they fit, and the problem would be solved. So, I think we can say that the algorithm can be applied without any concern to page breaking too. Regards Lucat
Re: Another page-related question: page-position=last
Jeremias Maerki wrote: What is the expected output? In this case it has to generate a blank page IMO. Oh, right, I did not think of an empty page! :-) The problem is with the page x of y hack that won't work like this if the last empty block ends up on the second-to-last page. [...] What about the following approach? Run the breaker without special last-page handling, then inspect the allocated BPD for the last part. If it fits into the last page, just exchange the page-master (*) and paint it there. If it doesn't fit, paint it using the non-last page-master and add a blank page with the last page-master. If there's a box w=0 at the end of the element list, force a new part and paint that on the last page to handle the page x of y case. I think this would work with my idea too: in this case, if the last empty block and the difference in page bpd (that cannot be parted) do not fit in the non-last page under construction, they would be placed in a new page; so, a page-number-citation pointing to the empty block would return the last page-number. This would avoid the need to exchange page-masters, and to have a special handling for zero-width box at the end of the sequence. Regards Luca
Re: Another page-related question: page-position=last
Jeremias Maerki wrote: It's an interesting idea. However, I suspect this will probably not be necessary. We should be able to make the breaker clever enough to handle this particular case. When the page bpd depends on the page-masters, things becomes very strange. Not only it's difficult to implement the page-master choice, but even to understand what should be the expected result! :-) For example: let's suppose the breaker is working, and it has to place the last 25 lines of a page-sequence. The page-master for the last page has a bpd allowing no more than 20 lines, while the other page-masters can contain up to 30 lines. What happens? If the breaker starts building a last page it soon realizes that it would not contain all the remaining content, so it would be no more a last page. But if it starts building a non-last page, it reaches the end of the content, and has to turn it into a last page, which is impossible. What is the expected output? The only way I see to satisfy the property is to create two more pages: one non-last page, partially empty, with less than 25 lines (24 or fewer, if there are keeps, widows or orphans) and a last page with the remaining lines. This sort of problems happens only if the last page is smaller than the previous ones: otherwise, the breaker can always try to build a non-last page, eventually moving all its content into a last page. Now I think of this ... an idea, that could work at least when the non-last pages have the same bpd and the last page a smaller one, could be to modify a little the elements appended at the end of the sequence, so that they have a width equal to the difference (nonLastBPD - lastBPD). This way, the last page created by the breaker will have an apparent width of nonLastBPD, but the content placed inside it will have an overall bpd equal to nonLastBPD - (nonLastBPD - lastBPD) = lastBPD What do you think? Regards Luca
Re: Indefinite page-width / page-height
Andreas L Delmelle wrote: Currently, I have solved this locally by creating the pageVP with the indefinite dimension set to Integer.MAX_VALUE. The only things I'm still looking for are ways to: a) retrieve the accumulated content-height/-width (or: the difference between the initial page-height/-width and the content-height/-width up to that point) The difference is stored into PageBreakPosition.difference I'm guessing the place where all this should happen is PageSeqLM.finishPage() Maybe it's easier to put this in PSLM.PageBreaker.finishPart(): it already has a PageBreakPosition parameter, so it should be enough to add something like getCurrentPV().setBPD(Integer.MAX_VALUE - pbp.difference). HTH Regards Luca
Re: undefined page length
Andreas L Delmelle wrote: BTW: Is it a correct assessment that implementing this should turn out to be far simpler than fixed page-sizes? IIC, theoretically, the whole page-breaking algorithm can be ignored for indefinite page-heights. getAvailableBPD() would always return, say, Integer.MAX_VALUE? I don't think the breaking algorithm can be completely ignored, but it's a good idea to have getAvailableBPD() return an almost infinite value. Once the PageBreakingAlgorithm has created the single PageBreakPosition, it would be possible to use the stored difference in order to set the correct page height (otherwise the page would have height = Integer.MAX_VALUE even if it contains just a few lines). Regards Luca
Build error?
Hi all. I'm noticing a strange problem: fop builds correctly, but then it seems it is not working at all. I'm using it from the command line under win xp, and even if I don't get any run time exception no output file is created. Launching fop with no parameters, or with wrong parameters (missing files ...) does not create any error: simply, nothing happens. I have compiled fop on two different computers, so I don't think this is a local configuration problem. Hasn't anyone else noticed this? Regards Luca
Re: wrap-option property
Jeremias Maerki wrote: wrap-option is one of those few properties which work in 0.20.5 but are not yet available in FOP Trunk. Luca, what do you think how difficult it would be to implement it at least for, let's say, fo:block? I imagine it would suffice to trick the breaker into not choosing any break possibilities except at the end of the sequence. Yes, it seems a very good idea: just an additional boolean parameter for findBreakingPoints(), similar to hyphenationAllowed. Or we could use just a single int instead of two booleans: a parameter whose value could be set using three constants, for example ALL_BREAKS, NO_HYPHENATION, NO_WRAP. Maybe it could be even easier: a LineBreakPosition could be created without even performing the line breaking algorithm, as we alredy know we will create just a line, an which will be the indexes of the first and last element. But maybe this would prevent us from knowing useful information created by the algorithm (difference, indent, ...). I'm going to work on this immediately. I think we will need something similar in the StaticContentLM and the BlockContainerLM so overflow can be handled better. At the moment, only the first part until the first break point found by the breaker is properly painted. Afterwards, the BCLM simply adds the additional parts but this can lead to unexpected results as I have seen in one document already. Sorry, I don't quit get what you mean ... what are these unexpected results? Regards Luca
Re: baseline-shift and KnuthInlineBoxes
Manuel Mall wrote: if we have a baseline-shift, eg. some Xfo:inline font-size=smaller baseline-shift=super2/fo:inline ... how is that intended to be modelled with respect to the lead,height, and middle values to be stored in the created KnuthInlineBoxes for the fo:inline? I think that more (or different) information needs to be stored in the KnuthInlineBoxes in order to fully implement the properties concerning the vertical positioning of objects. Lead, total and middle are only enough to handle vertical-align = top, bottom or middle; anyway, maybe three attributes could be enough: one identifying the alignment baseline (alphabetic, ideographic, text-before-edge, ...) and two specifying the box heigth above and below this baseline. The LineLM should look at these values when creating the lines: each box height will be interpreted differently according to its baseline: I think this will be the tricky part of this work! HTH, even if it' not much :-) Regards Luca
Re: svn commit: r280854 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java
I wrote: Correct handling of the combination of hyphenation and text-align = center, left or right. At the end, I found out that this was not the same problem as bug 36533, but another bug specifically concerning the elements created to represent hyphens. I think that Manuel has been the first person ever testing hyphenation together with non-justified text, thus awaking this sleeping bug! :-) There is still a detail that has not yet been fixed: the correct handling of characters that can be used as break points (for example a / character in the middle of a long url, that could be used as an emergency break). I'm going to fix that too, I just wanted to commit this correction as soon as possible in order to avoid run time exceptions. Regards Luca
Re: svn commit: r280520 - /xmlgraphics/fop/trunk/src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java
I wrote: Factorized the creation of the elements in TextLM: now both getNextKE() and getChangedKE() call the same methods createElementsForASpace() and crateElementsForAWordFragment(). This should definitively solve bug 36533. Besides removing duplicated lines and inconsistencies, I hope this could help making this part of code a little more readable and easily understandable. I'm going to see if these methods can be moved up to the LeafNodeLM, thus being available for all subclasses. Manuel, I hope you don't have to spend a lot of time merging this changes with the work you are doing; I think you could add further parameters to createElementsForASpace(), to pass the variables you need for borders and padding. Regards Luca
Re: fo:inline bpd
Manuel Mall wrote: yes, that is an option. What I am unsure about here is that the children, typically text areas, do not take the line spacing into account when reporting their bpd, that is the usually 10% space above and below the character. So what is the correct bpd for an fo:inline which has text area children: is it just the max bpd of its children or is it max bpd plus any line spacing settings from its parent? Oh, yes, the half-leading trait ... If I understand correctly the specs (4.5 Line areas) this line spacing must be added to the bpd of each inline area too. As it is the same for all inline areas, it could be stored into the LayoutContext by the LineLM. Regards Luca
Re: Space-resolution doesn't work
Jeremias Maerki wrote: I'll start from scratch to come up with a better strategy of implementing these rules. I'll probably start by documenting a few cases in the Wiki and try to develop the right element list for them. After that I'll try to find out who exactly to implement everything. Help is welcome. I think spaces and keeps are quite similar and very connected: in both cases, the constraints can invole formatting objects that are not at the same depth in the tree. So, my idea for handling space resolution is tho have a LM ask its children about their spaces, and create the necessary elements (while at the moment each LM creates elements for its own spaces). For example, if we have this LM tree Outer BlockLM | +++ ||| BlockLM 1BlockLM 2BlockLM 3 | +--+-+ || BlockLM ABlockLM B BlockLM1.getNextKnuthElements() would return to the outer BlockLM only the elements representing its block content, without any space. In order to decide which elements it has to create, the outer BlockLM could have some lines like: (currentChild = BlockLM 1 nextChild = BlockLM 2) space1 = currentChild.getSpaceAfter(); space2 = nextChild.getSpaceBefore(); if (this.mustKeepTogether() || currentChild.mustKeepWithNext() !nextChild.hasBreakBefore() || !currentChild.hasBreakAfter() nextChild.mustKeepWithPrevious) { // there cannot be a break between the two children, createElementsForSpace(resolve(space1, space2, false, false)); } else { // there can be a break between the children createElementsForSpace(resolve(space1, null, false, true), resolve(null, space2, true, false), resolve(space1, space2, false, false)); } where: - the method createElementsForSpace() can have a single space parameter (returning a sequence that has no feasible breaks [1]) or three different spaces parameters (returing a sequence with a feasible break [2]); - resolve takes two spaces and two booleans, signalling if the space will be at the beginning / end of a page (as this affects the resolved space) - getSpaceAfter() would be something like return resolve(this.spaceAfter, lastChild.getSpaceAfter(), false, false); vice-versa, getSpaceBefore would be return resolve(this.spaceBefore, firstChild.getSpaceBefore(), false, false); (a similar mechanism could be used for keeps) but I'm not sure that adding two spaces at a time would always give the same result. Otherwise, we could follow the implementation of keeps, using the LayoutContext to keep track of the spaces met and not yet converted into elements. Regards Luca [1] this would be a simple glue element, preceded by a penalty with value = inf [2] maybe a sequence glue - penalty - glue - box - PENALTY - glue, with glue #1 is the resolved space after block 1 if a break occurs glue #3 is the resolved space before block 2 if a break occurs penalty is a feasible break PENALTY forbids a break glue #3 is the difference between glue #1 + glue #3 and the resolved space if there is no break
Re: svn commit: r279551 - in /xmlgraphics/fop/trunk: src/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java test/layoutengine/testcases/wrapper_text-transform_1.xml
Manuel Mall wrote: this is my code after integrating your patch to add the knuth elements for line end / start border/padding for the common justify=start or end case. What I am getting now is a space at the beginning of each line break!: if (lineStartBAP != 0 || lineEndBAP != 0) { sequence.add (new KnuthGlue(lineEndBAP, 0, 0, new LeafPosition(this, -1), true)); sequence.add (new KnuthPenalty(0, 0, false, new LeafPosition(this, -1), true)); sequence.add (new KnuthGlue(wordSpaceIPD.opt - (lineStartBAP + lineEndBAP), wordSpaceIPD.max - wordSpaceIPD.opt, wordSpaceIPD.opt - wordSpaceIPD.min, new LeafPosition(this, -1), true)); sequence.add (new KnuthInlineBox(0, 0, 0, 0, notifyPos(new LeafPosition(this, -1)), true)); sequence.add (new KnuthPenalty(0, KnuthElement.INFINITE, false, new LeafPosition(this, -1), true)); sequence.add (new KnuthGlue(lineStartBAP, 0, 0, new LeafPosition(this, vecAreaInfo.size() - 1), false)); } else { ... } The LeafPosition(this, vecAreaInfo.size() - 1) (the Position containing the index of the AreaInfo objects storing information about the space) should be the one that is discared if a line break happens: i.e. the second one instead of the third. With this change, this sequence should be correct for a space in justified text. With left- / right-aligned text the overall stretch and shrink of the sequence should not be changed, so the sequence should be: sequence.add (new KnuthGlue(lineEndBAP, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(this, -1), true)); sequence.add (new KnuthPenalty(0, 0, false, new LeafPosition(this, -1), true)); sequence.add (new KnuthGlue(wordSpaceIPD.opt - (lineStartBAP + lineEndBAP), - 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(***, false)); sequence.add (new KnuthInlineBox(0, 0, 0, 0, new LeafPosition(this, -1), true)); sequence.add (new KnuthPenalty(0, KnuthElement.INFINITE, false, new LeafPosition(this, -1), true)); sequence.add (new KnuthGlue(lineStartBAP, 0, 0, new LeafPosition(this, -1), true)); With centered text the combined sequence should be: sequence.add (new KnuthGlue(lineEndBAP, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(this, -1), true)); sequence.add (new KnuthPenalty(0, 0, false, new LeafPosition(this, -1), true)); sequence.add (new KnuthGlue(wordSpaceIPD.opt - (lineStartBAP + lineEndBAP), - 6 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(***, false)); sequence.add (new KnuthInlineBox(0, 0, 0, 0, new LeafPosition(this, -1), true)); sequence.add (new KnuthPenalty(0, KnuthElement.INFINITE, false, new LeafPosition(this, -1), true)); sequence.add (new KnuthGlue(lineStartBAP, 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH, 0, new LeafPosition(this, -1), true)); The Position marked *** should be a LeafPosition(this, vecAreaInfo.size() - 1); as it is in the element more connected with the real space (if this element is ignored, the space is too) maybe it is this one that must be notified. [from your other message] I am also unsure what the correct knuth element sequences are in the case of the forced line break and for hyphenation. A forced line break should not be very different from the real end of the inline, so I think it should be enough to add a box/glue element (according to the conditionality [1]) whose width is lineEndBAP before the penalty. In this case, the next returned sequence should start with the elements for the initial border and padding. As per the hyphenation, I think we could use the same sequence created for a space (according to the alignment), but with the first penalty (the second element) having the width of the -. While answering your message I noticed that there are some inconstitencies in the TextLM: for example, the LineLM.DEFAULT_SPACE_WIDTH is not used everywhere it should ... I'll try and find some time to fix them. I hope I did not answer you too late, otherwise ... tomorrow is another day :-) The time difference between Italy and Australia can hinder communication! Regards Luca [1] in effects, as a preserved
Re: e-g with padding and borders
Manuel Mall wrote: Next problem: border conditionality - how do I model that with the Knuth approach? At the time I add the Border/Padding start/end boxes we don't have line breaks so they really only cover the .conditionality=discard case. How do I tell the algorithm to leave enough space at the end of each line (and the beginning of the next line) for the borders (in the case of .conditionality=retain)? The sequence of elements representing the inline content starts and ends with a box [1]. Adding another box at the beginning and at the end of the sequence implements retain, as a line break is never allowed to separate two adjacent boxes: so, the left border and padding will always be in same line as the first piece of content, and the breaking algorithm will always reserve enough space. In order to implement discard, glue elements must be used instead: these elements are discarded if they are chosen as a line break or they are adjacent to a line break, and in this case borders and padding will not be painted. I think that a single box or glue element could be created, representing both border and padding, unless the conditionalities of these properties can be different: for example, if it were possible to have border-start.conditionality = discard and padding-start.conditionality = retain two distinct elements should necessarily be created. Regards Luca [1] Or, better, everything should work well if the first and last elements are boxes. Should there be spaces at the beginning and at the end of the inline having borders, they should be handled as non-breaking spaces, in order to avoid a break between the start border and the first word, or between the last word and the end border.
Re: Line LM, Inline LM and LAST_AREA
Manuel Mall wrote: But if we have a long fo:inline stretching multiple lines this seem to give the wrong results from the Inline LM perspective. For example if the fo:inline finishes in the middle of a line followed by more text the Line LM will not set the LAST_AREA flag when calling addAreas on the Inline LM as there are more areas on the line. Therefore the Inline LM thinks its not done with yet although it is and the reverse is true on the first line of a multi-line inline. The LineLM.addAreas() method creates a line at a time (a line for each LineBreakPosition), and asks its children to add their inline areas for the line area being created. It sets the LAST_AREA flag if the child LM is the one that created the last element placed in this line: for each line, there is one and only one child LM that receives a LayoutContext with this flag set, unless there are bugs :-) If the content of an inline is divided among several lines, the method InlineLM.addAreas() will be called once per line, and all the times (but the last) it will have the LAST_AREA flag on. Some time ago there was a thread about a similar subject [1]: the problem, then, was the opposite, i.e. to find out which is the last area generated by a LM, regardless of line breaks. I think there is a bit of ambiguity in the names: at the moment, the LAST_AREA flag signals to a LM that it is adding the last inline area in a line, or the last block area in a page, but this can cause confusion with the is-last area trait described by the specs (4.2.2 Common traits). Maybe we can find out a more significant and univocal name. Regards Luca [1] Markers: Determining the last generated area for a LM, http://nagoya.apache.org/eyebrowse/ReadMsg?listId=63msgNo=11296
Re: e-g with padding and borders
Manuel Mall wrote: These two paragraphs confuse me - sorry. My understanding was: discard = start/end borders/padding only at the start and end of the whole fo:inline retain = as discard plus start/end borders/padding on the start and end of every line the fo:inline spans. Sorry, you are completely right, I did not understand you were referring to the extra borders needed around a line break. What we need is one or more elements whose overall behaviour is this: - they represent a space (or another legal break point) - if they are not used as a break, they behave like a normal space (or like a not-used hyphenation point) - if they are chosen as a break, they must add something both at the end of the line they end, and at the beginning of the next line This is quite similar to the behaviour of the sequence of elements representing a space in a centered text (in the TextLM.getNextKnuthElements() method); so, in this case we could use: 1 glue width = border/padding at the end of the line = A 2 penalty width = 0, value = 0 3 glue width = space.opt - (A + B), stretch = space.max - space.opt shrink = space.opt - space.min 4 box width = 0 5 penalty width = 0, value = infinity 6 glue with = border/padding at the beginning of the line = B so: - element 1 is a legal break point, but it is never chosen as 2 is better - element 2 is a legal break point: if it is chosen, the ending line will reserve a width of A for border and padding, and the next line will reserve a width of B (the glue 3 is discarded) - element 3 is NOT a legal break because of the preceding penalty - element 5 is NOT a legal break because of its value - element 6 is NOT a legal break because of the preceding penalty - is there is no break, the overall width is A + (space.opt - (A + B)) + B = space.opt In order to make all this work, the TextLM should - know that it is working on text with non-conditional borders - combine this sequence with the one it would create in a normal situation Regards Luca
Re: SVG Image cropping/positioning
Richard W. wrote: I'm starting now. I've had to rename inline_block_nested_\#36248.xml to inline_block_nested_bug36248.xml to get the junit task to build. I had to rename that file too; I have win xp. Regards Luca
Re: [Xmlgraphics-fop Wiki] Update of ExtensionPoints by JeremiasMaerki
Speaking of extensions, I'd like to resurrect the layout extensions that were part of the code used to start the Knuth branch, but I want to be sure I'm allowed to do it. The set of extensions (a couple of new properties, and some new value for an existing one) is aimed to give the user more control about the page breaking: in particular, via these extensions it is possible to give the application a list of properties that can be adjusted in order to fill all the available bpd of a region (in addition / substitution to the spaces between blocks [1]). I started writing a wiki page about these extensions on the wiki at http://wiki.apache.org/xmlgraphics-fop/LayoutExtensions (I really should take some time to finish it!). My highest-priority, short-term task is still to fix the behaviour of page-number and page-number-citation, as I think these formatting object must work in the next release: I have almost done, just have to finish handling the case of justified ext. After that, obviously if there are no objections against this, I'd like to spend some time on the extensions, that I'm sure could come in handy for fop-users producing book-style (or report-style) documents. For example, here is a link to a message in the xsl-editors mailing list requesting a feature which is completely equivalent to one of the layout extensions: http://lists.w3.org/Archives/Public/xsl-editors/2005JulSep/0007.html (many thanks to Jeremias for pointing it out to me!). Should I be allowed to keep working on this subject, I could answer him that fop will soon be able to cope with his request. Regards Luca [1] ... which makes me think that I should work on space resolution rules too ... my to-do list keeps growing longer and longer! :-(
Re: FOP Visuals
Jeremias Maerki wrote: For those who don't want to run BatchDiffer themselves, I've uploaded a ZIP full of PNGs, one per layout engine test case combined from output from the PDF, PS and Java2D renderers. Just an idea ... what about an option to have the output from two renderers and the XOR between the two? It could help noticing small differences, in the order of a few points, that could otherwise pass unnoticed. Regards Luca
Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch
Chris Bowditch wrote: + * Conditional space support, i.e. space-before.conditionality=retain Chris, doesn't this work already? As far as I can remember the correct space resolution is still missing, so for example the space-after of a block is not added to the space-after of the following block (they are just appended, and this has some side effects on keeps), but the conditionality should be handled correctly. I have just tested the simplest example possible (just a block with text and a space-before with conditionality = retain) and it seems ok. Regards Luca
Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch
Chris Bowditch wrote: I just knocked up a small test case and although retain is honoured, discard is ignored. I knew it wasn't quite yet working but didn't realise retain was working :) I'll update the Wiki. Could you please also attach your file? I have tested a simple sequence of blocks with conditional spaces and the output seems ok; the output of the testcase space-block2.xml seems correct too (I'm going to add checks). Maybe I forgot to fix some LM. Regards Luca
Re: [Xmlgraphics-fop Wiki] Update of ReleasePlanFirstPR by ChrisBowditch
Chris Bowditch wrote: Here is the sample: Thanks! I have tested a simple sequence of blocks with conditional spaces and the output seems ok; the output of the testcase space-block2.xml seems correct too (I'm going to add checks). Not true, space-block2.xml does not work. On the second page, there should not be any space between the two paragraphs. I'm no more sure I follow you ... :-) In your example the second block has a conditional space before, but it is not the first son of a reference area (not the first in the page) so I would expect it not to be suppressed. Should all conditional spaces be always suppressed, regardless of their position, what whould be the point in using them? :-) As per the testcase spaces-block2, I similarly think there should be a space between the first and second block on page 2; anyway, in this case the actual behaviour is probably wrong as the space resolution rules (if I understand them correctly) seems to imply that it should be only 10 points. Regards Luca
Re: page-number and page-number-citation problem
J.Pietschmann wrote: Maybe I'm wrong in trying to do so, but I'd like to handle both formatting objects in the same way. If page numbers can be resolved to strings early, it should be done. All the hassle for space readjusting, and perhaps reflowing content, should be reserved for forward references, if only for performance reasons. Sorry, my last message was not very clear (and / or I misunderstood your comments). The point is that the real page numbers are not known until the addAreas() phase, when pages are actually created. The Knuth-style page breaking algorithm gets a representation of a whole page-sequence (or part of it, if there are break conditions) and then computes all the page breaks at once: so, the fo:page-numbers comprised in that page-sequence cannot know in which page they will be placed, and the line breaking is necessarily performed using elements whose width could be just a guess. What I meant when I said that both page-number and page-number-citation should be handled in the same way was this: during the line breaking their real value is equally unknown. Well, to be more precise the value of a page-number is *always* unknown during line breaking, while a page-number-citation could refer to an object in a previous page-sequence, so it could be known: in this case the method PNCLM.get() already returns a TextArea with the real value and its ipd (maybe you were referring to this? this won't be changed at all). [from the other message] - sometimes, when a particularly elegant output is needed, it would really be desirable to have a two-steps algorithm, with line-breaking performed again once the actual width of each object is known. Well, it's not for particular elegant output, it's for the case of having multiple page number citations which point to five digit page numbers in the same line. Real life examples include references to page numbers in roman number format, which easily get into the six character range, and enumerating references in book indices, where the problem is may be amplified as an index is usually set in several narrow columns. Great examples, I did not think of them! I imagine that, should the index be in a page-sequence preceding the ones with the content, the line breaking of it could be really ugly, due to the provisional width of the references. This example is really interesting: in this case, a re-flowing of the index pages could not be able to achieve a better output, should it be performed before the breaking of the page-sequence with the content; and it could be avoided just deferring the breaking of this page-sequence, so that the first breaking can already work using the real values for all page-number-citations. If we see each page-sequence as a node, and a page-number-citation as a directed edge from one node (the target page-sequence) to another one (the page-sequence containing the page-number-citation), this is a well-known problem: the topological sorting of a graph. If the graph is acyclic then there is a sorting of its nodes such that for each edge going from a node A to a node B, A precedes B in the sorting order; i.e., the page-sequences could be ordered so that each one is flowed when all its page-number-references are already known. Very interesting indeed ... as soon as I finish working on the line-adjusting I'll spend some more thought on this ... (sorry for the long message!) Regards Luca
Re: page-number and page-number-citation problem
J.Pietschmann wrote: In the maintenance branch, the formatted page number string was produced just as a new page was set up. I wonder whether the page sequence LM can put the current page number string into the layout context? This could work for page-numbers but not for page-number-citations, as they could refer to an object in a different (and not yet paginated) page-sequence. Maybe I'm wrong in trying to do so, but I'd like to handle both formatting objects in the same way. Regards Luca
Re: page-number and page-number-citation problem
Firstly, thank you all for your suggestions. All your interesting replies led me to this conclusion: - in most cases, it is enough to make some local adjustments in each line containing page-numbers or page-number-citations; - sometimes, when a particularly elegant output is needed, it would really be desirable to have a two-steps algorithm, with line-breaking performed again once the actual width of each object is known. So, I'll start implementing the general purpose solution, storing the needed information inside an object (rather than directly as new attributes of areas) so as to reduce memory usage. Regards Luca
page-number and page-number-citation problem
There is a layout problem with fo:page-number and fo:page-number-citation, already pointed out but still unresolved. I think, these formatting objects are very similar, even if their actual handling is quite different: they both must be replaced by an information (a page number) that is (or could be) not available during the line breaking, so that a provisional width is used instead of the real one during the creation of the elements. The method PageNumberCitationLM.get() allocates the width of the string MMM if the id is not already known; PageNumberLM.get() calls getCurrentPV().getPageNumberString(), but, as pagination is performed later, it always get the page-sequence initial page number (I am going to add a testcase showing a situation in which this makes some text overlap). The real number could be known as soon as the pagination for the current page-sequence is done (for a fo:page-number) or even later (if there is a fo:page-number-citation whose referenced object is in a page-sequence following the current one). In both cases, if there is a differnce between the allocated width and the real one, indents and / or adjustment ratios should be re-computed. The computation, in itself, is easy, as the LineLM already has all the necessary information: line width, unadjusted width, available stretch and shrink. The point is that this information is stored in the LineBreakPositions, while the actual value (and the actual width) is set directly into the area tree. In order to adjust the inline content of a line when the page number is resolved, I see two alternative strategies: 1) the LineLM has to handle this: this needs the LineAreas to hold a reference to the LineLM that creates them, and that knows all the needed information; 2) the LineArea has to handle this: this means that the LineArea (and the InlineAreas too) must be given the information about MinOptMax ipd and provisional adjust ratio I don't like 1 very much, because I think the creator LM is not a significant attribute of an area, but 2 involves adding many attributes too (and maybe even less significant!) ... What do you think? Do someone see a different strategy? Regards Luca