Bug report for Fop [2008/10/26]
+---+ | Bugzilla Bug ID | | +-+ | | Status: UNC=Unconfirmed NEW=New ASS=Assigned| | | OPN=ReopenedVER=Verified(Skipped Closed/Resolved) | | | +-+ | | | Severity: BLK=Blocker CRI=Critical REG=Regression MAJ=Major | | | | MIN=Minor NOR=NormalENH=Enhancement TRV=Trivial | | | | +-+ | | | | Date Posted | | | | | +--+ | | | | | Description | | | | | | | | 1063|New|Nor|2001-03-21|fop does not handle large fo files| | 3280|New|Nor|2001-08-27|PCL Renderer doesn't work | | 3824|New|Blk|2001-09-25|MIF option with tables| | 4030|New|Nor|2001-10-08|IOException creating Postscript with graphics on S| | 4535|New|Maj|2001-10-31|PCL renderer 1.13 not rendering SVG | | 5010|New|Enh|2001-11-21|Better error reporting needed | | 5124|New|Maj|2001-11-27|fo:block-container is not rendered properly using | | 6305|New|Nor|2002-02-07|Using fo:table-and-caption results in empty output| | 6427|New|Enh|2002-02-13|Adding additional Type 1 fonts problem| | 6997|New|Nor|2002-03-09|[PATCH] Row-spanned row data breaks over a page wi| | 7241|New|Nor|2002-03-19|keep-with-previous, keep-with-next only working on| | 8003|Ass|Maj|2002-04-12|FopImageFactory never releases cached images | | 8463|New|Nor|2002-04-24|SVG clipping in external.fo example doc when rende| | 8767|Ass|Min|2002-05-03|Image and solid colour background rectangle sizes | | 9379|New|Nor|2002-05-24|MIF Renderer generates incorrect MIF code | |10379|New|Enh|2002-07-01|Improvement to FOP Classloader| |12494|New|Nor|2002-09-10|fop produces pdf file which Acrobat Reader refuses| |12610|New|Enh|2002-09-13|[PATCH] onLoad Action for PDF documents or how to | |13586|New|Blk|2002-10-13|fop will not work on linux alpha because jre is br| |13734|New|Nor|2002-10-17|Hyphenation does not work correctly on long string| |14248|New|Enh|2002-11-05|51-page FO example, could be added to the samples | |14352|New|Enh|2002-11-07|It would be nice if FOP could be plugged into popu| |14356|New|Nor|2002-11-07|*NOT* embedding TrueTypeFont in PDF causes Acrobat| |14419|New|Enh|2002-11-10|Implement SourceResolver, Image Resolver | |14529|New|Nor|2002-11-13|SVG url references do not work| |14679|New|Enh|2002-11-19|Pluggable renderers | |14924|New|Nor|2002-11-28|Table disappears when it ends when the page ends | |15020|New|Enh|2002-12-03|Reuse of fo-tree | |16017|Opn|Nor|2003-01-13|Jpeg's and the PDF Serializer | |16130|New|Nor|2003-01-15|PS-Renderer emits lots of redundant moveto's | |16156|New|Nor|2003-01-16|0.20.5rc SVG example does not work| |16713|New|Nor|2003-02-03|Hyphenation error in tables | |17369|New|Nor|2003-02-25|Footnote duplication | |17380|New|Nor|2003-02-25|Batik Component will not recognize fe SVG elem| |17921|New|Nor|2003-03-12|Kerning is broken for standard fonts | |18292|New|Nor|2003-03-24|24 bit PNG not displayed correctly| |18801|New|Nor|2003-04-08|visibility property is not implemented | |19228|New|Blk|2003-04-22|[PATCH] Child LayoutContext is null in certain cir| |19341|Ver|Nor|2003-04-26|leader doesn't work since 0.20.5.rc2 | |19695|New|Enh|2003-05-06|[PATCH] Allow fox:destination as child of fox:outl| |19717|New|Enh|2003-05-07|Lets add support for JimiClassesPro.zip to build.x| |19769|Ass|Enh|2003-05-08|Indefinite page size is not implemented | |20280|Ass|Enh|2003-05-27|text-align and text-align-last only partially impl| |20407|New|Enh|2003-06-02|[PATCH] Configure image caching using the configur| |20827|New|Enh|2003-06-17|Derive other font styles and weights from normal T| |21265|Opn|Nor|2003-07-02|referencing a custom font (TTF or Adobe Type 1) fo| |21905|New|Nor|2003-07-26|large list-item-label bleeds into following block | |21982|New|Maj|2003-07-30|NullPointer Exception in LazyFont with embedded fo| |22450|New|Maj|2003-08-15|Unterminated iteration in JPEGReader class| |22627|Opn|Nor|2003-08-21|Update mirror/download page HEADER README (was [| |24148|New|Nor|2003-10-27|Kerning upsets text-align=end |
DO NOT REPLY [Bug 46048] Wrong images used (how to clear image cache?)
https://issues.apache.org/bugzilla/show_bug.cgi?id=46048 --- Comment #21 from M.H. [EMAIL PROTECTED] 2008-10-27 06:11:41 PST --- Just downloaded the current trunk, built it and get a I/O exception while reading font cache (org.apache.fop.fonts.EmbedFontInfo; local class incompatible: stream classdesc serialVersionUID = -9075848379822693399, local class serialVersionUID = 8755432068669997367). Discarding font cache file. Error. As only fop.jar and xmlgraphics-commons.jar changed and double checked that my classpath contained these, I wonder what else is wrong. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
DO NOT REPLY [Bug 46048] Wrong images used (how to clear image cache?)
https://issues.apache.org/bugzilla/show_bug.cgi?id=46048 --- Comment #22 from Jeremias Maerki [EMAIL PROTECTED] 2008-10-27 06:44:34 PST --- (In reply to comment #21) Just downloaded the current trunk, built it and get a I/O exception while reading font cache (org.apache.fop.fonts.EmbedFontInfo; local class incompatible: stream classdesc serialVersionUID = -9075848379822693399, local class serialVersionUID = 8755432068669997367). Discarding font cache file. Error. As only fop.jar and xmlgraphics-commons.jar changed and double checked that my classpath contained these, I wonder what else is wrong. You can safely ignore that. It just means that the font cache file is being rebuilt. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: Layouts tree pruning for the prototype
Hi Dario, I’ve finally had the time to look at your patch. IIUC your idea is to do best-fit on the N but X lines, and total-fit on the final X lines. I have to say, I’m a bit doubtful of the interest of this approach in terms of performance and quality, compared to a plain best-fit version (see below): Dario Laera wrote: Hi Vincent, I wrote a new patch that implements a kind of ?-fit algorithm at page-level. Let N be the page number for the last page layout obtained and X a variable representing the depth of the layout tree I want to keep: each time N gets increased (and its value is greater than X) I perform a pruning of the layout tree choosing as new root the best layout for the page with index N-X. Here is an example: the two graphs below represents the trees of possible page layouts, in the first tree we have N=4. If X was set to 3, it's time to prune the tree by choosing the best subtree with a page-1 layout as root. In the example I suppose that 1a has lower demerits than 1b, so I prune the tree from 1a. Alternatively I can choose the page-1 layout that has the best leaf as child (actually not implemented). 0 / \ *1a* 1b / \ | /\| 2a 2b 2c / \ / \/ \ 3a 3b 3c 3d 3e 3f | 4a 1a / \ /\ 2a 2b / \ / \ 3a 3b 3c 3d | 4a In this example (X = 3), total-fit is performed until a first layout for line 4 is found. Then the best first line is chosen and the tree is cut accordingly (1b and descendants). When a first layout for line 5 is found, same again: total-fit is performed on lines 2 to 5, giving the tree that has 1a as root, then the best second line is chosen and the rest is cut off (say, 2a and descendants). And so on and so on. So, roughly speaking, given a paragraph of n lines, you’re doing (n - x) “local”, x line wide total fits, along with (n - x) best fits. In the end result, (n - x) lines will have been chosen using the best-fit method (despite the total fits running in parallel), and the final x lines using a total fit using line (n - x - 1) as the root. This sounds like a lot of work for a result that in the end may not look much better than best-fit. Also, the visual result might look strange, with the final lines looking more “even” than the starting ones (but this has to be confirmed on real testcases). I’m not sure there’s a real-life use case for such an optimization: people wanting speed will still be less happy than with plain best-fit, people wanting quality are ready to pay the price for it anyway. A thing that might be interesting is to regularly cut down with the number of active nodes, every x lines: once all of the layouts for line x have been found, select the best active node and discard all the other ones. Then do that again for line x + x, 3x, 4x, etc. While similar, it has the advantage that every bunch of x lines will have been determined by a local total-fit method. In fact the paragraph will be made of a sum of local optimums, that may actually correspond to the global optimum or not. But even in that case, I’m not sure this is worth the additional complexity to a piece of code that’s already well complicated enough. Another option might be to set a limit to the amount of consumed memory (the number of active nodes, which in turn will have an effect on the processing time). Once the maximal number of active nodes is reached, start discarding the nodes with highest demerits. But it remains to see if such a heuristic proves to be efficient, and what limit to set up. As we can see in your other message, figures may change radically from one document to the other. In the end, I would be more inclined to implement a gradation in the layout quality (best fit, total fit over page sequence, total fit over document, total fit over document + paragraph line numbers, etc.), rather than one or several pruning method. I think it should be easier to implement yet provide enough flexibility to satisfy all kinds of users. Sorry if that sounds a bit negative. Since this is all but a simple topic, I may well have missed the interest of your approach. At any rate good ideas sometimes emerge from the oddest experiments, so feel free to continue your investigations... The X value determine the trade off between good aesthetic results and time/memory performance: an high value should obtain results similar to total-fit, a low value, say 1, should obtain the fastest performance. This approach would allow to finalize (create the areas, render) the page that has been chosen as new root and, if possible, to free memory that is no more necessary (also FO-tree object and relative LM). This possibility apart, it seems to me that in general the new prototype can benefit from this approach. The implementation adds a low overhead in terms of memory and computation that is largely countervailed by the effect of pruning. Each LM has a
Choosing a better threshold in line breaking
Hi all, I found the reason why breaking paragraphs into short lines is really slow and memory hungry: the threshold of the adjustment ratio, set to 20 at the last try, is too high and makes EVERY legal breakpoint a feasible breakpoint too. A check should be performed to avoid such situation and to choose then a better threshold. For example, if I have a 2 columns in A4 page layout the line width is ~14. The glue stretchability, as far as I can see in TextLM class, is often set to 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH that is equal to ~1. When you compute the adj ratio for a line that have just one glue you get r = 14/1 = 14, that is lower than the threshold = 20.0, so an active node is added. A better threshold can be chosen as follow: let idealDifference be a reasonable size we choose as good threshold. We can assume 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH as default stretchability a compute a better threshold in that way: idealRatio = idealDifference / (3 * LineLayoutManager.DEFAULT_SPACE_WIDTH); and bound that value: 1.0 = idealRatio = 20. How to choose idealDifference? A naive solution, but probably not so bad, can be: idealDifference = iLineWidth / 2; A more sophisticated, maybe too much sophisticated, solution can choose it by looking at the average box length: we can see how many average box can fit a line (wordsPerLine) and execute: avgWord = avgBox + LineLayoutManager.DEFAULT_SPACE_WIDTH; idealDifference = iLineWidth - (avgWord * (wordsPerLine / 2)); The test I've run have showed sensible performance improvement. Comments are welcome! Dario autoThreshold.diff Description: Binary data
Re: Choosing a better threshold in line breaking
On Oct 27, 2008, at 18:36, Dario Laera wrote: Hi all, I found the reason why breaking paragraphs into short lines is really slow and memory hungry: the threshold of the adjustment ratio, set to 20 at the last try, is too high and makes EVERY legal breakpoint a feasible breakpoint too. A check should be performed to avoid such situation and to choose then a better Well, AFAIU, this is intended, BUT... This 'last try' should /only/ occur in case the earlier tries did not yield any breaks. IIC, it is really meant as a last resort, and should indeed lead to *any* possible break being considered (anything goes at that point). Do you mean that this last try is /always/ performed (even when we already have a set of feasible breaks)? Cheers Andreas
Re: Active node tree pruning in FOP trunk
Hi, Dario Laera wrote: Hi Simon, thanks for your reply. Il giorno 23/ott/08, alle ore 21:43, Simon Pepping ha scritto: Hi Dario, This is an interesting study. I need some more time to understand the implications fully. At first sight you prove that for normal documents the improvement is small. The paragraphs need to get long before your strategy makes a difference. This is interesting, however, for long chapters with many pages, as you mentioned in your earlier email. ATM I prefer to talk about paragraphs only: in the test I've done today I saw that for page breaking there is always just one active node. So it's clear why formatting the xsl-fo recommendation, that is over 400 pages long but with short para, doesn't get faster. I need to investigate in this area. It is clear why long paragraphs make a difference. Why does one- or two-column layout make a large difference? Simply due to the twice larger number of pages? I do not understand the left-aligned case. Is this not just the same as a first-fit layout? Nice questions... I'm trying to understand this behavior too, the first time I've implemented the pruning on prototype was for another reason and I accidentally noticed the performance boost :) About one or two columns, or better, long or short lines: again, I don't know why, maybe it's just because the double number of breaks; I thing I noted is that for the same number of active node with shorter lines the gap between startLine and endLine is wider than with long lines. I don't know if this is meaningful. In left-aligned mode this can be explained by looking at [1]: a line can be cut if its natural length is between (line-width - 3 sp-width) and line-width. So, at any break point in the paragraph, where the total width is w, you can typeset the paragraph on a number of lines between w / line-width (every line full) and w / (line-width - 3 sp-width) (every line “empty”). This interval increases when the line width decreases. Example: sp-width = 0.1 cm, line-width = 17 cm, w = 1700 cm: between 100 and 101 lines with line-width = 8 cm (double column): between 213 and 220 lines Obviously the interval increases with the number of lines, but this becomes unrealistic. [1] http://wiki.apache.org/xmlgraphics-fop/LineBreaking About left-aligned or justified: with the latter *sometimes* having threshold=1.0 is enough (I think because of stretchable glues) so obviously the number of active node is reduced, while the former will always fall in threshold=20.0 and in force mode (talking about my tests). Anyway, while I'm not sure short/long lines really makes difference, it's evident that non justified text produce a lot more of active nodes than justified ones. In justified mode, every line must fit the line width with quite a low tolerance. In left-aligned mode the algorithm can be much more liberal, and a rather big amount of blank space is allowed at the end of the line in comparison to justify. Hence the higher number of active nodes for ragged lines. In justified mode, longer lines should lead to more break opportunities, so more active nodes. Indeed, on every line there will be more spaces to play with, giving the content more chance to have enough of shrinkability or stretchability to fit on the line. In left-aligned mode this is the other way around: spaces are not stretchable so having more of them doesn’t help. However, at every break opportunity a big stretchable space is added, that is likely to make the content fit. The shorter the line, the more often that space is added, the more break opportunities it results into. snip/ [from another post] A really strange thing: is it theoretically possible that the amount of active nodes is greater than the number of knuth elements? As you can see in the graph, for short lines there are 14000 active nodes but only ~7000 knuth elements... Line breaking is a bin-packing problem [2], so the potential number of active nodes is exponential to the number of words you have in your paragraph. So, yes, it’s normal to have more active nodes than elements in the sequence. [2] http://en.wikipedia.org/wiki/Bin_packing_problem Vincent
Re: Layouts tree pruning for the prototype
Vincent, How do I remove myself from this list? Thanks From: Vincent Hennebert [EMAIL PROTECTED] To: fop-dev@xmlgraphics.apache.org Sent: Monday, October 27, 2008 1:01:43 PM Subject: Re: Layouts tree pruning for the prototype Hi Dario, I’ve finally had the time to look at your patch. IIUC your idea is to do best-fit on the N but X lines, and total-fit on the final X lines. I have to say, I’m a bit doubtful of the interest of this approach in terms of performance and quality, compared to a plain best-fit version (see below): Dario Laera wrote: Hi Vincent, I wrote a new patch that implements a kind of ?-fit algorithm at page-level. Let N be the page number for the last page layout obtained and X a variable representing the depth of the layout tree I want to keep: each time N gets increased (and its value is greater than X) I perform a pruning of the layout tree choosing as new root the best layout for the page with index N-X. Here is an example: the two graphs below represents the trees of possible page layouts, in the first tree we have N=4. If X was set to 3, it's time to prune the tree by choosing the best subtree with a page-1 layout as root. In the example I suppose that 1a has lower demerits than 1b, so I prune the tree from 1a. Alternatively I can choose the page-1 layout that has the best leaf as child (actually not implemented). 0 / \ *1a* 1b / \ | /\| 2a 2b 2c / \ / \/ \ 3a 3b 3c 3d 3e 3f | 4a 1a / \ /\ 2a 2b / \ / \ 3a 3b 3c 3d | 4a In this example (X = 3), total-fit is performed until a first layout for line 4 is found. Then the best first line is chosen and the tree is cut accordingly (1b and descendants). When a first layout for line 5 is found, same again: total-fit is performed on lines 2 to 5, giving the tree that has 1a as root, then the best second line is chosen and the rest is cut off (say, 2a and descendants). And so on and so on. So, roughly speaking, given a paragraph of n lines, you’re doing (n - x) “local”, x line wide total fits, along with (n - x) best fits. In the end result, (n - x) lines will have been chosen using the best-fit method (despite the total fits running in parallel), and the final x lines using a total fit using line (n - x - 1) as the root. This sounds like a lot of work for a result that in the end may not look much better than best-fit. Also, the visual result might look strange, with the final lines looking more “even” than the starting ones (but this has to be confirmed on real testcases). I’m not sure there’s a real-life use case for such an optimization: people wanting speed will still be less happy than with plain best-fit, people wanting quality are ready to pay the price for it anyway. A thing that might be interesting is to regularly cut down with the number of active nodes, every x lines: once all of the layouts for line x have been found, select the best active node and discard all the other ones. Then do that again for line x + x, 3x, 4x, etc. While similar, it has the advantage that every bunch of x lines will have been determined by a local total-fit method. In fact the paragraph will be made of a sum of local optimums, that may actually correspond to the global optimum or not. But even in that case, I’m not sure this is worth the additional complexity to a piece of code that’s already well complicated enough. Another option might be to set a limit to the amount of consumed memory (the number of active nodes, which in turn will have an effect on the processing time). Once the maximal number of active nodes is reached, start discarding the nodes with highest demerits. But it remains to see if such a heuristic proves to be efficient, and what limit to set up. As we can see in your other message, figures may change radically from one document to the other. In the end, I would be more inclined to implement a gradation in the layout quality (best fit, total fit over page sequence, total fit over document, total fit over document + paragraph line numbers, etc.), rather than one or several pruning method. I think it should be easier to implement yet provide enough flexibility to satisfy all kinds of users. Sorry if that sounds a bit negative. Since this is all but a simple topic, I may well have missed the interest of your approach. At any rate good ideas sometimes emerge from the oddest experiments, so feel free to continue your investigations... The X value determine the trade off between good aesthetic results and time/memory performance: an high value should obtain results similar to total-fit, a low value, say 1, should obtain the fastest performance. This approach would allow to finalize (create the areas, render) the page that has been chosen as new root and, if possible, to free memory that is no more necessary (also FO-tree object and
Re: Unsubscribe
Edward wrote: Vincent, How do I remove myself from this list? You have to send a mail to [EMAIL PROTECTED] with the e-mail address you used to subscribe. On many mailing lists this information can be retrieved by looking at the headers of messages coming from the lists; here: List-Unsubscribe: mailto:[EMAIL PROTECTED] Vincent
Re: Active node tree pruning in FOP trunk
On Mon, Oct 27, 2008 at 06:00:06PM +, Vincent Hennebert wrote: Hi, [from another post] A really strange thing: is it theoretically possible that the amount of active nodes is greater than the number of knuth elements? As you can see in the graph, for short lines there are 14000 active nodes but only ~7000 knuth elements... Line breaking is a bin-packing problem [2], so the potential number of active nodes is exponential to the number of words you have in your paragraph. So, yes, it???s normal to have more active nodes than elements in the sequence. Each feasible breakpoint can have 4 nodes, one for each line class. If half of the Knuth elements are legal breakpoints (a simple sequence of box - glue), then there could be at most 4 * 3500 nodes. Note that you are using the term active node in the wrong sense, viz. the set of nodes kept by the program. Active nodes are the subset of those nodes that participate in the inner loop. Indeed, all nodes were active at some point. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.eu
Re: Choosing a better threshold in line breaking
Il giorno 27/ott/08, alle ore 18:53, Andreas Delmelle ha scritto: Do you mean that this last try is /always/ performed (even when we already have a set of feasible breaks)? It's not always performed (so it's formally correct), but in my tests it's rarely avoided, more precisely just once, with the file my_franklin_rep-jus.fo that is composed of many paragraph in 1 column with justified text. What I think (obviously, I may be wrong, as it has been proved in other mails ;) is that another intermediate try, with a judicious threshold, can be performed, leading to the same final result but with much better performance, if this intermediate try doesn't fail like the previous. Anyway I always run my tests with hyphenation enabled, I should try disabling to see if the second try is run with threshold=5 and if this doesn't fails. Dario