Bug report for Fop [2008/10/26]

2008-10-27 Thread bugzilla
+---+
| Bugzilla Bug ID   |
| +-+
| | Status: UNC=Unconfirmed NEW=New ASS=Assigned|
| | OPN=ReopenedVER=Verified(Skipped Closed/Resolved)   |
| |   +-+
| |   | Severity: BLK=Blocker CRI=Critical  REG=Regression  MAJ=Major   |
| |   |   MIN=Minor   NOR=NormalENH=Enhancement TRV=Trivial |
| |   |   +-+
| |   |   | Date Posted |
| |   |   |  +--+
| |   |   |  | Description  |
| |   |   |  |  |
| 1063|New|Nor|2001-03-21|fop does not handle large fo files|
| 3280|New|Nor|2001-08-27|PCL Renderer doesn't work |
| 3824|New|Blk|2001-09-25|MIF option with tables|
| 4030|New|Nor|2001-10-08|IOException creating Postscript with graphics on S|
| 4535|New|Maj|2001-10-31|PCL renderer 1.13 not rendering SVG   |
| 5010|New|Enh|2001-11-21|Better error reporting needed |
| 5124|New|Maj|2001-11-27|fo:block-container is not rendered properly using |
| 6305|New|Nor|2002-02-07|Using fo:table-and-caption results in empty output|
| 6427|New|Enh|2002-02-13|Adding additional Type 1 fonts problem|
| 6997|New|Nor|2002-03-09|[PATCH] Row-spanned row data breaks over a page wi|
| 7241|New|Nor|2002-03-19|keep-with-previous, keep-with-next only working on|
| 8003|Ass|Maj|2002-04-12|FopImageFactory never releases cached images  |
| 8463|New|Nor|2002-04-24|SVG clipping in external.fo example doc when rende|
| 8767|Ass|Min|2002-05-03|Image and solid colour background rectangle sizes |
| 9379|New|Nor|2002-05-24|MIF Renderer generates incorrect MIF code |
|10379|New|Enh|2002-07-01|Improvement to FOP Classloader|
|12494|New|Nor|2002-09-10|fop produces pdf file which Acrobat Reader refuses|
|12610|New|Enh|2002-09-13|[PATCH] onLoad Action for PDF documents or how to |
|13586|New|Blk|2002-10-13|fop will not work on linux alpha because jre is br|
|13734|New|Nor|2002-10-17|Hyphenation does not work correctly on long string|
|14248|New|Enh|2002-11-05|51-page FO example, could be added to the samples |
|14352|New|Enh|2002-11-07|It would be nice if FOP could be plugged into popu|
|14356|New|Nor|2002-11-07|*NOT* embedding TrueTypeFont in PDF causes Acrobat|
|14419|New|Enh|2002-11-10|Implement SourceResolver, Image Resolver  |
|14529|New|Nor|2002-11-13|SVG url references do not work|
|14679|New|Enh|2002-11-19|Pluggable renderers   |
|14924|New|Nor|2002-11-28|Table disappears when it ends when the page ends  |
|15020|New|Enh|2002-12-03|Reuse of fo-tree  |
|16017|Opn|Nor|2003-01-13|Jpeg's and the PDF Serializer |
|16130|New|Nor|2003-01-15|PS-Renderer emits lots of redundant moveto's  |
|16156|New|Nor|2003-01-16|0.20.5rc SVG example does not work|
|16713|New|Nor|2003-02-03|Hyphenation error in tables   |
|17369|New|Nor|2003-02-25|Footnote duplication  |
|17380|New|Nor|2003-02-25|Batik Component will not recognize fe SVG elem|
|17921|New|Nor|2003-03-12|Kerning is broken for standard fonts  |
|18292|New|Nor|2003-03-24|24 bit PNG not displayed correctly|
|18801|New|Nor|2003-04-08|visibility property is not implemented  |
|19228|New|Blk|2003-04-22|[PATCH] Child LayoutContext is null in certain cir|
|19341|Ver|Nor|2003-04-26|leader doesn't work since 0.20.5.rc2  |
|19695|New|Enh|2003-05-06|[PATCH] Allow fox:destination as child of fox:outl|
|19717|New|Enh|2003-05-07|Lets add support for JimiClassesPro.zip to build.x|
|19769|Ass|Enh|2003-05-08|Indefinite page size is not implemented   |
|20280|Ass|Enh|2003-05-27|text-align and text-align-last only partially impl|
|20407|New|Enh|2003-06-02|[PATCH] Configure image caching using the configur|
|20827|New|Enh|2003-06-17|Derive other font styles and weights from normal T|
|21265|Opn|Nor|2003-07-02|referencing a custom font (TTF or Adobe Type 1) fo|
|21905|New|Nor|2003-07-26|large list-item-label bleeds into following block |
|21982|New|Maj|2003-07-30|NullPointer Exception in LazyFont with embedded fo|
|22450|New|Maj|2003-08-15|Unterminated iteration in JPEGReader class|
|22627|Opn|Nor|2003-08-21|Update mirror/download page HEADER  README (was [|
|24148|New|Nor|2003-10-27|Kerning upsets text-align=end   |

DO NOT REPLY [Bug 46048] Wrong images used (how to clear image cache?)

2008-10-27 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=46048





--- Comment #21 from M.H. [EMAIL PROTECTED]  2008-10-27 06:11:41 PST ---
Just downloaded the current trunk, built it and get a

I/O exception while reading font cache (org.apache.fop.fonts.EmbedFontInfo;
local class incompatible: stream classdesc serialVersionUID =
-9075848379822693399, local class serialVersionUID = 8755432068669997367).
Discarding font cache file.

Error. As only fop.jar and xmlgraphics-commons.jar changed and double checked
that my classpath contained these, I wonder what else is wrong.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 46048] Wrong images used (how to clear image cache?)

2008-10-27 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=46048





--- Comment #22 from Jeremias Maerki [EMAIL PROTECTED]  2008-10-27 06:44:34 
PST ---
(In reply to comment #21)
 Just downloaded the current trunk, built it and get a
 
 I/O exception while reading font cache (org.apache.fop.fonts.EmbedFontInfo;
 local class incompatible: stream classdesc serialVersionUID =
 -9075848379822693399, local class serialVersionUID = 8755432068669997367).
 Discarding font cache file.
 
 Error. As only fop.jar and xmlgraphics-commons.jar changed and double checked
 that my classpath contained these, I wonder what else is wrong.
 

You can safely ignore that. It just means that the font cache file is being
rebuilt.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


Re: Layouts tree pruning for the prototype

2008-10-27 Thread Vincent Hennebert
Hi Dario,

I’ve finally had the time to look at your patch. IIUC your idea is to do
best-fit on the N but X lines, and total-fit on the final X lines.
I have to say, I’m a bit doubtful of the interest of this approach in
terms of performance and quality, compared to a plain best-fit version
(see below):

Dario Laera wrote:
 Hi Vincent,
 
 I wrote a new patch that implements a kind of ?-fit algorithm at
 page-level. Let N be the page number for the last page layout obtained
 and X a variable representing the depth of the layout tree I want to
 keep: each time N gets increased (and its value is greater than X) I
 perform a pruning of the layout tree choosing as new root the best
 layout for the page with index N-X. Here is an example: the two graphs
 below represents the trees of possible page layouts, in the first tree
 we have N=4. If X was set to 3, it's time to prune the tree by choosing
 the best subtree with a page-1 layout as root. In the example I suppose
 that 1a has lower demerits than 1b, so I prune the tree from 1a.
 Alternatively I can choose the page-1 layout that has the best leaf as
 child (actually not implemented).
 
   0
  / \
*1a* 1b
/  \ |
   /\|
  2a 2b  2c
 /  \   /  \/  \
 3a  3b 3c  3d  3e  3f
 |
 4a
 
 1a
/  \
   /\
  2a 2b
 /  \   /  \
 3a  3b 3c  3d
 |
 4a

In this example (X = 3), total-fit is performed until a first layout for
line 4 is found. Then the best first line is chosen and the tree is cut
accordingly (1b and descendants). When a first layout for line 5 is
found, same again: total-fit is performed on lines 2 to 5, giving the
tree that has 1a as root, then the best second line is chosen and the
rest is cut off (say, 2a and descendants). And so on and so on.

So, roughly speaking, given a paragraph of n lines, you’re doing (n - x)
“local”, x line wide total fits, along with (n - x) best fits. In the
end result, (n - x) lines will have been chosen using the best-fit
method (despite the total fits running in parallel), and the final
x lines using a total fit using line (n - x - 1) as the root.

This sounds like a lot of work for a result that in the end may not look
much better than best-fit. Also, the visual result might look strange,
with the final lines looking more “even” than the starting ones (but
this has to be confirmed on real testcases).

I’m not sure there’s a real-life use case for such an optimization:
people wanting speed will still be less happy than with plain best-fit,
people wanting quality are ready to pay the price for it anyway.

A thing that might be interesting is to regularly cut down with the
number of active nodes, every x lines: once all of the layouts for
line x have been found, select the best active node and discard all the
other ones. Then do that again for line x + x, 3x, 4x, etc. While
similar, it has the advantage that every bunch of x lines will have been
determined by a local total-fit method. In fact the paragraph will be
made of a sum of local optimums, that may actually correspond to the
global optimum or not. But even in that case, I’m not sure this is worth
the additional complexity to a piece of code that’s already well
complicated enough.

Another option might be to set a limit to the amount of consumed memory
(the number of active nodes, which in turn will have an effect on the
processing time). Once the maximal number of active nodes is reached,
start discarding the nodes with highest demerits. But it remains to see
if such a heuristic proves to be efficient, and what limit to set up. As
we can see in your other message, figures may change radically from one
document to the other.

In the end, I would be more inclined to implement a gradation in the
layout quality (best fit, total fit over page sequence, total fit over
document, total fit over document + paragraph line numbers, etc.),
rather than one or several pruning method. I think it should be easier
to implement yet provide enough flexibility to satisfy all kinds of
users.

Sorry if that sounds a bit negative. Since this is all but a simple
topic, I may well have missed the interest of your approach. At any rate
good ideas sometimes emerge from the oddest experiments, so feel free to
continue your investigations...


 The X value determine the trade off between good aesthetic results and
 time/memory performance: an high value should obtain results similar to
 total-fit, a low value, say 1, should obtain the fastest performance.
 
 This approach would allow to finalize (create the areas, render) the
 page that has been chosen as new root and, if possible, to free memory
 that is no more necessary (also FO-tree object and relative LM). This
 possibility apart, it seems to me that in general the new prototype can
 benefit from this approach.
 
 The implementation adds a low overhead in terms of memory and
 computation that is largely countervailed by the effect of pruning. Each
 LM has a 

Choosing a better threshold in line breaking

2008-10-27 Thread Dario Laera

Hi all,

I found the reason why breaking paragraphs into short lines is really  
slow and memory hungry: the threshold of the adjustment ratio, set to  
20 at the last try, is too high and makes EVERY legal breakpoint a  
feasible breakpoint too. A check should be performed to avoid such  
situation and to choose then a better threshold.


For example, if I have a 2 columns in A4 page layout the line width is  
~14. The glue stretchability, as far as I can see in TextLM class,  
is often set to 3 * LineLayoutManager.DEFAULT_SPACE_WIDTH that is  
equal to ~1. When you compute the adj ratio for a line that have  
just one glue you get r = 14/1 = 14, that is lower than the  
threshold = 20.0, so an active node is added.


A better threshold can be chosen as follow: let idealDifference be a  
reasonable size we choose as good threshold. We can assume 3 *  
LineLayoutManager.DEFAULT_SPACE_WIDTH as default stretchability a  
compute a better threshold in that way:


	idealRatio = idealDifference / (3 *  
LineLayoutManager.DEFAULT_SPACE_WIDTH);


and bound that value:

1.0 = idealRatio = 20.

How to choose idealDifference? A naive solution, but probably not so  
bad, can be:


idealDifference = iLineWidth / 2;

A more sophisticated, maybe too much sophisticated, solution can  
choose it by looking at the average box length: we can see how many  
average box can fit a line (wordsPerLine) and execute:


avgWord = avgBox + LineLayoutManager.DEFAULT_SPACE_WIDTH;
idealDifference = iLineWidth - (avgWord * (wordsPerLine / 2));


The test I've run have showed sensible performance improvement.  
Comments are welcome!



Dario




autoThreshold.diff
Description: Binary data

 

Re: Choosing a better threshold in line breaking

2008-10-27 Thread Andreas Delmelle

On Oct 27, 2008, at 18:36, Dario Laera wrote:


Hi all,

I found the reason why breaking paragraphs into short lines is  
really slow and memory hungry: the threshold of the adjustment  
ratio, set to 20 at the last try, is too high and makes EVERY legal  
breakpoint a feasible breakpoint too. A check should be performed  
to avoid such situation and to choose then a better


Well, AFAIU, this is intended, BUT...
This 'last try' should /only/ occur in case the earlier tries did not  
yield any breaks.
IIC, it is really meant as a last resort, and should indeed lead to  
*any* possible break being considered (anything goes at that point).


Do you mean that this last try is /always/ performed (even when we  
already have a set of feasible breaks)?



Cheers

Andreas


Re: Active node tree pruning in FOP trunk

2008-10-27 Thread Vincent Hennebert
Hi,


Dario Laera wrote:
 Hi Simon,
 
 thanks for your reply.
 
 Il giorno 23/ott/08, alle ore 21:43, Simon Pepping ha scritto:
 
 Hi Dario,

 This is an interesting study. I need some more time to understand the
 implications fully. At first sight you prove that for normal documents
 the improvement is small. The paragraphs need to get long before your
 strategy makes a difference. This is interesting, however, for long
 chapters with many pages, as you mentioned in your earlier email.
 
 ATM I prefer to talk about paragraphs only: in the test I've done today
 I saw that for page breaking there is always just one active node. So
 it's clear why formatting the xsl-fo recommendation, that is over 400
 pages long but with short para, doesn't get faster. I need to
 investigate in this area.
 
 It is clear why long paragraphs make a difference. Why does one- or
 two-column layout make a large difference? Simply due to the twice
 larger number of pages? I do not understand the left-aligned case. Is
 this not just the same as a first-fit layout?
 
 Nice questions... I'm trying to understand this behavior too, the first
 time I've implemented the pruning on prototype was for another reason
 and I accidentally noticed the performance boost :)
 About one or two columns, or better, long or short lines: again, I don't
 know why, maybe it's just because the double number of breaks; I thing I
 noted is that for the same number of active node with shorter lines the
 gap between startLine and endLine is wider than with long lines. I don't
 know if this is meaningful.

In left-aligned mode this can be explained by looking at [1]: a line can
be cut if its natural length is between (line-width - 3 sp-width) and
line-width. So, at any break point in the paragraph, where the total
width is w, you can typeset the paragraph on a number of lines between
w / line-width (every line full) and w / (line-width - 3 sp-width)
(every line “empty”). This interval increases when the line width
decreases.

Example: sp-width = 0.1 cm, line-width = 17 cm, w = 1700 cm:
between 100 and 101 lines
with line-width = 8 cm (double column):
between 213 and 220 lines

Obviously the interval increases with the number of lines, but this
becomes unrealistic.

[1] http://wiki.apache.org/xmlgraphics-fop/LineBreaking

 About left-aligned or justified: with the latter *sometimes* having
 threshold=1.0 is enough (I think because of stretchable glues) so
 obviously the number of active node is reduced, while the former will
 always fall in threshold=20.0 and in force mode (talking about my
 tests). Anyway, while I'm not sure short/long lines really makes
 difference, it's evident that non justified text produce a lot more of
 active nodes than justified ones.

In justified mode, every line must fit the line width with quite a low
tolerance. In left-aligned mode the algorithm can be much more liberal,
and a rather big amount of blank space is allowed at the end of the line
in comparison to justify. Hence the higher number of active nodes for
ragged lines.

In justified mode, longer lines should lead to more break opportunities,
so more active nodes. Indeed, on every line there will be more spaces to
play with, giving the content more chance to have enough of
shrinkability or stretchability to fit on the line.
In left-aligned mode this is the other way around: spaces are not
stretchable so having more of them doesn’t help. However, at every break
opportunity a big stretchable space is added, that is likely to make the
content fit. The shorter the line, the more often that space is added,
the more break opportunities it results into.

snip/

[from another post]
 A really strange thing: is it theoretically possible that the amount 
 of active nodes is greater than the number of knuth elements? As you 
 can see in the graph, for short lines there are 14000 active nodes 
 but only ~7000 knuth elements...

Line breaking is a bin-packing problem [2], so the potential number of
active nodes is exponential to the number of words you have in your
paragraph. So, yes, it’s normal to have more active nodes than elements
in the sequence.

[2] http://en.wikipedia.org/wiki/Bin_packing_problem


Vincent


Re: Layouts tree pruning for the prototype

2008-10-27 Thread Edward
Vincent,

How do I remove myself from this list? 

Thanks





From: Vincent Hennebert [EMAIL PROTECTED]
To: fop-dev@xmlgraphics.apache.org
Sent: Monday, October 27, 2008 1:01:43 PM
Subject: Re: Layouts tree pruning for the prototype

Hi Dario,

I’ve finally had the time to look at your patch. IIUC your idea is to do
best-fit on the N but X lines, and total-fit on the final X lines.
I have to say, I’m a bit doubtful of the interest of this approach in
terms of performance and quality, compared to a plain best-fit version
(see below):

Dario Laera wrote:
 Hi Vincent,
 
 I wrote a new patch that implements a kind of ?-fit algorithm at
 page-level. Let N be the page number for the last page layout obtained
 and X a variable representing the depth of the layout tree I want to
 keep: each time N gets increased (and its value is greater than X) I
 perform a pruning of the layout tree choosing as new root the best
 layout for the page with index N-X. Here is an example: the two graphs
 below represents the trees of possible page layouts, in the first tree
 we have N=4. If X was set to 3, it's time to prune the tree by choosing
 the best subtree with a page-1 layout as root. In the example I suppose
 that 1a has lower demerits than 1b, so I prune the tree from 1a.
 Alternatively I can choose the page-1 layout that has the best leaf as
 child (actually not implemented).
 
   0
  / \
*1a* 1b
/  \ |
   /\|
  2a 2b  2c
 /  \   /  \/  \
 3a  3b 3c  3d  3e  3f
 |
 4a
 
 1a
/  \
   /\
  2a 2b
 /  \   /  \
 3a  3b 3c  3d
 |
 4a

In this example (X = 3), total-fit is performed until a first layout for
line 4 is found. Then the best first line is chosen and the tree is cut
accordingly (1b and descendants). When a first layout for line 5 is
found, same again: total-fit is performed on lines 2 to 5, giving the
tree that has 1a as root, then the best second line is chosen and the
rest is cut off (say, 2a and descendants). And so on and so on.

So, roughly speaking, given a paragraph of n lines, you’re doing (n - x)
“local”, x line wide total fits, along with (n - x) best fits. In the
end result, (n - x) lines will have been chosen using the best-fit
method (despite the total fits running in parallel), and the final
x lines using a total fit using line (n - x - 1) as the root.

This sounds like a lot of work for a result that in the end may not look
much better than best-fit. Also, the visual result might look strange,
with the final lines looking more “even” than the starting ones (but
this has to be confirmed on real testcases).

I’m not sure there’s a real-life use case for such an optimization:
people wanting speed will still be less happy than with plain best-fit,
people wanting quality are ready to pay the price for it anyway.

A thing that might be interesting is to regularly cut down with the
number of active nodes, every x lines: once all of the layouts for
line x have been found, select the best active node and discard all the
other ones. Then do that again for line x + x, 3x, 4x, etc. While
similar, it has the advantage that every bunch of x lines will have been
determined by a local total-fit method. In fact the paragraph will be
made of a sum of local optimums, that may actually correspond to the
global optimum or not. But even in that case, I’m not sure this is worth
the additional complexity to a piece of code that’s already well
complicated enough.

Another option might be to set a limit to the amount of consumed memory
(the number of active nodes, which in turn will have an effect on the
processing time). Once the maximal number of active nodes is reached,
start discarding the nodes with highest demerits. But it remains to see
if such a heuristic proves to be efficient, and what limit to set up. As
we can see in your other message, figures may change radically from one
document to the other.

In the end, I would be more inclined to implement a gradation in the
layout quality (best fit, total fit over page sequence, total fit over
document, total fit over document + paragraph line numbers, etc.),
rather than one or several pruning method. I think it should be easier
to implement yet provide enough flexibility to satisfy all kinds of
users.

Sorry if that sounds a bit negative. Since this is all but a simple
topic, I may well have missed the interest of your approach. At any rate
good ideas sometimes emerge from the oddest experiments, so feel free to
continue your investigations...


 The X value determine the trade off between good aesthetic results and
 time/memory performance: an high value should obtain results similar to
 total-fit, a low value, say 1, should obtain the fastest performance.
 
 This approach would allow to finalize (create the areas, render) the
 page that has been chosen as new root and, if possible, to free memory
 that is no more necessary (also FO-tree object and 

Re: Unsubscribe

2008-10-27 Thread Vincent Hennebert
Edward wrote:
 Vincent,
 
 How do I remove myself from this list? 

You have to send a mail to [EMAIL PROTECTED]
with the e-mail address you used to subscribe.
On many mailing lists this information can be retrieved by looking at
the headers of messages coming from the lists; here:
List-Unsubscribe: mailto:[EMAIL PROTECTED]


Vincent


Re: Active node tree pruning in FOP trunk

2008-10-27 Thread Simon Pepping
On Mon, Oct 27, 2008 at 06:00:06PM +, Vincent Hennebert wrote:
 Hi,
 
 
 [from another post]
  A really strange thing: is it theoretically possible that the amount 
  of active nodes is greater than the number of knuth elements? As you 
  can see in the graph, for short lines there are 14000 active nodes 
  but only ~7000 knuth elements...
 
 Line breaking is a bin-packing problem [2], so the potential number of
 active nodes is exponential to the number of words you have in your
 paragraph. So, yes, it???s normal to have more active nodes than elements
 in the sequence.

Each feasible breakpoint can have 4 nodes, one for each line class. If
half of the Knuth elements are legal breakpoints (a simple sequence of
box - glue), then there could be at most 4 * 3500 nodes.

Note that you are using the term active node in the wrong sense,
viz. the set of nodes kept by the program. Active nodes are the subset
of those nodes that participate in the inner loop. Indeed, all nodes
were active at some point.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu


Re: Choosing a better threshold in line breaking

2008-10-27 Thread Dario Laera


Il giorno 27/ott/08, alle ore 18:53, Andreas Delmelle ha scritto:

Do you mean that this last try is /always/ performed (even when we  
already have a set of feasible breaks)?


It's not always performed (so it's formally correct), but in my tests  
it's rarely avoided, more precisely just once, with the file  
my_franklin_rep-jus.fo that is composed of many paragraph in 1  
column with justified text. What I think (obviously, I may be wrong,  
as it has been proved in other mails ;) is that another intermediate  
try, with a judicious threshold, can be performed, leading to the same  
final result but with much better performance, if this intermediate  
try doesn't fail like the previous.
Anyway I always run my tests with hyphenation enabled, I should try  
disabling to see if the second try is run with threshold=5 and if this  
doesn't fails.


Dario