Hi Simon,
thanks for your reply.
Il giorno 23/ott/08, alle ore 21:43, Simon Pepping ha scritto:
Hi Dario,
This is an interesting study. I need some more time to understand the
implications fully. At first sight you prove that for normal documents
the improvement is small. The paragraphs need to get long before your
strategy makes a difference. This is interesting, however, for long
chapters with many pages, as you mentioned in your earlier email.
ATM I prefer to talk about paragraphs only: in the test I've done
today I saw that for page breaking there is always just one active
node. So it's clear why formatting the xsl-fo recommendation, that is
over 400 pages long but with short para, doesn't get faster. I need to
investigate in this area.
It is clear why long paragraphs make a difference. Why does one- or
two-column layout make a large difference? Simply due to the twice
larger number of pages? I do not understand the left-aligned case. Is
this not just the same as a first-fit layout?
Nice questions... I'm trying to understand this behavior too, the
first time I've implemented the pruning on prototype was for another
reason and I accidentally noticed the performance boost :)
About one or two columns, or better, long or short lines: again, I
don't know why, maybe it's just because the double number of breaks; I
thing I noted is that for the same number of active node with shorter
lines the gap between startLine and endLine is wider than with long
lines. I don't know if this is meaningful.
About left-aligned or justified: with the latter *sometimes* having
threshold=1.0 is enough (I think because of stretchable glues) so
obviously the number of active node is reduced, while the former will
always fall in threshold=20.0 and in force mode (talking about my
tests). Anyway, while I'm not sure short/long lines really makes
difference, it's evident that non justified text produce a lot more of
active nodes than justified ones.
I hope to give you some decent answer in the next days. Precise
answers faster than mine would be also appreciated :P
A more theoretical measurement would be the maximum number of active
nodes.
In stat-nopruning.txt you find the maximum number of active nodes for
each paragraph without pruning (max value), th is threshold and lines
is the line count for the final layout. The last line for each test
file doesn't matter because is referred to page breaking.
Today I developed a kind of auto-activating/regulating pruning: when
the number of active nodes exceeds a threshold (I used 300) the
pruning get activated, and the treeDepth (TD) is chosen as the mean
between startLine and endLine. Initially I was setting TD to
startLine, but then I noticed that in short line the pruning were
activated when startLine was 5 and endLine was 44 (!), so I decided
that the mean was a better choice. I can't explain how it's possible
that the same text can be laid out in 5 short lines (I'm talking about
2 columns in A4) and in 44 lines...
You can find statistics from auto pruning in the other file attached.
I will try to produce accurate graphs that outlines the variables
trend, hoping that will help understanding some behaviors.
Dario
##
# max = max value for activeNodeCount
# sl = startLine
# el = endLine
# line = line number of the node that has exceeded the activeNodeCount
# threshold
# td = the treeDepth to be used
#
## Trasform fo/my_franklin_rep-1blk-2c-jus.fo without pruning
Active pruning max = 301sl = 59 el = 93 line = 66 td = 76
REDUCE pruning max = 338sl = 76 el = 117line = 78 td = 50
findBreakinPoints max = 368 th = 20.0 lines = 544 forced
findBreakinPoints max = 1 th = 1.0lines = 15 forced
30.06 real 7.92 user 0.73 sys
## Trasform fo/my_franklin_rep-1blk-2c.fo without pruning
Active pruning max = 301sl = 5 el = 44 line = 24 td = 24
REDUCE pruning max = 301sl = 30 el = 65 line = 56 td = 16
REDUCE pruning max = 302sl = 30 el = 65 line = 57 td = 10
REDUCE pruning max = 301sl = 35 el = 67 line = 63 td = 6
REDUCE pruning max = 302sl = 35 el = 67 line = 64 td = 4
findBreakinPoints max = 1446th = 20.0 lines = 561 forced
findBreakinPoints max = 1 th = 1.0lines = 16 forced
31.04 real 8.07 user 0.74 sys
## Trasform fo/my_franklin_rep-1blk-jus.fo without pruning
findBreakinPoints max = 61 th = 1.0lines = 240
findBreakinPoints max = 1 th = 1.0lines = 7 forced
28.88 real 7.05 user 0.72 sys
## Trasform fo/my_franklin_rep-1blk.fo without pruning
Active pruning max =