Re: Knuth linebreaking questions

2004-12-06 Thread Luca Furini

Finn Bock wrote:

 I tend to read that to mean that word spacing may be pushed beyond the
 specified range by justification. And I would think that unjustified
 alignment still has the option of using the word-spacing range but
 ofcourse has to stay within the range.

I'm not convinced ...
The effect of having left-aligned text and adjustable word-spacing would
be an output in which most lines are justified, but the ones in which the
adjustment ratio would be  1 ... I really don't think this would be
better than having all lines left-aligned! :-)

And if the user sets text-align=left but does not explicitly sets
word-spacing, and the default value is used, and for a lucky coincidence
the algorithm find breaking points involving ratios  1, the output would
show justified lines, instead of the left-aligned lines the user would
have likely expected.

Regards
Luca



Re: Knuth linebreaking questions

2004-12-02 Thread Luca Furini

Finn Bock wrote:

(starting from the second question)
 And why not adjust the spacing within the user specified min/max for
 START and END alignment?

Should the user desire adjusted spaces, wouldn't it be better for him to
specify justified alignment? :-)
Seriously, the recommendation (at 7.16.2 letter-spacing and 7.16.8
word-spacing) states that these spaces may also be influenced by
justification, but says nothing about start and end alignments.

 I'm still not sure why it would be ok to ignore any user specified
 min and max values of 'word-spacing' during START and END alignment.
 If a user specifies a length range, what would the reason be for not
 using it? Perhaps with additional DEFAULT_SPACE_WIDTH.

When alignment is start or end, each space has always its .optimum width,
so there is no need to look at the .minimum and .maximum: the user most
preferred value is already used.
But the knuth algorithm would not work if there were no elements with
adjustable width (glue with stretchability and/or shrinkability); the
actual value used is not very relevant, because the computed adjustment
ratio will not be applied.

 Ok, performance is indeed a fine reason, but IMHO such quality vs.
 speed tradeoffs should eventually be made by the user rather than us.

Simon told the same:

# Note that in TeX such thresholds are user-adjustable parameters. I
# think they should eventually be so in FOP too, for those of us who
# have the most exquisite taste of line layout.

and I think it's a good idea; the algorithm should:

 1 find breaking points without hyphenation
 2 hyphenate
 3 find breaking points with hyphenation
 4 decide which ones are better

and point #4 uses the user-definable threshold; where should this constant
be stored? Inside the code of LineLM or in a configuration file?

Regards
Luca



Re: Knuth linebreaking questions

2004-12-02 Thread Finn Bock
And why not adjust the spacing within the user specified min/max for
START and END alignment?
[Luca]
Should the user desire adjusted spaces, wouldn't it be better for him to
specify justified alignment? :-)
Seriously, the recommendation (at 7.16.2 letter-spacing and 7.16.8
word-spacing) states that these spaces may also be influenced by
justification, but says nothing about start and end alignments.
I tend to read that to mean that word spacing may be pushed beyond the 
specified range by justification. And I would think that unjustified 
alignment still has the option of using the word-spacing range but 
ofcourse has to stay within the range.

I'm still not sure why it would be ok to ignore any user specified
min and max values of 'word-spacing' during START and END alignment.
If a user specifies a length range, what would the reason be for not
using it? Perhaps with additional DEFAULT_SPACE_WIDTH.

When alignment is start or end, each space has always its .optimum width,
so there is no need to look at the .minimum and .maximum: the user most
preferred value is already used.
Is there anything that prevents using a non .optimum value within the 
range if the result is judged to be better (with a lower demerit).

Ok, performance is indeed a fine reason, but IMHO such quality vs.
speed tradeoffs should eventually be made by the user rather than us.

Simon told the same:
# Note that in TeX such thresholds are user-adjustable parameters. I
# think they should eventually be so in FOP too, for those of us who
# have the most exquisite taste of line layout.
and I think it's a good idea; the algorithm should:
 1 find breaking points without hyphenation
 2 hyphenate
 3 find breaking points with hyphenation
 4 decide which ones are better
and point #4 uses the user-definable threshold; where should this constant
be stored? Inside the code of LineLM or in a configuration file?
An extension attribute?
   fo:block fox:knuth-threshold=5 ... /fo:block
I suspect that the other knuth parameters should be specified the same 
way. But it is not a high priority IMO.

regards,
finn


Re: Knuth linebreaking questions

2004-12-02 Thread Simon Pepping
On Thu, Dec 02, 2004 at 12:16:55PM +0100, Finn Bock wrote:
 and point #4 uses the user-definable threshold; where should this constant
 be stored? Inside the code of LineLM or in a configuration file?
 
 An extension attribute?
 
fo:block fox:knuth-threshold=5 ... /fo:block
 
 I suspect that the other knuth parameters should be specified the same 
 way. But it is not a high priority IMO.

It is not a layout specification in the fo file, it is a fine-tuning
of the algorithm applied by a particular FO Processor. It should be in
the user configuration. It may be specified in the configuration file,
or it may be specified by the calling application in the configuration
object FOUserAgent.userConfig. In the configuration file it should be
something like:

line-layout
  hyphenation-threshold5/hyphenation-threshold
  other parameters
/line-layout

FOUserAgent should get appropriate methods to extract the layout
part of the configuration and pass it on to a client class,
e.g. LineLM. Cf. FOUserAgent.getUserRendererConfig().

TeX's terms are pretolerance and tolerance for the two values of
maxAdjustment.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Knuth linebreaking questions

2004-12-01 Thread Simon Pepping
On Tue, Nov 30, 2004 at 07:27:29PM +0100, Luca Furini wrote:
 Finn Bock wrote:
 
  3) What is the reasoning for doing hyphenation only after threshold=1
  fails. Naive common sense tells me that if the user specify hyphenation
  we should do hyphenation before finding line breaks.
 
 Finding hyphenation points is time-expansive (all words must be
 hyphenated, not only the ones near a line's end), the sequence of
 elements becomes longer, there are more feasible breaking points, and a
 line ending with a - is less beautiful; so I thought that if a set of
 breaking points could be find without hyphenation.
 
 I just took the hyphenate property as a suggestion instead of an order! :-)

This is the practice in TeX too. It may be considered as a
satisfactory implementation of hyphenate=true: Take hyphenation into
account, when your line layout algorithm considers it a better
solution to hyphenate these lines. This algorithm does not think it
necessary to try hyphenation when there is a non-hyphenated solution
with an amount of demerits below a certain threshold.

Note that in TeX such thresholds are user-adjustable parameters. I
think they should eventually be so in FOP too, for those of us who
have the most exquisite taste of line layout.
 
 Note that the same algorithm with the same threshold could find a
 different set of breaking points with and without hyphenation, because the
 elements are different. Without hyphenation, spaces could need a little
 higher adjustment, for example.
 
  4) I've compared your code to tex_wrap
  http://oedipus.sourceforge.net/texlib/
  and the main difference is in the way new KnuthNodes are added to the
  active list. Is the BestRecords part of Knuth or is it your own
  invention? Why is it only fitness_class'es in BestRecord that is higher
  then minDemerits + incompatibleFitnessDemerit that is added to
  activeList? Why not all fitness_class'es in BestRecords?
 
 At the moment I don't have the book at hand, but I am quite sure it's
 *not* an invention of mine! :-)
 
 As far as I can remember, the Knuth book uses 4 different variables, named
 C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!)
 and I just created this structure to store them.
 
The algorithm distinguishes four classes of lines: tight, normal,
loose, very loose. When two consecutive lines are not of the same or
of two adjacent classes, it gives a penalty of
incompatibleFitnessDemerit. If the line of class i leading to
breakpoint b does not have an amount of demerits best.getDemerits(i)
which is less than the minimum demerits of all four classes (there is
one best line of each class leading to breakpoint b),
best.getMinDemerits(), plus incompatibleFitnessDemerit, it can never
be selected. The optimization omits it from the list of best
breakpoints. Knuth mentions that it saves him 25% of executions of his
loop, in his computational experiments.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Knuth linebreaking questions

2004-12-01 Thread J.Pietschmann
Finn Bock wrote:
3) What is the reasoning for doing hyphenation only after threshold=1 
fails. Naive common sense tells me that if the user specify hyphenation 
we should do hyphenation before finding line breaks.
The purpose of professional typography and layout is to
assist the reader: provide an easy reading with minimal
distractions. Typographic concepts reflect this. Justified
text makes it easier to identify paragraphs. Unfortunately,
long words may cause word spaces to be stretched into large
white blobs which disrupt reading. Hyphenation is essential
to cut down on space allocated for text justification,
especially for languages which can form arbitrary long
compound words. Hyphenation has of course it's own drawback:
words are mostly identified by the letters at the beginning
and the end, and hyphenation disrupts this. Several lines
ending in hyphenated words may also cause the reader to pick
up the wrong continuation line (that's the reason for having
the hyphenation-ladder-count property). This tradeoff between
using hyphenation in order to avoid visual artefacts and
having lots of hyphenated words disrupting the flow has to be
balanced.
J.Pietschmann


Re: Knuth linebreaking questions

2004-12-01 Thread Finn Bock

1) What is the purpose of 2 glues for a normal space in END and START
alignment:
new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false));
new KnuthPenalty(0, 0, false, , true));
new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true));
[Luca Furini]
The purpose is to give each line (but the last one) the same
stretchability, regardless of the number of spaces in it.
If the penalty is not used (there is no line ending there) the overall
effect of the 2 glues is a 0 stretchability and does not modify the line
total; if the penalty is used (a line ends there) then the stretchability
of the previous glue is added to the line total, which becomes 3 *
wordSpaceIPD.opt because the previous space, as said before, added 0 (the
following glue is suppressed).
In justified text, a line with many spaces can be adjusted in order to be
much shorter, or much longer.
If left-aligned text used the same elements, the algorithm would find the
same breaking points; but this time adjustment ratios are not used, so a
line with many spaces would be too much longer, or too much shorter, than
the other lines.
Using these elements, the algorithm creates lines whose unadjusted width is
quite the same.
Ok, thank you for the explanation.
I'm still not sure why it would be ok to ignore any user specified min 
and max values of 'word-spacing' during START and END alignment. If a 
user specifies a length range, what would the reason be for not using 
it? Perhaps with additional DEFAULT_SPACE_WIDTH.

And why not adjust the spacing within the user specified min/max for 
START and END alignment?


3) What is the reasoning for doing hyphenation only after threshold=1
fails. Naive common sense tells me that if the user specify hyphenation
we should do hyphenation before finding line breaks.
Finding hyphenation points is time-expansive (all words must be
hyphenated, not only the ones near a line's end), the sequence of
elements becomes longer, there are more feasible breaking points, and a
line ending with a - is less beautiful; so I thought that if a set of
breaking points could be find without hyphenation.
I just took the hyphenate property as a suggestion instead of an order! :-)
Note that the same algorithm with the same threshold could find a
different set of breaking points with and without hyphenation, because the
elements are different. Without hyphenation, spaces could need a little
higher adjustment, for example.
Ok, performance is indeed a fine reason, but IMHO such quality vs. speed 
tradeoffs should eventually be made by the user rather than us.

Thank you for taking the time to explain it all in such great detail.
regards,
finn


Knuth linebreaking questions

2004-11-30 Thread Finn Bock
Hi Luca (and others),
I've been trying to get my head around the line breaking code and during
that process some questions has come up. I urge you *not* to take
anything I ask as a sign of criticism or as a request for changes. I 
don't have the Knuth paper where the algorithm is described so perhaps 
the answers would be obvious if I read it.

1) What is the purpose of 2 glues for a normal space in END and START 
alignment:

new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false));
new KnuthPenalty(0, 0, false, , true));
new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true));
and why isn't the min and max of wordspaceIPD used.
2) What does the threshold parameter to findBreakingPoints controll?
It seems to be a performance parameter which control the number of 
active nodes, rather than a quality parameter. Or to frame my question 
differently, if threshold=1 finds a set of breaks, will threshold=5 
always pick the same set of breaks? Or can threshold=5 find a better set 
of breaks?

3) What is the reasoning for doing hyphenation only after threshold=1 
fails. Naive common sense tells me that if the user specify hyphenation 
we should do hyphenation before finding line breaks.

4) I've compared your code to tex_wrap
   http://oedipus.sourceforge.net/texlib/
and the main difference is in the way new KnuthNodes are added to the 
active list. Is the BestRecords part of Knuth or is it your own 
invention? Why is it only fitness_class'es in BestRecord that is higher 
then minDemerits + incompatibleFitnessDemerit that is added to 
activeList? Why not all fitness_class'es in BestRecords?

regards,
finn


Re: Knuth linebreaking questions

2004-11-30 Thread Luca Furini
Finn Bock wrote:

 1) What is the purpose of 2 glues for a normal space in END and START
 alignment:

 new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false));
 new KnuthPenalty(0, 0, false, , true));
 new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true));

The purpose is to give each line (but the last one) the same
stretchability, regardless of the number of spaces in it.

If the penalty is not used (there is no line ending there) the overall
effect of the 2 glues is a 0 stretchability and does not modify the line
total; if the penalty is used (a line ends there) then the stretchability
of the previous glue is added to the line total, which becomes 3 *
wordSpaceIPD.opt because the previous space, as said before, added 0 (the
following glue is suppressed).

In justified text, a line with many spaces can be adjusted in order to be
much shorter, or much longer.
If left-aligned text used the same elements, the algorithm would find the
same breaking points; but this time adjustment ratios are not used, so a
line with many spaces would be too much longer, or too much shorter, than
the other lines.
Using these elements, the algorithm creates lines whose unadjusted width is
quite the same.

 and why isn't the min and max of wordspaceIPD used.

Well, you just made me notice there is a little bug,
LineLayoutManager.DEFAULT_SPACE_WIDTH should be used insted! :-)

It's just a magic number: the point is that every TextLM should use the
same value.

 2) What does the threshold parameter to findBreakingPoints controll?
 It seems to be a performance parameter which control the number of
 active nodes, rather than a quality parameter.
 Or to frame my question
 differently, if threshold=1 finds a set of breaks, will threshold=5
 always pick the same set of breaks? Or can threshold=5 find a better set
 of breaks?

It controls both performance and quality: minimum quality.

If threshold = 1 finds a set of breaks, it is the best possible set of
breaks, because the adjustment ratio of each break is = 1 which means
that spaces and other adjustable objects will not need to be longer than
their .max width.

But with this optimal threshold the algorithm could fail, and find no set
of breaking points; so, a try with a higher threshold must be done.

If with threshold = 1 a set is found, with threshold = 5 the same set
would be found, but it would take more time, because a greater number of
active nodes are used.

 3) What is the reasoning for doing hyphenation only after threshold=1
 fails. Naive common sense tells me that if the user specify hyphenation
 we should do hyphenation before finding line breaks.

Finding hyphenation points is time-expansive (all words must be
hyphenated, not only the ones near a line's end), the sequence of
elements becomes longer, there are more feasible breaking points, and a
line ending with a - is less beautiful; so I thought that if a set of
breaking points could be find without hyphenation.

I just took the hyphenate property as a suggestion instead of an order! :-)

Note that the same algorithm with the same threshold could find a
different set of breaking points with and without hyphenation, because the
elements are different. Without hyphenation, spaces could need a little
higher adjustment, for example.

 4) I've compared your code to tex_wrap
 http://oedipus.sourceforge.net/texlib/
 and the main difference is in the way new KnuthNodes are added to the
 active list. Is the BestRecords part of Knuth or is it your own
 invention? Why is it only fitness_class'es in BestRecord that is higher
 then minDemerits + incompatibleFitnessDemerit that is added to
 activeList? Why not all fitness_class'es in BestRecords?

At the moment I don't have the book at hand, but I am quite sure it's
*not* an invention of mine! :-)

As far as I can remember, the Knuth book uses 4 different variables, named
C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!)
and I just created this structure to store them.

I'll try and find some time to look at this ...

Thanks for your interest and your comments, they are most welcome!

Regards
Luca





Re: Knuth linebreaking questions

2004-11-30 Thread Glen Mazza
[Finn]
3) What is the reasoning for doing hyphenation only after threshold=1
fails. Naive common sense tells me that if the user specify hyphenation
we should do hyphenation before finding line breaks.
   

 

[Luca]
Finding hyphenation points is time-expansive (all words must be
hyphenated, not only the ones near a line's end), the sequence of
elements becomes longer, there are more feasible breaking points, and a
line ending with a - is less beautiful; so I thought that if a set of
breaking points could be find without hyphenation.
 

I've just started to read Knuth's chapter on breaking paragraphs into 
lines, and from what I've read, he considers excessive hyphenation a bad 
form.  The main benefits he gives for taking the entire paragraph into 
account when deciding where to break lines (as opposed to the more 
traditional just-look-at-the-current-line analysis) are a reduced need 
for hyphenation and a reduced number of over-spaced lines (i.e., too few 
words on a line requiring large spaces between them for the line to be 
justified.)

Glen