Jamis Buck <[EMAIL PROTECTED]> writes:

> I know TeX has an algorithm for hyphenating words, and fop
> (Formatting Objects to PDF, on the Apache site) uses the same
> algorithm.  Seeing how it's all open-source, mightn't it be possible
> to port that algorithm to Python and use the same pattern files that
> TeX and fop use?  If that turns out to be possible, then you've
> already got multiple languages taken care of, since I know fop, at
> least, has a many different language hyphenation files.

TeX applies pattern matching to find candidates for hyphenation, and
assigns each of them a badness value.  this means that it will
sometimes stretch the text slightly to get better hyphenation.

short excerpt: "." means word beginning or word end.  higher number
means more desirable placement for hyphenation.  % introduces my
comments.

.co3e           % co-ercion
.co4r           % co-rduroy (this was strange, actually)
.cor5ner        % cor-nerstone
.de4moi         % de-moire

you see that cornerstone could be hyphenated co-rnerstone if the
amount of air in the layout is disrupted by cor-nerstone.

there are rule sets like this for many languages.  the English set
consists of 4400, the Norwegian set is 1500.  clearly not something to
put in the viewer...


Kjetil T.

Reply via email to