Hello, Werner,
On Mon, 24 Mar 2025, Werner LEMBERG wrote:
Hello Barbara,
The hyphenation exception list in TUGboat has been accumulating more
and more words used in chemistry and similar fields -- pharmacology,
medicine, etc. [...]
German is heavily compounded, and the hyphenation patterns there
seem to cope well with the situation, although I have no idea how
that is accomplished.
I think the reason for the good behaviour (but certainly not perfect)
with German hyphenation patterns is twofold.
(1) German tends to hyphenate words as done in the original language;
While recent developments in the German orthography support
different hyphenation schemes – based on sound rather than on
etymology – the TeX patterns for German are rather conservative
and follow the etymology almost always.
I'm pretty much convinced that words associated with chemistry, medicine,
and related fields are most easily understood when hyphenated based on
etymology, even though that conflicts with US hyphenation, which is
based largely on pronunciation. Thus trying to develop patterns for
the combination of words in these fields and "ordinary" Ehglish words
could be expected to do damage to the existing US patterns, resulting
in even more exceptions than there are now. That's a major reason why
I think they should be treated separately, at least for English.
(2) The word list on which the German hyphenation patterns are based
on contains quite a few words from the natural sciences (often
tagged in the comments with 'chem.', 'biol.', 'phys.', etc.); due
to Liang's algorithm this helps in hyphenating words that are
similar in structure.
Have a look at our repository:
https://repo.or.cz/wortliste.git
Thanks for the links. I will certainly take a good look.
The main file is
https://repo.or.cz/wortliste.git/blob/HEAD:/wortliste
Its format is explained in
https://repo.or.cz/wortliste.git/blob/HEAD:/dokumente/README.wortliste
(German only, sorry).
I can cope with German reasonably well, but thanks for the heads up.
Hyphenated lists of chemical and pharmacological substances can be
found at
https://repo.or.cz/wortliste.git/blob/HEAD:/zusatzlisten/arzneiwirkstoffnamen
https://repo.or.cz/wortliste.git/blob/HEAD:/zusatzlisten/arzneiwirkstoffnamen-supplement
(note that we don't use these two lists yet for the German hyphenation
patterns).
I'll be interested to see how closely the actual words are spelled
compared to the equivalent words in English. If there's a significant
difference, that would tend to make me think that separate lists of
exceptions are more viable than "universal" patterns.
-- bb
Werner