Hello, Werner,

On Mon, 24 Mar 2025, Werner LEMBERG wrote:

Hello Barbara,

The hyphenation exception list in TUGboat has been accumulating more
and more words used in chemistry and similar fields -- pharmacology,
medicine, etc.  [...]

German is heavily compounded, and the hyphenation patterns there
seem to cope well with the situation, although I have no idea how
that is accomplished.

I think the reason for the good behaviour (but certainly not perfect)
with German hyphenation patterns is twofold.

(1) German tends to hyphenate words as done in the original language;
   While recent developments in the German orthography support
   different hyphenation schemes – based on sound rather than on
   etymology – the TeX patterns for German are rather conservative
   and follow the etymology almost always.

I'm pretty much convinced that words associated with chemistry, medicine,
and related fields are most easily understood when hyphenated based on
etymology, even though that conflicts with US hyphenation, which is
based largely on pronunciation.  Thus trying to develop patterns for
the combination of words in these fields and "ordinary" Ehglish words
could be expected to do damage to the existing US patterns, resulting
in even more exceptions than there are now.  That's a major reason why
I think they should be treated separately, at least for English.

(2) The word list on which the German hyphenation patterns are based
   on contains quite a few words from the natural sciences (often
   tagged in the comments with 'chem.', 'biol.', 'phys.', etc.); due
   to Liang's algorithm this helps in hyphenating words that are
   similar in structure.

Have a look at our repository:

 https://repo.or.cz/wortliste.git

Thanks for the links.  I will certainly take a good look.

The main file is

 https://repo.or.cz/wortliste.git/blob/HEAD:/wortliste

Its format is explained in

 https://repo.or.cz/wortliste.git/blob/HEAD:/dokumente/README.wortliste

(German only, sorry).

I can cope with German reasonably well, but thanks for the heads up.

Hyphenated lists of chemical and pharmacological substances can be
found at

 https://repo.or.cz/wortliste.git/blob/HEAD:/zusatzlisten/arzneiwirkstoffnamen
 
https://repo.or.cz/wortliste.git/blob/HEAD:/zusatzlisten/arzneiwirkstoffnamen-supplement

(note that we don't use these two lists yet for the German hyphenation
patterns).

I'll be interested to see how closely the actual words are spelled
compared to the equivalent words in English.  If there's a significant
difference, that would tend to make me think that separate lists of
exceptions are more viable than "universal" patterns.
                                                -- bb
   Werner

Reply via email to