Yuri wrote:
> When I am looking at the algorithm results, I keep seeing a lot of
> inconsistencies.
>
> Original hyphen.tex has some testcases in the end, that are supposedly
> the correct hyphenation points:
No, these are not test cases; they are explicit hyphenations (i.e.,
exceptions) that correct the results that would otherwise be obtained
using only the patterns.
> But when I run the algorithm with patterns from hyphen.tex, I get these
> results:
> as·so·ci·ate
> as·so·ci·ates
> de·cli·na·tion
> obli·ga·to·ry
> phi·lan·throp·ic
> p·re·sen·t
> p·re·sents
> pro·jec·t
> pro·ject·s
> re·ciproc·i·ty
> rec·og·nizance
> re·for·ma·tion
> re·tri·bu·tion
> table
Yes, that is exactly the point. Those words are known to be hyphenated
incorrectly using the patterns alone, whence the list of exceptions.
> Available correct answers from the Merriam-Webster dictionary:
> as·so·ci·ate
> dec·li·na·tion
> oblig·a·to·ry
> phil·an·throp·ic
> pres·ent
> proj·ect
> rec·i·proc·i·ty
> re·cog·ni·zance
> ref·or·ma·tion
> ret·ri·bu·tion
> ta·ble
TeX gives these break-points for your word list :
as-so-ciate
as-so-ciates
dec-li-na-tion
oblig-a-tory
phil-an-thropic
present
presents
project
projects
reci-procity
re-cog-ni-zance
ref-or-ma-tion
ret-ri-bu-tion
ta-ble
Thus there are differences, but it is quite possible that Don Knuth did
not use Merriam-Webster as his authoritative source for hyphenation in
<Am.E>.
> Additionally, the produced "gen·uine" hyphenation split isn't correct
> (should be " gen·u·ine"), the word "toothache" isn't split at all, and
> "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").
TeX
This is TeX, Version 3.14159265 (TeX Live 2014/W32TeX) (preloaded
format=tex)
**\showhyphens {genuine toothache pneumonia}
gen-uine toothache pneu-mo-nia
Thus "pneumonia" is hyphenated correctly, "genuine" arguably so
(depending on whether or not one regards the "u" as syllabic) and
"toothache" is indeed wrong.
Philip Taylor