Yuri wrote:
> When I am looking at the algorithm results, I keep seeing a lot of > inconsistencies. > > Original hyphen.tex has some testcases in the end, that are supposedly > the correct hyphenation points: No, these are not test cases; they are explicit hyphenations (i.e., exceptions) that correct the results that would otherwise be obtained using only the patterns. > But when I run the algorithm with patterns from hyphen.tex, I get these > results: > as·so·ci·ate > as·so·ci·ates > de·cli·na·tion > obli·ga·to·ry > phi·lan·throp·ic > p·re·sen·t > p·re·sents > pro·jec·t > pro·ject·s > re·ciproc·i·ty > rec·og·nizance > re·for·ma·tion > re·tri·bu·tion > table Yes, that is exactly the point. Those words are known to be hyphenated incorrectly using the patterns alone, whence the list of exceptions. > Available correct answers from the Merriam-Webster dictionary: > as·so·ci·ate > dec·li·na·tion > oblig·a·to·ry > phil·an·throp·ic > pres·ent > proj·ect > rec·i·proc·i·ty > re·cog·ni·zance > ref·or·ma·tion > ret·ri·bu·tion > ta·ble TeX gives these break-points for your word list : as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory phil-an-thropic present presents project projects reci-procity re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble Thus there are differences, but it is quite possible that Don Knuth did not use Merriam-Webster as his authoritative source for hyphenation in <Am.E>. > Additionally, the produced "gen·uine" hyphenation split isn't correct > (should be " gen·u·ine"), the word "toothache" isn't split at all, and > "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia"). TeX This is TeX, Version 3.14159265 (TeX Live 2014/W32TeX) (preloaded format=tex) **\showhyphens {genuine toothache pneumonia} gen-uine toothache pneu-mo-nia Thus "pneumonia" is hyphenated correctly, "genuine" arguably so (depending on whether or not one regards the "u" as syllabic) and "toothache" is indeed wrong. Philip Taylor