When I am looking at the algorithm results, I keep seeing a lot of inconsistencies.

Original hyphen.tex has some testcases in the end, that are supposedly the correct hyphenation points:
as-so-ciate
as-so-ciates
dec-li-na-tion
oblig-a-tory
phil-an-thropic
present
presents
project
projects
reci-procity
re-cog-ni-zance
ref-or-ma-tion
ret-ri-bu-tion
ta-ble

But when I run the algorithm with patterns from hyphen.tex, I get these results:
as·so·ci·ate
as·so·ci·ates
de·cli·na·tion
obli·ga·to·ry
phi·lan·throp·ic
p·re·sen·t
p·re·sents
pro·jec·t
pro·ject·s
re·ciproc·i·ty
rec·og·nizance
re·for·ma·tion
re·tri·bu·tion
table

Available correct answers from the Merriam-Webster dictionary:
as·so·ci·ate
dec·li·na·tion
oblig·a·to·ry
phil·an·throp·ic
pres·ent
proj·ect
rec·i·proc·i·ty
re·cog·ni·zance
ref·or·ma·tion
ret·ri·bu·tion
ta·ble

Additionally, the produced "gen·uine" hyphenation split isn't correct (should be " gen·u·ine"), the word "toothache" isn't split at all, and "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").

I tried Hyphenator.js JavaScript implementation (https://github.com/mnater/hyphenator) with pattern set from hyphen.tex, reviewed the algorithm there in detail, and it seems correct. I didn't try the Tex implementation.

Franklin Liang paper says that this algorithm almost always produces correct results.

So how to explain these discrepancies? Why even the testcases from hyphen.tex aren't reproducible? Is the algorithm implementation not correct? Something is missing?


Yuri

Reply via email to