[tex-hyphen] Accuracy of the hyphenation algorithm

Yuri Tue, 28 Jul 2015 16:38:26 -0700

When I am looking at the algorithm results, I keep seeing a lot ofinconsistencies.

Original hyphen.tex has some testcases in the end, that are supposedlythe correct hyphenation points:

as-so-ciate
as-so-ciates
dec-li-na-tion
oblig-a-tory
phil-an-thropic
present
presents
project
projects
reci-procity
re-cog-ni-zance
ref-or-ma-tion
ret-ri-bu-tion
ta-ble

But when I run the algorithm with patterns from hyphen.tex, I get theseresults:

as·so·ci·ate
as·so·ci·ates
de·cli·na·tion
obli·ga·to·ry
phi·lan·throp·ic
p·re·sen·t
p·re·sents
pro·jec·t
pro·ject·s
re·ciproc·i·ty
rec·og·nizance
re·for·ma·tion
re·tri·bu·tion
table

Available correct answers from the Merriam-Webster dictionary:
as·so·ci·ate
dec·li·na·tion
oblig·a·to·ry
phil·an·throp·ic
pres·ent
proj·ect
rec·i·proc·i·ty
re·cog·ni·zance
ref·or·ma·tion
ret·ri·bu·tion
ta·ble

Additionally, the produced "gen·uine" hyphenation split isn't correct(should be " gen·u·ine"), the word "toothache" isn't split at all, and"p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").

I tried Hyphenator.js JavaScript implementation(https://github.com/mnater/hyphenator) with pattern set from hyphen.tex,reviewed the algorithm there in detail, and it seems correct. I didn'ttry the Tex implementation.

Franklin Liang paper says that this algorithm almost always producescorrect results.

So how to explain these discrepancies? Why even the testcases fromhyphen.tex aren't reproducible? Is the algorithm implementation notcorrect? Something is missing?



Yuri

[tex-hyphen] Accuracy of the hyphenation algorithm

Reply via email to