I'm guessing the bad "p-neu-mo-ni-a" may be caused by missing support for LEFTHYPHENMIN and RIGHTHYPHENMIN in the implementation used. >From the top of my head, these are both atleast 2 for english.
kind regards, Martijn van der Lee (developer of the phpSyllable implementation for PHP at https://github.com/vanderlee/phpSyllable). 2015-07-29 10:07 GMT+02:00 Philip Taylor <[email protected]>: > > > Yuri wrote: > > > When I am looking at the algorithm results, I keep seeing a lot of > > inconsistencies. > > > > Original hyphen.tex has some testcases in the end, that are supposedly > > the correct hyphenation points: > > No, these are not test cases; they are explicit hyphenations (i.e., > exceptions) that correct the results that would otherwise be obtained > using only the patterns. > > > But when I run the algorithm with patterns from hyphen.tex, I get these > > results: > > as·so·ci·ate > > as·so·ci·ates > > de·cli·na·tion > > obli·ga·to·ry > > phi·lan·throp·ic > > p·re·sen·t > > p·re·sents > > pro·jec·t > > pro·ject·s > > re·ciproc·i·ty > > rec·og·nizance > > re·for·ma·tion > > re·tri·bu·tion > > table > > Yes, that is exactly the point. Those words are known to be hyphenated > incorrectly using the patterns alone, whence the list of exceptions. > > > Available correct answers from the Merriam-Webster dictionary: > > as·so·ci·ate > > dec·li·na·tion > > oblig·a·to·ry > > phil·an·throp·ic > > pres·ent > > proj·ect > > rec·i·proc·i·ty > > re·cog·ni·zance > > ref·or·ma·tion > > ret·ri·bu·tion > > ta·ble > > TeX gives these break-points for your word list : > > as-so-ciate > as-so-ciates > dec-li-na-tion > oblig-a-tory > phil-an-thropic > present > presents > project > projects > reci-procity > re-cog-ni-zance > ref-or-ma-tion > ret-ri-bu-tion > ta-ble > > Thus there are differences, but it is quite possible that Don Knuth did > not use Merriam-Webster as his authoritative source for hyphenation in > <Am.E>. > > > Additionally, the produced "gen·uine" hyphenation split isn't correct > > (should be " gen·u·ine"), the word "toothache" isn't split at all, and > > "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia"). > > TeX > This is TeX, Version 3.14159265 (TeX Live 2014/W32TeX) (preloaded > format=tex) > **\showhyphens {genuine toothache pneumonia} > > gen-uine toothache pneu-mo-nia > > Thus "pneumonia" is hyphenated correctly, "genuine" arguably so > (depending on whether or not one regards the "u" as syllabic) and > "toothache" is indeed wrong. > > Philip Taylor >
