Hi Maksim, > Many thanks for sample ldf-file!
You’re welcome :-) About the consonant clusters, my view was actually that it can be worth including all the possible clusters as unhyphenatable patterns at word boundaries – regardless of where the clusters themselves were found – because it’s still not that many and it makes sense to be a little prudent; for example, is it possible that someone make an abbreviation by stopping the word right after the 3-consonant cluster? Just a thought. > Also I've made some progress in determining if hyphenation in the middle of > дж/дз is allowed. > Here is the script https://github.com/msalau/hyph-be/blob/master/list-dz.py > And output https://github.com/msalau/hyph-be/blob/master/list-dz.txt > I started with empty PATTERNS and added patterns until all words are covered. > There are still 95 words (7 patterns) to be determined, but overall picture > is already clear: > hyphenation is allowed in 579 words (39 patterns) and is prohibited in 1280 > words (69 patterns). > So I can conclude that hyphenation of дж/дз is an exception. > I'll try to find someone to review the list. This is sound. I wouldn’t quite call hyphenation of дж and дз an “exception” as it occurs in one third of the words, but it’s clearly the pragmatic choice to prohibit it by default and allow it for the words where it is allowed. > There is also a alternative and 100% correct way: prohibit hyphenation in the > middle of дж/дз and right before it. > E.g.: 8д8ж 8д8з > This will be valid for all cases :) Of course :-) Best, Arthur
