Thank you all for your replies! My programming abilities are quite limited and I realize there aren't many people who need to make hyphenation dictionaries, hence the lack of good Unicode support. But would someone be willing to help with a little more step-by-step help? I am a little confused as how best to map the Khmer Unicode characters to 8-bit values. I think it would be quite useful to post a tutorial of the process once I am done so others can more easily create hyphenation dictionaries for languages that don't have them yet (I have yet to find a good tutorial anywhere). Thanks again for your help, Nathan
On Fri, Oct 10, 2014 at 3:57 AM, Philip Taylor <[email protected]> wrote: > > > Mojca Miklavec wrote: > > > No, the patterns should work just fine with a large alphabet. > > This part I do not understand, Mojca; surely the patterns /define/ the > size of the alphabet, do they not ? If letter <xqqyn> is not in the > patterns, then TeX cannot hyphenate a word containing letter <xqqyn>, > can it ? > > > 2.) The fact that "patgen" is limited, "opatgen" is defunct and > > nobody else stepped up yet to create a new tool in some modern > > programming language with built-in Unicode support (or with some > > modern C(++) libraries) has nothing to do with XeTeX's ability to > > interpret patterns. If we don't have a tool that can generate > > patterns for large alphabets that doesn't mean that XeTeX cannot > > handle such patterns. > > It was, in part, the existence or otherwise of such a tool that > interested me, as well as whether XeTeX could natively handle such > patterns were they to be generatable ... A /very/ quick look at > Patgen.web suggests (to me) that a re-implementation in Perl might be > the fastest way forward (Patgen is run so infrequently that the run-time > overheads of an interpreted language are irrelevant) but I regret I have > too much on my plate at the moment to volunteer to investigate further. > > > The patterns can probably be created with patgen with some ugly > > tweaking (as Jonathan suggested). > > That is indeed seriously ugly. The sooner the whole of the TeX suite > has native UTF-8 clones, the more chance there is of TeX surviving into > the 22nd century, it seems to me. > > ** Phil. >
