I used patgen for Georgian. Georgian UTF-8 is 3 bytes long. so when patgen asks for Number of chars I use 3*N1, 3*N2 Where N1 and N2 are for one byte chars what patgen expects :) I had no succes with opatgen (unicode patgen)/was unable copile it for windows. This is nice program and library. Unfortunately not updated long time/
this will wotk for Kmer language also (3 bytes long) http://utf8-chartable.de/unicode-utf8-table.pl another way is to convert your words(actually chars) to 1 byte encoding , than use patgen as described and convert generated patterns back to utf-8. But as i mentioned way above worked for me. after that I just converted generated paterns for 1byte encoding for georgian tex (T8M).(this is not necessaary if you dont need hypenation support for 1byte TeX engines) . Those 2 sets of generated patterns work for 1byte and utf-8 respectivelly. pattern loading is handled with hyph-utf8 package > From: [email protected] > Subject: tex-hyphen Digest, Vol 58, Issue 1 > To: [email protected] > Date: Thu, 9 Oct 2014 12:00:01 +0200 > > Send tex-hyphen mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://tug.org/mailman/listinfo/tex-hyphen > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of tex-hyphen digest..." > > > Today's Topics: > > 1. Help with UTF-8 Language (Nathan Wells) > 2. Re: Help with UTF-8 Language (Werner LEMBERG) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 9 Oct 2014 08:12:57 +0700 > From: Nathan Wells <[email protected]> > To: <[email protected]> > Subject: [tex-hyphen] Help with UTF-8 Language > Message-ID: > <cafse7hr6jbm4+r07j50bha6tkvnkvusvbyv8tnf+7eh7d2n...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, > I am not sure if this is the right place to ask, but I am trying to create > hyphenation rules for a UTF-8 language (Khmer). I've tried patgen, but I > can't get it to work (some have said it doesn't support UTF-8?). > > I would like to use the output for Hunspell as well as Tex. > > I've asked this question here as well (and I've included sample data): > http://tex.stackexchange.com/questions/205154/patgen-to-create-hyphenation-dictionary-for-utf-8-language?noredirect=1#comment479516_205154 > > But thought I would ask on this mailing list since it seems there are many > experts here. > > Thanks for your help, > Nathan > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://tug.org/pipermail/tex-hyphen/attachments/20141009/4d36a50a/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Thu, 9 Oct 2014 09:23:02 +0200 > From: Werner LEMBERG <[email protected]> > To: <[email protected]>, <[email protected]> > Subject: Re: [tex-hyphen] Help with UTF-8 Language > Message-ID: <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > > > I am not sure if this is the right place to ask, but I am trying to > > create hyphenation rules for a UTF-8 language (Khmer). I've tried > > patgen, but I can't get it to work (some have said it doesn't > > support UTF-8?). > > > > I would like to use the output for Hunspell as well as Tex. > > > > I've asked this question here as well (and I've included sample data): > > http://tex.stackexchange.com/questions/205154/patgen-to-create-hyphenation-dictionary-for-utf-8-language?noredirect=1#comment479516_205154 > > I've answered it there. > > > Werner > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > tex-hyphen mailing list > [email protected] > http://tug.org/mailman/listinfo/tex-hyphen > > > ------------------------------ > > End of tex-hyphen Digest, Vol 58, Issue 1 > *****************************************
