It seems that I may have reinvented the wheel (and created an inferior model.)
For a pdf explanation of Lao syllabification check this link http://www.tcllab.org/events/uploads/valaxay-lao.pdf Thank you, Brian Wilson On Wed, Apr 28, 2010 at 5:00 PM, <[email protected]> wrote: > Send tex-hyphen mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://tug.org/mailman/listinfo/tex-hyphen > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of tex-hyphen digest..." > > > Today's Topics: > > 1. Re: tex patterns as lua files (Mojca Miklavec) > 2. Re: tex patterns as lua files (Karl Berry) > 3. Lao Word wrap (Brian Wilson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 27 Apr 2010 15:04:21 +0200 > From: Mojca Miklavec <[email protected]> > To: "About TeX hyphenation patterns." <[email protected]> > Subject: Re: [tex-hyphen] tex patterns as lua files > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=UTF-8 > > On Tue, Apr 27, 2010 at 13:06, Manuel P?gouri?-Gonnard wrote: > > Le 27/04/2010 12:26, Mojca Miklavec a ?crit : > > > >> What I would really like to know before doing the change is: > >> > >> 1.) Which patterns should be default for any other program > >> (javascript, perl etc.) outside of TeX? > > > > I guess it's usenglishmax. The Knuthian version matters mainly in the > > nearly-frozen part of the TeX world. > > OK. If others agree ... > > >> 2.) Do you need/want (two questions) Knuth's hyphen.tex patterns in > >> "plain" format as well? > >> > > One could always special-case english since we're going to do it at some > > points anyway, but it would be a bit more easy for us if everything is > > uniform. > > > > While we're at it, there's also a few other hyphenation files that are > not > > in the normal form hypf-XX.tex + loadhyph-XX.tex + all the nice txt files > > you kindly prepared for us. Some are from hyphen-base, namely > dumyhyph.tex > > and zerohyph.tex. Again, we can special-case them in our code, our you > can > > provide .txt version of them (with an entry in languages.lua.dat) we > would > > make a bit more work for you but would result in a cleaner code for > loading. > > > > It's mainly up to you to evaluate if you think those files belong to > > texhyphen or not. I don't mind doing the little additional Lua & TeX > coding > > to treat them specially if needed. (Actually, I already know how I would > do > > it for hyphen.cfg, and I didn't look too closely at etex.src yet but I > know > > it's possible too.) > > As far as dummy and zero are concerned, what do you think about the > idea of creating a separate folder with appropriate txt files for > those two languages? LuaTeX won't care about location and others that > might be willing to use the repository won't have to create special > cases for dummy/zero files in that folder. > > Of course the entry for those two can be added to language.dat.lua. > > As far as > > > (There are also other files that end up being mentioned in TL's full > > language.dat but ae coming from other sources. We (meaning ?lie and I) > need > > to do something about that, but I propose postponing the discussion about > > them, since we're already dicsussing a lot of things at the same time). > > - If you mean arabic and others, it's no problem to add an entry to > that lua file. > - If you mean ibycus, you probably don't want to support it in LuaTeX > - If you mean the Germans with their timestamped patterns, we may > postpone the discussion; in LuaTeX you would probably want to go for a > completly different route than the current approach anyway. > - There are also Javier's ideas about different subsets of patterns in > LuaTeX that we might want to consider. > - And there are some languages that have zillions of versions of > patterns (like Russians etc.). > > Anything else? > > >>> or it can also be done on LaTeX's side, I can > >>> modify the table accordingly. What would be the best? > >> > >> I'll respond once I know the answer to the two questions above. The > >> table will be modified in either case and will include USenglish > >> synonym. The question is only whether we should duplicate hyphen.tex > >> in our repository and if yes, which patterns should take precedence > >> (of having no -x-something extension). The lua table will be modified > >> accordingly from languages.rb database. > >> > > IMO, for the rest of the world, usenglishmax is the canonical version for > US > > english. I guess you want to reflect that in the code/filename by making > it > > en-us, and Knuth patterns en-US-x-knuth-original. > > > > What is sure is, the logical name "english" *must* be the knuthian > patterns > > (= hyphen.tex = en-US-x-knuth-original), usenglish, USenglish and > american > > have to be synonyms of this one, and the logical name "usenglishmax" > needs > > to be ushyphmax.tex (ak en-US in the new codes if you follow my > suggestion). > > > > With current language.dat.lua, "english" points to en-US which is > formerly > > ushyphmax, which means not Knuthian patterns, and that needs to be > changed, > > regardless of what you decide for the rest. > > I fully agree with that. All I wanted to know was how to change that. > > Mojca > > > > ------------------------------ > > Message: 2 > Date: Tue, 27 Apr 2010 22:38:41 GMT > From: [email protected] (Karl Berry) > To: [email protected] > Subject: Re: [tex-hyphen] tex patterns as lua files > Message-ID: <[email protected]> > > patterns/txt (or data or plaintext or raw or ...) > > txt seems like a nice choice here. > > > 1.) Which patterns should be default for any other program > > (javascript, perl etc.) outside of TeX? > I guess it's usenglishmax. > > I don't disagree exactly, but what "other programs" are we talking > about? Or are you talking about use of our patterns in completely > different programs (e.g., FOP)? > > The question is only whether we should duplicate hyphen.tex > > Whether you duplicate hyphen.tex in your repository is a matter for your > convenience. In TeX Live, I think hyphen.tex should remain as part of > hyphen-base. So if you include it, we'll just remove it when importing > into TL (which is no problem to do). > > So again something for Karl: what's the best place for the following > file? > > http://tug.org/svn/texhyphen/branches/luatex/TL/texmf/tex/generic/config/language.dat.lua > > Since the filenames are unique (....lua) it doesn't seem to matter much. > tex/generic/hyph-utf8/luatex/* maybe? Manuel? > > > ------------------------------ > > Message: 3 > Date: Wed, 28 Apr 2010 16:29:54 +0700 > From: Brian Wilson <[email protected]> > To: [email protected], [email protected] > Subject: [tex-hyphen] Lao Word wrap > Message-ID: > <[email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > Attached is a humble attempt at Lao syllabication rules in the hopes for > Lao > integration with TeX. > > I am sending this to the tex-hyphen list, and CCing the xetex list as a > lengthy discussion regarding this subject occurred there during the last > couple of weeks. > > I will be happy to work with the group in tweaking this and running tests. > > Thank you, > > -- > Brian Wilson, Director > Asia-Pacific International University Translation Center > _____________ > > I have a new blog!! http://tc4asia.org/wpblog > > "He hath shewed thee, O man, what is good; and what doth the LORD require > of > thee , but to do justly, and to love mercy, and to walk humbly with thy > God." Micah 6:8 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://tug.org/pipermail/tex-hyphen/attachments/20100428/3b37e1cd/attachment-0001.html > > > -------------- next part -------------- > The following is a brief sketch of the syllabification rules in Lao. My > apologies for not using standard conventions. Feel free to edit. > > On the most basic level of word-wrapping, syllables should never be split. > > Lao syllables consist of > 1. Beginning Consonant (bC) [required] > 2. Secondary Beginning Consonant (sbC) [for consonant clusters] > 3. Vowel (V) [required] > 4. Tone Mark (T) [The order of 3 and 4 can be reversed] > 5. Final Consonant (fC) > 6. Extra Final Consonant (efC) > 7. galan (g) > > ########## > ########## > Consonants and consonant clusters that can begin a syllable. > 1. ? 0E81 > (1) ?? 0E81 + 0EA3 [uncommon] > (2) ?? 0E81 + 0EA5 [uncommon] > (3) ?? 0E81 + 0EA7 > (4) ?? 0E81 + 0EBC [uncommon] > > 2. ? 0E82 > (1) ?? 0E82 + 0EA3 [uncommon] > (2) ?? 0E82 + 0EA5 [uncommon] > (3) ?? 0E82 + 0EA7 > (4) ?? 0E82 + 0EBC [uncommon] > > 3. ? 0E84 > (1) ?? 0E84 + 0EA3 [uncommon] > (2) ?? 0E84 + 0EA5 [uncommon] > (3) ?? 0E84 + 0EA7 > (4) ?? 0E84 + 0EBC [uncommon] > > 4. ? 0E87 > > 5. ? 0E88 > > 6. ? 0E89 > > 7. ? 0E80 > > 8. ? 0E94 > (1) ?? 0E94 + 0EA3 [uncommon] > > 9. ? 0E95 > (1) ?? ?0E95 + 0EA3 [uncommon] > > 10. ? 0E96 > > 11. ? 0E97 > > 12. ? 0E99 > > 13. ? 0E9A > (1) ?? ?0E9A + 0EA3 [uncommon] > (2) ?? 0E9A + 0EA5 [uncommon] > (3) ?? 0E9A + 0EBC [uncommon] > > 14. ? 0E9B > (1) ?? ?0E9B + 0EA3 [uncommon] > (2) ?? 0E9B + 0EA5 [uncommon] > (3) ?? 0E9B + 0EBC [uncommon] > > 15. ? 0E9C > > 16. ? 0E9D > (1) ?? 0E9D + 0EA3 > (2) ?? 0E9D + 0EBC > > 17. ? 0E9E > > 18. ? 0E9F > > 19. ? 0EA1 > > 20. ? 0EA2 > > 21. ? 0EA3 > > 22. ? 0EA5 > > 23. ? 0EA7 > > 24. ? 0EAA > (1) ?? 0E81 + 0EA3 [uncommon] > (2) ?? 0E81 + 0EA5 [uncommon] > (3) ?? 0E81 + 0EA7 > (4) ?? 0E81 + 0EBC [uncommon] > > 25. ? 0EAB > (1) ?? 0EAB + 0E87 > (2) ?? 0EAB + 0E99 [This is uncommon as it has its own > character, see below] > (3) ?? 0EAB + 0E8D > (4) ?? 0EAB + 0EA1 [This is uncommon as it has its own > character, see below] > (5) ?? 0EAB + 0EA3 [uncommon] > (6) ?? 0EAB + 0EA5 > (7) ?? 0EAB + 0EA7 > (8) ?? 0EAB + 0EBC > > 26. ? 0EAD > > 27. ? 0EAE [my mac is rendering this the same as 0EA3, shame on it] > > 28. ? 0EDC > > 29.? 0EDD > > ############ > ############ > Consonants that commonly end a syllable > 1. ? 0E81 > 2. ? 0E87 > 3. ? 0E8D [This is a /y/ and acts as a semivowel in certain > constructions that will be explained later] > 4. ? 0E94 > 5. ? 0EA1 > 6. ? 0E99 > 7. ? 0E9A > 8. ? 0EA7 [This is a /w/ and acts as a semivowel in certain > constructions that will be explained later] > > ############ > ############ > Consonants that could conceivably end a syllable in rare occasions when > transcribing certain foreign words. > > 1. ? 0E82 > 2. ? 0E84 > 3. ? 0E88 > 4. ? 0E89 > 5. ? 0E94 > 6. ? 0E95 > 7. ? 0E96 > 8. ? 0E97 > 9. ? 0E9B > 10. ? 0E9C > 11. ? 0E9D > 12. ? 0E9E > 13. ? 0E9F > 14. ? 0EA1 > 15. ? 0EA3 > 16. ? 0EA5 > 17. ? 0EAA > > ############ > ############ > Consonants that can never end a syllable [unless followed immediately by > the silencer 0ECC] > 1. ? 0EAB > 2. ? 0EA2 > 3. ? 0EAD > 4. ? 0EAE > 5. ? 0EBC > 6. ? 0EDC > 7. ? 0EDD > ############ > ############ > Extra final consonant > In order to type foreign words, Lao adds 0ECC to extra final consonants. > Every consonant but > 1. ? 0EBC > 2. ? 0EDC > 3. ? 0EDD] > are theoretically possible with some more common than others. > > ############ > ############ > Vowels that are written before the beginning consonant [syllable breaks > ALWAYS occur before these characters and NEVER occur after these characters] > 1. ? 0EC0 > 2. ? 0EC1 > 3. ? 0EC2 > 4. ? 0EC3 > 5. ? 0EC4 > > ############ > ############ > Vowels that are written after the beginning consonant [syllable breaks > NEVER occur before these characters. Some vowels in this section and the > proceeding section can be stacked. I can specify if necessary.] > 1. ? 0EB0 > 2. ? 0EB2 > 3. ? 0EB3 [can also be written as 0ECD followed by 0EB2] > 4. ? 0EB4 > 5. ? 0EB5 > 6. ? 0EB6 > 7. ? 0EB7 > 8. ? 0EB8 > 9. ? 0EB9 > 10. ? 0ECD > > ############ > ############ > Vowels that are written between two consonants [syllable breaks NEVER occur > before or after these characters] > > 1. ? 0EB1 [The following character must be a consonant or 0EBD > semi-vowel] > 2. ? 0EBB [The following character must be (an optional T marker) > 1. consonant or 2. ? 0EB2 vowel when used in the /ow/ diphthong ( <0EC0> > <bC> <(sbC)> <0EBB> <(T)> <0EBD>) or 3. ? 0EA7 semi-vowel when used in the > /ua/ diphthong (Note that the ? may be followed by ? 0EB0 for the shortened > version of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)] > > ############ > ############ > Vowels that can't take a final consonants > > 1. ? 0EB0 [syllable break ALWAYS occurs after this character] > 2. ? 0ECD [syllable break ALWAYS occurs after this character or > the optional tone mark immediately following it.] > > > ############ > ############ > /ia/ Vowel and in old orthography /y/ which can replace the final ? 0E8D - > see above > > 1. ? [can NEVER break before. If it is a final /y/, then can break after] > > ############ > ############ > Tones. There are four tone marks that can sit on top of the initial > consonant or on ? ? ? ? ? 0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD (Note > that 0EB5 and 0EB7 also part of diphthongs?see below) Breaks can NEVER occur > before these. > > 1. ? 0EC8 > 2. ? 0EC9 > 3. ? 0ECA > 4. ? 0ECB > > ############ > ############ > The silencer?a mark placed on a consonant rendering it silent. Only used to > write foreign words. Usually placed on the last letter of a syllable, > although it can occur in the middle of a syllable when placed on a ? 0EA3 or > ? 0EA5. A break can NEVER occur before the consonant upon which this > character sits as a consonant containing this character (galan) can not > begin a syllable. > > 1. ? 0ECC > > ############ > ############ > The following punctuation marks can never begin a new line. Also not that > English and French punctuation symbols and rules apply. ( Lao tends to add a > space around punctuation as in French, but not always. ) Quotes can be > with " " or << >> > 1. ? 0EC6 > 2. 0EAF [Sorry, I can't find this on my unmarked mac keyboard] > > ############ > ############ > Vowel Diphthongs. Here is where it gets hairy as three consonant > semi-vowels are involved. [See my explanation at the beginning of this > document. Parentheses refer to optional characters)] > 1. <0EC0> <bC> <(sbC)> <0EB6 or 0EB7> <(T)> <0EAD> <(fC)> [eua > vowel. Note that the beginning consonant is in the middle] > > [Well, that wasn't so bad. I think that the other diphthongs are taken care > of in previous rules and notes.] > > ############ > ############ > Consonants used as vowels between consonants. > > 1. ? 0EA7 > 2. ? 0EAD > > [If ?|? is preceded by a consonant (note optional tone mark) and followed > immediately by a consonant that is not followed by a vowel or tone mark then > consider C(T)?|?C to be a syllable.] > > ############ > ############ > Yeah. The end. > > ------------------------------ > > _______________________________________________ > tex-hyphen mailing list > [email protected] > http://tug.org/mailman/listinfo/tex-hyphen > > > End of tex-hyphen Digest, Vol 21, Issue 10 > ****************************************** > -- Brian Wilson, Director Asia-Pacific International University Translation Center _____________ I have a new blog!! http://tc4asia.org/wpblog "He hath shewed thee, O man, what is good; and what doth the LORD require of thee , but to do justly, and to love mercy, and to walk humbly with thy God." Micah 6:8
