On Mon, May 3, 2010 at 03:34, Manuel Pégourié-Gonnard wrote: > Hi, > > Now that the format and meaning of language.dat.lua seems stable, it's > probably > time to decide how to handle it. Here is a brief summary of the situation: > > 1. For maximal safety, we use a plain text version of patterns and exceptions > for dynamic loading; for this we need information not contained in > language.(dat|def), namely the nae of the files containing those versions. > 2. This information is looked for in a file language.dat.lua. (a) Languages > without entry in this file are dumped in the format the plain old way > (Knuthian > usenglish is always dumped as \language0). (b) It is possible to disable > dynamic > loading (hence loading at all) of particular language via entries in this > file. > 3. Currently, languages loadable are (a subset of) those declared in > language.{dat,def}. > 4. But in the future, one can imagine languages having an entry > language.dat.lua > only, hence being only dynamically loadable in LuaTeX (macro support yet to be > written, but I have ideas for that, should not be difficult now) without being > dumped in other (non-LuaTeX-based) formats.
I agree with all that. > Now, to the best of my knowledge, entries in language.{dat,def} basically come > from three souces: > (a) package hyphen-base > (b) tex-hyphen (hyph-utf8) > (c) german-x > > The down side is, > when german-x is updated, hyph-utf8 needs to be updated too Whet german-x is updated, they'll probably want to update patterns in hyph-utf8 anyway. > Another possibility is to handle language.dat.lua in the same way we handle > language.{dat,def} in TL currently. It would only require new (optional) > attributes for the AddHyphen postaction, and the code to handle it of course. > Pro: more modular and scalable. Con: needs coding. It's an option, but there's another big con: whenever you'll want some change, you'll have to update tlmgr. I don't think that this is such a great idea. > New attributes would be: patterns=<file with plain text patterns>, > hyphenation=<file with plain text exceptions>, special=<code for special > languages> (optional), and something to determine if the language should go to > language.{dat,def} only, language.dat.lua only, or both. I have some comments about special=<...> It's a bit ugly in my opinion. I would use comment=":some,arbitrary#comment%" (not so extreme of course) and rather additional fields than "special". In particular when the number of options is limited anyway. There is no need to add option to all the languages. You may have optional options and then use something like "empty_patterns"=true only for farsi, arabic and zerohyph (could be other option names). The "special" field seems a bit ugly to me. > An intermediate possibility is to use a monolithic language.dat.lua for now, > since it is readily available, and implement the more modular option later. > (Pro: nothing to do now, con: now would be the best moment for me to implement > that, since later I'll have to remember things first.) The third option would be if your lua scripts would read a database file from german-x. In my opinion that would be best in the long run. In the short term we can add a few more definitions to current language.dat.lua, but in the long run I don't think that that database really belongs to hyph-utf8. But that may be changed later. If we need to support two more files, we can add them, but authors of German patterns might want a different approach (even better luatex support) at some point anyway. Mojca