On Mon, Jun 16, 2008 at 12:51 PM, Norbert Preining wrote: > HI all, > > On So, 15 Jun 2008, Karl Berry wrote: >> 3) The supplied language.def loads only the US English hyphenation >> patterns. Would it be possible to include other hyphenation patterns >> by default (as is done with language.dat)? >> >> Clearly this is the intent, but it takes a lot of effort to keep >> language.dat, fmtutil.cnf, updmap.cfg in sync now. I'm scared to try to >> add another file of the same ilk. > > There is one question with this: Do we have *any* chance to > auto-generate these lines from what is present in > Master/texmf/tex/generic/config/language.*.dat > ?? > > If yes, I could add some magic such that language.def is updated at the > same time as language.dat (and in the same way!).
Norbert, There have been some discussions on http://www.tug.org/mailman/listinfo/tex-hyphen (maybe you should subscire for passive reading?) and there is some effort going on on http://www.tug.org/svn/texhyphen/trunk/. The files are nearly-ready to be submitted to CTAN & included into TeX Live. I don't know the details, but I have a feeling that language.def needs lefthyphenmin & righthyphenmin data which is not available in language.dat. But: I do autogenerate language.foo.dat files for the languages in repository (I need to fix some border cases like: Spanish dat file also lists Catalan, Greek dat file lists all kinds of greek etc.) and I could just as well auto-generate language.foo.def if needed. See TL/texmf/tex/generic/config/; files are generated together with loaders for languages. There are some open questions concerning language.def vs. language.dat: - germans want versioned patterns, so it would be nice to support some versioning - in language.dat there is no information about hyphenmin - it would be nice if language.dat and language.def would be unified (that's what I have heard, I don't know the details and I don't know when each of them is used) The new scheme for loading patterns has the following idea: 1.) language.dat contains proper language codes: uppersorbian loadhyph-hsb.tex swedish loadhyph-sv.tex turkish loadhyph-tr.tex serbian loadhyph-sr-latn.tex serbianc loadhyph-sr-cyrl.tex greek loadhyph-el-polyton.tex =polygreek monogreek loadhyph-el-monoton.tex ancientgreek loadhyph-grc.tex bulgarian loadhyph-bg.tex russian loadhyph-ru.tex ukrainian loadhyph-uk.tex norsk loadhyph-nb.tex =norwegian =bokmal nynorsk loadhyph-nn.tex 2.) loadhyph-foo.tex takes care that: - unicode patterns hyph-foo.tex are loaded for the particular language - for 8-bit engines either the proper UTF-8 to ENCODING is done first and then patterns are loaded (last year the same ugly job in the other direction has been done by xu-hyphfoo.tex wrappers, except that they were full of hacks) - sometimes the conversion cannot be done 1:1, an example for that is Greek with combining accents or German where I do not dare to afford not supporting OT1 encoding; in such cases, the old file is loaded the usual way - sometimes it sets some additional lccodes (apostrophe, dash etc.) 3.) Now the patterns are stored in one place and the knowledge about patterns (such as which encoding they are written in, what catcodes they need) is stored in some other file, so that TeX macros that are needed to handle the patterns are engine-specific I have also written a generator of tlpsrc files for languages, but I need some instructions (from you and Karl) about what should go into those files. I have absolutely no insight into TeX Live tools, but we can coordinate to simplify the things as much as possible. We have taken the effort to "purify" the patterns, what's left to be done is packaging them properly (and add proper copyright notes on top of files). Mojca
