Li18nux Locale Name Guideline Public Review
Hi, I found the 2nd public review of Li18nux Locale Name Guideline has started. http://www.hauN.org/ml/b-l-j/a/800/840.html http://www.li18nux.org/subgroups/sa/locnameguide/index.html The page says that comments are welcome until 14 Feb 2002. Any additions from Li18nux insiders? --- Tomohiro KUBOTA [EMAIL PROTECTED] http://www.debian.or.jp/~kubota/ Introduction to I18N http://www.debian.org/doc/manuals/intro-i18n/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
xterm KSYM mode encoding
On Sun, 20 Jan 2002 14:55:17 + Markus Kuhn [EMAIL PROTECTED] wrote: What you could do is either prefix each release with some release indicator symbol, or add for instance 0x20 to a Unicode character to turn it into a release code. Both approaches allow you to use a normal UTF-8 decoder at the receiver's end. Sounds like a good idea but unfortunately Xutf8LookupString, XmbLookupString, and XwcLookupString are not supposed to be used with XKeyReleasedEvents. Apparently it messes up the input context. I've resorted to simple XLookupString and writing post-modifier KeySyms as 4 byte integers. Works for now but I'd like to also provide transparent support for the Linux console and Putty. If I'm going to normalize on something I should at least use unicode to avoid table lookups in the end user code. Ideas would be appreciated. There is no standard for what you want to do, as this is getting very far away from the classic VT100 / ISO 6429 terminal semantics. No matter what you do, it will be your private encoding that isn't compatible with anything else. Make sure that the ESC sequence that you use to activate this private mark/break mode is as long and obscure as possible (at least 10 bytes, but still within the ECMA-48 syntax for ESC sequences!), to minimize that it can ever be sent by accident to the terminal. With a hint from of Mr. Dickey, I believe the private \E[?1515h and \E[?1515l sequences are appropriate. http://www.ecma.ch/ecma1/STAND/ECMA-048.HTM This looks important :~) Thanks, Mike -- May The Source be with you. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Li18nux Locale Name Guideline Public Review
Tomohiro Kubota wrote: I found the 2nd public review of Li18nux Locale Name Guideline has started. http://www.hauN.org/ml/b-l-j/a/800/840.html http://www.li18nux.org/subgroups/sa/locnameguide/index.html The page says that comments are welcome until 14 Feb 2002. Any additions from Li18nux insiders? Here are a few remarks from my side: 87 All of the fields (i.e. LANGUAGE, TERRITORY, CODESET and MODIFIERS) 88 shall be treated as case sensitive. For users it's quite difficult to remember which field is uppercase or lowercase. It's easy to make a mistake. I don't see a field where there would be any confusion when case is ignored. So why not ignore case? I know this makes it a bit more difficult for developers, but that's a small price to pay for usability. 114 If a two-letter code is not available in ISO 3166-1 for a territory, 115 no standard value is defined for the territory. 116 In order not to conflict with future extension of ISO 3166-1, 117 user/implementation-defined values for the TERRITORY field shall 118 include lowercase letters or consist of more than two letters. Why not always require a non-standard code to be three (or more) letters? That avoids a lot of confusion and will make it easy to detect a non-standard name. -- Tips for aliens in New York: Land anywhere. Central Park, anywhere. No one will care or indeed even notice. -- Douglas Adams, The Hitchhiker's Guide to the Galaxy /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net \\\ ((( Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim ))) \\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org /// -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: xterm KSYM mode encoding
On Mon, 21 Jan 2002, Michael B Allen wrote: With a hint from of Mr. Dickey, I believe the private \E[?1515h and \E[?1515l sequences are appropriate. the pointer which Paul Williams gave (for key position mode) is probably more promising. -- T.E.Dickey [EMAIL PROTECTED] http://invisible-island.net ftp://invisible-island.net -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: xterm KSYM mode encoding
Thomas E. Dickey wrote: On Mon, 21 Jan 2002, Michael B Allen wrote: With a hint from of Mr. Dickey, I believe the private \E[?1515h and \E[?1515l sequences are appropriate. the pointer which Paul Williams gave (for key position mode) is probably more promising. Heh heh, I'll have to correct myself now you've mentioned it! Unfortunately the VT500-series Programmer References have a typo, which I blindly copied. The sequences are CSI ? 81 h and CSI ? 81 l, not CSI 81 h, etc. However, they should not be used to turn on a mode other than KPM. Any custom UTF-8 trickery should use a new mode code. I'd like to hear more from the original poster about the requirement for this encoding, and why it would need to be transparent to Linux console and PuTTY. Without wishing to be rude, I think the problem should be defined before the solution. - Paul -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: xterm KSYM mode encoding
On Mon, 21 Jan 2002, Paul Williams wrote: Thomas E. Dickey wrote: On Mon, 21 Jan 2002, Michael B Allen wrote: With a hint from of Mr. Dickey, I believe the private \E[?1515h and \E[?1515l sequences are appropriate. the pointer which Paul Williams gave (for key position mode) is probably more promising. Heh heh, I'll have to correct myself now you've mentioned it! Unfortunately the VT500-series Programmer References have a typo, which I blindly copied. The sequences are CSI ? 81 h and CSI ? 81 l, not CSI 81 h, etc. good (I was a little surprised to see it as a non-private mode, but wasn't at a good point to research it - I have it as a private mode in vttest: not the first typo in DEC's manuals). However, they should not be used to turn on a mode other than KPM. Any custom UTF-8 trickery should use a new mode code. From what I understand of the request, KPM would be adequate, and UTF-8 is just brought in because he's still learning what is involved. Ultimately, since scan-codes are not really portable in the sense that's requested, I'm not sure how useful any of this is. -- T.E.Dickey [EMAIL PROTECTED] http://invisible-island.net ftp://invisible-island.net -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
setenv LANG de_DE.iso-8859-1@euro setenv LANG DE_de.ISO-8859-1@euro setenv LANG de_DE.Iso-8859-1@EURO Do you think an average user can guess which one of these he has to type? No GUI available! If the average user is having to choose between those 3 possibilities, then presumably those 3 possibilities were presented by some program or included in some list. That program, or that list, should be modified to only give valid possibilities. Edmund -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
On Mon, 21 Jan 2002, Bram Moolenaar wrote: setenv LANG de_DE.iso-8859-1@euro setenv LANG DE_de.ISO-8859-1@euro setenv LANG de_DE.Iso-8859-1@EURO Do you think an average user can guess which one of these he has to type? No GUI available! If the user has to *type* one of those, the system is broken. You can write a menu system, which will list the legitimate choices and ask him which one he wants, in twenty lines of shell script. It will run on any ASCII terminal. There is no need to have X and Tcl/Tk to write interfaces that are more novice-friendly than setenv. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
Henry Spencer wrote: On Mon, 21 Jan 2002, Bram Moolenaar wrote: setenv LANG de_DE.iso-8859-1@euro setenv LANG DE_de.ISO-8859-1@euro setenv LANG de_DE.Iso-8859-1@EURO Do you think an average user can guess which one of these he has to type? No GUI available! If the user has to *type* one of those, the system is broken. You can write a menu system, which will list the legitimate choices and ask him which one he wants, in twenty lines of shell script. It will run on any ASCII terminal. There is no need to have X and Tcl/Tk to write interfaces that are more novice-friendly than setenv. Nice idea. However, that this menu system shell script still has to be made is enough proof that, in practice, people will use setenv. I often use env LANG=locale gvim arguments to test message translations. -- hundred-and-one symptoms of being an internet addict: 33. You name your children Eudora, Mozilla and Dotcom. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net \\\ ((( Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim ))) \\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org /// -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
On Mon, 21 Jan 2002, Bram Moolenaar wrote: You can write a menu system, which will list the legitimate choices and ask him which one he wants, in twenty lines of shell script... Nice idea. However, that this menu system shell script still has to be made is enough proof that, in practice, people will use setenv... For small values of people. :-) Only the experts will. The experts presumably can get the case of a locale name right. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
On Mon, 2002-01-21 at 15:31, Bram Moolenaar wrote: Henry Spencer wrote: On Mon, 21 Jan 2002, Bram Moolenaar wrote: setenv LANG de_DE.iso-8859-1@euro setenv LANG DE_de.ISO-8859-1@euro setenv LANG de_DE.Iso-8859-1@EURO Do you think an average user can guess which one of these he has to type? No GUI available! If the user has to *type* one of those, the system is broken. You can write a menu system, which will list the legitimate choices and ask him which one he wants, in twenty lines of shell script. It will run on any ASCII terminal. There is no need to have X and Tcl/Tk to write interfaces that are more novice-friendly than setenv. Nice idea. However, that this menu system shell script still has to be made is enough proof that, in practice, people will use setenv. I often use env LANG=locale gvim arguments to test message translations. Try locale -a. It'll give you a list of valid locales, and aliases. It reads its contents from /etc/locale.aliases Over on debian-devel, I've been proposing changing the locale-gen cmd. in Debian (and similar in other distributions...) to automagically edit locale.aliases to include only aliases to locales that are present on the system (e.g. in /usr/lib/locale). Then locale -a would give you de_DE.UTF-8@euro german fr_FR.UTF-8@euro french .. env LANG=french gvim args would work ... In principle, I agree though, case sensitive; work should be aimed at making a GUI simple to use, and the CLI consistent and simple. -- hundred-and-one symptoms of being an internet addict: 33. You name your children Eudora, Mozilla and Dotcom. s /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net \\\ ((( Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim ))) \\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org /// -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/ -- Alastair McKinstry [EMAIL PROTECTED] -- -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
Kaixo! On Mon, Jan 21, 2002 at 02:47:48PM +0100, Bram Moolenaar wrote: Usability is provided by GUI selection tools, not by softening syntax specs. The case sensitivity makes a lot of sense as ISO's language and setenv LANG de_DE.iso-8859-1@euro setenv LANG DE_de.ISO-8859-1@euro setenv LANG de_DE.Iso-8859-1@EURO Do you think an average user can guess which one of these he has to type? No GUI available! To use the command line you must be able to read a doc and to copy correctly what it said; you are also supposed to know the command line is case sensitive. The underscore is sufficient to separate the language and region. Upper/lower case doesn't really help me anyway, it's only an extra thing to know. But it's also the way things are done sice ever (or at least as long as I can look at); why to break it and introduce compatibility problems ? If we can agree on case insensitivity, then case differences are not aliases. You can type them any way you like and they would still be the same locale. Note that case insensitivity is locale dependent; by introducing case insensitivity you may have some very strange behaviours, like the locale being recognized when you first define it (as you were on another locale previously), then after the change is done you start getting errors (as the new locale defines new case insensitivity rules and the string that previously was considered the same is not anylonger the same). Also, remmeber the filesystem is still case sensitive; which means that if you introduce case insensitivness for locales naming in variables, you need to change the sources of the libc or other sensitive libraries in order to have the files where locale data is stored found even if what the user request is a different name than the one of the actual directories... The supposed benefits are too small compared to the troubles. -- Ki ça vos våye bén, Pablo Saratxaga http://www.srtxg.easynet.be/PGP Key available, key ID: 0x8F0E4975 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Re: Li18nux Locale Name Guideline Public Review
Kaixo! On Mon, Jan 21, 2002 at 10:05:23AM -0500, Henry Spencer wrote: On Mon, 21 Jan 2002, Bram Moolenaar wrote: setenv LANG de_DE.iso-8859-1@euro setenv LANG DE_de.ISO-8859-1@euro setenv LANG de_DE.Iso-8859-1@EURO Do you think an average user can guess which one of these he has to type? No GUI available! If the user has to *type* one of those, the system is broken. Well, indeed! as none of the three above makes sense at all, there is no possibility to have the euro sign in iso-8859-1 :) -- Ki ça vos våye bén, Pablo Saratxaga http://www.srtxg.easynet.be/PGP Key available, key ID: 0x8F0E4975 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Li18nux Locale Name Guideline Public Review
On Mon, Jan 21, 2002 at 07:18:09PM +0900, Tomohiro KUBOTA wrote: Hi, I found the 2nd public review of Li18nux Locale Name Guideline has started. http://www.hauN.org/ml/b-l-j/a/800/840.html http://www.li18nux.org/subgroups/sa/locnameguide/index.html The page says that comments are welcome until 14 Feb 2002. Starting from the top: This is Linux, not proprietary Unixes. What is the point of not standardizing on the IANA names, especially when you don't standardize on the Unix ones, either? Especially - TCA-Big5 and TCA-BIG5-HKSCS. Lovely names there. It doesn't seem that the authors were very familiar with the IANA names, as BIG5-HKSCS, ISO-8859-13, ISO-8859-15 and TIS-620 are registered. Why all the IBM code pages? glibc currently supports two - 1251 (be_BY, bg_BG) and 1255 (yi_US). Is there anyone who really needs all the others? They are a step backward on Unix. They're missing a bunch of charsets currently in glibc's supported list. glibc does not and will not support VISCII, as it puts graphic characters in the ASCII range. And I'm sure Ulrich Drepper will bite your head off for even asking. As a final note - why does this exist? Linux has a locale standard, in the same way that Perl has a standard - it's called glibc. If you feel compelled to write a formal standard, you have to write one that defines what the standard implementation does. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org When the aliens come, when the deathrays hum, when the bombers bomb, we'll still be freakin' friends. - Freakin' Friends -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [I18n]Li18nux Locale Name Guideline Public Review
Hi, At Mon, 21 Jan 2002 19:18:09 +0900, Tomohiro KUBOTA wrote: I found the 2nd public review of Li18nux Locale Name Guideline has started. http://www.hauN.org/ml/b-l-j/a/800/840.html http://www.li18nux.org/subgroups/sa/locnameguide/index.html One important note. I am not a member of Li18nux. Thus, people who have opinions should write it to Li18nux. The above web page writes how to comment. --- Tomohiro KUBOTA [EMAIL PROTECTED] http://www.debian.or.jp/~kubota/ Introduction to I18N http://www.debian.org/doc/manuals/intro-i18n/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: xterm KSYM mode encoding
On Mon, 21 Jan 2002 12:19:09 + Paul Williams [EMAIL PROTECTED] wrote: I'd like to hear more from the original poster about the requirement for this encoding, and why it would need to be transparent to Linux console and PuTTY. Without wishing to be rude, I think the problem should be defined before the solution. I'm writing a text mode application framework called Terminal MVC that reads an XML screen definition to get a DOM tree (the Model), around which it renders Views much like the Java AWT or the BView class from Beos but with box and line drawing characters, for which the root viewport is the terminal window (the View), and uses NON-neumonic 10 key easy navigation based around Enter, Esc, and cursor keys (the Controller). If a C module is specified in a frame tag it's is dlopen'd and initialized passing the DOM tree onto which event hanlders can be registered. My intention is that it be the easiest way to write a UI for a C program. The model and recusrive composition of frames and basic text are stable enough for me to move onto the Controller. I want to be able to use the Print Screen key to render the current UI on the postscript display device and sent to the printer. For the text component I want to be able to hold the shift key down while I hit the up arrow key to select entire lines of text. To do this I must be able to detect key releases. Terminal MVC will be most useful for Linux configuration applications running on servers, or POS like applications running remotely from cheap PCs. Servers don't have X Windows so I want this to work on the Linux console. Admins like administering there boxes from PuTTY on there Windows workstation too. As for the requirement of the encoding, I do not want to encode keycodes or scancodes as Thomas thought. The original 5 minute hack to xterm that I sent him did this. I want to encode UCS codes corresponding to the KeySym of the keycode. In other words, I want the X server to do as much work for me as possible because it knows all about the users keyboard and custom mappings etc. So I just need to convert the KeySym to Unicode. The Xutf8LookupString is the way you normally do this but it doesn't work with KeyReleasedEvents for some reason. Control keys will require additional representation. All of these UCS key syms will be augmented with an extra bit of information to indicate that the key associated with the UCS code was released as opposed to pressed. I do not believe I need to communicate modifiers although it might be nice if there's space left over. I have tried without success to find information on these DEC terminals Paul speaks of but the client X server on which the end user program is being displayed has already worked out the keyboard layout and other platform specific issues so I don't think this is the direction I want to take anyway. Same thing, for PuTTY, it knows it's operating in a PC environment, so we can just convert to UCS codes there. The point being the clients handle all the portability issues. The end user program can now just use the UCS codes and a few constants for control keys. So ideally, I want to take a keycode and convert it to a KeySym with modifiers applied and then sent UCS codes with the high bit on if it's a release. Of course I don't know for sure if that will work but I wouldn't be asking these fandangled questions here if I did ;-) xterm-165-ksym2.patch and sample ksym.c program attached. Thanks, Mike -- May The Source be with you. xterm-165-ksym2.patch Description: Binary data ksym.c Description: Binary data
Re: Li18nux Locale Name Guideline Public Review
On Tue, Jan 22, 2002 at 12:49:56PM +0900, Stephen J. Turnbull wrote: However, it's important to remember that a bad standard is better than no standard. It is extremely difficult to change a bad standard, it is true. But it's even harder to change no standard, and in the meantime users suffer much more. I'm not sure I agree. A lot of programming languages and a lot of systems have done well without a formal standard - Perl, Python, Fortran prior to 1966. But a bad standard, that's hard to implement or is painful to use, will drive away users and implementers, and discourage the creation of a new standard. Telling the relevant Li18nux/LSB working group Debian has looked at the Li18nux proposal. However, we intend to {use the IANA names, not impose unstandardized names, deprecate IBM code pages to compatibility packages} for these reasons: would be great. The Debian name commands a fair amount of respect because of Debian's continuing commitment to standards, both international and internal. I can't honestly say I speak for Debian. I don't think anybody can honestly say they speak for Debian on this, besides maybe Ben Collins (libc maintainer). It's the whole herd of cats thing. David == David Starner [EMAIL PROTECTED] writes: David Why all the IBM code pages? glibc currently supports two - David 1251 (be_BY, bg_BG) and 1255 (yi_US). What do you mean by support? For code pages, I would say iconv is the relevant functionality. I have no argument with iconv supporting any charset in use. But we're talking about locale charsets, the charsets that every program can be expected to handle, the master charsets for a user. Users should be able to expect that you can send a file from one Linux box to another in the same locale without having to recode it. While this isn't universally true, adding charsets that aren't better then ones already in use doesn't help anything. Furthermore, if possible, a charset should leave C1 free of graphical characters, like ISO-8859-1 and EUC-JP do, and UTF-8 does in a hamhanded way, and must leave C0 free of graphic characters. What I mean by support is that it is included in the list of tested and supported locales (/usr/share/doc/locales/SUPPORTED.gz on my system) - attached to the bottom of this message. David As a final note - why does this exist? Linux has a locale David standard, in the same way that Perl has a standard Aka, why I use Python. :-) Does Python have a formal standard? It would surprise me. David If you feel compelled to write a formal standard, you have David to write one that defines what the standard implementation David does. Note what taking that to extremes implies: forget POSIX, which doesn't describe any real OS. Large parts of POSIX are directly based of existing implemenations. Also, POSIX needed implementing; there were many diverging Unixes. There's one locale implementation used on Linux - glibc's. While that's mostly a joke, there's something important here. And that is that if we stick consistently to the specify, then implement approach, we end up with something workable not so far from where we actually are. This sounds an awful lot like creationism. - Jargon File (4.3.0, 30 APR 2001) [jargon]: creationism n. The (false) belief that large, innovative software designs can be completely specified in advance and then painlessly magicked out of the void by the normal efforts of a team of normally talented programmers. In fact, experience has shown repeatedly that good designs arise only from evolutionary, exploratory interaction between one (or at most a small handful of) exceptionally able designer(s) and an active user population -- and that the first try at a big new idea is always wrong. Unfortunately, because these truths don't fit the planning models beloved of {management}, they are generally ignored. We've had the evolutionary, exploratory ineraction, and now, for the most part, glibc supports the locales and charsets people need. I'm not recommending knuckling under to Emerson's hobgoblin, but I hope Debian will lean toward specifying desiderata (== standards) independently of current implementations, rather than falling into the trap of making the standards overly dependent on the implementations. Why haven't you standardized Emacs yet? What would you do with a Emacs standard that ignored much of the good points of recent Emacsen? IMO, we have a poorly-thought out standard, in an area without multiple implementations and hence the need for a standard. af_ZA ISO-8859-1 ar_AE ISO-8859-6 ar_BH ISO-8859-6 ar_DZ ISO-8859-6 ar_EG ISO-8859-6 ar_IN UTF-8 ar_IQ ISO-8859-6 ar_JO ISO-8859-6 ar_KW ISO-8859-6 ar_LB ISO-8859-6 ar_LY ISO-8859-6 ar_MA ISO-8859-6 ar_OM ISO-8859-6 ar_QA ISO-8859-6 ar_SA ISO-8859-6 ar_SD ISO-8859-6 ar_SY ISO-8859-6 ar_TN ISO-8859-6 ar_YE ISO-8859-6 be_BY CP1251 bg_BG CP1251 br_FR ISO-8859-1 bs_BA ISO-8859-2 ca_ES