On 6/3/07, BJörn Lindqvist <[EMAIL PROTECTED]> wrote: [Most deleted, Stephen Turnbull already answered better than I knew, let alone could write]
> > The same one-step-at-a-time reasoning applies to unicode identifers. > > Allowing IDs in your native language (or others that you explicitly > > approve) is probably a good step. Allowing IDs in *any* language by > > default is probably going too far. > If you set different native languages won't you get the exact same > problems that codepages caused and that unicode was invented to solve? Not at all; if anything, it is the opposite. (1) Those different code pages were mainly used for text, not programming logic. No one has suggested (re-)limiting comments or even (continuing to limit) strings. (2) The biggest problem that I saw in practice was partial overlap; people would assume WYSIWYG, and the different code pages were close enough (mostly overlapping in ASCII) that they didn't usually need to use the same code page -- but then when the differences did bite, they were harder to notice. If you happen to use both Sanskrit and Ethiopic, you can set your own computer to accept both. The only catch is that you probably can't share the Sanskrit with the Coptic community (or vice versa), unless at least one of the following is true: (2a) The code itself (not comments or strings) is in ASCII, so both can read it. Note that this is already the recommened policy for shared code. or (2b) The people you are sharing with trust you enough to add your script as an acceptable alternate. (Again, preferably a simple one-time step -- but an explicit decision.) or (2c) The people you are sharing with have already decided to accept Sanskrit (or Coptic) because other people they trusted were using it, and said it was safe. The existence of 2b and 2c rely on the "consenting adults" policy, but they encourage "informed consent". I wouldn't be surprised to discover that Latin-1, Sanskrit, Coptic, and the Japanese characters were all OK with me. That still wouldn't mean I want to allow Cyrillic (which carries more confusable risk). I already know I don't want to auto-allow the FF10-FF19 (fullwidth ASCII numbers[1]), simply because I don't see any good (non-presentational) reason to use them in place of the normal ASCII numbers -- so the more likely result of using them is confusion. Adding one script (or character range) at a time lets me add things that I (or people I trust) think are reasonable. Turning unicode on or off with a single blunt switch does not. -jJ [1] Yes, the fullwidth ASCII variants are allowed as ID characters according to both the unicode ID_* and XID_ properties, which means they are allowed by the current draft. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com