Rauli Ruohonen writes: > On 6/5/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > > I'd love to get rid of full-width ASCII and halfwidth kana (via > > compatibility decomposition). > > If you do forbid compatibility characters in identifiers, then they > should be flagged as an error, not converted silently.
No. The point is that people want to use their current tools; they may not be able to easily specify normalization. We should provide tools to pick this lint from programs, but the normalization should be done inside of Python, not by the user. Please look through the list (I've already done so; I'm speaking from detailed examination of the data) and state what compatibility characters you want to keep. On reflection, I would make an exception for LATIN L WITH MIDDLE DOT (both cases); just don't decompose it for the sake of Catalan. (And there possibly should be a warning for L followed by MIDDLE DOT.) But as a native English speaker and one who lectures and deals with the bureaucracy in Japanese, I can tell you unequivocally I want the fi and ffi ligatures and full-width ASCII compatibility decomposed, and as a daily user of several Japanese input methods, I can tell you it would be a massive pain in the ass if Python doesn't convert those, and errors would be an on-the-minute-every-minute annoyance. > Unicode, and adding extra equivalences (whether it's "FoO" == "foo", > "カキ" ==
"カキ" or "A123" == "A123") is surprising. How many Japanese documents do you deal with on a daily basis? I live with the half-width kana and full-width ASCII every day, and they are simply an annoyance to me and to everybody I know. They are treated as font variants, not different characters, by *all* users. Users are quite happy to substitute ultra-wide ASCII fonts for JIS X 0208 ASCII, or ultra-condensed fonts for JIS X 0201 kana. Japanese don't expect equivalence, but that's because it's too much effort for the programmers when nobody is asking for it; the users are unsophisticated and don't demand it. But where equivalence is provided on web forms and the like, people are indeed surprised, they are *impressed*. "Wow! Gaijin magic! How'd he *do* that?!" They *hate* the fact that some forms want the postal code entered in JIS X 0208 full-width digits while others want ASCII (and I've even seen a form that expected the address, including the yuubin mark, to be in full-width JIS, but the postal code itself, embedded in the address, had to be entered in ASCII or the form couldn't parse it). > In short, I would like this function to return 'OK' or be a > syntax error, but it should not fail or return something else: > > def test(): > if 'A' == 'A': return 'OK' > A = 'O' > A = 'K' # as tested above, 'A' and 'A' are not the same thing > return locals()['A']+locals()['A'] I would like this code to return "KK". This might be an unpleasant surprise, once, and there would need to be a warning on the box for distribution in Japan (and other cultures with compatibility decompositions). On the other hand, diffusion of non-ASCII identifiers at best will be moderately paced; people will have to learn about usage and will have time to get used to it.
_______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
