On 6/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >> Indeed, PEP 3131 gives a predictable identifier character set. > >> Adding per-site options to change the set of allowable characters > >> makes it less predictable.
> > Not in practice. ... > > By allowing site modifications, the rule becomes: > > It will use ASCII. [and clipped "programs intended only for local use will use ASCII plus letters that locla users recognize."] > Not universally - only on that site. Yes, universally. By allowing "any unicode character", you have reason to believe the next piece of code isn't doing something strange, either by accident or by malice. By allowing "ASCII + those listed in the site config", then the rule will change from "It will use ASCII, always" (today) to "It will use ASCII if it is intended for distribution." plus "local programs can use ASCII + locally recognized letters" That is slightly more complicated than ASCII-only, but only for those who want to use the extended charsets -- and either rule is still straightforward. The rule proposed in PEP 3131 is "It will use something that is numerically a letter or number, to someone somewhere." Given the style guide of ASCII for internationally targeted open source, that will degrade to "It should use ASCII". "But it might not, since there will be no feedback or apparent downside to violating the style rule, even for distributed code." "In fact, it might even use something downright misleading, and you won't have any warning, because we thought that maybe someone, somewhere, might have wanted that character in a different context." And no, I don't think I'm exagerating with that last one; we aren't proposing rules against mixed script identifiers (or even limiting script switches to occur only at the _ character). It will be perfectly legitimate to apparently end a string with three consecutive prime characters. It will be bad style, but there will be nothing to tip off the non-paranoid. In theory, we could solve this by limiting the non-ASCII characters, but I don't we can do that in practice. The unicode consortium hasn't even tried; even XID + security modifications + NFKC still includes characters that are intended to look identical; all the security modifications do is eliminate characters that do *not* have any expected legitimate use. (Example: no living language uses them.) I don't think we want to wade too deeply into the morass of confusables detection; the unicode consortium itself says the problem is neither solved nor stable. It might be a good idea to restrict (wihtin-a-single-ID) script switches to only occur at the "_", but I'm not sure a 95% solution is worth doing. By saying "Only charcacters you or your sysadmin expected", we at least limit it to things the user will be expecting and can recognize. (Unless the sysadmin decides otherwise.) > I don't know what rule is > in force on my buddy's machine, so predicting it becomes harder. But you know ASCII will work. If he used the same local install (classroom peer, member of the same user group, etc), then your local characters will probably work too. If he is really your buddy, he probably trusts you enough to allow your charset if you tell him about it. > I just put wording in the PEP that makes it clear that, whatever > the problem, a global flag is not an acceptable solution. I agree that a single flag doesn't really solve the problem. But a global configuration does go a long way. For me personally, I would be more willing to allow Latin-1 than Hangul, because I can recognize the Latin-1 characters. (I still wouldn't allow them all by default; the difference between the various lower-case i's is small enough -- to me -- that I want a warning when one is used.) Hangul is more acceptable than Cyrillic, because at least it is obviously foreign; I won't mistake it for something. Someone who uses Cyrillic on a daily basis might well have the opposite preferences. I support letting her use Cyrillic if she wants to; I just don't want it to work on my machine without my knowing about it. But I would like to be able to accept é and ç (French characters) without shutting off the warning for Cyrillic or Ogham. Allowing ASCII plus "chars specified by the site or user through a config file" meets that goal. -jJ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com