On 6/2/07, Josiah Carlson <[EMAIL PROTECTED]> wrote: > """ > If a comment in the first or second line of the Python script matches > the regular expression coding[=:]\s*([-\w.]+), this comment is processed > as an encoding declaration; the first group of this expression names the > encoding of the source code file. > """ > > Your suggestion would unnecessarily change the semantics of the encoding > declarations. I would call this gratuitous breakage.
Depending on what the regular expression for the declarations is, the difference may not be big. Current code can also reliably be converted with an automated tool, so this isn't a big deal for py3k. It may be that the change is unnecessary. Reading Guido's writings, he seems to be of the opinion that the Java way (no restrictions at all) is right here, and anything else can be delegated to pylint and similar tools. > Sounds like the application of vim settings as a solution to a whole > bunch of completely unrelated "problems" in Python (especially with 4 > space indents being the "one true way to indent" and the encoding > declaration already being established). Please keep your vim out of my > Python ;) . The encoding declaration stays mostly the same, I'm just suggesting adding similar declarations for the identifier/string character sets and making them deception-proof. You're probably right about the indentation stuff. If you got rid of all indentation-related options and simply forbade mixture of tabs and spaces, I'd just say good riddance. > And as stated by basically everyone, the only *sane* default is ascii > identifiers. Since the vast majority of users will have no use for > unicode identifiers in the short or long term, making them the default > is overzealous at best. "Basically everyone" is not true, because it does not include Guido, who matters the most. Some quotes from his latest posts on the topic: Guido van Rossum (May 25): :I still think such a command-line switch (or switches) is the wrong :approach. What if I have *one* module that uses Cyrillic legitimately. :A command-line switch would enable Cyrillic in *all* modules. Guido van Rossum (May 25): :On 5/24/07, Josiah Carlson <[EMAIL PROTECTED]> wrote: :> Where else in Python have we made the default :> behavior only desired or useful to 5% of our users? : :Where are you getting that statistic? This seems an extremely :backwards, US-centric worldview. Guido van Rossum (May 25): :A more useful approach would seem to be a set of auditing tools that :can be applied routinely to all new contributions (e.g. as a :pre-commit hook when using a source control system), or to all code in :a given directory, download, etc. I don't see this as all that :different from using e.g. PyChecker of PyLint. : :While I routinely perform visual code inspections [...], I certainly don't see :this as a security audit [...]. Scanning for stray non-ASCII characters is best :left to automated tools. Guido van Rossum (May 23): :In particular very helpful was a couple of reports from the Java :world, where Unicode letters in identifiers have been legal for a long :time now. (JavaScript also supports this BTW.) The Java world has not fallen apart, Guido van Rossum (May 17): :As I mentioned before, I don't expect either of these will be much of :a concern. I guess tools like pylint could optionally warn if :non-ascii characters are used. : :On 5/16/07, Jim Jewett <[EMAIL PROTECTED]> wrote: :> (1) Security concerns. :> (2) Obscure bugs. Summary of what I think Guido's saying (involves some interpretation): - always having no restrictions (the Java way) is not a problem in practice - because having no restrictions has worked well with Java, Python should follow - any concerns can be adequately dealt solely with external tools - command line switches are a bad implementation of restriction management It is the last one of these that I was addressing, as there was some demand for restriction management (despite Guido's leave-it-to-pylint stance) but no adequate proposal. The defaults are easily changed in any case. > > # identifier_charset: fooproject.codingstyle.identifier_charset > > I really don't like the idea of adding a *different* import-like thing. > We already have imports (that are evaluated at run time, not compile > time), and due to their semantics, can't use a mechanism like the above. I agree that import is problematic. This part could be omitted with the rationale that it's more trouble than it's worth, and anyone who needs something complicated can use pylint or similar. In the end, something like this is what you'd have most of the time in practice when you care about character sets: # identifier_charset: 0-7f # Real code. When you have a file with Cyrillic, then it'd allow Cyrillic too. For quick hacks you could use this and everything would just work: #!/usr/bin/env python # Real code. This isn't really anything more than a countermeasure against Ka-Ping's tricky.py -exploit and addition of a real charset restriction method instead of abusing the coding declaration for that (that would force you to use legacy codings just to restrict the charsets, as pointed out a lot earlier here). One more thing which might be removed from the suggestion is the command line option and its associated site.py default. Such checking is more appropriate for pylint, and is probably of little use anyway. Either you trust the files you're importing in which case the characters they use does not make any difference, or you don't, in which case you shouldn't be importing them at all and checking their character sets will not help you at all. For audit purposes the comment directives are enough as they can't deceive, and if you want to be extra paranoid you can use pylint to catch any surreptitious patches like in Guillaume's post. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com