[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
R. David Murray rdmur...@bitdance.com added the comment: I believe the title problem is solved by PEP 383 in py3k trunk. -- nosy: +r.david.murray resolution: - fixed stage: - committed/rejected status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Changes by Benjamin Peterson benja...@python.org: -- status: pending - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
David Watson bai...@users.sourceforge.net added the comment: @ Victor Stinner: Yes, the behaviour of those functions is as you describe - it's been changed since I filed this issue. I do consider it an improvement. By the password database, I mean /etc/passwd or replacements that are accessible via getpwnam() and friends. Users are often allowed to change things like the GECOS field, and can generally stick any old junk in there, regardless of encoding. Now that I come to check, it seems that in the Python 3.0 release, the pwd.* functions do raise UnicodeDecodeError when the GECOS field can't be decoded (bizarrely, they try to interpret it as a Python string literal, and thus choke on invalid backslash escapes). Unfortunately, this allows a user to change their GECOS field so that system programs written in Python can't determine the username corresponding to that user's UID or vice versa. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Martin v. Löwis mar...@v.loewis.de added the comment: By the password database, I mean /etc/passwd or replacements that are accessible via getpwnam() and friends. Please only discuss one issue at the time in the bug tracker. This issue is about invalidly-encoded command-line arguments, not about the password database. If you want to report an issue with the password database, please do so in a separate report. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
STINNER Victor victor.stin...@haypocalc.com added the comment: By the password database, I mean /etc/passwd or replacements that are accessible via getpwnam() and friends. Users are often allowed to change things like the GECOS field, and can generally stick any old junk in there, regardless of encoding. I started to patch pwd module to return bytes instead of unicode, but I didn't finished my work and the lost it :-/ Today, most UNIX uses UTF-8 as the default charset. About GECOS: is it really used? If you have real problems, open a new issue as proposed by Martin. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Changes by Gabriel Genellina gagsl-...@yahoo.com.ar: -- nosy: +gagenellina ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
STINNER Victor victor.stin...@haypocalc.com added the comment: Hmm, yes, I see that the open() builtin doesn't accept bytes filenames, though os.open() still does. What? open() builtin, io.open() and os.open() accept bytes filename. So what *is* os.listdir() supposed to do when it finds an unconvertible filename? Raise an exception? os.listdir(str)-str raises an exception on undecodable filename, whereas os.listdir(bytes)-bytes doesn't write unicode error because the filename is not decoded! What if someone puts unconvertible strings in the password database? Which database? It sounds like a different issue. It's always a good thing to reject undecodable string, even with python2 ;-) -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Dan Dever ded...@verizon.net added the comment: What if someone puts unconvertible strings in the password database? Which database? It sounds like a different issue. It's yet another special case of the more general issue, which is that Unix strings are strings of bytes that may or may not be encoded text. Bytes of any value (save nul) are permitted in any order. There may be the occasional additional constraint: '/' is not permitted in filenames since it's the path element delimiter, for example. But you can certainly have non-text strings for file names, environment variables, command-line arguments, etc. Since Python 3 strings must be text, they cannot generally be used to represent Unix strings. David's right, this is going to cause real problems. It has to be solved somehow, but the more obvious solutions are in some way ugly and introduce platform-to-platform inconsistencies. I occasionally skim the python-dev mailing list archive, and as far as I can tell there is yet no consensus on how to handle this. My use of Python is chiefly general-purpose scripting on Linux. Parameters to these scripts are more likely to be file names than anything else. So I can't personally consider moving to version 3 until this issue is resolved, which is why I added myself to the nosy list. I'm bothered by Martin's comment: That os.listdir still uses bytes should be changed as well. Both file names and command line arguments are strings, from the viewpoint of Python. Nothing else is supported. I hope that this is nothing more than his expression of dismay that such a situation should exist, and that he doesn't mean it literally. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Changes by dedded ded...@verizon.net: -- nosy: +dedded ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
David Watson [EMAIL PROTECTED] added the comment: Hmm, yes, I see that the open() builtin doesn't accept bytes filenames, though os.open() still does. When I saw that you could pass bytes filenames transparently from os.listdir() to os.open(), I assumed that this was intentional! So what *is* os.listdir() supposed to do when it finds an unconvertible filename? Raise an exception? Pretend the file isn't there? What if someone puts unconvertible strings in the password database? I think this is going to cause real problems for people. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Martin v. Löwis [EMAIL PROTECTED] added the comment: The issue with unrepresentable file names hasn't been decided yet. One option is to include the bytes object in that case, instead, noting that this can only occur on selected platforms. Another option is indeed to raise an exception, or exclude the file from the listing (although errors should never pass silently). ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
Martin v. Löwis [EMAIL PROTECTED] added the comment: That os.listdir still uses bytes should be changed as well. Both file names and command line arguments are strings, from the viewpoint of Python. Nothing else is supported. -- nosy: +loewis ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
New submission from David Watson [EMAIL PROTECTED]: The error message has no newline at the end: $ LANG=en_GB.UTF-8 python3.0 test.py $'\xff' Could not convert argument 2 to string$ Seriously, though: is this the intended behaviour? If the interpreter just dies when it gets a non-UTF-8 (or whatever) argument, it creates an opportunity for a denial-of-service if some admin is running a Python script via find(1) or similar. And what if you want to run a Python script on some files named in a mixture of charsets (because, say, you just untarred an archive created in a foreign charset)? Could sys.argv not provide bytes objects for those arguments, like os.listdir()? Or (better IMHO) have a separate sys.argv_bytes interface? -- components: Unicode messages: 67608 nosy: baikie severity: normal status: open title: Problem with invalidly-encoded command-line arguments (Unix) type: behavior versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3023 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com