[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2009-05-18 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I believe the title problem is solved by PEP 383 in py3k trunk.

--
nosy: +r.david.murray
resolution:  - fixed
stage:  - committed/rejected
status: open - pending

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2009-05-18 Thread Benjamin Peterson

Changes by Benjamin Peterson benja...@python.org:


--
status: pending - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2009-01-02 Thread David Watson

David Watson bai...@users.sourceforge.net added the comment:

@ Victor Stinner: Yes, the behaviour of those functions is as you
describe - it's been changed since I filed this issue.  I do
consider it an improvement.

By the password database, I mean /etc/passwd or replacements that
are accessible via getpwnam() and friends.  Users are often
allowed to change things like the GECOS field, and can generally
stick any old junk in there, regardless of encoding.  Now that I
come to check, it seems that in the Python 3.0 release, the pwd.*
functions do raise UnicodeDecodeError when the GECOS field can't
be decoded (bizarrely, they try to interpret it as a Python
string literal, and thus choke on invalid backslash escapes).
Unfortunately, this allows a user to change their GECOS field so
that system programs written in Python can't determine the
username corresponding to that user's UID or vice versa.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2009-01-02 Thread Martin v. Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 By the password database, I mean /etc/passwd or replacements that
 are accessible via getpwnam() and friends. 

Please only discuss one issue at the time in the bug tracker. This
issue is about invalidly-encoded command-line arguments, not about
the password database. If you want to report an issue with the password
database, please do so in a separate report.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2009-01-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 By the password database, I mean /etc/passwd or replacements that
 are accessible via getpwnam() and friends.  Users are often
 allowed to change things like the GECOS field, and can generally
 stick any old junk in there, regardless of encoding.

I started to patch pwd module to return bytes instead of unicode, but I didn't 
finished my work and the lost it :-/ Today, most UNIX uses UTF-8 as the 
default charset. About GECOS: is it really used? If you have real problems, 
open a new issue as proposed by Martin.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-12-31 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


--
nosy: +gagenellina

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-12-30 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Hmm, yes, I see that the open() builtin doesn't accept bytes
 filenames, though os.open() still does.

What? open() builtin, io.open() and os.open() accept bytes filename.

 So what *is* os.listdir() supposed to do when it finds an
 unconvertible filename?  Raise an exception?

os.listdir(str)-str raises an exception on undecodable filename, 
whereas os.listdir(bytes)-bytes doesn't write unicode error because 
the filename is not decoded!

 What if someone puts unconvertible strings in the password database?

Which database? It sounds like a different issue. It's always a good 
thing to reject undecodable string, even with python2 ;-)

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-12-30 Thread Dan Dever

Dan Dever ded...@verizon.net added the comment:

 What if someone puts unconvertible strings in the password database?

 Which database? It sounds like a different issue.

It's yet another special case of the more general issue, which is that
Unix strings are strings of bytes that may or may not be encoded text. 
Bytes of any value (save nul) are permitted in any order.  There may be
the occasional additional constraint: '/' is not permitted in filenames
since it's the path element delimiter, for example.  But you can
certainly have non-text strings for file names, environment variables,
command-line arguments, etc.

Since Python 3 strings must be text, they cannot generally be used to
represent Unix strings.  David's right, this is going to cause real
problems.  It has to be solved somehow, but the more obvious solutions
are in some way ugly and introduce platform-to-platform inconsistencies.
 I occasionally skim the python-dev mailing list archive, and as far as
I can tell there is yet no consensus on how to handle this.

My use of Python is chiefly general-purpose scripting on Linux. 
Parameters to these scripts are more likely to be file names than
anything else.  So I can't personally consider moving to version 3 until
this issue is resolved, which is why I added myself to the nosy list.

I'm bothered by Martin's comment:

 That os.listdir still uses bytes should be changed as well. Both
 file names and command line arguments are strings, from the
 viewpoint of Python. Nothing else is supported.

I hope that this is nothing more than his expression of dismay that such
a situation should exist, and that he doesn't mean it literally.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-12-29 Thread dedded

Changes by dedded ded...@verizon.net:


--
nosy: +dedded

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-06-02 Thread David Watson

David Watson [EMAIL PROTECTED] added the comment:

Hmm, yes, I see that the open() builtin doesn't accept bytes
filenames, though os.open() still does.  When I saw that you
could pass bytes filenames transparently from os.listdir() to
os.open(), I assumed that this was intentional!

So what *is* os.listdir() supposed to do when it finds an
unconvertible filename?  Raise an exception?  Pretend the file
isn't there?  What if someone puts unconvertible strings in the
password database?  I think this is going to cause real problems
for people.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-06-02 Thread Martin v. Löwis

Martin v. Löwis [EMAIL PROTECTED] added the comment:

The issue with unrepresentable file names hasn't been decided yet. One
option is to include the bytes object in that case, instead, noting that
this can only occur on selected platforms. Another option is indeed to
raise an exception, or exclude the file from the listing (although
errors should never pass silently).

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-06-01 Thread Martin v. Löwis

Martin v. Löwis [EMAIL PROTECTED] added the comment:

That os.listdir still uses bytes should be changed as well. Both file
names and command line arguments are strings, from the viewpoint of
Python. Nothing else is supported.

--
nosy: +loewis

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-06-01 Thread David Watson

New submission from David Watson [EMAIL PROTECTED]:

The error message has no newline at the end:

$ LANG=en_GB.UTF-8 python3.0 test.py $'\xff'
Could not convert argument 2 to string$

Seriously, though: is this the intended behaviour?  If the
interpreter just dies when it gets a non-UTF-8 (or whatever)
argument, it creates an opportunity for a denial-of-service if
some admin is running a Python script via find(1) or similar.
And what if you want to run a Python script on some files named
in a mixture of charsets (because, say, you just untarred an
archive created in a foreign charset)?

Could sys.argv not provide bytes objects for those arguments,
like os.listdir()?  Or (better IMHO) have a separate
sys.argv_bytes interface?

--
components: Unicode
messages: 67608
nosy: baikie
severity: normal
status: open
title: Problem with invalidly-encoded command-line arguments (Unix)
type: behavior
versions: Python 3.0

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3023
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com