Jason R. Coombs <jar...@jaraco.com> added the comment:

> > Encoding to 'utf-8' or the default file system encoding doesn't seem
> > right (as the characters end up getting stored in the gzip archive itself).
> I don’t understand.

The characters are being stored in the gzip archive as part of the gzip header. 
The comment in the Python 3 trunk indicates the encoding should be iso-8859-1: 
https://bitbucket.org/mirror/cpython/src/f3041e7f535d/Lib/tarfile.py#cl-475

My point is that the file system encoding is not relevant here. Because the 
name is being stored in a gzip blob, it should be encoded according to gzip 
specs.

> > Additionally, encoding as 'utf-8' would cause the file to be created
> > with a utf-8 filename, which would be undesirable.
> Why?

My concern here was that if we're encoding the string as utf-8 before passing 
to the __builtins__.open() call, Python might encode _that_ utf-8 string using 
the file system encoding and save the file that way (where the file is named 
with a utf-8 encoded string, not the unicode string intended). After further 
investigation, and based on the work that's been proposed, this is not a risk.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11638>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to