[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset dc1045d08bd8 by Jason R. Coombs in branch '2.7':
Issue #11638: Adding test to ensure .tar.gz files can be generated by sdist 
command with unicode metadata, based on David Barnett's patch.
http://hg.python.org/cpython/rev/dc1045d08bd8

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Lars Gustäbel

Lars Gustäbel l...@gustaebel.de added the comment:

I think we should wrap this up as soon as possible, because it has already 
absorbed too much of our time. The issue we discuss here is a tiny glitch 
triggered by a corner-case. My original idea was to fix it in a minimal sort of 
way that is backwards-compatible.

There are at least 4 different solutions now:

1. Keep the patch.
2. Revert the patch, leave everything as it was as wontfix.
3. Don't write an FNAME field at all if the filename that is passed is a 
unicode string.
4. Rewrite the FNAME code the way Terry suggests. This seems to me like the 
most complex solution, because we have to fix gzip.py as well, because the code 
in question was originally taken from the gzip module. (BTW, both the tarfile 
and gzip module discard the FNAME field when a file is opened for reading.)

My favorites are 1 and 3 ;-)

--
assignee:  - lars.gustaebel
priority: normal - low

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Jason R. Coombs

Jason R. Coombs jar...@jaraco.com added the comment:

I also feel (1) or (3) is best for this issue. If there is a _better_ 
implementation, it should be reserved for a separate improvement to Python 
3.2+.

I lean slightly toward (3) because it would support filenames with Unicode 
characters other than latin-1 (as long as the file system allows it to be 
saved), because I suspect it would enable tests such as this to pass: 
https://bitbucket.org/jaraco/cpython-issue11638/changeset/9e9ea96eb0dd#chg-Lib/distutils/tests/test_archive_util.py

--
Added file: http://bugs.python.org/file24090/smime.p7s

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___

smime.p7s
Description: S/MIME cryptographic signature
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

As I understand the patched code, it only fixes the issue for unicode names 
that can be latin-1 encoded and that other unicode names will raise the same 
exception with 'latin-1' (or equivalent) substituted for 'ascii'. So it is easy 
for me to anticipate a new issue reporting such someday.

I would prefer a more complete fix. If 3 is easier than 4, fine with me.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

I just took a look as the 3.2 tarfile code and see that it always (because 
self.name is always unicode) does the same encoding, with 'replace', 
referencing RFC1952. Although there are a few other differences, they appear 
inconsequential, so that the code otherwise should behave the same. Reading 
further on codec error handling, I gather that my previously understanding was 
off; non-Latin1 chars will just all appear as '?' instead of raising an 
exception. While that is normally useless, it does not matter since the result 
is not used. So I agree to call this fixed.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-24 Thread Lars Gustäbel

Lars Gustäbel l...@gustaebel.de added the comment:

I thought about that myself, too. It is clearly no new feature, it is really 
more some kind of a fix.

Unicode pathnames given to tarfile.open() are just passed through to the open() 
function, which is why this always has been working, except for this particular 
case. There are 6 different possible write modes: w:, w:gz, w:bz2, w|, 
w|gz and w|bz2. And the only one not working with a unicode pathname is 
w|gz. Although admittedly tarfile.open() is not supposed to be used with a 
unicode path, people do it anyway, because they don't care, and because it 
works. The patch does not add a new broad functionality, it merely harmonises 
the way the six write modes work.

Neither can we retroactively enforce using string pathnames at this point, nor 
should we let a user run into this strange error. The patch is very small and 
minimally invasive. The error message you get without the patch is completely 
incomprehensible.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-24 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

With that explanation, that it is one case out of six that fails, for whatever 
reason, I agree.

That leaves the issue of whether the fix is the right one. I currently agree 
with Victor that we should do what the rest of Python does and what is most 
universally useful. That fact that an old standard requires a *storage* 
encoding for a nearly unused field for .gz files that (I believe) only works 
for Western Europe, does not mean we should use it for *opening* .tar files. 
WestEuro-centrism is as bad as Anglo-centrism. If the unicode filename cannot 
be Latin-1 encoded, the filename field should be left blank. But it seems to me 
that the filename should be converted to the bytes that the user wants, 
expects, and can use.

--
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-23 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

2.7 is closed to new features. This looks like it mignt be one. The 2.7 doc for 
tarfile.open says Return a TarFile object for the pathname name. Does the 
meaning of 'pathname' in 2.7 generally include unicode as well as str objects? 
(It is not in the Glossary.) 

 The error does not occur under Python 3 (even with non-ascii characters), so 
 it should be possible to create a tarfile with a unicode filename on Python 
 2.7.

Python 3 has many new features that are not in 2.7, so 'possible' is not 
exactly the point ;-).

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Lars Gustäbel

Lars Gustäbel l...@gustaebel.de added the comment:

tarfile under Python 2.x is not particularly designed to support unicode 
filenames (the gzip module does not support them either), but that should not 
be too hard to fix.

--
keywords: +patch
Added file: 
http://bugs.python.org/file24066/tarfile-stream-gzip-unicode-fix.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Jason R. Coombs

Jason R. Coombs jar...@jaraco.com added the comment:

That looks like a good patch to me. Do you want to commit it, or would you 
rather I do?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset a60a3610a97b by Lars Gustäbel in branch '2.7':
Issue #13639: Accept unicode filenames in tarfile.open(mode=w|gz).
http://hg.python.org/cpython/rev/a60a3610a97b

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

+ self.name = self.name.encode(iso-8859-1, replace)

Why did you chose ISO-8859-1? I think that the filesystem encoding should be 
used instead:

-self.name = self.name.encode(iso-8859-1, replace)
+self.name = self.name.encode(ENCODING, replace)

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Lars Gustäbel

Lars Gustäbel l...@gustaebel.de added the comment:

See http://bugs.python.org/issue11638#msg150029

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

The gzip format (defined in RFC 1952) allows storing the original filename 
(without the .gz suffix) in an additional field in the header (the FNAME 
field). Latin-1 (iso-8859-1) is required.

Hum, it looks like the author of the gzip program (on Linux Fedora 16) didn't 
read the RFC!

$ tar -cvf hého.tar README
README
$ gzip hého.tar 
$ hachoir-urwid ~/prog/python/default/hého.tar.gz 
0) file:/home/haypo/prog/python/default/hého.tar.gz: ...
   0) signature= \x1f\x8b: GZip file signature (\x1F\x8B) (2 bytes)
   2) compression= deflate: Compression method (1 byte)
   3.0) is_text= False: File content is probably ASCII text (1 bit)
   3.1) has_crc16= False: Header CRC16 (1 bit)
   3.2) has_extra= False: Extra informations (variable size) (1 bit)
   3.3) has_filename= True: Contains filename? (1 bit)
   3.4) has_comment= False: Contains comment? (1 bit)
   3.5) reserved[0]= null (3 bits)
   4) mtime= 2011-12-21 19:34:54: Modification time (4 bytes)
   8.0) reserved[1]= null (1 bit)
   8.1) slowest= False: Compressor used maximum compression (slowest) (1 bit)
   8.2) fastest= False: Compressor used the fastest compression (1 bit)
   8.3) reserved[2]= null (5 bits)
   9) os= Unix: Operating system (1 byte)
   10) filename= hého.tar: Filename (10 bytes)

Raw display:

   10) filename= h\xc3\xa9ho.tar\0: Filename (10 bytes)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-20 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +lars.gustaebel

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-19 Thread Jason R. Coombs

New submission from Jason R. Coombs jar...@jaraco.com:

python -c import tarfile; tarfile.open(u'hello.tar.gz', 'w|gz')

produces

Traceback (most recent call last):
  File string, line 1, in module
  File C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py, line 1687, in 
open
_Stream(name, filemode, comptype, fileobj, bufsize),
  File C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py, line 431, in 
__init__
self._init_write_gz()
  File C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py, line 459, in 
_init_write_gz
self.__write(self.name + NUL)
  File C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py, line 475, in 
__write
self.buf += s
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal 
not in range(128)


Remove the compression ('|gz') or remove the unicode name or run under Python 3 
and the command completes without error.

The error does not occur under Python 3 (even with non-ascii characters), so it 
should be possible to create a tarfile with a unicode filename on Python 2.7.

This failure is the underlying cause of #11638.

--
messages: 149896
nosy: jason.coombs
priority: normal
severity: normal
status: open
title: UnicodeDecodeError when creating tar.gz with unicode name
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13639
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com