New submission from Roddy Shuler:

GNU and USTAR formats use a special case if the file path is longer than 100 
bytes. The detection for this, though, incorrectly checked for 100 characters 
rather than 100 bytes. So, if the length was close to but not exceeding 100 
characters and included special characters such that the encoded length is 
greater than 100 bytes, the encoded string was truncated to 100 bytes and thus 
the resulting file name was truncated within the tar file.

For example...

/gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion 
y Lenguaje 1 Grado.jpg

is truncated as:

/gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion 
y Lenguaje 1 Grado.jp

The attached patch fixes this.  Initially found on Python 3.3.  Patch is tested 
on Linux with version 3.4.3-6 from Debian.  Looking at the source code, I am 
pretty confident that the problem still exists upstream in Python 3.5.

----------
files: fix-tarfile-path-truncation.patch
keywords: patch
messages: 248363
nosy: Roddy Shuler
priority: normal
severity: normal
status: open
title: tarfile.py: fix GNU and USTAR formats to properly handle paths with 
special characters that are encoded with more than one byte each
type: behavior
versions: Python 3.5
Added file: http://bugs.python.org/file40157/fix-tarfile-path-truncation.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24838>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to