New submission from Roddy Shuler: GNU and USTAR formats use a special case if the file path is longer than 100 bytes. The detection for this, though, incorrectly checked for 100 characters rather than 100 bytes. So, if the length was close to but not exceeding 100 characters and included special characters such that the encoded length is greater than 100 bytes, the encoded string was truncated to 100 bytes and thus the resulting file name was truncated within the tar file.
For example... /gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jpg is truncated as: /gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jp The attached patch fixes this. Initially found on Python 3.3. Patch is tested on Linux with version 3.4.3-6 from Debian. Looking at the source code, I am pretty confident that the problem still exists upstream in Python 3.5. ---------- files: fix-tarfile-path-truncation.patch keywords: patch messages: 248363 nosy: Roddy Shuler priority: normal severity: normal status: open title: tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each type: behavior versions: Python 3.5 Added file: http://bugs.python.org/file40157/fix-tarfile-path-truncation.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24838> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com