[issue31557] tarfile: incorrectly treats regular file as directory

2017-10-04 Thread Joe Tsai

Joe Tsai  added the comment:

It creates a number of nested directories only because GNU (and BSD) tar 
implicitly create missing parent directories. If you cd into the bottom-most 
folder, you will see "foo.txt".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31557] tarfile: incorrectly treats regular file as directory

2017-10-04 Thread Nitish

Nitish  added the comment:

Try 'tar xvf test.tar'. On Linux machine at least, it is in fact producing a 
tree of directories. Not a single file. So - in a way what Python is reporting 
is correct.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31557] tarfile: incorrectly treats regular file as directory

2017-10-03 Thread Joe Tsai

Joe Tsai  added the comment:

This bug is not platform specific.

I've attached a reproduction:
$ python
>>> import tarfile
>>> tarfile.open("test.tar", "r").next().isdir()
True

$ tar -tvf test.tar
-rw-rw-r-- 0/0   0 1969-12-31 16:00 
123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/foo.txt
$ tar --version
tar (GNU tar) 1.27.1

For some background, this bug was original filed against the Go standard 
library (for which I am the maintainer of the Go implementation of tar). When I 
investigated the issue, I discovered that Go was doing the right thing, and 
that the discrepancy was due to the check I pointed to earlier. The GNU tool 
indicates that this is a regular file as well.

--
Added file: https://bugs.python.org/file47188/test.tar

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31557] tarfile: incorrectly treats regular file as directory

2017-09-30 Thread Nitish

Nitish  added the comment:

> This check was the source of a bug that caused tarfile to report a regular as 
> a directory because the file path was extra long, and when the tar write 
> truncated the path to the first 100B, it so happened to end on a slash.

AFAIK, '/' character is not allowed as part of a filename on Linux systems. Is 
this bug platform specific? Can you give the testcase you are referring to.

--
nosy: +nitishch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31557] tarfile: incorrectly treats regular file as directory

2017-09-30 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
components: +Library (Lib)
nosy: +serhiy.storchaka
stage:  -> needs patch
type:  -> behavior
versions: +Python 2.7, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31557] tarfile: incorrectly treats regular file as directory

2017-09-29 Thread Terry J. Reedy

Change by Terry J. Reedy :


--
nosy: +lars.gustaebel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31557] tarfile: incorrectly treats regular file as directory

2017-09-22 Thread Joe Tsai

New submission from Joe Tsai:

The original V7 header only allocates 100B to store the file path. If a path 
exceeds this length, then either the PAX format or GNU formats must be used, 
which can represent arbitrarily long file paths. When doing so, most tar 
writers just store the first 100B of the file path in the V7 header.

When reading, a proper reader should disregard the contents of the V7 field if 
a previous and corresponding PAX or GNU header overrode it.

This currently not the case with the tarfile module, which has the following 
check 
(https://github.com/python/cpython/blob/c7cc14a825ec156c76329f65bed0d0bd6e03d035/Lib/tarfile.py#L1054-L1057):
# Old V7 tar format represents a directory as a regular
# file with a trailing slash.
if obj.type == AREGTYPE and obj.name.endswith("/"):
obj.type = DIRTYPE

This check should be further constrained to only activate when there were no 
prior PAX or GNU records that override that value of obj.name. This check was 
the source of a bug that caused tarfile to report a regular as a directory 
because the file path was extra long, and when the tar write truncated the path 
to the first 100B, it so happened to end on a slash.

--
messages: 302778
nosy: Joe Tsai
priority: normal
severity: normal
status: open
title: tarfile: incorrectly treats regular file as directory

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com