[issue29612] TarFile.extract() suffers from hard links inside tarball

2020-01-19 Thread Zachary Ware


Change by Zachary Ware :


--
nosy: +ethan.furman
versions: +Python 3.7, Python 3.8, Python 3.9 -Python 2.7, Python 3.4, Python 
3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2019-10-27 Thread TROUVERIE Joachim


Change by TROUVERIE Joachim :


--
pull_requests: +16486
pull_request: https://github.com/python/cpython/pull/16958

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-08-07 Thread TROUVERIE Joachim


Change by TROUVERIE Joachim :


--
pull_requests: +8192

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-06-21 Thread Joachim Trouverie


Joachim Trouverie  added the comment:

Travis build failed for a reason unrelated to my changes. I relaunched it using 
an empty commit. 

If anyone could validate my changes I would rebase to validate my work.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-03-28 Thread Joachim Trouverie

Joachim Trouverie  added the comment:

Anyone for a review ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-02-26 Thread Joachim Trouverie

Joachim Trouverie  added the comment:

I created a PR for this issue for Python 2.7 
(https://github.com/python/cpython/pull/5753/files).

I just skip the link creation if the target path is equals to the link target. 
I don't see any corner case where this would be an unwanted behavior.

I am not sure either I should also create an unit test for this behavior.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-02-19 Thread TROUVERIE Joachim

Change by TROUVERIE Joachim :


--
keywords: +patch
pull_requests: +5532
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-02-19 Thread Joachim Trouverie

Joachim Trouverie  added the comment:

Is there anybody working on this issue or can I create a branch concerning it ?

--
nosy: +jtrouverie

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2018-02-16 Thread Larry Cook

Larry Cook  added the comment:

I recently hit this with Python 2.7.5 and 2.7.13.  It has a very simple repro.  
Just specify the same file twice on the command line to tar (GNU 1.26):

% tar cvf test.tar test.txt test.txt
test.txt
test.txt

% tar tvf test.tar
-rw-r--r-- root/root24 2018-02-16 09:35 test.txt
hrw-r--r-- root/root 0 2018-02-16 09:35 test.txt link to test.txt

% python2.7
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tarfile
>>> tarball = tarfile.open("test.tar")
>>> tarball.extractall()
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python2.7/tarfile.py", line 2047, in extractall
self.extract(tarinfo, path)
  File "/usr/lib64/python2.7/tarfile.py", line 2084, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/lib64/python2.7/tarfile.py", line 2168, in _extract_member
self.makelink(tarinfo, targetpath)
  File "/usr/lib64/python2.7/tarfile.py", line 2252, in makelink
os.link(tarinfo._link_target, targetpath)
OSError: [Errno 2] No such file or directory
>>>

--
nosy: +Larry Cook

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2017-03-10 Thread Thomas Guettler

Thomas Guettler added the comment:

I have the same issue on Python 2.7.12 (Ubuntu 16.04)

I tried to execute tartest.py. But I could not find a way how to create the tar 
which is needed for tartest.py.

--
nosy: +guettli

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2017-02-21 Thread Ned Deily

Changes by Ned Deily :


--
nosy: +lars.gustaebel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29612] TarFile.extract() suffers from hard links inside tarball

2017-02-21 Thread Jussi Judin

New submission from Jussi Judin:

I managed to create a tarball that brought out quite nasty behavior with 
tarfile.TarFile.extract() and tarfile.TarFile.extractall() functions when there 
are hard links inside a tarball that point to themselves with a file that is 
included in the tarball. In Python 2.7 it leads to an exception and with Python 
3.4-3.6 it extracts the same file from the tarball multiple times.

First we create a tarball that causes this behavior:

$ mkdir -p tardata/1/2/3/4/5/6/7/8/9
$ dd if=/dev/zero of=tardata/1/2/3/4/5/6/7/8/9/zeros.data bs=100 count=500
# tar by default adds all directories recursively multiple times to the 
archive, but duplicates are created as hard links:
$ find tardata | xargs tar cvfz tardata.tar.gz

Then let's extract the tarball with tarfile module
Let following commands demonstrate what happens with the attached tartest.py 
file

$ python2.7.13 tartest.py noskip tardata.tar.gz /tmp/tardata-python-2.7.13
...
tardata/1/2/3/4/5/6/7/8/9/zeros.data
...
tardata/1/2/3/4/5/6/7/8/9/zeros.data
Traceback (most recent call last):
  File "tartest.py", line 17, in 
unarchive(skip, archive, dest)
  File "tartest.py", line 12, in unarchive
tar_fd.extract(info, dest)
  File "python/2.7.13/lib/python2.7/tarfile.py", line 2118, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "python/2.7.13/lib/python2.7/tarfile.py", line 2202, in _extract_member
self.makelink(tarinfo, targetpath)
  File "python/2.7.13/lib/python2.7/tarfile.py", line 2286, in makelink
os.link(tarinfo._link_target, targetpath)
OSError: [Errno 2] No such file or directory

And with Python 3.6.0 (and earlier Python 3 series based Pythons that I have 
tested):

$ time python3.6.0 tartest.py noskip tardata.tar.gz /tmp/tardata-python-3.6.0
...
tardata/1/2/3/4/5/6/7/8/9/zeros.data <-- this is extracted 11 times
...
real0m42.747s
user0m17.564s
sys 0m6.144s

If we then make the tarfile skip extraction of hard links that point to 
themselves:

$ time python3.6.0 tartest.py skip tardata.tar.gz /tmp/tardata-python-3.6.0
...
tardata/1/2/3/4/5/6/7/8/9/zeros.data <-- this is extracted once
...
Skipping tardata/1/2/3/4/5/6/7/8/9/zeros.data <-- skipped hard links 10 times
...
real0m2.688s
user0m1.816s
sys 0m0.532s

>From the used user CPU time it's obvious that there is happening a lot of 
>unneeded decompression when we compare Python 3.6 results. If I use 
>TarFile.extractall(), it behaves similarly as using TarFile.extract() 
>individually on TarInfo objects. GNU tar seems to behave in such fashion that 
>it skips over the extraction of the actual file data when it encounters this 
>situation.

--
components: Library (Lib)
files: tartest.py
messages: 288284
nosy: Jussi Judin
priority: normal
severity: normal
status: open
title: TarFile.extract() suffers from hard links inside tarball
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file46658/tartest.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com