[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-05-27 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

I recommend closing this issue since the behavior is the same as the BSD and 
GNU tar utilities.

--
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-05-17 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

I did some testing with BSD and GNU tar to compare with Python's behavior.

jfoo:~ jeff$ tar --version
bsdtar 2.8.3 - libarchive 2.8.3

jeff@albarino:~$ tar --version
tar (GNU tar) 1.28

Both BSD tar and GNU tar can create an empty tar file that consists of all zero 
bytes. BSD tar creates a 1 KB file:

jfoo:~ jeff$ tar -cf tarfilename.tar -T /dev/null
jfoo:~ jeff$ hexdump tarfilename.tar 
000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
400
jfoo:~ jeff$ tar -tf tarfilename.tar
jfoo:~ jeff$ echo $?
0

while GNU tar creates a 10 KB file:

jeff@albarino:~$ tar -cf tarfilename.tar -T /dev/null
jeff@albarino:~$ hexdump tarfilename.tar
000        
*
0002800
jeff@albarino:~$ tar -tf tarfilename.tar 
jeff@albarino:~$ echo $?
0

GNU tar will also leave a tar file with 10 KB of zeros when all contents have 
been deleted (BSD tar doesn't support deletion):

jeff@albarino:~$ tar cf empty.tar tarfilename.tar 
jeff@albarino:~$ hexdump empty.tar 
000 6174 6672 6c69 6e65 6d61 2e65 6174 0072
010        
*
060   3030 3030 3636 0034 3030 3130
070 3537 0031 3030 3130 3537 0031 3030 3030
080 3030 3432 3030 0030 3331 3634 3637 3430
090 3331 0037 3130 3432 3637 2000 0030 
0a0        
*
100 7500 7473 7261 2020 6a00 6665 0066 
110        
120     6a00 6665 0066 
130        
*
0005000
jeff@albarino:~$ tar --delete -f empty.tar tarfilename.tar
jeff@albarino:~$ hexdump empty.tar 
000        
*
0002800
jfoo:~ jeff$ tar -tf empty.tar
jfoo:~ jeff$ echo $?
0


According to the POSIX.1 standard, "[t]he last physical block shall always be 
the full size, so logical records after the two zero logical records may 
contain undefined data." 
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#).

It looks like any file starting with 1,024 bytes of zeros is a valid tar 
archive per BSD tar, GNU tar, and the POSIX.1 standard.

However, BSD tar and GNU tar disagree about files starting with 512 bytes of 
zeros followed by 512 bytes of garbage. First, I constructed such a file for 
testing (zr.tar):

jfoo:~ jeff$ dd if=/dev/zero of=zr.tar bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.60 secs (8521761 bytes/sec)
jfoo:~ jeff$ dd if=/dev/random of=zr.tar bs=512 count=1 oseek=1
1+0 records in
1+0 records out
512 bytes transferred in 0.56 secs (9138228 bytes/sec)
jfoo:~ jeff$ ls -l zr.tar
-rw-r--r--  1 jeff  staff  1024 May 17 13:14 zr.tar
jfoo:~ jeff$ hexdump zr.tar 
000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
200 d7 56 a9 8d 26 11 a4 d8 9a 96 15 04 8d 4b 31 5d
210 33 2b 20 ae a2 23 09 8c 60 a1 73 12 a1 ab 73 61
220 69 eb 88 bf 8a 7d 6b 9a c5 79 b6 c9 9b a9 5a 6d
230 4b 4a 81 a7 71 da 90 24 3f 8f 43 a9 95 a0 20 bb
240 93 0f b2 be 7e 4d 80 49 aa 61 19 a2 6b b5 5c f4
250 e0 34 7f 99 a0 d3 29 08 9a 25 97 96 d4 d0 07 e4
260 90 1c 60 97 9a 23 d3 25 38 54 97 8b 71 a0 83 40
270 a6 f9 19 1b 3f 6e bc 5b 06 22 20 fc ff fe 7b eb
280 35 9b 52 57 14 83 90 7f d3 e8 f4 72 58 96 16 8c
290 09 ad 2a 2f ad fd 43 09 96 eb 7c 8f fc a6 14 d9
2a0 18 34 38 b6 6a 5a ff 66 6d 46 cb 77 7a 5c 1e 72
2b0 3e 27 05 3a b0 c4 52 7b c8 cc 26 b9 c3 5f 39 27
2c0 a3 49 9e f1 3f f8 7e 46 98 df 7c 9d e3 86 c3 72
2d0 e1 ef 98 7d a1 96 4e 4b 82 bb f4 2b f3 71 6f 16
2e0 fe 38 2d bc 2b 70 b3 e6 db 1b ad 44 13 06 28 e5
2f0 3d 05 07 3c 5f 09 5b 90 67 09 0b 5a db 79 b7 27
300 8a 4b e5 b3 66 f0 7a 9d a5 c4 e3 a8 b4 b2 d2 c8
310 5d d1 27 81 03 25 33 f4 fb 6f 77 b1 df 9d fa cf
320 01 a7 70 40 b4 7f 6b ac 04 70 5c 29 06 6a 73 64
330 4f 15 92 3b 5e a4 34 95 e0 4b 04 be ca 87 e9 73
340 1e 63 98 f3 f1 fd be 7a de fe 84 27 b7 e4 db e0
350 fb 04 7f 9d f0 ae af a3 8e 0f c2 a7 80 e0 32 38
360 17 1e 47 37 48 9b 99 35 58 9d d5 83 1b 67 d4 e8
370 15 0d 00 bb 79 f3 37 59 c3 5e e9 1d 87 79 96 de
380 6c 89 35 34 0b b1 12 b2 a8 2d 61 dd f5 9a 19 e7
390 c1 c5 24 46 fa 23 f0 db 72 7f a5 18 aa e2 db 04
3a0 1e cc a6 0f 9e 4e 00 d9 2d eb f9 fc c4 d5 8e 46
3b0 ab c3 ed 53 98 df a8 81 26 f4 b5 0f b4 7f 12 a4
3c0 4a aa 14 4c f5 aa dd ba 69 e5 a8 d5 b3 68 0b 9f
3d0 1a aa 34 a4 60 09 c2 30 22 32 72 dd 2e f9 7a 79
3e0 88 a3 6a 99 13 4f f4 27 db 02 2e cb a0 ec d8 4d
3f0 fe 68 44 0c 7b 3a 74 8d 8e cd ba 3e d8 ef cb 97
400


GNU tar outputs a warning message, but still returns zero:

jeff@albarino:~$ tar -tvf zr.tar 
tar: A lone zero block at 1
jeff@albarino:~$ echo $?
0

while BSD tar silently accepts the file:

jfoo:~ jeff$ tar -tvf zr.tar 
jfoo:~ jeff$ echo $?
0

Python also accepts the file as valid:

>>> tarfile.open("zr.tar", "r")


Personally, I think that an error should be returned if the file star

[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-04-18 Thread Read Hughes


Read Hughes  added the comment:

GNU description of tar file format: 
http://www.gnu.org/software/tar/manual/html_node/Standard.html

Particular quotes that are relevant:

>Physically, an archive consists of a series of file entries terminated by an 
>end-of-archive entry, which consists of two 512 blocks of zero bytes

>Each file archived is represented by a header block which describes the file, 
>followed by zero or more blocks which give the contents of the file. At the 
>end of the archive file there are two 512-byte blocks filled with binary zeros 
>as an end-of-file marker

The header itself is 257 bytes padded with NUL until it reaches 512.

No input other than this, just trying to bring any relevant information to this 
issue that may help

--
nosy: +rthugh02

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-04-11 Thread Eitan Adler


Change by Eitan Adler :


--
nosy: +eitan.adler

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-04-11 Thread Carl Harris


Change by Carl Harris :


--
nosy: +hitbox

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-04-10 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +lars.gustaebel, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36596] tarfile module considers anything starting with 512 bytes of zero bytes to be a valid tar file

2019-04-10 Thread Chris Siebenmann


New submission from Chris Siebenmann :

The easiest reproduction of this is:

import tarfile
tarfile.open("/dev/zero", "r:")

(If you use plain "r" you get a hang in attempted lzma decoding.)

I believe this is probably due to a missing 'elif self.offset == 0:' in the 
'except EOFHeaderError' exception handling case that almost all of the other 
exception handlers have.

This appears to be a very long standing issue based on the history of the code.

--
components: Library (Lib)
messages: 339915
nosy: cks
priority: normal
severity: normal
status: open
title: tarfile module considers anything starting with 512 bytes of zero bytes 
to be a valid tar file
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com