New submission from Jack Lloyd:

Context: I have a script which checks out a software release (tagged git 
revision) and builds an archive to distribute to end users. One goal of this 
script is that the archive is reproducible, ie if the script is run twice (at 
different times, on different machines, by different people) it produces 
bit-for-bit identical output, and thus also has the same SHA-256 hash.

Mostly this works great, using the TarInfo feature of tarfile.py to set the 
uid/gid/mtime to fixed values. Except I also want to compress the archive, and 
tarfile calls time.time() to find out the timestamp that will be embedded in 
the gzip header. This breaks my carefully deterministic output.

I would like it if tarfile accepted an additional keyword that allowed 
overriding the time value for the gzip header. As it is I just hack around it 
with

def null_time():
    return 0
time.time = null_time

which does work but is also horrible.

Alternately, tarfile could just always set the timestamp header to 0 and avoid 
having its output depend on the current clock. I doubt anyone would notice.

The script in question is here 
https://github.com/randombit/botan/blob/master/src/scripts/dist.py

My script uses Python2 for various reasons, but it seems the same problem 
affects even the tarfile.py in latest Python3. I would be willing to try 
writing a patch for this, if anything along these lines might be accepted.

Thanks.

----------
components: Library (Lib)
messages: 302590
nosy: randombit
priority: normal
severity: normal
status: open
title: Allow setting timestamp in gzip-compressed tarfiles
type: enhancement
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31526>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to