Re: [Distutils] Reproducible builds (Sdist)

2017-10-02 Thread Matthias Bussonnier
Hi all,

On Fri, Sep 29, 2017 at 12:04 PM, Jakub Wilk  wrote:
> It not enough to normalize timestamps. You need to normalize permissions and
> ownership, too.
>
> (I'm using https://pypi.python.org/pypi/distutils644 for normalizing
> permissions/ownership in my own packages.)
>
Thanks Jakub this will be helpful for me;

> Yeah, I don't believe distutils honors SOURCE_DATE_EPOCH at the moment.
>
>> Second; is there a convention to store the SDE value ?
>
> In the changelog.

I'll consider that as well;


On Sun, Oct 1, 2017 at 10:31 PM, Nick Coghlan  wrote:
> On 30 September 2017 at 06:02, Thomas Kluyver  wrote:
>> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>
> For distro level reproducible build purposes, we typically treat the
> published tarball *as* the original sources, and don't really worry
> about the question of "Can we reproduce that tarball, from that VCS
> tree?".

Thanks for the detail explanation Nick, even if this was not the
original goal of SDE,
I would still like to have it reproducible build of sdist even if my package
does not have source generation like Cython;  I'll embed the timestamp in the
commit for now;  and see if I can also extract the timestamp from the
commit log.
AFAICT it's `git log -1 --pretty=format:%ct` if it's of interest to anyone.

My interest in this is to have CI to build the sdist, and make sure independant
machines can get the same artifact in order to have a potentially distributed
agreement on what the sdist is.

Is there any plan (or would it be accepted), to try to upstream patches like
distutils644 Jakub linked to ?

Thanks,
-- 
Matthias
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-10-01 Thread Nick Coghlan
On 30 September 2017 at 06:02, Thomas Kluyver  wrote:
> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>> Second; is there a convention to store the SDE value ? I don't seem to
>> be able to find one. It is nice to have reproducible build; but if
>> it's a pain for reproducers to find the SDE value that highly decrease
>> the value of SDE build.
>
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.

For distro level reproducible build purposes, we typically treat the
published tarball *as* the original sources, and don't really worry
about the question of "Can we reproduce that tarball, from that VCS
tree?".

This stems from the original model of open source distribution, where
publication *was* a matter of putting a tarball up on a website
somewhere, and it was an open question as to whether or not the
publisher was even using a version control system at all (timeline:
RCS=1982, CVS=1986, SVN=2000, git/hg=2005, with Linux distributions
getting their start in the early-to-mid 1990's).

So SOURCE_DATE_EPOCH gets applied *after* unpacking the original
tarball, rather than being used to *create* the tarball (we already
know when the publisher created it, since that's part of the tarball
metadata).

Python's sdists mess with that assumption a bit, since it's fairly
common to include generated C files that aren't part of the original
source tree, and Cython explicitly recommends doing so in order to
avoid requiring Cython as a build time dependency:
http://docs.cython.org/en/latest/src/reference/compilation.html#distributing-cython-modules

So in many ways, this isn't the problem that SOURCE_DATE_EPOCH on its
own is designed to solve - instead, it's asking the question of "How
do I handle the case where my nominal source archive is itself a built
artifact?", which means you not only need to record source timestamps
of the original inputs you used to build the artifact (which the
version control system will give you), you also need to record details
of the build tools used (e.g. using a different version of Cython will
generate different code, and hence different "source" archives), and
decide what to do with any timestamps on the *output* artifacts you
generate (e.g. you may decide to force them to match the commit date
from the VCS).

So saying "SOURCE_DATE_EPOCH will be set to the VCS commit date when
creating an sdist" would be a reasonable thing for an sdist creation
tool to decide to do, and combined with something like `Pipfile.lock`
in `pipenv`, or a `dev-requirements.txt` with fully pinned versions,
*would* go a long way towards giving you reproducible sdist archives.

However, it's not a problem to be solved by adding anything to the
produced sdist: it's a property of the publishing tools that create
sdists to aim to ensure that given the same inputs, on a different
machine, at a different time, you will nevertheless still get the same
result.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Jakub Wilk

* Matthias Bussonnier , 2017-09-29, 11:16:

I'm interested in the reproducible build of an _sdist_.
That is to say the process of going from a given commit to the 
corresponding TGZ file. It is my understanding that setting 
SOURCE_DATE_EPOCH (SDE for short) should allow a reproducible building 
of an Sdist;


It not enough to normalize timestamps. You need to normalize permissions 
and ownership, too.


(I'm using https://pypi.python.org/pypi/distutils644 for normalizing 
permissions/ownership in my own packages.)


I cannot seem to be able to do that without unpacking and repacking the 
tgz myself;


Yeah, I don't believe distutils honors SOURCE_DATE_EPOCH at the 
moment.



Second; is there a convention to store the SDE value ?


In the changelog.

--
Jakub Wilk
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Matthias Bussonnier
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.

That make sens – and that would be useful, but then that mean you need
to have the sdist to reproduce the sdist...
I was more thinking of a location in the source-tree/commit; for
example in pyproject.toml's tool section.
So if I give you only that you can tell me "When I build the sdist I
get this sha256", and I can do the same independently.

-- 
M

On Fri, Sep 29, 2017 at 1:02 PM, Thomas Kluyver  wrote:
> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>> Second; is there a convention to store the SDE value ? I don't seem to
>> be able to find one. It is nice to have reproducible build; but if
>> it's a pain for reproducers to find the SDE value that highly decrease
>> the value of SDE build.
>
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.
>
> Thomas
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Thomas Kluyver
On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
> Second; is there a convention to store the SDE value ? I don't seem to
> be able to find one. It is nice to have reproducible build; but if
> it's a pain for reproducers to find the SDE value that highly decrease
> the value of SDE build.

Does it make sense to add a new optional metadata field to store the
value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
guess it could cause problems if unpacking & repacking a tarball means
that its metadata is no longer accurate, though.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Matthias Bussonnier
Hello there,

I'm going to ask questions about Reproducible Builds, a previous
thread have been started in March[1], but does not cover some of the
questions I have.

In particular I'm interested in the reproducible build of an _sdist_.
That is to say the process of going from a given commit to the
corresponding TGZ file. It is my understanding that setting
SOURCE_DATE_EPOCH (SDE for short) should allow a reproducible building
of an Sdist;
And by reproducible I mean that the tgz itself is the same byte for
byte;  (the unpacked-content being the same is a weaker form I'm less
interested in).
Is this assumption correct?

In particular I cannot seem to be able to do that without unpacking
and repacking the tgz myself; because the copy_tree-taring and the
gziping by default embed the current timestamp of when these functions
were ran. Am I missing something ?

Second; is there a convention to store the SDE value ? I don't seem to
be able to find one. It is nice to have reproducible build; but if
it's a pain for reproducers to find the SDE value that highly decrease
the value of SDE build.

Also congrats for pep 517 and thanks for everyone who participated;

Thanks
-- 
Matthias

1: https://mail.python.org/pipermail/distutils-sig/2017-March/030284.html
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig