Re: unexpected reproducibility of reproducible blog post?

2020-05-05 Thread Konrad Hinsen
Hi Ludo,

> Grafts are normal derivations, and they’re deterministic: it’s just
> about replacing a set of strings by another set of strings.
>
> On the implementation, see also
> .

Thanks for the clarification. That post makes it clear that grafts only
modify the build process of grafted packages.

Cheers,
  Konrad



Re: unexpected reproducibility of reproducible blog post?

2020-05-05 Thread Ludovic Courtès
Hi,

Konrad Hinsen  skribis:

> I looked a bit at grafts. The documentation at
>
>   https://guix.gnu.org/manual/en/html_node/Security-Updates.html
>
> isn't very explicit about the reproducibility of grafts. In particular,
> it doesn't say if a package containing patched binaries retains its
> original hash, or receives a new unique one. With a unique hash, grafts
> would just be a tweak in the build system, and no less reproducible than
> standard builds. It looks like I have to dive into the source code to
> find out!

Grafts are normal derivations, and they’re deterministic: it’s just
about replacing a set of strings by another set of strings.

On the implementation, see also
.
I’m also preparing a post of the recent (pre-1.1.0) changes in that
area.

Ludo’.



Re: unexpected reproducibility of reproducible blog post?

2020-05-04 Thread zimoun
Hi Konrad,

(add Ludo for advice :-))

On Mon, 4 May 2020 at 15:50, Konrad Hinsen  wrote:

> > I will add something overthere for tracking reproduciblity infos in
> > the future.
>
> It would actually be nice to have some external Guix reproducibility
> surveillance. A few benchmark packages that will be rebuilt regularly,
> using frozen commits via time-machine, and checked for bit-by-bit
> identity explicitly, not relying on Guix' hash mechanism. Trust but
> verify.
>
> My example is perhaps not such a bad start. Building a Docker container
> containing gcc exercises a lot of code in Guix.

Does it make sense to:

add the file "tests/guix-reproducibility.sh"?
So that reproducibility issues are detected by "make check".

Or add another rule in the Makefile?

Or test reproducibility outside the Guix tree?


All the best,
simon



>
> I looked a bit at grafts. The documentation at
>
>   https://guix.gnu.org/manual/en/html_node/Security-Updates.html
>
> isn't very explicit about the reproducibility of grafts. In particular,
> it doesn't say if a package containing patched binaries retains its
> original hash, or receives a new unique one. With a unique hash, grafts
> would just be a tweak in the build system, and no less reproducible than
> standard builds. It looks like I have to dive into the source code to
> find out!
>
> Cheers,
>   Konrad



Re: unexpected reproducibility of reproducible blog post?

2020-05-04 Thread Konrad Hinsen
Hi Simon,

> I will add something overthere for tracking reproduciblity infos in
> the future.

It would actually be nice to have some external Guix reproducibility
surveillance. A few benchmark packages that will be rebuilt regularly,
using frozen commits via time-machine, and checked for bit-by-bit
identity explicitly, not relying on Guix' hash mechanism. Trust but
verify.

My example is perhaps not such a bad start. Building a Docker container
containing gcc exercises a lot of code in Guix.

I looked a bit at grafts. The documentation at

  https://guix.gnu.org/manual/en/html_node/Security-Updates.html

isn't very explicit about the reproducibility of grafts. In particular,
it doesn't say if a package containing patched binaries retains its
original hash, or receives a new unique one. With a unique hash, grafts
would just be a tweak in the build system, and no less reproducible than
standard builds. It looks like I have to dive into the source code to
find out!

Cheers,
  Konrad



Re: unexpected reproducibility of reproducible blog post?

2020-04-30 Thread zimoun
On Wed, 29 Apr 2020 at 18:00, Konrad Hinsen  wrote:

> I have also opened an issue for this:
>
>   https://github.com/khinsen/reproducibility-with-guix/issues/2

I will add something overthere for tracking reproduciblity infos in the future.


> > Grafts or maybe Guile 2 -> 3?
>
> With time-machine, you run the full Guix from back then, so you run
> Guile 2 if that's what it takes. What I am not so sure about is how the
> old Guix release is built. If the build uses the equivalent of "guix
> environment guix", it would start using Guile 3.

>From [1] and assuming that the commit was the same, i.e.,
769b96b62e8c09b078f73adc09fb860505920f8f, there is also a mismatch
about the resulting binary.

Expected:
1be3c1b5d1e065017e4c56f725b1a692

Now:
2805a33e2e48f648307c6b913b69e41c

--8<---cut here---start->8---
guix describe # f03e5ca
guix time-machine \
 --commit=769b96b62e8c09b078f73adc09fb860505920f8f \
 -- environment --container --ad-hoc gcc-toolchain \
 -- gcc pi.c -o pi-guix
--8<---cut here---end--->8---

[1] https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00192.html

>From f03e5ca, the time machine downloads the substitute:
   
https://ci.guix.gnu.org/nar/lzip/ij38zh495f81xpzmp4qzqz4fprczwck2-gcc-toolchain-9.2.0


> Time travelling is not as simple as it looks, but then we should have
> expected that!

I agree but it is annoying.
Because `in fine` the computations are not more reproducible than say
Debian if 3 months later we are not able to reproduce them bit-to-bit.
I do not know. Maybe it is about 'time-machine', maybe about the exact
commit used (most probable! :-)), maybe about the Guix build toolchain
(seed) used to travel back and restore the previous build toolchain.
Who knows? :-)

Well, I will try later with my desktop machine when I will be back at
the office; hoping that I did not garbage collected. :-)


Cheers,
simon



Re: unexpected reproducibility of reproducible blog post?

2020-04-29 Thread Konrad Hinsen
zimoun  writes:

> Argh! The author should watch the Fun MOOC about computational
> reproducibility. ;-)

That would probably help. I'll pass on the message ;-)

I have also opened an issue for this:

  https://github.com/khinsen/reproducibility-with-guix/issues/2

> Grafts or maybe Guile 2 -> 3?

With time-machine, you run the full Guix from back then, so you run
Guile 2 if that's what it takes. What I am not so sure about is how the
old Guix release is built. If the build uses the equivalent of "guix
environment guix", it would start using Guile 3.

Time travelling is not as simple as it looks, but then we should have
expected that!

Cheers,
  Konrad



Re: unexpected reproducibility of reproducible blog post?

2020-04-29 Thread zimoun
Hi Ricardo,

On Wed, 29 Apr 2020 at 14:44, Ricardo Wurmus  wrote:
>
>
> Konrad Hinsen  writes:
>
> > One question I have been wondering about is the possibility of grafts
> > being an obstacle to reproducibility. Grafts are something I don't
> > really understand yet, so I cannot answer this question. In particular,
> > does a grafted package get a different hash from a package built with
> > grafting disabled?
>
> Yes.
>
> A grafted package is a copy of the original package but with all
> references to /gnu/store/AA-… replaced with /gnu/store/BB-….
> This is done recursively starting with the direct users of the
> replaced package and for all users of those users.

Could the grafts explain the mismatch reported before?


Cheers,
simon



Re: unexpected reproducibility of reproducible blog post?

2020-04-29 Thread Ricardo Wurmus


Konrad Hinsen  writes:

> One question I have been wondering about is the possibility of grafts
> being an obstacle to reproducibility. Grafts are something I don't
> really understand yet, so I cannot answer this question. In particular,
> does a grafted package get a different hash from a package built with
> grafting disabled?

Yes.

A grafted package is a copy of the original package but with all
references to /gnu/store/AA-… replaced with /gnu/store/BB-….
This is done recursively starting with the direct users of the
replaced package and for all users of those users.

--
Ricardo



Re: unexpected reproducibility of reproducible blog post?

2020-04-29 Thread zimoun
Hi Konrad,

On Wed, 29 Apr 2020 at 11:26, Konrad Hinsen  wrote:

> > Has the file 'guix-version-for-reproduction.txt' been tracked?
>
> Unfortunately not. The repository for the preparation of the post
> is at
>
>   https://github.com/khinsen/reproducibility-with-guix/
>
> but it doesn't contain the file 'guix-version-for-reproduction.txt'.

Argh! The author should watch the Fun MOOC about computational
reproducibility. ;-)


> > Is really the commit 769b96b62e8c09b078f73adc09fb860505920f8f used to
> > produce the Docker image listed in the blog post?
>
> Hard to say... I can't play with that right now because I am running
> jobs on my computer that eat all the memory.

Leo reproduced the new observed hash.


> One question I have been wondering about is the possibility of grafts
> being an obstacle to reproducibility. Grafts are something I don't
> really understand yet, so I cannot answer this question. In particular,
> does a grafted package get a different hash from a package built with
> grafting disabled?

Grafts or maybe Guile 2 -> 3?


Cheers,
simon



Re: unexpected reproducibility of reproducible blog post?

2020-04-29 Thread Konrad Hinsen
Hi Simon,

> Based on the nice blog post [1], instead of really travelling I just
> travel in time. :-)
> If I read correctly and if I did not do any mistake, the final hash is
> not the same now than before. It is not what I was expecting.
>
> Expected output (blog post):
> /gnu/store/iqn9yyvi8im18g7y9f064lw9s9knxp0w-docker-pack.tar
>
> Returned output:
> /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar
>
> Has the file 'guix-version-for-reproduction.txt' been tracked?

Unfortunately not. The repository for the preparation of the post
is at 

  https://github.com/khinsen/reproducibility-with-guix/

but it doesn't contain the file 'guix-version-for-reproduction.txt'.

> Is really the commit 769b96b62e8c09b078f73adc09fb860505920f8f used to
> produce the Docker image listed in the blog post?

Hard to say... I can't play with that right now because I am running
jobs on my computer that eat all the memory.

One question I have been wondering about is the possibility of grafts
being an obstacle to reproducibility. Grafts are something I don't
really understand yet, so I cannot answer this question. In particular,
does a grafted package get a different hash from a package built with
grafting disabled?

Cheers,
  Konrad.



Re: unexpected reproducibility of reproducible blog post?

2020-04-27 Thread Leo Prikler
Hi zimoun,

Am Montag, den 27.04.2020, 12:05 +0200 schrieb zimoun:
> Hi Leo,
> 
> Thank you for testing.
> 
> 
> On Mon, 27 Apr 2020 at 00:53, Leo Prikler <
> leo.prik...@student.tugraz.at> wrote:
> 
> > yours: /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar
> > mine:  /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar
> 
> Nice!
> 
> What is your "guix describe"?
I used the same time-machine as you did, so I don't think it matters,
but I'm currently at 1e700c656a7a25ff2eb27fa552bc213cc50efb2a.  The
result is the same for 86081d9d7f88a7faee6fd14e8a085cb95ac1e36a (a
"random" commit in January), as long as you remember to actually use
the time-machine to jump back to the commit mentioned in the blog post.

> > I don't know, what configuration exactly went into the blog post,
> > but I
> > assume, it is not the same as for the time-machine experiments
> > before.
> > Since the prefix `guix time-machine --channels=guix-version-for-
> > reproduction.txt --` appears to be missing from the command, that
> > hash
> > is therefore probably not indicative of anything.
> 
> I do not know. That's why I am asking. :-)
> Because when reading the blog post, I naively assume that all had
> been
> run with the same version of Guix and the post mentions only one
> commit. Well, if it is not the case, it should be mentioned in the
> blog post because it is currently misleading, IMO.
I agree.  There should probably be an addendum correcting the
information.  Perhaps adding some up-to-date hashes would probably not
be a bad idea either.

> > I think the larger problem here is that, while Guix itself is
> > reproducible, Guix + org-mode (specifically the latter) is not.
> 
> Why?
There are too many ways to bork org-mode -- you yourself specifically
list one of them later.  I don't think org-mode not being reproducible
is a big secret.

> > Particularly, looking at the source[1,2], it appears as if all code
> > blocks were evaluated once, but evaluating them again in a new
> > environment would bring different results.
> 
> Do you mean evaluate twice in a row leads to different results?
> By results, I mean items in '/gnu/store'.
> Because, yes the org-babel cache should not be reproducible. But that
> another story and should not impact the result of a source block.
That by itself is no biggie, but it becomes particularly bad when you
throw in partial evaluation.  If you don't evaluate all the code blocks
once, your results may get stale and then you'd export those stale
results.  Throw in some command that hasn't been evaluated yet and
evaluate it on export and you get yourself a recipe for deluxe bogus.

> My point is:
>  - only one Guix commit is provided by the post, so it seems
> legitimate to assume this commit had been used for all the post
>  - using this commit leads to different item in the store
This assumption may appear intuitive, but by reproducing the time-
machine on many Guix systems and verifying that it is indeed
reproducible (albeit with a different hash), we can invalidate it.

> The question is why?
>  - another commit had been used. Which one? Could be mentioned in the
> post?
>  - or there is something unexpected and let inspect what.
I assume it was accidental.  Had it been known earlier, that a
different commit was used, the blog post would have been updated
between Jan 14th (last update in git) and Jan 24th (official post),
would it not?

All the best,
Leo




Re: unexpected reproducibility of reproducible blog post?

2020-04-27 Thread zimoun
Hi Leo,

Thank you for testing.


On Mon, 27 Apr 2020 at 00:53, Leo Prikler  wrote:

> yours: /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar
> mine:  /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar

Nice!

What is your "guix describe"?


> I don't know, what configuration exactly went into the blog post, but I
> assume, it is not the same as for the time-machine experiments before.
> Since the prefix `guix time-machine --channels=guix-version-for-
> reproduction.txt --` appears to be missing from the command, that hash
> is therefore probably not indicative of anything.

I do not know. That's why I am asking. :-)
Because when reading the blog post, I naively assume that all had been
run with the same version of Guix and the post mentions only one
commit. Well, if it is not the case, it should be mentioned in the
blog post because it is currently misleading, IMO.


> I think the larger problem here is that, while Guix itself is
> reproducible, Guix + org-mode (specifically the latter) is not.

Why?


> Particularly, looking at the source[1,2], it appears as if all code
> blocks were evaluated once, but evaluating them again in a new
> environment would bring different results.

Do you mean evaluate twice in a row leads to different results?
By results, I mean items in '/gnu/store'.
Because, yes the org-babel cache should not be reproducible. But that
another story and should not impact the result of a source block.


> In other words, you'd have
> to use `guix time-machine` inside `guix time-machine` to get a truly
> reproducibly org-mode file, or else come up with a smart way of
> dynamically updating the hash in the source blocks themselves.

I do not know and I am not sure to follow.


My point is:
 - only one Guix commit is provided by the post, so it seems
legitimate to assume this commit had been used for all the post
 - using this commit leads to different item in the store

The question is why?
 - another commit had been used. Which one? Could be mentioned in the post?
 - or there is something unexpected and let inspect what.


All the best,
simon



unexpected reproducibility of reproducible blog post?

2020-04-26 Thread Leo Prikler
Hi simon,

I've executed your commands, et voilà
yours: /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar
mine:  /gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar
Unsurprisingly, this did not change when adding channels -- though, if
you were to add your personal channel with some override for gcc-
toolchain, things might be different.

I don't know, what configuration exactly went into the blog post, but I
assume, it is not the same as for the time-machine experiments before. 
Since the prefix `guix time-machine --channels=guix-version-for-
reproduction.txt --` appears to be missing from the command, that hash
is therefore probably not indicative of anything.

I think the larger problem here is that, while Guix itself is
reproducible, Guix + org-mode (specifically the latter) is not. 
Particularly, looking at the source[1,2], it appears as if all code
blocks were evaluated once, but evaluating them again in a new
environment would bring different results.  In other words, you'd have
to use `guix time-machine` inside `guix time-machine` to get a truly
reproducibly org-mode file, or else come up with a smart way of
dynamically updating the hash in the source blocks themselves.

All the best,
Leo

[1] https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00106.html
[2] 
https://github.com/khinsen/reproducibility-with-guix/blob/master/reproducibility-with-guix.org




unexpected reproducibility of reproducible blog post?

2020-04-26 Thread zimoun
Dear,

Based on the nice blog post [1], instead of really travelling I just
travel in time. :-)
If I read correctly and if I did not do any mistake, the final hash is
not the same now than before. It is not what I was expecting.

Expected output (blog post):
/gnu/store/iqn9yyvi8im18g7y9f064lw9s9knxp0w-docker-pack.tar

Returned output:
/gnu/store/klisfr3a4wxb9dc5sgibb45kky72kg65-docker-pack.tar

Has the file 'guix-version-for-reproduction.txt' been tracked?
Is really the commit 769b96b62e8c09b078f73adc09fb860505920f8f used to
produce the Docker image listed in the blog post?


Thank you in advance.

All the best,
simon


--8<---cut here---start->8---
guix describe
Generation 11   Apr 26 2020 19:24:23(current)
  guix ca4b558
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: ca4b55882a0f6b4ba46253485afb82ec000f8fc2
--8<---cut here---end--->8---

--8<---cut here---start->8---
  guix time-machine --commit=769b96b62e8c09b078f73adc09fb860505920f8f \
   --  pack -f docker -C none \
   -S /bin=bin -S /lib=lib -S /share=share -S /etc=etc \
   gcc-toolchain
--8<---cut here---end--->8---


[1] https://guix.gnu.org/blog/2020/reproducible-computations-with-guix/