Re: Reproductibility, Data Services, guix weather

2020-10-16 Thread Ludovic Courtès
Hi!

zimoun  skribis:

> (define %prefix-url
>   "https://data.guix-patches.cbaines.net/revision;)
>
> (define %suffix-url
>   "output_consistency=not-matching=none_results=on")
>
> (define %json-name "package-derivation-outputs.json")
>
>
> (define* (json-url revision
>   #:optional
>   (system (%current-system))
>   (json %json-name))
>   "Return the URL corresponding to REVISION."
>   (string-append
>%prefix-url "/" revision "/" json "?" %suffix-url"=" system))
>
> (define* (json-file revision
> #:optional
> (system (%current-system))
> (name %json-name)
> (tmp %temporary-directory))
>   "Path where the JSON is stored.
>
> By default in %TEMPORARY-DIRECTORY/REV-%JSON-NAME."
>   (let ((hash (substring revision 0 6)))
> (string-append tmp "/" hash "-" name "-" system)))
>
> (define (fetch-json revision system)
>   "Fetch the JSON file from the Data Service corresponding to REVISION.
>
> Store the result in %TEMPORARY-DIRECTORY."
>   (let* ((out (json-file revision system))
>  (url (json-url revision system)))
> (url-fetch url out)))

I think it’s a good idea.  My suggestion would be to do as for (guix
ci): make a (guix data-service-client) (?) module that contains proper
bindings to a subset of the Data Service APIs, using
‘define-json-mapping’.

Once we have that, we can consider using it in ‘guix weather’ or in new
tools such as the proposed ‘guix git log’.

Ludo’.



Re: Reproductibility, Data Services, guix weather

2020-10-15 Thread zimoun
On Thu, 15 Oct 2020 at 09:45, Christopher Baines  wrote:

>> Can I expect that all the revisions are there?  Or only some?
>
> Well, definitely not all revisions, but for the patches instance of the
> Guix Data Service I'm aiming to keep recent revisions.

Well, I am going to put that in my script and will report to you if I am
not able to reach the expected revisions.

--8<---cut here---start->8---
(match (current-profile)
  (#f %guix-version)   ;for lack of a better ID
  (profile
   (let ((channel
  (find guix-channel? (profile-channels profile
 (channel-commit channel))
--8<---cut here---end--->8---

Thanks,
simon



Re: Reproductibility, Data Services, guix weather

2020-10-15 Thread Christopher Baines

zimoun  writes:

> Hi Chris,
>
> On Tue, 13 Oct 2020 at 21:41, Christopher Baines  wrote:
>
>> > First, Chris could you add the fields package name and version?  Because
>> > it is hard to automatically reconstruct them by parsing the output-path.
>>
>> Done in [1], and I've updated data.guix-patches.cbaines.net.
>
> Neat!  I have updated my script.  Now, I need to add some build-system
> support to easy the triage.  Keep you in touch.
>
> Thank you.
>
>> > (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
>> > does not seems founding in Guix repo.)
>>
>> So, that particular commit is just some revision of Guix with some
>> patches applied. I picked it because it was the most recent
>> data. There's now recent commits for the master branch itself [2] like
>> [3].
>
> Can I expect that all the revisions are there?  Or only some?

Well, definitely not all revisions, but for the patches instance of the
Guix Data Service I'm aiming to keep recent revisions.


signature.asc
Description: PGP signature


Re: Reproductibility, Data Services, guix weather

2020-10-14 Thread zimoun
Hi Chris,

On Tue, 13 Oct 2020 at 21:41, Christopher Baines  wrote:

> > First, Chris could you add the fields package name and version?  Because
> > it is hard to automatically reconstruct them by parsing the output-path.
>
> Done in [1], and I've updated data.guix-patches.cbaines.net.

Neat!  I have updated my script.  Now, I need to add some build-system
support to easy the triage.  Keep you in touch.

Thank you.

> > (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
> > does not seems founding in Guix repo.)
>
> So, that particular commit is just some revision of Guix with some
> patches applied. I picked it because it was the most recent
> data. There's now recent commits for the master branch itself [2] like
> [3].

Can I expect that all the revisions are there?  Or only some?


Cheers,
simon



Re: Reproductibility, Data Services, guix weather

2020-10-13 Thread Christopher Baines

zimoun  writes:

> The issue is to be able to find them.  I proposed (below) to run cron
> task doing ’--check’ on the build farms and then report by email the
> failure.  Chris indicated me the work they is doing [3] and instead of a
> cron task, they is proposing to parse the JSON.  That’s what the tiny
> script attached is doing.
>
>guix repl -L . -- weather-repro.scm
>
> For example, I run:
>
>guix repl -L . -- weather-repro.scm | sort | grep ghc
>
> to list (almost) all the unreproducible Haskell packages.  What I would
> like is to be able to filter by build system for example.
>
>
> First, Chris could you add the fields package name and version?  Because
> it is hard to automatically reconstruct them by parsing the output-path.

Done in [1], and I've updated data.guix-patches.cbaines.net.

1: 
https://git.savannah.gnu.org/cgit/guix/data-service.git/commit/?id=f15dc5ab0b48f4228a3c545052a1e4daf3e80f15

> Second, the revision of 
> does not match the Guix commit.  Is it possible to have a bridge?  Other
> said, how is computed this revision hash?
>
> (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
> does not seems founding in Guix repo.)

So, that particular commit is just some revision of Guix with some
patches applied. I picked it because it was the most recent
data. There's now recent commits for the master branch itself [2] like
[3].

2: https://data.guix-patches.cbaines.net/repository/2/branch/master
3: 
https://data.guix-patches.cbaines.net/revision/ec82d58526c27a9ca26f6c5e39cec90a48cbc1cc


signature.asc
Description: PGP signature


Reproductibility, Data Services, guix weather

2020-10-12 Thread zimoun
Dear,

Recently, we discovered a regression in the Haskell build system:
introducing unreproducible builds.  Well, it was a kind of luck: I was
testing ’git-annex’ with the willing to have ’git-annex-assitant’
building it several times (--check) [1].

Aside this particular issue, ~10% of packages are not reproducible and I
am not convinced that “--check” is done by submitter/committer at each
update or new package.  Otherwise the case of unreproducible Mesa [2]
would raised before than June. :-)  (That’s fine, we need package after
all and we cannot fix the world all in the same time. :-))

The issue is to be able to find them.  I proposed (below) to run cron
task doing ’--check’ on the build farms and then report by email the
failure.  Chris indicated me the work they is doing [3] and instead of a
cron task, they is proposing to parse the JSON.  That’s what the tiny
script attached is doing.

   guix repl -L . -- weather-repro.scm

For example, I run:

   guix repl -L . -- weather-repro.scm | sort | grep ghc

to list (almost) all the unreproducible Haskell packages.  What I would
like is to be able to filter by build system for example.


First, Chris could you add the fields package name and version?  Because
it is hard to automatically reconstruct them by parsing the output-path.

Second, the revision of 
does not match the Guix commit.  Is it possible to have a bridge?  Other
said, how is computed this revision hash?

(A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which
does not seems founding in Guix repo.)


Third, this tiny script is better than nothing but *far far away* form
perfect.  The question about tooling is: does it make sense to include
something like that directly in “guix weather”?  For example,

  guix weather --reproducible

or maybe under “guix challenge”?


WDYT?  Feedback and ideas are very welcome. :-)


All the best,
simon

PS: Below my question and the Chris’s answer.  Both deserve to be public
as Chris told me. :-)


1: 
2: 
3: 



 Start of forwarded message 
From: zimoun 
Subject: [guix-sysadmin] whishlist: Hook on the build-farm?
Date: Sun, 11 Oct 2020 17:19:26 +0200

Hi,

Currently, it is hard to catch:

  1. which commit breaks which package
  2. if the package builds reproductibly

Even if the Data services helps, *a lot!*.  There are still a lot of
manual actions to spot one or the other.  And I fully agree that the
work initiated by Chris is The Right Thing©.

However it is not ready and the man power is not extensible.  For the
#1, Danny have started a discussion. 


For the #2, I am proposing to add a cron task on one build-farm.  To be
concrete, let’s *randomly* pick 100 packages once a week, rebuild with
“--check“ and send by email the unreproducible packages.

Even, I am proposing: 1rst week 100 packages of build-system “foo”, 2nd
week 100 packages of build-system “bar”, 3rd week…

It is far from perfect but it seems a good heuristic to catch
regression, spot packages with reproducibility troubles, etc.  Note that
it should not happen since the committer should catch the
reproducibility issue; but as a matter of fact it is not the case.
Somehow, I am proposing a workaround.


I volunteer to be the recipient of these automatic emails, then I can do
some triage (remove false-positive, check what’s going, etc.)  and open
a bug report if there is an issue.

Currently, I do not have the CPU power to do so.  So I am asking if it
possible to put something like that on one of the building machines.  I
totally understand an answer as: « Simon, you are enthusiast and that’s
nice but no and go to hell! » :-)

Cheers,
simon
 End of forwarded message 

 Start of forwarded message 
From: Christopher Baines 
Subject: Re: [guix-sysadmin] whishlist: Hook on the build-farm?
Date: Sun, 11 Oct 2020 17:38:03 +0100

> Currently, I do not have the CPU power to do so.  So I am asking if it
> possible to put something like that on one of the building machines.  I
> totally understand an answer as: « Simon, you are enthusiast and that’s
> nice but no and go to hell! » :-)

I too would really like to be able to identify/prevent regressions,
including with respect to build reproducibility, and although the work
I'm doing on this is going slowly I'm hoping that with the Guix Build
Coordinator now I'll be able to get something sort of working.

I've just made a few tweaks to the Guix Data Service to make the data it
has on this a little easier to use.

This URL [1] should show you package reproducibility stats for each
architecture, computed from