Re: Project direction with testing changes (branches and patches)

2021-08-11 Thread Christopher Baines

Ludovic Courtès  writes:

> Hi Chris,
>
> Christopher Baines  skribis:
>
>> I was thinking of using Cuirass for building derivations when testing
>> patches, but I gave up on that approach back in 2019 after trying to use
>> it (I discussed trying to use it here [1]).
>>
>> 1: https://lists.gnu.org/archive/html/guix-devel/2019-03/msg00010.html
>>
>> I was specifically thinking about testing patches when I initially
>> designed both the Guix Data Service and Guix Build Coordinator. For me
>> at least, the focus has been on this direction for the last ~3 years.
>>
>> I realise that Cuirass now has some of the functionality that the Guix
>> Data Service was written to provide, like tracking all
>> packages/derivations in each revision. But from my perspective, Cuirass
>> still lacks key things like being able to compare two revisions and
>> tracking lint warnings.
>
> Cuirass has always had to track packages/derivations in each revision;
> it’s been this way from day 1 when it started more or less as a
> revisited Hydra, which did exactly this.

I know it tracks some derivations against a revision, but it has only
been tracking all of the derivations associated with each revision since
this change [1].

1: 
https://git.savannah.nongnu.org/cgit/guix/guix-cuirass.git/commit/?id=bba1311478a50c837a8c70a556d308ca32ead816

>> There's also things like testing derivations on different hardware,
>> regularly testing fixed output derivations and automatically retrying
>> failed builds that I think the Guix Build Coordinator is better setup to
>> do compared to Cuirass.
>>
>> But this feedback is why I started this thread. I don't see the same
>> option as was found for improving substitutes by setting up a new
>> substitute server using the Guix Data Service and the Guix Build
>> Coordinator. There's a much stronger need to have one approach as a
>> project for testing changes, and if using the Guix Data Service and Guix
>> Build Coordinator isn't looking like a convincing option at this point,
>> that's better to know now, compared to later when more time and effort
>> has been put in.
>
> I can sympathize with the bitter feeling.  I do think though that we
> must work collectively; to me it’d be a problem if misplaced competition
> were to prevent us from moving forward.
>
> Several concrete incremental steps were proposed in this thread and
> earlier.  Instead of trying to provide a definite answer as to whether
> the grand plan you propose is a convincing option at this point, I’d
> like us to collect the low-hanging fruits, in an opportunistic way.  :-)
>
> Several easy hacks have been proposed in this thread and before: custom
> web/CLI views for the Data Service, Cuirass APIs to spin up specs on the
> fly, Data Service integration in Mumi/Gitile, Cuirass notifications sent
> to the Data Service, etc.  None of these is impressive in itself, but
> each of these can be a step making our hacker lives better, IMO.
>
> WDYT?

I don't perceive this as bitterness, just pragmatism.

I think I have an answer now as to whether there's consensus on making
use of the Guix Data Service and Guix Build Coordinator for testing
changes. Knowing there's some objections is useful when considering the
risk and managing my own expectations.

I still personally think that the general direction this work is going
is a good one. There's still some areas of uncertianty, but there's
definitely some stuff that can be done to move forward and more
discussion that can be had.

Thanks for your suggestions on next steps, I still need to get my own
thoughts in order. As you alluded to, I do like to have a plan in mind.

Thanks,

Chris


signature.asc
Description: PGP signature


Re: core-updates-frozen on powerpc64le-linux

2021-08-11 Thread Thiago Jung Bauermann
Hello!

Em quarta-feira, 11 de agosto de 2021, às 07:18:27 -03, Ludovic Courtès 
escreveu:
> Thiago Jung Bauermann  skribis:
> > Em quarta-feira, 4 de agosto de 2021, às 17:48:59 -03, Ludovic Courtès
> > escreveu:
> >> Thiago Jung Bauermann  skribis:
> >> Note that currently ci.guix only does emulated powerpc64le-linux
> >> because
> >> the only POWER9 machine we currently have access to (lent by OSUOSL)
> >> is
> >> not running ‘cuirass-remote-worker’.



> >> It’s a foreign distro (Debian) so
> >> setting up these things can be a bit tedious.  If you or anyone would
> >> like to help with this, we can discuss it!
> > 
> > I’d be glad to help set that up and maintain the OSUOSL machine!
> 
> Excellent!  I think Mathieu, Tobias, and guix-sysad...@gnu.org should be
> able to get you started.  Could you send us an account name and SSH
> public key?

Thanks! I sent it in private.

> >> (bordeaux.guix does have a POWER9 build machine behind, but it’s not
> >> building ‘core-updates-frozen’ currently.)
> > 
> > Nice! I’d be glad to help with that machine as well if there’s anything
> > to do on that front.
> 
> I think it’s running fine, using the Guix Build Coordinator instead of
> Cuirass, set up by Christopher Baines (this POWER9 machine is Chris’s,
> currently).

That’s great. Thanks Chris!

> >> > So next step for me is to look into the build failures above. I’ll
> >> > semi-randomly start with ‘gmp-boot’ and see what I can find out.
> >> 
> >> Neat, thank you!
> > 
> > You’re welcome. Patches on issues 49880, 49881 and 49882. :-)
> 
> Alrighty, I’ll take a look!

Thank you!

-- 
Thanks,
Thiago





Re: Hurd Security vulnerabilities, please upgrade!

2021-08-11 Thread Ludovic Courtès
Hi Samuel,

Samuel Thibault  skribis:

> Ricardo Wurmus, le mar. 10 août 2021 17:52:34 +0200, a ecrit:
>> I’m a little unclear on what this means for distributions like Guix.  Should
>> we just update to the latest version from git?  Are there specific commits
>> we should use if it’s not just the latest?
>
> Since Sergey's copyright assignment is not complete yet, it's not
> commited yet, so you have to pick up the patches from the debian
> repository.

It would be interesting to consider dropping the copyright assignment
requirement for Hurd/Mach/MiG.  For what remains primarily a hobby
project, this looks to me like a hindrance more than anything else.

Ludo’.



Re: Substitute timeouts

2021-08-11 Thread Mathieu Othacehe


Hey Ludo,

Thanks for taking the time to read my wall of text :D.

> Yeah, it’s a double-edged sword.  If this is a problem on the main ‘guix
> publish’ server, we can lower the bypass threshold, which is currently
> 50 MiB:
>
>   
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/modules/sysadmin/services.scm#n450
>
> WDYT?

That would maybe help, but on the other hand, I would prefer to find a
more definitive solution :).

> First, in terms of UI, you’d have a command sitting there and doing
> nothing, which can be off-putting.  Second, clients have no idea how
> long they’re going to wait; it could be that the nar is going to be
> baked within seconds, or it could take 20mn if the baking queue is
> already crowded or if the user is asking for a big store item like
> libreoffice.  Third, in many cases, building locally is likely to be
> faster than waiting for substitutes to be available (the majority of
> packages build very quickly, though the few most popular leaf packages
> take a long time to build).

It would be interesting to monitor the status of the baking
workers. Could it really take 20 minutes to bake a substitute from your
experience?

Personally, I have always found this baking 404 and bypass cache a bit
misleading. When substituting libreoffice, I would much rather wait a
few minutes than trying to build it while there's an almost ready
substitute. I get that this is a personal choice and maybe it should be
an optional behaviour.

>> It will also allow the Cuirass build farm to use directly the main guix
>> publish server, simplifying the current CI setup.
>
> The only reason why Cuirass runs its own publish server is to avoid
> overloading the main one?

No, the main reason is that with the use of a publish cache, the Cuirass
workers would probably hit 404 errors while the substitutes are being
baked. Using a publish server without cache was a way to work around it.

The motivation of the 202 waiting patch was to solve both problems at
once. Maybe I should explore the narinfo dedicated thread solution as a
short term solution, while starting to think about a more long term
solution based on Fiber/Nginx.

A Cuirass dedicated solution could also be to declare a build successful
only when a nar is available and stop using a non-caching publish
server.

Thanks,

Mathieu



Re: Substitute timeouts

2021-08-11 Thread Ludovic Courtès
Hi,

Mathieu Othacehe  skribis:

> I have been investigating a problem that is visible both on the main
> guix publish server at https://ci.guix.gnu.org[1] and on the Cuirass
> build farm[2].
>
> This error comes from the fact that the publish server does not accept
> the "guix substitute" connection requests within the %fetch-timeout
> duration of 5 seconds.

Thanks for getting to the bottom of this!

> The main guix publish server is using a cache. If a requested narinfo is
> not in the cache, it will be baked and the client receives a 404
> error. Since ecaa102a58ad3ab0b42e04a3d10d7c761c05ec98 and the
> introduction of the bypass mechanism, small store items are directly
> returned.
>
> This means that the "narinfo-string" procedure can be called directly in
> the main publish thread. Running perf on the main publish server reveals
> that this procedure can be really expensive under IO pressure (GC
> running for example) because it opens a lot of files. I have observed
> that the "read-derivation-from-file" call can take up to 600 ms.
>
> If multiple clients were to ask narinfo of several items not yet cached,
> under IO pressure, I think that the publish server could become
> unresponsive and cause the timeout errors.

Yeah, it’s a double-edged sword.  If this is a problem on the main ‘guix
publish’ server, we can lower the bypass threshold, which is currently
50 MiB:

  
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/modules/sysadmin/services.scm#n450

WDYT?

> The fact that Cuirass triggers the baking of successfully built
> derivations probably doesn't help here.

Could be.  This threshold seemed to work fine earlier (and still does,
mostly?).

> Now regarding the timeout errors that are much more frequent on the
> Cuirass build farm, the cause varies a bit. The Cuirass publish server
> running on Berlin does not use a cache. This means that the
> "narinfo-string" procedure is called for each request, in the main
> thread.
>
> To fix those issues, a solution could be to run the "narinfo-string" in
> a separate thread, but it will make the publish server code even harder
> to understand.

True!  Though maybe it wouldn’t be that much worse.  :-)

The problem is that this thing is very much single-threaded, with
exceptions in a couple of places.  We could add one more exception like
you write, or fiberize it, or run it behind nginx, possibly with a tiny
bit of caching.

> My proposition would be to get rid of the bypass mechanism and instead
> implement a retry when some substitutes are reported as being baked,
> as proposed by Miguel[3].
>
> I think this is the most reasonable solution. This way, users won't
> receive 404 errors and start building substitutes that are being
> baked[4].

(If I followed correctly, the bypass mechanism is not at fault regarding
timeouts on the Cuirass publish server since it’s not using a cache,
right?)

I don’t think it’s reasonable for ‘guix substitute’ to just wait upon
202 (or 404, that doesn’t matter).

First, in terms of UI, you’d have a command sitting there and doing
nothing, which can be off-putting.  Second, clients have no idea how
long they’re going to wait; it could be that the nar is going to be
baked within seconds, or it could take 20mn if the baking queue is
already crowded or if the user is asking for a big store item like
libreoffice.  Third, in many cases, building locally is likely to be
faster than waiting for substitutes to be available (the majority of
packages build very quickly, though the few most popular leaf packages
take a long time to build).

> It will also allow the Cuirass build farm to use directly the main guix
> publish server, simplifying the current CI setup.

The only reason why Cuirass runs its own publish server is to avoid
overloading the main one?

Thanks,
Ludo’.



Re: Project direction with testing changes (branches and patches)

2021-08-11 Thread Ludovic Courtès
Hi Chris,

Christopher Baines  skribis:

> I was thinking of using Cuirass for building derivations when testing
> patches, but I gave up on that approach back in 2019 after trying to use
> it (I discussed trying to use it here [1]).
>
> 1: https://lists.gnu.org/archive/html/guix-devel/2019-03/msg00010.html
>
> I was specifically thinking about testing patches when I initially
> designed both the Guix Data Service and Guix Build Coordinator. For me
> at least, the focus has been on this direction for the last ~3 years.
>
> I realise that Cuirass now has some of the functionality that the Guix
> Data Service was written to provide, like tracking all
> packages/derivations in each revision. But from my perspective, Cuirass
> still lacks key things like being able to compare two revisions and
> tracking lint warnings.

Cuirass has always had to track packages/derivations in each revision;
it’s been this way from day 1 when it started more or less as a
revisited Hydra, which did exactly this.

The Guix Data Service provides info not available elsewhere though, and
that’s why we need to take advantage of it.

> There's also things like testing derivations on different hardware,
> regularly testing fixed output derivations and automatically retrying
> failed builds that I think the Guix Build Coordinator is better setup to
> do compared to Cuirass.
>
> But this feedback is why I started this thread. I don't see the same
> option as was found for improving substitutes by setting up a new
> substitute server using the Guix Data Service and the Guix Build
> Coordinator. There's a much stronger need to have one approach as a
> project for testing changes, and if using the Guix Data Service and Guix
> Build Coordinator isn't looking like a convincing option at this point,
> that's better to know now, compared to later when more time and effort
> has been put in.

I can sympathize with the bitter feeling.  I do think though that we
must work collectively; to me it’d be a problem if misplaced competition
were to prevent us from moving forward.

Several concrete incremental steps were proposed in this thread and
earlier.  Instead of trying to provide a definite answer as to whether
the grand plan you propose is a convincing option at this point, I’d
like us to collect the low-hanging fruits, in an opportunistic way.  :-)

Several easy hacks have been proposed in this thread and before: custom
web/CLI views for the Data Service, Cuirass APIs to spin up specs on the
fly, Data Service integration in Mumi/Gitile, Cuirass notifications sent
to the Data Service, etc.  None of these is impressive in itself, but
each of these can be a step making our hacker lives better, IMO.

WDYT?

Thanks,
Ludo’.



Re: core-updates-frozen on powerpc64le-linux

2021-08-11 Thread Ludovic Courtès
Hi!

Thiago Jung Bauermann  skribis:

> Em quarta-feira, 4 de agosto de 2021, às 17:48:59 -03, Ludovic Courtès 
> escreveu:
>> Thiago Jung Bauermann  skribis:

[...]

>> Note that currently ci.guix only does emulated powerpc64le-linux because
>> the only POWER9 machine we currently have access to (lent by OSUOSL) is
>> not running ‘cuirass-remote-worker’.
>
> Ah, I didn’t realise that. I started out my investigations of powerpc64le-
> linux CI failures using emulation on my laptop (both with qemu-user and 
> qemu-system), and found it to be a bit unreliable. I saw some failures in 
> packages’ testsuite results which don’t happen on real hardware. There was 
> one in the glib package in particular which happened on the master branch 
> and prevented a `guix pull` command from succeeding. This is what prompted 
> me to request the Minicloud VM instance.
>
>> It’s a foreign distro (Debian) so
>> setting up these things can be a bit tedious.  If you or anyone would
>> like to help with this, we can discuss it!
>
> I’d be glad to help set that up and maintain the OSUOSL machine!

Excellent!  I think Mathieu, Tobias, and guix-sysad...@gnu.org should be
able to get you started.  Could you send us an account name and SSH
public key?

>> (bordeaux.guix does have a POWER9 build machine behind, but it’s not
>> building ‘core-updates-frozen’ currently.)
>
> Nice! I’d be glad to help with that machine as well if there’s anything to 
> do on that front.

I think it’s running fine, using the Guix Build Coordinator instead of
Cuirass, set up by Christopher Baines (this POWER9 machine is Chris’s,
currently).

>> > So next step for me is to look into the build failures above. I’ll
>> > semi-randomly start with ‘gmp-boot’ and see what I can find out.
>> 
>> Neat, thank you!
>
> You’re welcome. Patches on issues 49880, 49881 and 49882. :-)

Alrighty, I’ll take a look!

Thanks,
Ludo’.



Re: New signing key

2021-08-11 Thread Ludovic Courtès
Hello,

Tobias Geerinckx-Rice  skribis:

> Question: I think committers should be trusted with discretion in how
> they prefer to manage their keys, but how about briefly documenting a
> suggested sane key-management strategy to new committers, like we
> already describe some rando's editor set-up? :-)

I had missed this message, but I think it’s a good idea!  Your message
is already a good start at that.

Thanks,
Ludo’.