Re: plz is there a roadmap for a more resilient substitutes infrastructure?

2018-11-11 Thread Giovanni Biscuolo
Hi!

sorry for my late reply

I confess I haven't still read the whole Guix/GuixSD Reference Maulal,
so my apologies if I'm asking something already documented :-S

l...@gnu.org (Ludovic Courtès) writes:

[...]

> We Guix developers don’t have control over the physical hardware behind
> hydra.gnu.org; for this machine, we rely on the work of the FSF
> sysadmins for all things hardware/networking.

OK, thanks for this info

> Unfortunately in this case, this maintenance period was rather
> unprepared: it wasn’t supposed to last a whole week, rather a few hours
> or a day at most.  Most of the time it took was about copying data to a
> new disk (!).

is it published somewhere what are the minimum hardware and disk needs
for a complete GuixSD distribution build server?

> Had this been prepared, we could have arranged to keep
> hydra.gnu.org up until the replacement was ready.  We Guix developers
> didn’t have much visibility over what was going on though, and we just
> didn’t anticipate this.

sorry about that, I'm a sysadmin and I know how much my work is
impacting others :-)

> It is clear that this prolonged downtime was harmful to many users and
> to the project’s reputation.

GuixSD does not deserve this kind of harm :-(

> What to do from here?

I once saw the existance of
https://git.savannah.gnu.org/cgit/guix/maintenance.git [1] you pointed
me (below), but did not read the entire tree

now I see we have
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/doc/1.0.org

should we add a new "super" task named "resilience of subsitutes
network"?

looking at
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/machines.scm
it seems that some deggree of resilience for hydra.gnu.org is already in
place but this does not seem to work as a distributed source of
substitute servers, but "just" to offload build jobs to the defined list
of build servers

could servers in "machines.scm" also be used as substitutes servers?

> Our main focus is on making berlin.guixsd.org the primary build farm of
> the project.  It has the advantage that one Guix dev has physical access
> to it (Ricardo); it’s also much more powerful than hydra.gnu.org and the
> associated build machines.

OK, I see it
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/doc/1.0.org#n30

more details could help fix related issues

IMHO a public guixsd.org Sysadmins Manual should be in the roadmap (as
MAYBE): that could help the core team job, show the community how the
job is done *and* help others to build on our best practices

Guix/GuixSD is *the* perfect tool for IaC (infrastructure as code),
could be *very* interesting to develop a "Literate GuixSD IaC package"
as a meta-project :-) 

maybe we could (slowly) build a reproducible IaC literate devops
document, based on org-mode babel, so we'd have both tangled code and
exported documentation

> Yet, there’s more work to do: berlin has just 1T of disk space.  Ricardo
> started looking on growing it but was stuck on software issues IIRC.  I
> think fixing this should be a priority, so I think we should help
> Ricardo fix the software issues as much as we can.

I realize I'm pretty new in this community and you can't trust me since
we do non even know each other... but I could help if needed, just tell
me (in private if more appropriate) what's the hardware issue

> That alone doesn’t fix the resilience issue: berlin.guixsd.org could go
> down at some point for some time.
>
> To address that, a possibility that was discussed recently on
> guix-sysadmin is use bayfront.guixsd.org has a separate build farm

guess you meant "use bayfront.guixsd.org *as* a separate build farm"

> and/or mirror of berlin.

[...]

>> given the prolonged issue, please also consider writing an *official*
>> blog post explaining the current situation and steps adopted to prevent
>> similar issues in the future
>
> We set up the info-guix mailing list with that in mind (but too late for
> this incident).  Posting blog posts is also a good idea; we should have
> done that, with instructions on how to switch to berlin.guixsd.org.

given the impact on project reputation, please consider a "post-mortem"
blog post on what happened: something in line with Ludo's reply to me

not all interested users and observers read this (and others) mailing
list archives

>> 1. is there a method to "replicate the whole store of an official server
>> (e.g. hydra.gnu.org once healed)" so we can just "guix publish" a
>> *complete* mirror? In this case a ready to use official
>> mirror-config.scm could be useful
>
> mirror.hydra.gnu.org is a simple nginx proxy to hydra.gnu.org.  You can
> find its config here:
>
>   
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf

OK, so it's caching proxy
I'll see if and how I can build a similar one

sorry but I still don't understand why mirror.hydra.gnu.org failed
serving substitutes during a 0.15 installation started from the install
CD: it 

Re: plz is there a roadmap for a more resilient substitutes infrastructure?

2018-11-06 Thread Ludovic Courtès
Ciao Giovanni,

Giovanni Biscuolo  skribis:

> recently users and developers are facing hard to manage problems due to
> the maintainance of hydra.gnu.org and its proxy mirror.hydra.gnu.org [1]
> since 23 Oct 2018

We Guix developers don’t have control over the physical hardware behind
hydra.gnu.org; for this machine, we rely on the work of the FSF
sysadmins for all things hardware/networking.

Unfortunately in this case, this maintenance period was rather
unprepared: it wasn’t supposed to last a whole week, rather a few hours
or a day at most.  Most of the time it took was about copying data to a
new disk (!).  Had this been prepared, we could have arranged to keep
hydra.gnu.org up until the replacement was ready.  We Guix developers
didn’t have much visibility over what was going on though, and we just
didn’t anticipate this.

It is clear that this prolonged downtime was harmful to many users and
to the project’s reputation.


What to do from here?

Our main focus is on making berlin.guixsd.org the primary build farm of
the project.  It has the advantage that one Guix dev has physical access
to it (Ricardo); it’s also much more powerful than hydra.gnu.org and the
associated build machines.

Yet, there’s more work to do: berlin has just 1T of disk space.  Ricardo
started looking on growing it but was stuck on software issues IIRC.  I
think fixing this should be a priority, so I think we should help
Ricardo fix the software issues as much as we can.


That alone doesn’t fix the resilience issue: berlin.guixsd.org could go
down at some point for some time.

To address that, a possibility that was discussed recently on
guix-sysadmin is use bayfront.guixsd.org has a separate build farm
and/or mirror of berlin.

On top of that we could have a service like httpredir.debian.org, or
maybe even a CDN where we’d replicate substitutes, or torrents (looking
at you, Julien ;-)).


At this point, all these options are still on the table, and anyone with
expertise in these areas is very much welcome!

> given the prolonged issue, please also consider writing an *official*
> blog post explaining the current situation and steps adopted to prevent
> similar issues in the future

We set up the info-guix mailing list with that in mind (but too late for
this incident).  Posting blog posts is also a good idea; we should have
done that, with instructions on how to switch to berlin.guixsd.org.

> 1. is there a method to "replicate the whole store of an official server
> (e.g. hydra.gnu.org once healed)" so we can just "guix publish" a
> *complete* mirror? In this case a ready to use official
> mirror-config.scm could be useful

mirror.hydra.gnu.org is a simple nginx proxy to hydra.gnu.org.  You can
find its config here:

  
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf

In the past a few people set up their own mirrors using a similar
configuration.

> 2. is there an official mirrors directory users can look at when needed?

No.

> 3. is there a plan to build a service similar to
> http://httpredir.debian.org/? (I looked on the web but did not find any
> reference to such plan)

Like I wrote, there’s no concrete plan at this point, which means it’s
an opportunity for you and anyone else to chime in and give a hand!

Thanks,
Ludo’.



Re: plz is there a roadmap for a more resilient substitutes infrastructure?

2018-11-03 Thread Pjotr Prins
On Fri, Nov 02, 2018 at 11:51:20PM +0100, Julien Lepiller wrote:
> We could easily distribute nar files over distributed networks (IPFS, 
> bittorrent, …) but we still need a "canonical source" that builds these 
> packages, otherwise how do you know what you are looking for? Don't we always 
> need some sort of central authority?

Yes. A name service which is fed from accredited build servers. 

It is not hard to keep a few build servers in the 'air' which can be
replaced on demand - even run in the cloud or in VMs. What is hard it
to create a 100% uptime service that serves many generations of nars.
Lot of data, and the data load can be high. This is what we ought to
consider fanning out.

Guix can support both systems, existing and new. Just add a
substitute-url which resolves to an IPFS based naming scheme. Could
even be integrated with guix-publish. Anyone who would run a
guix-publish server could choose to expose an IPFS node for sharing.

But I think it can be lighter weight. If we have a name service we
could indeed just make use of any protocol that serves files. As long
as the download hash is known. So, guix-named provides pointers to nar
entities with their download hash and guix-download is capable of
querying guix-named and provides more protocols. IPFS protocol is well
defined and there exist implementations in multiple languages. 

Anyway, this all requires more thought and a proof-of-concept. The
point really is to design a distributed system based on existing
components.

Pj.





Re: plz is there a roadmap for a more resilient substitutes infrastructure?

2018-11-02 Thread Julien Lepiller
We could easily distribute nar files over distributed networks (IPFS, 
bittorrent, …) but we still need a "canonical source" that builds these 
packages, otherwise how do you know what you are looking for? Don't we always 
need some sort of central authority?

Le 2 novembre 2018 22:04:51 GMT+01:00, Pjotr Prins  
a écrit :
>On Fri, Nov 02, 2018 at 01:16:03PM +0100, Giovanni Biscuolo wrote:
>> please is there a roadmap in GNU and/or Guix devel team to address
>this
>> problems?
>
>I think it would be a good idea to create a more distributed approach
>for creating and finding substitutes. A simple name service would
>help. We could even use IPFS or something to fetch nar files - IPFS
>comes with a name service. That way anyone building a substitute could
>push it to IPFS and expose it to the rest of the world. Since IPFS is
>content-addressable we can prevent injections. Any change to the file
>would change its location. So the address + NAR hash is safe. And no
>key setting required.
>
>Does away with the dependency on just a few machines. Maintaining
>machines is a pain. Why not distribute the effort? I am happy to build
>some stuff and put it out there - in fact I already run my own
>substitute server, but it has only the substitutes I need. If we all
>do that we can bundle resources together. Guix can easily support
>that.
>
>If someone wants to think this through and can write a prototype it
>would make a great talk at FOSDEM. We can also discuss it at Guix
>days.
>
>Pj.



Re: plz is there a roadmap for a more resilient substitutes infrastructure?

2018-11-02 Thread Pjotr Prins
On Fri, Nov 02, 2018 at 01:16:03PM +0100, Giovanni Biscuolo wrote:
> please is there a roadmap in GNU and/or Guix devel team to address this
> problems?

I think it would be a good idea to create a more distributed approach
for creating and finding substitutes. A simple name service would
help. We could even use IPFS or something to fetch nar files - IPFS
comes with a name service. That way anyone building a substitute could
push it to IPFS and expose it to the rest of the world. Since IPFS is
content-addressable we can prevent injections. Any change to the file
would change its location. So the address + NAR hash is safe. And no
key setting required.

Does away with the dependency on just a few machines. Maintaining
machines is a pain. Why not distribute the effort? I am happy to build
some stuff and put it out there - in fact I already run my own
substitute server, but it has only the substitutes I need. If we all
do that we can bundle resources together. Guix can easily support
that.

If someone wants to think this through and can write a prototype it
would make a great talk at FOSDEM. We can also discuss it at Guix
days.

Pj.



plz is there a roadmap for a more resilient substitutes infrastructure?

2018-11-02 Thread Giovanni Biscuolo
Ciao,

recently users and developers are facing hard to manage problems due to
the maintainance of hydra.gnu.org and its proxy mirror.hydra.gnu.org [1]
since 23 Oct 2018

unfortunately many recent reports from users in help-guix and guix-devel
mailing list clearly shows that berlin.guixsd.org it's still not a
solution, since several missing substitutes are forcing users to "build
the world" [2]

please is there a roadmap in GNU and/or Guix devel team to address this
problems?

GuixSD is now well known in the free software community, please
aknowledge that this king of problems are detrimental to project
reputation 

given the prolonged issue, please also consider writing an *official*
blog post explaining the current situation and steps adopted to prevent
similar issues in the future

Me and many others would be very happy to help building a more resilient
substitutes infrastructure: just tell us how to do

for example:

1. is there a method to "replicate the whole store of an official server
(e.g. hydra.gnu.org once healed)" so we can just "guix publish" a
*complete* mirror? In this case a ready to use official
mirror-config.scm could be useful

2. is there an official mirrors directory users can look at when needed?

3. is there a plan to build a service similar to
http://httpredir.debian.org/? (I looked on the web but did not find any
reference to such plan)

ciao
Giovanni





[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33151 

[2] peronally I'm trying to install a bare-bones.scm machine in a VM and
guix is compiling many many packages, including texlive... :-S (using
berlin.guixsd.org as substitute URL)

[3] 
https://www.gnu.org/software/guix/manual/en/html_node/Invoking-guix-publish.html

-- 
Giovanni Biscuolo

Xelera IT Infrastructures


signature.asc
Description: PGP signature