Re: Postmortem of service downtime

2024-05-24 Thread Maxim Cournoyer
Hi Ludovic,

Ludovic Courtès  writes:

> From Sunday May 19th to Tuesday may 21st, for about 36h,
> bayfront.guix.gnu.org, the machine behind many services went down:
>
>   https://lists.gnu.org/archive/html/info-guix/2024-05/msg0.html
>
> Affected web sites and services included:
>
>   guix.gnu.org
>   bordeaux.guix.gnu.org
>   logs.guix.gnu.org
>   hpc.guix.info
>   foundation.guix.info
>   packages.guix.gnu.org
>   qa.guix.gnu.org
>

[...]

> A large part of the slowness was due to ‘guix substitute’ reading
> all the 300K+ entries from /var/guix/substitute/cache and deleting
> them, one by one (this took several minutes).  Chris had mentioned
> that performance issue in the past; it’s not much of a problem on
> one’s laptop with an SSD, but it’s clearly a problem here where
> there are more entries than usual.  We should at least drastically
> reduce the TTL of cache entries.

Interesting!

>   • qa-frontpage failed to build when we first reconfigured the machine,
> so we commented it out.  This is now fixed:
>
>   
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=3fecb1e8fdea65a7440fec403c1c52da197b5dfe
>
>   • guix-packages-website (the server behind packages.guix.gnu.org)
> still refuses to start with an Artanis error:
>
>   https://issues.guix.gnu.org/71138
>
> Ludo’, on behalf on the emergency rescue^W^W sysadmin team.

Phew!  Thanks for the detailed write-up and for the fixes/thankless work
of bringing the machine back up and running.

-- 
Maxim



XDG spec and ~/.config/guix/

2024-05-24 Thread Development of GNU Guix and the GNU System distribution.
Hi,

In ~/.config/guix/ I found a symbolic link called 'current' to
/var/guix/profiles/per-user/lechner/current-guix.  Does that comply with
the XDG spec? [1]

For backgroud, I am ditching the home-channels-service-type and plan to
maintain the folder as a Git repo.  Since 'current' is a computational
output, I believe the link belongs into $XDG_STATE_HOME.

For me, that would be in ~/.local/state/guix.

Any thoughts?

Kind regards
Felix

[1] https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html



Re: `make check` fails when trying to build from Git

2024-05-24 Thread Development of GNU Guix and the GNU System distribution.
Hi Ashvith,

On Fri, May 24 2024, Ashvith Shetty wrote:

> I'll be [...]  continuing this discussion on bug-g...@gnu.org, as I
> think that it would be an apt place to discuss.

It isn't.  Any message sent to bug-g...@gnu.org will open a new,
numbered report on debbugs.gnu.org.

You could discuss your issue, after opening such a report, at
report_num...@debbugs.gnu.org.  Please make sure to carbon-copy any
additional participants.  By default, Debbugs notfies no one except when
closing a report.

Before filing a new report it is, however, customary to look first if
another similar report is open already.  Then the relevant conversations
happen in one place.  One way to search would be to access the same
group of reports at https://issues.guix.gnu.org/

Good luck!

Kind regards
Felix

P.S. To everyone, I'm a beekeeper.  Please don't call your reports
"bugs."  Let's call them reports.  I'll try to remember, too.



Re: Upgrading Shepherd services

2024-05-24 Thread Attila Lendvai
> I see some services starting but no errors on the console. Also, there
> is absolutely nothing in /var/log/messages. Would it help to diagnose
> it using your Shepherd branch?


yep, in two ways: my branch has extensive logging (and currently its default 
level is set to debug), and i also reworked and extended the error handling.

my expectation is that your machine should both start up, and also emit some 
useful log why that specific service is failing.

if that is not the case, then i'd really love to see a self-contained 
reproducer.

if you want to dig deeper towards a reproducer, then one option is to try to 
write a guix system test that reproduces it (see gnu/tests/ for examples, and 
`make check-system`).

to use my shepherd channel:

 (channel
  (name 'shepherd)
  (url "https://codeberg.org/attila-lendvai-patches/shepherd.git";)
  (branch "attila")
  (introduction
   (make-channel-introduction
;; note that this commit id changes whenever i rebase and force-push my 
commits
"13557ba988f4976f6581149ecdc06fce031258c7"
(openpgp-fingerprint
 "69DA 8D74 F179 7AD6 7806  EE06 FEFA 9FE5 5CF6 E3CD"

and in your OS definition follow the instructions that are now in the shepherd 
README.

HTH,

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Gradualism in theory is perpetuity in practice.”
— Jared Howe




Re: A different way to build GCC to overcome issues, especially with C++ for embedded systems

2024-05-24 Thread Jean-Pierre De Jesus Diaz
Hello,

Adding to this conversation I have been making a arm-none-eabi
toolchain with Newlib included as the default C standard library like
in the official Arm toolchain but I'm not quite happy with the patches
yet.[1]

I haven't contributed it yet because it has been a bit hard adapting axoloti-*
packages to use a modern toolchain because I'm also intending to remove
the old toolchains in `gnu/packages/embedded.scm' eventually and
instead use the ones in `cross-gcc-toolchain'.

[1]: 
https://github.com/Foundation-Devices/guix-mirror/compare/master...wip/arm-none-eabi

On Fri, May 24, 2024 at 3:53 PM Sergio Pastor Pérez
 wrote:
>
> Hi both of you.
>
> I want to echo Attila's sentiments. This is a valuable contribution, and
> creating a channel would serve as a central hub for other contributors.
>
> Thanks for sharing, Stefan.
>
> Have nice day.
> Sergio.
>



Re: [shepherd] several patches that i deem ready

2024-05-24 Thread Attila Lendvai
> i've rebased my commits on top of the devel branch, and in the process i've 
> reordered them into a least controversial order for your cherry-picking 
> convenience:
> 
> https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/various
> 
> i just started a wave of deeper testing after the rebase, so the more complex 
> commits may change, but those need further work/negotiation anyway.


Ludo, the first commit ('Replace stop with stop-service in power-off of the 
root service.') used to serve to avoid a warning, but on the 'devel' branch it 
is now essential:

# halt
halt: error: exception caught while executing 'power-off' on service 'root':
Unbound variable: stop

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Tyranny is defined as that which is legal for the government but illegal for 
the citizenry.”
— Thomas Jefferson (1743–1826)




Re: A different way to build GCC to overcome issues, especially with C++ for embedded systems

2024-05-24 Thread Sergio Pastor Pérez
Hi both of you.

I want to echo Attila's sentiments. This is a valuable contribution, and
creating a channel would serve as a central hub for other contributors.

Thanks for sharing, Stefan.

Have nice day.
Sergio.



Re: `make check` fails when trying to build from Git

2024-05-24 Thread Ashvith Shetty
Hello again, sorry for the delay in reply. I'll be closing this
conversation and continuing this discussion on bug-g...@gnu.org, as I think
that it would be an apt place to discuss.

On Sun, May 19, 2024 at 2:18 PM Ludovic Courtès  wrote:

> Hi,
>
> Ashvith Shetty  skribis:
>
> > $ file
> /home/ashvith/Desktop/guix/test-tmp/store/a2k16z6jzwzvvg00bhf4mf9v0k65r7kq-guile-bootstrap-2.0/bin/guile
> >
> /home/ashvith/Desktop/guix/test-tmp/store/a2k16z6jzwzvvg00bhf4mf9v0k65r7kq-guile-bootstrap-2.0/bin/guile:
> empty
>
> This suggests that this file is corrupt: it shouldn’t be empty.
>
> I’d suggest starting from a clean state:
>
>   cd ~/Desktop/guix
>   chmod -R +w test-tmp
>   rm -rf test-tmp
>
> and then:
>
>   make check -j4
>
> HTH!
>
> Ludo’.
>


Call for contribution to the Guix infrastructure

2024-05-24 Thread Ludovic Courtès
Since its inception, the Guix project has always valued its autonomy,
and that reflects in its infrastructure: our servers run Guix System and
exclusively free software, none of them is hosted by one of these
transnational companies, and they’re administered by volunteers.

Of course this comes at a cost and this is why we’re sending this call
for contributions.  Our hope is to make infrastructure-related activity
more legible so that maybe you can picture yourself helping in one of
these areas.

  • Coding

We run many Guix-specific services; this is all lovely Scheme code
but it tends to receive less attention than Guix itself:

  Build Farm Front-End: https://git.cbaines.net/guix/bffe
  Cuirass: https://guix.gnu.org/cuirass/
  Goggles (IRC logger): 
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/goggles.scm
  Guix Build Coordinator: 
https://git.savannah.gnu.org/cgit/guix/build-coordinator.git/
  Guix Data Service: https://git.savannah.gnu.org/git/guix/data-service.git/
  Guix Packages Website: 
https://codeberg.org/luis-felipe/guix-packages-website.git
  mumi: https://git.savannah.gnu.org/cgit/guix/mumi.git/
  nar-herder: https://git.savannah.gnu.org/cgit/guix/nar-herder.git/
  QA Frontpage: https://git.savannah.gnu.org/git/guix/qa-frontpage.git

There is no time constraint on this coding activity: any improvement
is welcome, whenever it comes.  Most of these code bases are
relatively small, which should make it easier to get started.

Prerequisites: Familiarity with Guile, HTTP, and databases.

If you wish to get started, check out the README of the project of
your choice and get in touch with guix-devel and the primary
developer(s) of the tool as per ‘git shortlog -s | sort -k1 -n’.

  • System administration

Guix System configuration for all our systems is held in this
repository:

   https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/

The two front-ends are berlin.scm (the machine behind
ci.guix.gnu.org) and bayfront.scm (the machine behind
bordeaux.guix.gnu.org, guix.gnu.org, hpc.guix.info, qa.guix.gnu.org,
and more).  Both connect to a number of build machines and helpers.

Without even having SSH access to the machine, you can help by
posting patches to improve the configuration (you can test it with
‘guix system vm’).  Here are ways you can help:

  - Improve infra monitoring: set up a dashboard to monitor all the
infrastructure, and an out-of-band channel to communicate about
downtime.

  - Implement web site redundancy: guix.gnu.org should be backed by
several machines on different sites.  Get in touch with us
and/or send a patch!

  - Implement substitute redundancy: likewise, bordeaux.guix.gnu.org
and ci.guix.gnu.org should be backed by several head nodes.

  - Improve backup: there’s currently ad-hoc backup of selected
pieces over rsync between the two head nodes; we can improve on
that, for example with a dedicated backup site and proper
testing of recoverability.

  - Support mirroring: We’d like to make it easy for others to
mirror substitutes from ci.guix and bordeaux.guix, perhaps by
offering public rsync access.

  - Optimize our web services: Monitor the performance of our
services and tweak nginx config or whatever it takes to improve
it.

There is no time constraint on this activity: any improvement is
welcome, whenever you can work on it.

Prerequisite: Familiarity with Guix System administration and
ideally with the infrastructure handbook:

.

  • Day-to-day system administration

We’re also looking for people who’d be willing to have SSH access to
some of the infrastructure to help with day-to-day maintenance:
restarting a build, restarting the occasional service that has gone
wild (that can happen :-)), reconfiguring/upgrading a machine,
rebooting, etc.

This day-to-day activity requires you to be available some of the
time (during office hours or not, during the week-end or not),
whenever is convenient for you, so you can react to issues reported
on IRC, on the mailing list, or elsewhere, and synchronize with
other sysadmins.

Prerequisite: Being a “known” member of the community, familiarity
with Guix System administration, with some of the services/web sites
being run, and with the infrastructure handbook:

.

  • On-site intervention

The first front-end is currently generously hosted by the Max
Delbrück Center (MDC), a research institute in Berlin, Germany.
Only authorized personnel can physically access it.

The second one, bordeaux.guix.gnu.org, is ho

watchdog triggered auto-rollback

2024-05-24 Thread raingloom
Since I've been experimenting with a foolproof unikernel based static
website deployment lately, I realized I should write down this idea I've
been chewing for a while:

It would be very nice to have automatic system rollbacks when certain
things break.
One example is broken SSH config that makes a machine unreachable.
Local testing is useful, but like in the SSH example, some issues only
become apparent when you are deploying to the production environment.

Would others find this useful?  Where in the stack would this be solved?
 Could we, for example, catch an issue in the init system and still
perform a rollback?  Or if not a full rollback, then at least a reboot
into the previous config?  (And if that is also broken, then the one
before, etc, etc)

Obviously there are a lot of edge cases and potential bugs in this
mechanism as well.  Sticking with the SSH example, rolling back to a
version that was kept around where the authorized keys are different
would also make the machine unreachable via SSH.