bug#66866: aarch64 system cross compilation + pinebook pro image broken?

2024-01-13 Thread Mathieu Othacehe


> The issue seems to be that grafting ends-up dragging the bootstrap
> packages into the closure when cross-compiling which is quite scary.

I could narrow it down somehow.

This commands drags the bootstrap and fails:

--8<---cut here---start->8---
guix build --target= alsa-lib 
--8<---cut here---end--->8---

while this one doesn't:

--8<---cut here---start->8---
guix build --target= alsa-lib --no-grafts
--8<---cut here---end--->8---

Not sure how to go further. Adding Ludo and Efraim.

Thanks,

Mathieu





bug#66866: aarch64 system cross compilation + pinebook pro image broken?

2024-01-13 Thread Mathieu Othacehe


Hey,

> ./pre-inst-env guix system build --target=aarch64-linux-gnu 
> gnu/system/images/pine64.scm
>
>
> which cause an issue in gawk-mesboot:
>
> checking host system type... Invalid configuration `aarch64-linux-gnu': 
> machine `aarch64' not recognized
> configure: error: 
> /gnu/store/rb75igdc6daly1mz2ivz7rs8hd85imdz-gash-boot-0.3.0/bin/bash 
> ./config.sub aarch64-linux-gnu failed
>
> Janneke, do you know what could have caused this regression?

This probably has nothing to do with the bootstrap packages. I noticed
that the CI succeeds in building the pine64 image:
https://ci.guix.gnu.org/build/3265001/details.

The difference is that the CI is building without grafting. Disabling
grafting locally, seems to do the trick as well:

--8<---cut here---start->8---
./pre-inst-env guix system image gnu/system/images/pine64.scm  --no-grafts
--8<---cut here---end--->8---

The issue seems to be that grafting ends-up dragging the bootstrap
packages into the closure when cross-compiling which is quite scary.

Mathieu





bug#66866: aarch64 system cross compilation + pinebook pro image broken?

2024-01-03 Thread Mathieu Othacehe


Hello,

I can reproduce the error by running:

--8<---cut here---start->8---
./pre-inst-env guix system build --target=aarch64-linux-gnu 
gnu/system/images/pine64.scm
--8<---cut here---end--->8---

which cause an issue in gawk-mesboot:

--8<---cut here---start->8---
checking host system type... Invalid configuration `aarch64-linux-gnu': machine 
`aarch64' not recognized
configure: error: 
/gnu/store/rb75igdc6daly1mz2ivz7rs8hd85imdz-gash-boot-0.3.0/bin/bash 
./config.sub aarch64-linux-gnu failed
--8<---cut here---end--->8---

Janneke, do you know what could have caused this regression?

Thanks,

Mathieu





bug#67109: ‘efi32-esp’ image support pulls in host-side code

2023-11-25 Thread Mathieu Othacehe


Hello,

> guix system: error: # initialize-efi32-partition root #:grub-targets # (quote ("i386-efi" . "BOOTIA32.EFI")) gnu/system/image.scm:142:28
> 7fef96f85390>:out> args)) gnu/system/image.scm:146:8 7fef96f85360>:
> invalid G-expression input

Expressed that way, I no longer have this error and `target` seems to
take the expected value, WDYT?

--8<---cut here---start->8---
(define esp32-partition
  (partition
   (inherit esp-partition)
   (initializer
#~(lambda (root . args)
(let ((targets '#$(let-system (system target)
   (cond ((target-x86? (or target system))
  '("i386-efi". "BOOTIA32.EFI"))
 ((target-arm? (or target system))
  '("arm-efi" . "BOOTARM.EFI"))
 (else #f)
  (apply initialize-efi32-partition root
 #:grub-targets targets
 args))
--8<---cut here---end--->8---

Thanks,

Mathieu





bug#67109: ‘efi32-esp’ image support pulls in host-side code

2023-11-14 Thread Mathieu Othacehe


Hey,

Thanks for investigating this!

> Thus I’m proposing the fix below.  How can I test it though?  I get:
>
> $ ./pre-inst-env guix system image -t efi32-raw 
> gnu/system/examples/bare-bones.tmpl
> guix system: error: EFI bootloader required with GPT partitioning

I added this check recently because we do not currently support
installing the `grub-bootloader` on a non-MBR disk.

The way to test your change is to switch the bare-bones system
bootloader to `grub-efi-bootloader`, this way:

--8<---cut here---start->8---
diff --git a/gnu/system/examples/bare-bones.tmpl 
b/gnu/system/examples/bare-bones.tmpl
index dc6aff5273..e11d4bd5ee 100644
--- a/gnu/system/examples/bare-bones.tmpl
+++ b/gnu/system/examples/bare-bones.tmpl
@@ -18,8 +18,9 @@
   ;; target hard disk, and "my-root" is the label of the target
   ;; root file system.
   (bootloader (bootloader-configuration
-(bootloader grub-bootloader)
-(targets '("/dev/sdX"
+(bootloader grub-efi-bootloader)
+(targets '("/boot/efi"
--8<---cut here---end--->8---

We then have the following error:

--8<---cut here---start->8---
guix system: error: #:out> 
args)) gnu/system/image.scm:146:8 7fef96f85360>: invalid G-expression input
--8<---cut here---end--->8---

Mathieu





bug#66207: Cannot boot VMs with grub-efi-bootloader

2023-10-01 Thread Mathieu Othacehe


Hey,

Some context around that.

Before recent commits, when we were producing qcow2 or raw images,
those were MBR images with an EFI partition. If the grub-bootloader
was used, then Grub was installed both in the post-MBR gap
and in the EFI partition. This means that a single image could be booted
with or without the qemu -bios option, i.e both in BIOS legacy
or in EFI mode.

Commit d57cab764122af69d52d8cc9c843456044e5d7bc changed the default
behaviour and the produced images were by default MBR images, without
EFI partitions.

I changed that with e5ed1712da049b1c3dcf01e0a7e02e48a8aff012 and
dfaeaae9c7e7283b99ad10aef3e61402e9820bc7 which introduced a new image
type called mbr-hybrid-raw, which is now the default image type,
restoring the previous behaviour.

Now it looks to me that what Ricardo is observing is not linked to any
of the changes mentioned above.  When using the grub-efi-bootloader,
Grub is never installed in the post-MBR gap. This was already the case
in 1.4.0 and is still true. Those images cannot be booted without the
qemu -bios option unless I'm mistaken.

Hope it helps,

Mathieu







bug#48468: substitute server connection timeout

2023-01-10 Thread Mathieu Othacehe


Hey,

So the debug mechanism is in place. Requesting a non-existing derivation
on a worker gives:

--8<---cut here---start->8---
mathieu@hydra-guix-104 ~$ guix build 
/gnu/store/yd1p7069rs4xbbfwj5p7nzp9psw7d3vv-hello-2.12.1.drv
substitute: could not fetch 
http://141.80.167.131/yd1p7069rs4xbbfwj5p7nzp9psw7d3vv.narinfo 404
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation 
‘/gnu/store/yd1p7069rs4xbbfwj5p7nzp9psw7d3vv-hello-2.12.1.drv’
guix build: error: build of 
`/gnu/store/yd1p7069rs4xbbfwj5p7nzp9psw7d3vv-hello-2.12.1.drv' failed
--8<---cut here---end--->8---

as expected. The funny thing is that during the test failures of
tonight, none of those traces were displayed. That would mean that the
failure is not caused by a missing narinfo.

I added the "--debug" option to the guix-daemon on the workers as well
hoping to gather more info.

Thanks,

Mathieu





bug#37513: Subject: Installer finish backtrace, umount dispatch exception /mnt device busy

2023-01-07 Thread Mathieu Othacehe


Hello,

> Therefore, I guess it would make to close.  Any objection?

Thanks for the feedback, lets close it.

Mathieu





bug#48468: substitute server connection timeout

2023-01-07 Thread Mathieu Othacehe


Hello,

> It means that upstream (i.e., ‘guix publish’) closed the connection,
> right?
>
> And it means that it closed it prematurely I guess?

Looks like it yes.

>> However, like suggested in your hypothesis number 1, it seems instead
>> that we are replying 404 to the worker which resets the connection. As
>> we have put aside the baking thing, the question is now why are those
>> derivations not available?
>
> In that case we’re not replying at all, are we?

Well could be, I'm not 100% sure how to understand those nginx logs. If
we are replying anything, it will be visible with the new traces. If on
the other hand the publish server is hanging up then they won't help
much I guess.

> Drop ‘G_’ (we don’t translate debugging messages) and use ASCII, to be
> on the safe side…

Done.

> Instead of an env. var., maybe add a ‘--debug’ command-line option and
> parameterize ‘%debug?’ accordingly?

The --debug command-line feels better but it involves a guix-daemon
modification so I kept the environment variable,

> You can also have something like:
>
>   (define-syntax (debug fmt args ...)
> (when (%debug?)
>   (format #t fmt args ...)))
>

and used that macro :)

Thanks for having a look!

Mathieu





bug#48468: substitute server connection timeout

2022-12-28 Thread Mathieu Othacehe
': No such file 
or directory
mathieu@berlin /var/log/nginx$ ls /gnu/store/007zgflsl5xkr377wpakbsis5c2yqh1q*
ls: cannot access '/gnu/store/007zgflsl5xkr377wpakbsis5c2yqh1q*': No such file 
or directory
--8<---cut here---end--->8---

As I don't have much clue about what are those derivations, I think we
should instrument a bit the publish server and maybe the substitute
script like proposed in the attachments.

WDYT?

Thanks,

Mathieu
>From 9f9c839937ac2edd1b5901b2262c4be0954fa20c Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Wed, 28 Dec 2022 15:12:46 +0100
Subject: [PATCH 1/2] scripts: publish: Add a custom baking header.

Log the not-found responses and their reason (baking or not) to stdout. Also
send the X-Baking custom header so that the client can be informed of the
cause of the failure.

* guix/scripts/publish.scm (not-found): Add a baking? argument to add the
X-Baking HTTP header to the response if baking is in progress.  Also, log the
404 responses to stdout, indicating if it is due to baking or not.
(render-narinfo/cached): Pass the baking? argument.
---
 guix/scripts/publish.scm | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm
index 3bf3bd9c7c..11fedf092e 100644
--- a/guix/scripts/publish.scm
+++ b/guix/scripts/publish.scm
@@ -4,7 +4,7 @@
 ;;; Copyright © 2015-2022 Ludovic Courtès 
 ;;; Copyright © 2020 Maxim Cournoyer 
 ;;; Copyright © 2021 Simon Tournier 
-;;; Copyright © 2021 Mathieu Othacehe 
+;;; Copyright © 2021, 2022 Mathieu Othacehe 
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -375,14 +375,28 @@ (define* (narinfo-string store store-path
compression)))
  compressions
 
+;; Custom header to indicate that baking is in progress.
+(declare-opaque-header! "X-Baking")
+
 (define* (not-found request
-#:key (phrase "Resource not found")
+#:key
+baking?
+(phrase "Resource not found")
 ttl)
   "Render 404 response for REQUEST."
+  (format #t (G_ "↳ ~a ~a: 404~a~%")
+  (request-method request)
+  (uri-path (request-uri request))
+  (if baking? " (baking)" ""))
   (values (build-response #:code 404
-  #:headers (if ttl
-`((cache-control (max-age . ,ttl)))
-'()))
+  #:headers
+  (append
+   (if ttl
+   `((cache-control (max-age . ,ttl)))
+   '())
+   (if baking?
+   '((x-baking . "1"))
+   '(
   (string-append phrase ": "
  (uri-path (request-uri request)
 
@@ -587,6 +601,7 @@ (define (delete-entry narinfo)
#:nar-path nar-path
    #:compressions compressions)
(not-found request
+  #:baking? #t
   #:phrase "We're baking it"
   #:ttl 300)))  ;should be available within 5m
   (else
-- 
2.38.1

>From 25ffc57864dbf34ca58741f89c1f790dbde6702f Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Wed, 28 Dec 2022 15:19:29 +0100
Subject: [PATCH 2/2] substitutes: Log the failing queries.

* guix/substitutes.scm (%debug?): New variable.
(handle-narinfo-response): Log the failing queries if the %debug? parameter is
set.
---
 guix/substitutes.scm | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/guix/substitutes.scm b/guix/substitutes.scm
index 9014cf61ec..819eb2c73e 100644
--- a/guix/substitutes.scm
+++ b/guix/substitutes.scm
@@ -90,6 +90,12 @@ (define %narinfo-cache-directory
   (string-append %state-directory "/substitute/cache"))
   (string-append (cache-directory #:ensure? #f) "/substitute")))
 
+(define %debug?
+  ;; Enable debug mode by setting the GUIX_SUBSTITUTE_DEBUG environmnent
+  ;; variable.
+  (make-parameter
+   (getenv "GUIX_SUBSTITUTE_DEBUG")))
+
 (define (narinfo-cache-file cache-url path)
   "Return the name of the local file that contains an entry for PATH.  The
 entry is stored in a sub-directory specific to CACHE-URL."
@@ -224,6 +230,15 @@ (define (handle-narinfo-response request response port result)
   (let* ((path  (uri-path (request-uri request)))
  (hash-part (basename
  (string-drop-right path 8 ;drop ".narinfo"
+;; Log the failing queries and indicate if it failed because the
+  

bug#48468: substitute server connection timeout

2022-12-27 Thread Mathieu Othacehe

Hey Ludo,

> That’s still below the 100 MiB cache bypass threshold of the main ‘guix
> publish’ instance though.

Right. Just to be on the safe side here, what about applying this patch
to have log lines when we are replying 404 due to baking?

Thanks,

Mathieu
>From 725d5ba21a0fc0108b60c37bbc8d947fab6ac938 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Tue, 27 Dec 2022 10:49:04 +0100
Subject: [PATCH 1/1] scripts: publish: Add a log when replying 404 due to
 baking.

* guix/scripts/publish.scm (render-narinfo/cached): Add it.
---
 guix/scripts/publish.scm | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm
index 3bf3bd9c7c..a2048c98fd 100644
--- a/guix/scripts/publish.scm
+++ b/guix/scripts/publish.scm
@@ -586,9 +586,13 @@ (define (delete-entry narinfo)
#:ttl 300  ;temporary
#:nar-path nar-path
#:compressions compressions)
-   (not-found request
-  #:phrase "We're baking it"
-  #:ttl 300)))  ;should be available within 5m
+   (begin
+ (format #t (G_ "~a ~a: 404 (baking)~%")
+ (request-method request)
+ (uri-path (request-uri request)))
+ (not-found request
+#:phrase "We're baking it"
+#:ttl 300  ;should be available within 5m
   (else
(not-found request #:phrase "" #:ttl negative-ttl)
 
-- 
2.38.1



bug#60265: Unbootable system after guix gc

2022-12-25 Thread Mathieu Othacehe


Hello,

> do you have other channels added, or did this happen on a vanilla Guix 
> install?
>
> it's a duplicate of https://issues.guix.gnu.org/57838

Right it is a duplicate and it is related to the use of an extra
channel. Good news is that I found a fix.

Closing this one.

Thanks,

Mathieu





bug#59823: [1.4.0rc1] Installer fails to identify installation device on Ventoy-made images

2022-12-10 Thread Mathieu Othacehe


Hola,

> Anyway, all in all, calling out to dmsetup looks reasonable for now; I
> have a slight preference for using ‘open-pipe* OPEN_READ’, but no big
> deal.  Perhaps add a comment showing what the line we’re parsing should
> look like.

Yeah, I agree that open-pipe would be a bit clearer, but as this is
already tested as-is plus we'll switch to the ioctl after the release, I
think we can proceed.

>>  (define (eligible-devices)
>>"Return all the available devices except the install device and the 
>> devices
>>  which are smaller than %MIN-DEVICE-SIZE."
>>  
>>(define the-installer-root-partition-path
>> -(installer-root-partition-path))
>> +(let ((root-path
>> +   (installer-root-partition-path)))
>
> Just ‘root’.  :-)

Fixed, added a few comments and pushed.

Thanks for having a look!

Mathieu





bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures

2022-12-08 Thread Mathieu Othacehe


Hello,

> ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args 
> `((quote (complete-installation installer-socket)))'.
> builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' 
> failed with exit code 1
> @ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 
> 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' 
> failed with exit code 1
> cannot build derivation 
> `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv':
>  1 dependencies couldn't be built
>
> (From .)
>
> Does that ring a bell, Mathieu?

I spent days on that issue before. It used to show up on all installer
tests, and even on real hardware, then
8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.

The installation is now made in a container to make sure that we are
later on able to umount the store overlay even though some background
processes such as kmscon or udev opened files from the overlay.

Now the issue only shows up on that specific test and is intermittent as
you noticed.

To be honest, that was quite painful to debug and I'm a bit scared to
jump back in. I think I had the marionette produce some lsof reports
back then, or something like that. I very much regret not to have kept
notes somewhere.

Mathieu





bug#59823: [1.4.0rc1] Installer fails to identify installation device on Ventoy-made images

2022-12-08 Thread Mathieu Othacehe

Hello,

The attached patch fixes it for me. We could maybe use libdevmapper
instead of the plain "dmsetup" call but that's not critical in my
opinion.

Thanks,

Mathieu
>From 0afda5b3ed32e73bece9db96ab970d83f9f2e74b Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Thu, 8 Dec 2022 13:24:02 +0100
Subject: [PATCH 1/1] installer: Detect mapped installation devices.

Fixes: <https://issues.guix.gnu.org/59823>

* gnu/installer/parted.scm (mapped-device?,
mapped-device->parent-partition-path): New procedures.
(eligible-devices): Detect mapped installation devices using the new
procedures.
---
 gnu/installer/parted.scm | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/gnu/installer/parted.scm b/gnu/installer/parted.scm
index 82375d29e3..058f2a8dab 100644
--- a/gnu/installer/parted.scm
+++ b/gnu/installer/parted.scm
@@ -379,12 +379,45 @@ (define (installer-root-partition-path)
 (define %min-device-size
   (* 2 GIBIBYTE-SIZE)) ;2GiB
 
+(define (mapped-device? device)
+  "Return #true if DEVICE is a mapped device, false otherwise."
+  (string-prefix? "/dev/dm-" device))
+
+(define (mapped-device->parent-partition-path device)
+  "Return the parent partition path of the mapped DEVICE."
+  (let* ((command `("dmsetup" "deps" ,device "-o" "devname"))
+ (parent #f)
+ (handler
+  (lambda (input)
+(let ((result
+   (string-match "\\(([^\\)]+)\\)"
+ (get-string-all input
+  (and result
+   (set! parent
+ (format #f "/dev/~a"
+ (match:substring result 1
+(run-external-command-with-handler handler command)
+parent))
+
 (define (eligible-devices)
   "Return all the available devices except the install device and the devices
 which are smaller than %MIN-DEVICE-SIZE."
 
   (define the-installer-root-partition-path
-(installer-root-partition-path))
+(let ((root-path
+   (installer-root-partition-path)))
+  (cond
+   ((mapped-device? root-path)
+;; If the partition is a mapped device (/dev/dm-X), locate the parent
+;; partition.  It is the case when Ventoy is used to host the
+;; installation image.
+(let ((parent-path
+   (mapped-device->parent-partition-path root-path)))
+  (installer-log-line "mapped device ~a -> ~a"
+  parent-path root-path)
+  parent-path))
+   (else
+root-path
 
   (define (small-device? device)
 (let ((length (device-length device))
-- 
2.38.1



bug#59823: an installer dump was sent

2022-12-07 Thread Mathieu Othacehe


Hello,

It's really good that you managed to install it anyway. Thanks for
persevering :) Nevertheless we need to fix the problem.

> I already managed the installation by burning the iso directly to my
> usb drive and not via Ventoy.

Turns out Ventoy was the crux of the issue here. Ventox allows to
install multiple iso images just by copying them in a directory of the
drive. It then creates a device mapping which look like that:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda  8:00 238.5G  0 disk 
├─sda1   8:10   512M  0 part 
└─sda2   8:20   238G  0 part 
sdb  8:16   0  14.6G  0 disk 
├─sdb1   8:17   0  14.6G  0 part 
│ └─ventoy 253:00 842.9M  1 dm   
└─sdb2   8:18   032M  0 part 

This device mapping defeats our "eligible-devices" procedure. It's
because the UUID that is passed as the root=xxx argument of the Linux
command line is in fact related to /dev/dm-0 which is different from
/dev/sdb.

I had a look to the parted sources and it should detect mapped devices
but for some reason it doesn't. Figuring it out and fixing it is maybe a
bit risky before the release.

So I'm trying to figure out a (cheap) way to make the correlation
between /dev/dm-0 and /dev/sdb to exclude the latter in the eligible
devices procedure.

Any idea is welcomed!

Thanks,

Mathieu





bug#48468: substitute server connection timeout

2022-12-07 Thread Mathieu Othacehe


Hello,

> /gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv,
> which was marked as failed earlier today due to missing .drv.  It’s a
> 4KiB file, and the cache-bypass-threshold is ‘guix publish’ is typically
> set to something much higher than that.  So ‘guix publish’ won’t return
> 404 in that case.

Yes but that derivation also depends on other derivations, for instance
qemu-minimal and if I try:

--8<---cut here---start->8---
mathieu@berlin ~$ guix build qemu-minimal
...
/gnu/store/lwv2pl0m6dkf6bkzip755w5p71g5akq4-qemu-minimal-7.1.0
--8<---cut here---end--->8---

and then, from my machine.

--8<---cut here---start->8---
curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
--8<---cut here---end--->8---

wget exhibits the same behaviour and returns 404.

So any build that requires a heavy substitute, heavier than the cache
bypass threshold at least, will fail on the workers, as it would fail
locally.

That's not really a surprise as baking substitutes takes time and there
is a time window between the moment Cuirass triggers NAR baking and the
moment the NAR is baked, where every user will get a 404.

Mathieu





bug#48468: substitute server connection timeout

2022-12-07 Thread Mathieu Othacehe


Hello,

> You mentioned on IRC that nginx logs show that ‘guix publish’ times out.
> Looking at /var/log/nginx/error.log, I see “Connection reset by peer”
> and “Broken pipe”, which could indicate that the client closed the
> connection (which was open) prematurely, maybe due to an internal
> timeout.

Could it be that the client is receiving 404 because the baking of some
NAR was deferred to a worker, and then it closes the connection?

I think that's what I had in mind with the patch 2/2 of this patchset:
https://issues.guix.gnu.org/50040.

Thanks,

Mathieu





bug#59823: an installer dump was sent

2022-12-07 Thread Mathieu Othacehe


Hello,

So I had a closer look and we do have this strange kernel warning: 

--8<---cut here---start->8---
Dec  4 10:49:05 localhost vmunix: [ 1351.610773] device-mapper: ioctl: 
remove_all left 1 open device(s)
--8<---cut here---end--->8---

I also noticed that we do not have the following trace:

--8<---cut here---start->8---
/dev/sdX is not eligible because it is the installation device. 
--8<---cut here---end--->8---

which indicates that the "eligible-devices" procedure failed to identify
the installation device. So /dev/sdd could be the installation device
and the "with-delay-device-in-use?" always reports true because it is in
fact always in use.

My proposed patch probably won't help and the question is, why is the
installation device not detected?

Would it be possible for you to try the installation with an
instrumented installer that I would provide you with?

Thanks,

Mathieu





bug#59823: an installer dump was sent

2022-12-05 Thread Mathieu Othacehe

Hello,

Thanks for reporting! So the error is:

--8<---cut here---start->8---
mathieu@meije ~$ cat dump.2022-12-04.10.54.06/installer-backtrace
In ./gnu/installer/steps.scm:
   150:13 19 (run ((locale . "de_DE.utf8")) #:todo-steps _ #:done-steps _)
   150:13 18 (run ((welcome . #t) (locale . "de_DE.utf8")) #:todo-steps _ 
#:done-steps _)
   150:13 17 (run ((timezone . "Europe/Berlin") (welcome . #t) (locale . 
"de_DE.utf8")) #:todo-steps _ #:done-steps _)
   150:13 16 (run ((keymap "de" #f #f) (timezone . "Europe/Berlin") (welcome . 
#t) (locale . "de_DE.utf8")) #:todo-steps _ #:done-steps _)
   150:13 15 (run ((hostname . "guix-hp") (keymap "de" #f #f) (timezone . 
"Europe/Berlin") (welcome . #t) (locale . "de_DE.utf8")) #:todo-steps _ 
#:done-steps _)
   150:13 14 (run ((network (select-technology . #< name: "Wired" 
type: "ethernet" powered?: #t connected?: #t>) (power-technology . 
#) (connect-service . #<) …) …) …)
   150:13 13 (run ((substitutes #t) (network (select-technology . 
#< name: "Wired" type: "ethernet" powered?: #t connected?: #t>) 
(power-technology . #) (# . #<) …) …) …)
   150:13 12 (run ((user #< name: "root" real-name: "" group: "users" 
password:  home-directory: "/root"> #< name: "typ" real-name: 
"Typ" group: "users" password: ) …) …)
   148:23 11 (run ((services #< name: "GNOME" type: desktop 
recommended?: #f snippet: ((service gnome-desktop-service-type)) packages: ()> 
#< name: "OpenSSH …> …) …) …)
In ./gnu/installer/newt/partition.scm:
814:4 10 (run-partitioning-page)
In srfi/srfi-1.scm:
634:9  9 (for-each # _)
In ./gnu/installer/parted.scm:
  1528:22  8 (_ # #t)
In ice-9/boot-9.scm:
  1685:16  7 (raise-exception _ #:continuable? _)
  1780:13  6 (_ #< components: (#<> #< origin: 
#f> #< message: "~A"> #< irritants: ("Gerät /dev/sdd wird 
noch verwendet.")> #<…>)
In ice-9/eval.scm:
619:8  5 (_ #(#(#(#) misc-error (#f 
"~A" ("Gerät /dev/sdd wird noch verwendet.") #f)) #> # …))
   626:19  4 (_ #(#(#(#) misc-error (#f 
"~A" ("Gerät /dev/sdd wird noch verwendet.") #f)) #> # …))
In ./gnu/installer/dump.scm:
 58:4  3 (prepare-dump misc-error (#f "~A" ("Gerät /dev/sdd wird noch 
verwendet.") #f) #:result _)
In ice-9/ports.scm:
   433:17  2 (call-with-output-file _ _ #:binary _ #:encoding _)
In ./gnu/installer/dump.scm:
60:27  1 (_ #)
In unknown file:
   0 (make-stack #t)
./gnu/installer/dump.scm:62:36: Gerät /dev/sdd wird noch verwendet.
--8<---cut here---end--->8---

Which means that the delay in the "with-delay-device-in-use?" procedure
is probably not high enough.

Here is an attached patch to bump it from 16 to 96 seconds. I also
uploaded an image built on top of of the version-1.4.0 with this patch
if you are up for a retry :), you can download it this way:

--8<---cut here---start->8---
wget https://othacehe.org/files/installer.iso
--8<---cut here---end--->8---

Thanks,

Mathieu
>From b53d7f0c930f029d6b17be92dfa408b74615c1a5 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Mon, 5 Dec 2022 08:56:43 +0100
Subject: [PATCH 1/1] installer: Dump the device in use retry count to 96.

Fixes: <https://issues.guix.gnu.org/59823>

* gnu/installer/parted.scm (with-delay-device-in-use): Bump it.
---
 gnu/installer/parted.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnu/installer/parted.scm b/gnu/installer/parted.scm
index 82375d29e3..518b40a7ea 100644
--- a/gnu/installer/parted.scm
+++ b/gnu/installer/parted.scm
@@ -348,7 +348,7 @@ (define (with-delay-device-in-use? file-name)
 fail. See rereadpt function in wipefs.c of util-linux for an explanation."
   ;; Kernel always return EINVAL for BLKRRPART on loopdevices.
   (and (not (string-match "/dev/loop*" file-name))
-   (let loop ((try 16))
+   (let loop ((try 96))
  (usleep 25)
  (let ((in-use? (device-in-use? file-name)))
(if (and in-use? (> try 0))
-- 
2.38.1



bug#59493: cuirass-remote-worker crash

2022-11-23 Thread Mathieu Othacehe


Hey,

> Oh I see.  It would be nice to avoid non-backward-compatible changes in
> the protocol so we can upgrade more smoothly.

Right, sorry. We should introduce a protocol version to avoid that in
the future.

> Fixed in Cuirass commit 9fb6f21d29c5398b35f4c1a77cf6c20f207c9ebb.

Awesome, thanks :)

> To me, ideally this would be either multi-threaded or Fiberized.  The
> latter would be more fruitful but what might be difficult is
> guile-simple-zmq integration with Fibers (but maybe not: zmq_getsockopt
> + ZMQ_FD lets us get the file descriptor of a socket).

I would prefer the multi-threaded approach if possible. While the
concept of Fiber is nice it adds another layer of complexity and
instability to those programs which are already hard to debug.

Mathieu





bug#59514: Stuck builds in Cuirass

2022-11-23 Thread Mathieu Othacehe


Hello Marius,

> Cuirass has a tendency to not notice when a build is finished, leaving
> it in a "running" state.
>
> The phenomenon can be observed by going to
>  and look at builds that are running for
> a suspiciously long time.

I suspect this is caused by https://issues.guix.gnu.org/59510 which
causes the worker threads to bail out.

We can probably merge those two issues. The
/var/log/cuirass-remote-server.log file on Berlin also indicates when
the build-succeeded or build-failed message is received by the server,
and how long the fetch from the worker took.

Thanks,

Mathieu





bug#59510: cuirass-remote-server: put-char encoding failed

2022-11-23 Thread Mathieu Othacehe


Hello,

On Cuirass 1.1.0-13.1341725, the fetch workers are experimenting the
following issue:

--8<---cut here---start->8---
2022-11-22 00:28:15 In cuirass/scripts/remote-server.scm:
2022-11-22 00:28:15415:12  3 (_)
2022-11-22 00:28:15 387:7  2 (run-fetch _)
2022-11-22 00:28:15 2022-11-22T00:28:15 build succeeded: 
'/gnu/store/wbnmp70x7hcwr9h5iw0v3w7waclw277x-rust-openssl-sys-0.9.75.drv'
2022-11-22 00:28:15 In unknown file:
2022-11-22 00:28:151 (display "2022-11-22T00:28:15 fetching 
'/gnu/store/hl2dkk1ayavfxpydm5r12kjz201idk1g-rust-num-0.3.0.drv' from 
http://141.80.167.165:5558\n; #)
2022-11-22 00:28:15 2022-11-22T00:28:15 build succeeded: 
'/gnu/store/12dyhjzl0cy984jif7pp9w9hsrdgkcdf-rust-trust-dns-openssl-0.18.1.drv'
2022-11-22 00:28:15 2022-11-22T00:28:15 build succeeded: 
'/gnu/store/j44ia9xffsggflgwg29q1l89vbga2y25-rust-trust-dns-https-0.18.1.drv'
2022-11-22 00:28:15 In ice-9/boot-9.scm:
2022-11-22 00:28:15 2022-11-22T00:28:15 build succeeded: 
'/gnu/store/g9xa21wmxyk1sfra84pq3mx8hvlx10hh-rust-actix-server-config-0.1.2.drv'
2022-11-22 00:28:15   1685:16  0 (raise-exception _ #:continuable? _)
2022-11-22 00:28:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2022-11-22 00:28:15 Throw to key `encoding-error' with args `("put-char" 
"conversion to port encoding failed" 84 # #\2)'.
--8<---cut here---end--->8---

Thanks,

Mathieu





bug#55336: Graphical installer: Selecting a partition scheme always takes me back to the start

2022-11-23 Thread Mathieu Othacehe


Hello Simon,

> I just ran into this same problem with an old machine.

Interesting, thanks for the report.

> Disk image is: axygxkgkgcgbk2gjd6q521h85shp7hwf-image.iso from
> https://ci.guix.gnu.org/build/125952/details. 
>
> Please find attached some logs too: 

It looks like you experimented a crash (segfault or so), and the
backtrace is not really helpful here sadly.

What would be interesting is to share the core dump, either by copying
the /tmp/installer-core-dump file after the crash, or by using the dump
upload mechanism and reporting the crash id.

Thanks for your help,

Mathieu





bug#59493: cuirass-remote-worker crash

2022-11-23 Thread Mathieu Othacehe


Hello Ludo,

Thanks for gathering those information.

> 2022-11-21 14:27:24   1685:16  0 (raise-exception _ #:continuable? _)
> 2022-11-21 14:27:24
> 2022-11-21 14:27:24 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> 2022-11-21 14:27:24 Throw to key `match-error' with args `("match" "no 
> matching pattern" (#vu8()))'.

Yes this is because a new remote-server is running on Berlin and it
sends an empty sequence at every connection:
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=fc1641381d2a8a0472a71ef5ad2b64361fb4

All remote-workers must update, and I have deployed Cuirass
1.1.0-13.1341725 on all hydra workers + guix9p.

I have been trying to deploy that to overdrive1 for two days but Berlin
offloads the builds to kreuzberg which has some issues because a lot of
builds are timeouting:

--8<---cut here---start->8---
\building of 
`/gnu/store/9jg75a8rvdz3qxcbbm95312rlc4hyi98-mrustc-0.10-2.597593a-checkout.drv'
 timed out after 3600 seconds of silence
build of 
/gnu/store/9jg75a8rvdz3qxcbbm95312rlc4hyi98-mrustc-0.10-2.597593a-checkout.drv 
failed
View build log at 
'/var/log/guix/drvs/9j/g75a8rvdz3qxcbbm95312rlc4hyi98-mrustc-0.10-2.597593a-checkout.drv.gz'.
cannot build derivation 
`/gnu/store/wavx7rl6h93fpmc46nggnhkyxm75lqa4-mrustc-0.10-2.597593a-checkout.drv':
 1 dependencies couldn't be built
--8<---cut here---end--->8---

> (Stuttering is due to the unprotected use of ‘primitive-fork’: a
> non-local exit in the child leads it to execute the same code as its
> parent.  We should fix that, but should we really fork in the first
> place?  :-))

Right, this is problematic. I can't remember why I chose to fork.

In the meantime, this should be fixed by updating to 1.1.0-13.1341725 so
we can close this one I guess.

Mathieu





bug#59405: Bug Report Screenshot on Lenovo Yoga 700 14ISK Device (which has flawed BIOS/unPatchable)

2022-11-22 Thread Mathieu Othacehe


Hello,

> FYI - Failed Installation attempt: 

Thanks for the report. Did you use the 1.3.0 release installer? If so,
you might have better luck with the latest version
(https://guix.gnu.org/en/download/latest/) that includes a dump upload
mechanism.

Thanks,

Mathieu





bug#59447: Offload fails with: Throw to key `match-error'

2022-11-22 Thread Mathieu Othacehe


> Fixed in b2b9571935f9188086b2e7b434840eeda6c42805.
>
> We’ll have to update the ‘guix’ package to deploy it, though.

Thanks for fixing that one, closing!

Mathieu





bug#59467: GUIX Installer Bugs

2022-11-22 Thread Mathieu Othacehe


Hello,

> Hello I'm trying to install Guix 1.3.0 the last release but the
> graphical installer throw an issue, attached you images.

The 1.3.0 installer is a bit old and many issues, including the one you
are experimenting have been fixed.

Could you please retry with the latest installer available here:
https://guix.gnu.org/en/download/latest/

Thanks,

Mathieu





bug#59447: Offload fails with: Throw to key `match-error'

2022-11-21 Thread Mathieu Othacehe


Hello,

I'm trying to offload an aarch64-linux build on Berlin and it fails this
way:

--8<---cut here---start->8---
process 75612 acquired build slot '/var/guix/offload/10.0.0.9:22/3'
Backtrace:
In ice-9/boot-9.scm:
  1752:10 12 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
  11 (apply-smob/0 #)
In ice-9/boot-9.scm:
724:2 10 (call-with-prompt _ _ #)
In ice-9/eval.scm:
619:8  9 (_ #(#(#)))
In guix/ui.scm:
   2263:7  8 (run-guix . _)
  2226:10  7 (run-guix-command _ . _)
In guix/scripts/offload.scm:
   814:22  6 (guix-offload . _)
In ice-9/boot-9.scm:
  1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In guix/scripts/offload.scm:
   595:21  4 (process-request _ _ "/gnu/store/vqmlpayiwfagh6s86jwns…" …)
   514:36  3 (choose-build-machine _)
In guix/inferior.scm:
345:2  2 (port->inferior _ _)
327:2  1 (read-repl-response _ _)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" #)'.
guix build: error: unexpected EOF reading a line
make: *** [Makefile:7087: release] Error 1
--8<---cut here---end--->8---

It can be reproduced 100% by running a `make release` or using this
command: 

--8<---cut here---start->8---
guix deploy -L maintenance/hydra/modules maintenance/hydra/deploy-overdrive1.scm
--8<---cut here---end--->8---

Thanks,

Mathieu





bug#55336: Graphical installer: Selecting a partition scheme always takes me back to the start

2022-11-21 Thread Mathieu Othacehe


Hey,

> I tried again with w0wi4jvanaddk1zcvwzhlnn7fkfwab82-image.iso in the
> same machine and the issue is gone. I could install the system almost
> flawlesly. So no oportunity to try out the new mechanism :)

Thanks a lot for testing again and reporting :)

Mathieu





bug#53594: no matching pattern #

2022-11-21 Thread Mathieu Othacehe


Hey Ludo,

> If we cannot reproduce this bug, I propose that we remove it from the
> list of release blockers at .

I haven't had this issue for a long time, seems fair to unblock the
release issue.

Thanks,

Mathieu





bug#37513: Subject: Installer finish backtrace, umount dispatch exception /mnt device busy

2022-11-12 Thread Mathieu Othacehe


Hello,

Sorry for the very late reply. We recently added a bunch of error
reporting features to the installer. Any chance you still have the
hardware around and would be able to test again using the latest
installation image available here:
https://guix.gnu.org/en/download/latest/?

That would be super useful for us :)

Thanks,

Mathieu





bug#42054: installer: "invisible" screens, partitioning step fail

2022-11-12 Thread Mathieu Othacehe


Hello,

Sorry for the very late reply. We recently added a bunch of error
reporting features to the installer. Any chance you still have the
hardware around and would be able to test again using the latest
installation image available here:
https://guix.gnu.org/en/download/latest/.

That would be super useful for us :)

Thanks,

Mathieu





bug#47053: The installer has encountered an unexpected problem

2022-11-12 Thread Mathieu Othacehe


Hello,

> Attempted to install Guix System on a Librebooted machine and this
> happened.

Sorry for the very late reply. We recently added a bunch of error
reporting features to the installer. Any chance you still have the
hardware around and would be able to test again using the latest
installation image available here:
https://guix.gnu.org/en/download/latest/.

That would be super useful for us :)

Thanks,

Mathieu





bug#52767: Unexpected problem with Guix System installer

2022-11-12 Thread Mathieu Othacehe


Hello Mathieu,

> This is not the drive that was used to boot the Guix installer in the
> first place.
>
> Thanks, hope someone can understand what the issue is.

Sorry for the very late reply. We recently added a bunch of error
reporting features to the installer. Any chance you still have the
hardware around and would be able to test again using the latest
installation image available here:
https://guix.gnu.org/en/download/latest/?

That would be super useful for us :)

Thanks,

Mathieu





bug#54966: Guix Installer Bug

2022-11-12 Thread Mathieu Othacehe


Hello,

Sorry for the very late reply.

> I tried to install Guix on the Framework laptop, and up until the
> partitioning, everything goes smoothly. However, after this is
> reached, any attempt at mounting a partition to / results in this
> error statement

We recently added several error reporting mechanisms to the installer,
any chance you still have the hardware around and would be able to test
again using the latest installer image:
https://guix.gnu.org/en/download/latest/.

That would be really helpful for us :)

Thanks,

Mathieu





bug#56485: Graphical Installer Partitioning Bug

2022-11-12 Thread Mathieu Othacehe


Hey,

> I believe this issue was with the 1.3 installer and not the latest
> version. When I used the latest version I don't think I encountered
> this. Sorry I forgot to update.

Thanks for your answer :)

Mathieu





bug#33555: test.iso-image-installer.i686 got stuck after kernel oops

2022-11-12 Thread Mathieu Othacehe


Hey Mark,

> The most recent build of 'test.iso-image-installer.i686-linux', which
> was performed on hydra.gnunet.org, hit a kernel oops, and subsequently
> got stuck.
>
>   https://hydra.gnu.org/build/3194778

All installation tests are now passing on Cuirass.

Closing,

Mathieu





bug#37264: null pointer dereference in visual installer

2022-11-12 Thread Mathieu Othacehe


Hello,

> I try to install Guix System 1.0.1 ISO on VirtualBox 6.0.10.  I just
> select default values (press Enter every time) and get error after
> choosing options for disk partitioning.

Closing this old issue that is very likely fixed now.

Thanks,

Mathieu





bug#53459: installer: uuid->string failure backtrace (again)

2022-11-12 Thread Mathieu Othacehe


Hello,

> ice-9/eval.scm:159:9: Throw to key 'match-error' with args '("match" "no 
> matching pattern" (#f ext4))'

This is very likely fixed by: ab974ed709976d34917c8f6f9e5cc0004547af45.

Thanks,

Mathieu





bug#53978: installer crash

2022-11-12 Thread Mathieu Othacehe


Hello,

> I have tried to install the stable version of guix using the graphical
> installer and got the error on the attached screenshot.

Thanks for the report. This is very likely fixed by:
ab974ed709976d34917c8f6f9e5cc0004547af45.

Closing,

Mathieu





bug#55011: let user drop into REPL from installer

2022-11-12 Thread Mathieu Othacehe


Hello,

> Long story short I got an error related to mkfs.btrfs, this I think has
> already been reported, but Mumi's search is not great so I haven't
> verified that yet.

Installer now reports failing partitioning commands and offers to re-run
them or keep things going. This should cover your request.

Closing,

Mathieu





bug#55336: Graphical installer: Selecting a partition scheme always takes me back to the start

2022-11-12 Thread Mathieu Othacehe


Hello Luis,

> Using the latest installer image
> (l6dfrnjhhjf3axjndk290qsgxj0bzpgm-image.iso) I can't install the
> system because everytime I select a partition scheme, the installer
> takes me back to the language selection step.

The latest installer offers a core-dump upload mechanism that should
help diagnosing this issue. Any chance you could try again with
https://ci.guix.gnu.org/build/1689729/details, or later :)?

Thanks,

Mathieu





bug#56485: Graphical Installer Partitioning Bug

2022-11-12 Thread Mathieu Othacehe


Hello,

> The bug: When I tell it to go ahead and partition the drive according
> to what I set up, the screen flashes and resets the entire installer
> to the beginning of the installation i.e. takes me back to the locale
> setting screen. However, when I don't create a swap, it seems to
> partition without complaint.

This could be a manifestation of https://issues.guix.gnu.org/58732. With
the latest installer version there is a core-dump uploader which will
help diagnosing those issues.

Thanks,

Mathieu





bug#59179: BUG: (./guix/base32.scm:296:65: ERROR)

2022-11-12 Thread Mathieu Othacehe


Hello,

Fixed by Andrew with: 0760a8511d512ebac388eda0b9e18fd7451ca4b3. Sorry
for the breakage!

Mathieu





bug#58923: Malformed core dumps on Guix System

2022-11-10 Thread Mathieu Othacehe


Hey,

Thanks for trying to reproduce it. Turns out, it is now working both on
Berlin and on installation images but still failing on my machine. It's
not often that those kind of issues do resolve by themselves. I would
suspect a kernel regression here.

My machine has the following kernel:
Linux meije 5.19.15 #1 SMP PREEMPT_DYNAMIC 1 x86_64 GNU/Linux 

while Berlin and installation images have respectively:
Linux berlin.guix.gnu.org 6.0.7-gnu #1 SMP PREEMPT_DYNAMIC 1 x86_64 GNU/Linux
Linux gnu 6.0.7-gnu #1 SMP PREEMPT_DYNAMIC 1 x86_64 GNU/Linux

Anyway, we can now:

--8<---cut here---start->8---
wget https://dump.guix.gnu.org/download/installer-dump-1c97b34e
tar -xvf installer-dump-1c97b34e
cd dump.2022-10-14.17.15.30
gdb $(type -P guile) core-dump
(gdb) bt
#0  linux_destroy (dev=0x268e620) at arch/linux.c:1615
#1  0x7fee6ae9cd37 in chained_finalizer (obj=0x7fee54779370, 
data=0x7fee6790acc0) at finalizers.c:84
#2  0x7fee6adf5e3f in GC_invoke_finalizers () at extra/../finalize.c:1281
#3  0x7fee6ae9d429 in scm_run_finalizers () at finalizers.c:414
#4  0x7fee6aea4482 in finalization_thread_proc (unused=) at 
finalizers.c:244
#5  0x7fee6ae9085a in c_body (d=0x7fee69f21d80) at continuations.c:430
#6  0x7fee6af1d326 in vm_regular_engine (thread=0x7fee6a3b8b40) at 
vm-engine.c:972
#7  0x7fee6af2a5d9 in scm_call_n (proc=, argv=, nargs=2) at vm.c:1610
#8  0x7fee6ae9209a in scm_call_2 (proc=, arg1=, arg2=) at eval.c:503
#9  0x7fee6af48742 in scm_c_with_exception_handler.constprop.0 (type=#t, 
handler_data=handler_data@entry=0x7fee69f21d10, 
thunk_data=thunk_data@entry=0x7fee69f21d10, thunk=, 
handler=)
at exceptions.c:170
#10 0x7fee6af1a88f in scm_c_catch (tag=, body=, body_data=, handler=, 
handler_data=, pre_unwind_handler=, 
pre_unwind_handler_data=0x7fee6a436040) at throw.c:168
#11 0x7fee6ae92e66 in scm_i_with_continuation_barrier 
(pre_unwind_handler=0x7fee6ae92b80 , 
pre_unwind_handler_data=0x7fee6a436040, handler_data=0x7fee69f21d80, 
handler=0x7fee6ae998b0 , 
body_data=0x7fee69f21d80, body=0x7fee6ae90850 ) at 
continuations.c:368
#12 scm_c_with_continuation_barrier (func=, data=) at continuations.c:464
#13 0x7fee6af19b39 in with_guile (base=0x7fee69f21e08, data=0x7fee69f21e30) 
at threads.c:645
#14 0x7fee6adf00ba in GC_call_with_stack_base (fn=fn@entry=0x7fee6af19a60 
, arg=arg@entry=0x7fee69f21e30) at extra/../misc.c:2106
#15 0x7fee6af128b8 in scm_i_with_guile (dynamic_state=, 
data=, func=) at threads.c:688
#16 scm_with_guile (func=, data=) at threads.c:694
#17 0x7fee6adc6d7e in ?? () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0
#18 0x7fee6a9c4eff in clone () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
--8<---cut here---end--->8---

which feels quite nice.

Closing that one.

Thanks,

Mathieu





bug#58732: installer: finalizers & device destroy segfault

2022-11-10 Thread Mathieu Othacehe


Hey,

> Looking at device.c in Parted, that’s probably the right thing because
> PedDevice objects are kept in a linked list whose head is stored in the
> ‘devices’ global variable of device.c.  So you cannot just free them
> asynchronously from a finalizer thread because they might still be
> accessed from other parts of the library.  This is the explanation that
> should go in the comment, and it’s clearly a good reason not to free
> those PedDevice objects.

If the finalizer was run synchronously when a device is removed from the
weak hash table then things would be OK. The device would be removed
from the global linked list by _device_register. get_device would malloc
a new structure and so on. However finalizers are not run synchronously
so here we are.

> Now, we could provide bindings for ‘ped_device_destroy’ that users could
> explicitly call if they want to (this would be similar to explicit calls
> to ‘close-port’).  We’d arrange to make it idempotent.

Sure.

Thanks for your help on that one. I pushed the proposed patch and updated
Guile-Parted to 0.0.7 in Guix.

Mathieu





bug#58732: installer: finalizers & device destroy segfault

2022-11-09 Thread Mathieu Othacehe

Hey,

I ran further tests and my understanding is that the weak hash-table /
finalizer mechanism is not compatible with a C function that can return
multiple times the same allocated object.

Even if we were to introduce a set-pointer-unique-finalizer! procedure
that calls scm_i_set_finalizer instead of scm_i_add_finalizer we would
still have double free errors because the finalizers are registered on
SCM pointers and not on libparted C pointers when calling
GC_REGISTER_FINALIZER_NO_ORDER.

I tested it out and I had several SCM pointers encapsulating the same
libparted C pointer, thus multiple finalizers on the same underlying C
pointer.

Anyway, here is a patch that solves the issue by removing the device
finalizer. It also means that all devices are persisted until the end of
the program which doesn't feel right, but I cannot think of a better
solution.

Let me know if you agree with my reasoning :)

Thanks,

Mathieu
>From 066220a75c020b818aab9c2f5c3a7db835fa871a Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Wed, 9 Nov 2022 16:12:52 +0100
Subject: [PATCH 1/1] Remove the finalizer on device pointers.

Fixes: <https://issues.guix.gnu.org/58732>

* parted/device.scm (%device-destroy): Remove it.
(pointer->device!): Do not set a finalizer.
---
 parted/device.scm | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/parted/device.scm b/parted/device.scm
index 56a774b..be7f0ac 100644
--- a/parted/device.scm
+++ b/parted/device.scm
@@ -43,20 +43,23 @@
 device-get-minimum-alignment
 device-get-optimum-alignment))
 
-;; Record all devices, so that pointer finalizers are only set once,
-;; even if get-device returns an already known pointer.  Use the
-;; pointer as key and the associated  as value.
-(define %devices (make-weak-value-hash-table))
-
-(define %device-destroy
-  (libparted->pointer "ped_device_destroy"))
-
+;; Record all devices, so that we do not end up with different 
+;; objects aliasing the same underlying C pointer. Use the pointer as key and
+;; the associated  as value.
+(define %devices (make-hash-table))
+
+;; %DEVICES was a weak hash-table and we used to set a finalizer on POINTER.
+;; This is inevitably causing double free issues for the following reason:
+;;
+;; When  goes out of scope and is removed from the %DEVICES table, the
+;; finalizer that is set on the underlying C pointer is still registered but
+;; possibly not called as finalization happens is a separate thread.  If a
+;; subsequent call to ped_device_get returns the same C pointer, another
+;; finalizer will be registered.  This means that the finalization function
+;; can be called twice on the same pointer, causing a double free issue.
 (define (pointer->device! pointer)
-  ;; Check if a finalizer is already registered for this pointer.
   (or (hash-ref %devices pointer)
   (let ((device (pointer->device pointer)))
-;; Contrary to its name, this "adds" a finalizer.
-(set-pointer-finalizer! pointer %device-destroy)
 (hash-set! %devices pointer device)
 device)))
 
-- 
2.38.0



bug#58732: installer: finalizers & device destroy segfault

2022-11-07 Thread Mathieu Othacehe

Hola,

> Finalizers are set on pointer objects, so they’re invoked when the
> pointer object goes out of scope.  But:
>
>   (eq? (make-pointer 123) (make-pointer 123))
>   => #f

I agree, but somehow this works:

--8<---cut here---start->8---
scheme@(guile-user)> ,use (parted)
scheme@(guile-user)> (eq? (get-device "/tmp/test.img") (get-device 
"/tmp/test.img"))
$3 = #t
--8<---cut here---end--->8---

denoting that the "pointer->device!" procedure is working correctly and
the underlying pointer object returned by pointer->procedure is the
same.

> So a possible mistake is to add one finalizer on each pointer object and
> have several pointer objects aliasing the same C object; that’s how you
> can get the same “free” function called several times on the same C
> object.

I don't think that what's happening. I have monitored closely the
%devices weak hash table and it never exceeds the total device count.

We have multiple finalizers registered for the same C pointer but that's
because the weak hash table may be cleaned by (gc) calls, leaving the
opportunity for multiple finalizers registration on the same C pointer.

I attached a reproducer that exposes the double free issue.

--8<---cut here---start->8---
sudo -E guile ~/tmp/parted-bug.scm
double free or corruption (!prev)
Aborted
--8<---cut here---end--->8---

We could save up somewhere which pointers have registered finalizers but
that would prevent the devices garbage collection, in the same way as if
%device was a plain hash table and not a weak one.

That could well be a solution, as I cannot see at the moment how we
could preserve this mechanism and avoid multiple finalization.

Thanks,

Mathieu


parted-bug.scm
Description: Binary data


bug#58732: installer: finalizers & device destroy segfault

2022-11-06 Thread Mathieu Othacehe

Hey,

I made some progress on that one. I think, this is what's going on:

1. Two new PedDevice A and B are malloc'ed by the libparted when opening
the installer partitioning page.

2. They are added to the %devices weak hash table by pointer->device!
and their respective finalizers are registered.

3. The partitioning ends and A goes out of scope. It is eventually
removed from %devices but it does not mean its finalizer will be run
immediately.

4. The partitioning is restarted using the installer menu. B is still in
the %devices hash table. However, A is now gone and is added again to
the %devices hash table by the pointer->device! procedure. Another
finalizer is registered for A.

That's because set-pointer-finalizer! does not *set* a finalizer it
*adds* one.

5. The partitioning ends and both A and B goes out of scope. They are
removed from %devices and their finalizers are called. The A finalizer
is called twice resulting in a double free.

This race condition is created by the fact that there is a time window
where the device is removed from the %devices hash table but its
finalizer is not immediately called.

If set-pointer-finalizer! actually called scm_i_set_finalizer instead of
scm_i_add_finalizer the A finalizer would be set twice but called only
once. Do you think it would be an option?

I attached the instrumentation patches (good old printf's) as well as
the syslog I based my analysis upon.

Thanks,

Mathieu
diff --git a/gnu/installer/parted.scm b/gnu/installer/parted.scm
index 82375d29e3..381e1b3ce7 100644
--- a/gnu/installer/parted.scm
+++ b/gnu/installer/parted.scm
@@ -1502,6 +1502,7 @@ (define (user-partitions->configuration user-partitions)
 
 (define (init-parted)
   "Initialize libparted support."
+  (%parted-syslog-port (syslog-port))
   (probe-all-devices!)
   ;; Remove all logical devices, otherwise "device-is-busy?" will report true
   ;; on all devices containaing active logical volumes.
diff -aur parted/libparted/arch/linux.c tmp/parted-3.5/libparted/arch/linux.c
--- parted/libparted/arch/linux.c   2022-11-04 10:14:33.551737324 +0100
+++ tmp/parted-3.5/libparted/arch/linux.c   2022-04-18 20:38:45.0 
+0200
@@ -17,7 +17,6 @@
 
 #define PROC_DEVICES_BUFSIZ 16384
 
-#include 
 #include 
 #include 
 #include 
@@ -44,7 +43,6 @@
 #include 
 #include 
 #ifdef ENABLE_DEVICE_MAPPER
-
 #include 
 #endif
 
@@ -89,8 +87,6 @@
 #define WR_MODE (O_WRONLY)
 #define RW_MODE (O_RDWR)
 
-int syslog_init;
-
 struct hd_geometry {
 unsigned char heads;
 unsigned char sectors;
@@ -1600,11 +1596,6 @@
 _("ped_device_new()  Unsupported device 
type"));
 goto error_free_arch_specific;
 }
-if (!syslog_init) {
-openlog("parted", LOG_PID, LOG_USER);
-syslog_init = 1;
-}
-syslog(LOG_INFO, "parted: new: %p\n", dev);
 return dev;
 
 error_free_arch_specific:
@@ -1620,8 +1611,6 @@
 static void
 linux_destroy (PedDevice* dev)
 {
-syslog(LOG_INFO, "parted: destroy: %p\n", dev);
-
 LinuxSpecific *arch_specific = LINUX_SPECIFIC(dev);
 void *p = arch_specific->dmtype;
 
diff --git a/parted/device.scm b/parted/device.scm
index 9f688dd..36d83f4 100644
--- a/parted/device.scm
+++ b/parted/device.scm
@@ -23,7 +23,7 @@
   #:use-module (parted geom)
   #:use-module (parted natmath)
   #:use-module (parted structs)
-  #:export (parted-syslog-port
+  #:export (%parted-syslog-port
 probe-all-devices!
 get-device
 get-device-next
@@ -44,8 +44,8 @@
 device-get-minimum-alignment
 device-get-optimum-alignment))
 
-(define parted-syslog-port
-  (make-parameter #f))
+(define %parted-syslog-port
+  (make-parameter #t))
 
 ;; Record all devices, so that pointer finalizers are only set once,
 ;; even if get-device returns an already known pointer.  Use the
@@ -58,22 +58,22 @@
 (define (pointer->device! pointer)
   ;; Check if a finalizer is already registered for this pointer.
   (format (%parted-syslog-port)
-  "guile-parted: pointer->device!: ~a" pointer)
+  "guile-parted: pointer->device!: ~a~%" pointer)
 
   (format (%parted-syslog-port)
-  "guile-parted: hash begin")
+  "guile-parted: hash begin~%")
   (hash-for-each (lambda (k v)
(format (%parted-syslog-port)
-   "guile-parted: hash: ~a -> ~a" k v))
+   "guile-parted: hash: ~a -> ~a~%" k v))
  %devices)
   (format (%parted-syslog-port)
-  "guile-parted: hash end")
+  "guile-parted: hash end~%")
 
   (or (hash-ref %devices pointer)
   (let ((device (pointer->device pointer)))
 
 (format (%parted-syslog-port)
-  "guile-parted: finalizer!: ~a" pointer)
+  "guile-parted: finalizer!: ~a~%" pointer)
 
 ;; Contrary to its name, this "adds" a finalizer.
 (set-pointer-finalizer! 

bug#40682: Installer hangs while connecting to WiFi network

2022-11-03 Thread Mathieu Othacehe


Hey,

>> Are these still in your plans?  Otherwise let's close this old,
>> high severity bug.
>
> Friendly ping. :-)

Yeah, we didn't have other reports regarding WiFi connections, so lets
close it.

Thanks,

Mathieu





bug#58732: installer: finalizers & device destroy segfault

2022-11-03 Thread Mathieu Othacehe


Hey,

Thanks for your help :)

>   1. Bindings create wrappers for C pointers—e.g., with
>  ‘pointer->device’.  If several C functions return a pointer P, you
>  must make sure to return always the same wrapper and not create a
>  new one.

Agreed.

>
>  ‘pointer->device!’ attempts to do that but I think it’s bogus: it
>  uses a weak-value hash table, where the value is the wrapper.  So
>  if the wrapper disappears before the underlying C object, then the
>  pointer is called and bad things ensue.

I'm not sure to understand how could the wrapper disappear before the
underlying C object? We are only exposing  records to the
Guile-Parted users so my assumption is that when  goes out of
scope, the pointer it wraps can be freed, but I'm maybe missing
something?

>  ‘define-wrapped-pointer-type’ in Guile is meant to help with these
>  things (info "(guile) Void Pointers and Byte Access").  We can’t
>  use it directly here because we’re using bytestructures and all
>  that.

Turns out, the "wrap" procedure defined in define-wrapped-pointer-type
is a clone of pointer->device! except that it doesn't set a
finalizer.

Regarding object lifetime, I wrote a small memo in 2019 here:
https://issues.guix.gnu.org/36402#11.

We have three weak hash tables in Guile-Parted:

%devices: To make sure that we do not set multiple finalizers on the
same pointers.

%disk-devices: So that a device always outlives its disks.

%partition-disks: So that a disk always outlives its partitions.

This means that as far as I can tell we are OK regarding your second
point about "aggregation relations".

Mathieu





bug#58923: Malformed core dumps on Guix System

2022-11-03 Thread Mathieu Othacehe


Hey zimoun,

> #0  0x00401106 in main ()
> (gdb) bt
> #0  0x00401106 in main ()
> (gdb) exit

OK so it must somehow be related to Guix System or the Linux kernel we
ship.

Thanks for your feedback,

Mathieu





bug#58733: installer: coredump generation

2022-11-02 Thread Mathieu Othacehe


Hey,

> Both look reasonable to me, thanks!

Thanks for reviewing :)

> Now, we should probably focus on Guile-Parted…

Yes, I saw you sent a few pointers, that will be my next focus!

Mathieu





bug#58923: Malformed core dumps on Guix System

2022-11-02 Thread Mathieu Othacehe


Hey,

> Could it have to do with /proc/sys/kernel/core_pattern or with the fact
> that your /tmp file system was full or something?  What if you try to
> have the core dump on another file system?

I suspected that at first, but I then reproduced it on Berlin so I would
rather bet on a recent regression. I'll see if this can be reproduced on
a foreign distribution.

Thanks,

Mathieu





bug#57068: Resizing mcron job in vm-image.tmpl interferes with settings

2022-11-01 Thread Mathieu Othacehe


Hey,

> Oh, I wasn’t aware of that, that should certainly be fixed.  (I fixed a
> similar issue in GNOME some years ago, and I’m confident it’ll be easier
> to fix in Xfce because it doesn’t have all those layers and daemons and
> JavaScript and DBus interfaces.  :-))

Fixing this behaviour in Xfce seems like the right thing to do to
conserve SPICE support and fix the QEMU resizing issue.

This also looks like a large development, so I propose to unblock the
release with this ticket.

Thanks,

Mathieu





bug#49508: Implement --allow-insecure-transport for `guix pull`

2022-11-01 Thread Mathieu Othacehe


Hello,

> ‘verify_server_cert’ in src/streams/openssl.c is called
> unconditionally.  So it seems that the first thing to do would be to
> submit a patch upstream that would allow users to disable certificate
> checks via ‘git_libgit2_opts’.

While this seems like something that we definitely want, I think we
shouldn't block the release with a contribution that can take time to be
upstreamed in libgit2.

Unblocking #53214.

Mathieu





bug#58733: installer: coredump generation

2022-10-31 Thread Mathieu Othacehe

> Here is an attached patch implementing the proposed mechanism.

I also prepared the attached patch as a follow-up. The idea is to hide
the backtrace page when the user chooses to "Report the failure".

Thanks,

Mathieu
>From d3f2ce83152a8ea453b407652dbee7b86a64816b Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Mon, 31 Oct 2022 16:43:09 +0100
Subject: [PATCH 1/1] installer: Skip the backtrace page on user abort.

When the user aborts the installation because a core dump is discovered or the
installation command failed, displaying the abort backtrace doesn't make much
sense. Hide it when the abort condition is  and skip directly
to the dump page.

* gnu/installer/steps.scm (): New variable.
(user-abort-error?): New procedure.
* gnu/installer/newt/final.scm (run-install-failed-page): Raise a
user-abort-error.
* gnu/installer/newt/welcome.scm (run-welcome-page): Ditto.
* gnu/installer.scm (installer-program): Hide the backtrace page and directly
propose to dump the report when the a  is raised.
---
 gnu/installer.scm  | 18 ++
 gnu/installer/newt/final.scm   |  5 ++---
 gnu/installer/newt/welcome.scm |  3 +--
 gnu/installer/steps.scm|  8 +++-
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/gnu/installer.scm b/gnu/installer.scm
index 52c595b5b7..5cd1af8edf 100644
--- a/gnu/installer.scm
+++ b/gnu/installer.scm
@@ -453,11 +453,21 @@ (define results
   key args)
   (define dump-dir
 (prepare-dump key args #:result %current-result))
+
+  (define user-abort?
+(match args
+  (((? user-abort-error? obj)) #t)
+  (_ #f)))
+
   (define action
-((installer-exit-error current-installer)
- (get-string-all
-  (open-input-file
-   (string-append dump-dir "/installer-backtrace")
+(if user-abort?
+'dump
+((installer-exit-error current-installer)
+ (get-string-all
+  (open-input-file
+   (string-append dump-dir
+  "/installer-backtrace"))
+
   (match action
 ('dump
  (let* ((dump-files
diff --git a/gnu/installer/newt/final.scm b/gnu/installer/newt/final.scm
index 6e55be5067..9f950a0551 100644
--- a/gnu/installer/newt/final.scm
+++ b/gnu/installer/newt/final.scm
@@ -92,9 +92,8 @@ (define (run-install-failed-page)
 ;; Keep going, the installer will be restarted later on.
 #t)
(3 (raise
-   (condition
-(
- (message "User abort.")))
+(condition
+ ())
 (_
  (send-to-clients '(installation-failure))
  #t)))
diff --git a/gnu/installer/newt/welcome.scm b/gnu/installer/newt/welcome.scm
index 5d47591d67..326996b005 100644
--- a/gnu/installer/newt/welcome.scm
+++ b/gnu/installer/newt/welcome.scm
@@ -145,8 +145,7 @@ (define (run-welcome-page logo)
 (1 #t)
 (2 (raise
 (condition
- (
-  (message "User abort.")))
+ ())
 (run-menu-page
  (G_ "GNU Guix install")
  (G_ "Welcome to GNU Guix system installer!
diff --git a/gnu/installer/steps.scm b/gnu/installer/steps.scm
index 8b25ae97c8..0c505e40e4 100644
--- a/gnu/installer/steps.scm
+++ b/gnu/installer/steps.scm
@@ -28,7 +28,10 @@ (define-module (gnu installer steps)
   #:use-module (srfi srfi-34)
   #:use-module (srfi srfi-35)
   #:use-module (rnrs io ports)
-  #:export (
+  #:export (
+user-abort-error?
+
+
 installer-step
 make-installer-step
 installer-step?
@@ -50,6 +53,9 @@ (define-module (gnu installer steps)
 
 %current-result))
 
+(define-condition-type  
+  user-abort-error?)
+
 ;; Hash table storing the step results. Use it only for logging and debug
 ;; purposes.
 (define %current-result (make-hash-table))
-- 
2.38.0



bug#58926: Shepherd becomes unresponsive after an interrupt

2022-10-31 Thread Mathieu Othacehe


Hello,

When running the following command:

--8<---cut here---start->8---
sudo herd restart service-that-hangs-upon-restart
--8<---cut here---end--->8---

then hitting C-c, Shepherd becomes totally unresponsive:

--8<---cut here---start->8---
sudo herd status
--8<---cut here---end--->8---

and all further Shpeherd commands hang forever. I was able to reproduce
it in two different configurations:

1. On my laptop with a Wireguard service trying to reach a non-existing
DNS server.

--8<---cut here---start->8---
(service wireguard-service-type
 (wireguard-configuration
  (addresses (list "10.0.0.2/24"))
  (dns '("10.0.0.50")) #does not exit
--8<---cut here---end--->8---

2. On Berlin, while trying to restart nginx.

In both situations, the "reboot" command was also hanging.

Thanks,

Mathieu





bug#58733: installer: coredump generation

2022-10-31 Thread Mathieu Othacehe

Hello,

> Failed to read a valid object file image from memory.
> Core was generated by 
> `/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/bin/guile 
> --no-auto-com'.

This is reported as: https://issues.guix.gnu.org/58923

> I think that it would be great if we could enable coredump generation
> from the installer. This way, when a crash occurs and the installer
> restarts, it would notice that there is an existing coredump in say
> /tmp/coredump_xxx and propose to upload it using the existing dump
> mechanism.

Here is an attached patch implementing the proposed mechanism.

Mathieu
>From f4d2a1bb4df2f65b650be704bffb7ea469ae0232 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Mon, 31 Oct 2022 13:03:46 +0100
Subject: [PATCH 1/1] installer: Add core dump support.

Fixes: <https://issues.guix.gnu.org/58733>

* gnu/installer.scm (installer-program): Enable core dump generation.
* gnu/installer/dump.scm (%core-dump): New variable.
(prepare-dump): Copy the core dump file.
* gnu/installer/newt/welcome.scm (run-welcome-page): Propose to report an
installation that previously generated a core dump.
---
 gnu/installer.scm  |  6 ++
 gnu/installer/dump.scm | 10 +-
 gnu/installer/newt/welcome.scm | 15 +++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/gnu/installer.scm b/gnu/installer.scm
index 8a6e604fa5..52c595b5b7 100644
--- a/gnu/installer.scm
+++ b/gnu/installer.scm
@@ -389,6 +389,12 @@ (define installer-builder
  (ice-9 match)
  (ice-9 textual-ports))
 
+;; Enable core dump generation.
+(setrlimit 'core #f #f)
+(call-with-output-file "/proc/sys/kernel/core_pattern"
+  (lambda (port)
+(format port %core-dump)))
+
 ;; Initialize gettext support so that installers can use
 ;; (guix i18n) module.
 #$init-gettext
diff --git a/gnu/installer/dump.scm b/gnu/installer/dump.scm
index daa02f205a..f91cbae021 100644
--- a/gnu/installer/dump.scm
+++ b/gnu/installer/dump.scm
@@ -28,13 +28,17 @@ (define-module (gnu installer dump)
   #:use-module (web http)
   #:use-module (web response)
   #:use-module (webutils multipart)
-  #:export (prepare-dump
+  #:export (%core-dump
+prepare-dump
 make-dump
 send-dump-report))
 
 ;; The installer crash dump type.
 (define %dump-type "installer-dump")
 
+;; The core dump file.
+(define %core-dump "/tmp/installer-core-dump")
+
 (define (result->list result)
   "Return the alist for the given RESULT."
   (hash-map->list (lambda (k v)
@@ -66,6 +70,10 @@ (define dump-dir
 ;; syslog
 (copy-file "/var/log/messages" "syslog")
 
+;; core dump
+(when (file-exists? %core-dump)
+  (copy-file %core-dump "core-dump"))
+
 ;; dmesg
 (let ((pipe (open-pipe* OPEN_READ "dmesg")))
   (call-with-output-file "dmesg"
diff --git a/gnu/installer/newt/welcome.scm b/gnu/installer/newt/welcome.scm
index 0bca44d1b2..5d47591d67 100644
--- a/gnu/installer/newt/welcome.scm
+++ b/gnu/installer/newt/welcome.scm
@@ -20,6 +20,7 @@
 (define-module (gnu installer newt welcome)
   #:use-module ((gnu build linux-modules)
 #:select (modules-loaded))
+  #:use-module (gnu installer dump)
   #:use-module (gnu installer steps)
   #:use-module (gnu installer utils)
   #:use-module (gnu installer newt page)
@@ -132,6 +133,20 @@ (define (run-welcome-page logo)
 the system does not boot, perhaps you will need to add nomodeset to the
 kernel arguments and need to configure the uvesafb kernel module.")
   (G_ "Pre-install warning")))
+(when (file-exists? %core-dump)
+  (match
+  (choice-window
+   (G_ "Previous installation failed")
+   (G_ "Continue")
+   (G_ "Report the failure")
+   (G_ "It seems that the previous installation exited unexpectedly \
+and generated a core dump.  Do you want to continue or to report the failure \
+first?"))
+(1 #t)
+(2 (raise
+(condition
+ (
+  (message "User abort.")))
 (run-menu-page
  (G_ "GNU Guix install")
  (G_ "Welcome to GNU Guix system installer!
-- 
2.38.0



bug#58923: Malformed core dumps on Guix System

2022-10-31 Thread Mathieu Othacehe


Hello,

Working on https://issues.guix.gnu.org/58733, I noticed that there is
something wrong with the core dumps we are generating on Guix System.

--8<---cut here---start->8---
mathieu@meije ~/tmp [env]$ cat test.c 
#include 
int main() {
int *t = NULL;
return *t;
}

mathieu@meije ~/tmp [env]$ gcc test.c 
mathieu@meije ~/tmp [env]$ ulimit -c unlimited
mathieu@meije ~/tmp [env]$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size  (blocks, -c) unlimited

mathieu@meije ~/tmp [env]$ echo "/tmp/my-core-%p" | sudo tee 
/proc/sys/kernel/core_pattern
/tmp/my-core-%p

mathieu@meije ~/tmp [env]$ ./a.out 
Segmentation fault (core dumped)

mathieu@meije ~/tmp [env]$ gdb ./a.out /tmp/my-core-5622
...
BFD: warning: /tmp/my-core-5622 has a segment extending past end of file
...
Failed to read a valid object file image from memory.
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00401102 in main ()
(gdb) bt
#0  0x00401102 in main ()
Backtrace stopped: Cannot access memory at address 0x7fff14e14168
--8<---cut here---end--->8---

The "has a segment extending past end of file" warning appears to be
problematic and the "bt" command does not work which makes core dump
generation a bit useless.

Thanks,

Mathieu





bug#53194: System test partition.img differs in size across hosts(?)

2022-10-31 Thread Mathieu Othacehe


Hello,

> FYI, I pushed this workaround in
> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

I'm not able to reproduce this issue with or without the workaround, by
running the openvswitch test on Berlin and on my laptop. I think we can
close it for now and re-open it if someone finds a more reliable
reproducer.

Thanks,

Mathieu





bug#53541: [installer] backtrace during fresh Guix System install after during formatting

2022-10-31 Thread Mathieu Othacehe


> Fixes: .
>
> * gnu/installer/parted.scm (read-partition-uuid/retry): New procedure.
> (check-user-partitions): Use it.

Pushed as ab974ed709976d34917c8f6f9e5cc0004547af45.

Mathieu





bug#53541: [installer] backtrace during fresh Guix System install after during formatting

2022-10-24 Thread Mathieu Othacehe


Hey,

> I was able to reproduce it on real hardware, following those
> instructions. The dump is available here if people want to join the
> party: dump.guix.gnu.org/download/installer-dump-304492ff.

So the backtrace suggests that we are trying to open /dev/nvme0n1p1 to
read its superblock:

--8<---cut here---start->8---
   9 (open "/dev/nvme0n1p1" 524288 #)
--8<---cut here---end--->8---

and that it fails because the file does not exist:

--8<---cut here---start->8---
  1780:13  6 (_ #< components: (#<> #< 
origin: "open-fdes"> #< message: "~A"> #< irritants: ("No 
such file or directory")> #<…>)
--8<---cut here---end--->8---

This open call originates from check-user-partitions in (gnu installer
parted). If we arrive here, it means that the file *should* exist.

Looking at the kernel trace, the two last lines are:

--8<---cut here---start->8---
[   72.271204]  nvme0n1: p1 p2 p3 p4
[  127.415648]  nvme0n1: p1 p2
--8<---cut here---end--->8---

so the disk partition table is updated because we move from 4 to 2
partitions. Could it be possible that during a brief period of time the
/dev/nvme0n1p1 file disappears then re-appears?

I'll try to reproduce it a VM to conduct more testing.

Mathieu





bug#58733: installer: coredump generation

2022-10-23 Thread Mathieu Othacehe


Hello,

This installer sometimes sadly segfaults, most of the time in
libparted. To be able to catch this coredump[1], I ran those commands:

--8<---cut here---start->8---
echo /tmp/core > /proc/sys/kernel/core_pattern
prlimit --core=unlimited --pid=1234
--8<---cut here---end--->8---

The coredump I obtained did not seem to be exploitable, despite the fact
that it weights 155MB:

--8<---cut here---start->8---
mathieu@meije ~/guix [env]$ gdb 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/bin/guile core
...
BFD: warning: /home/mathieu/guix/core has a segment extending past end of file
warning: core file may not match specified executable file.
...
Failed to read a valid object file image from memory.
Core was generated by 
`/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/bin/guile 
--no-auto-com'.
--8<---cut here---end--->8---

So I decided to adopt a new strategy and ran:

--8<---cut here---start->8---
$ gdb
$ attach 1234
...
$ gcore
--8<---cut here---end--->8---

to get a viable core dump, and those commands to exploit it (thanks
Josselin!):

--8<---cut here---start->8---
(gdb) info sharedlibrary 
>FromTo  Syms Read   Shared Object Library
...
0x7f892c59c850  0x7f892c5d3d0b  Yes (*) 
/gnu/store/qz7qqrhgcs3ixv8f1k30gwiqr1prm7qs-parted-3.5/lib/libparted.so
(gdb) add-symbol-file  
/gnu/store/b0ymz7vjfkcvhbci49q5yk1fi0l9lq49-parted-3.5/lib/libparted.so  
0x7f892c59c850 
add symbol table from file 
"/gnu/store/b0ymz7vjfkcvhbci49q5yk1fi0l9lq49-parted-3.5/lib/libparted.so" at
.text_addr = 0x7f892c59c850
(y or n) y
Reading symbols from 
/gnu/store/b0ymz7vjfkcvhbci49q5yk1fi0l9lq49-parted-3.5/lib/libparted.so...
(gdb) bt
#0  linux_destroy (dev=0x1dc89e0) at arch/linux.c:1615
#1  0x7f8941aecd37 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
...
--8<---cut here---end--->8---

I think that it would be great if we could enable coredump generation
from the installer. This way, when a crash occurs and the installer
restarts, it would notice that there is an existing coredump in say
/tmp/coredump_xxx and propose to upload it using the existing dump
mechanism.

Thanks,

Mathieu

[1]: https://issues.guix.gnu.org/58732





bug#58734: installer: backtrace page in final step

2022-10-23 Thread Mathieu Othacehe


Hello,

When an error occurs in the pseudo terminal displaying the "guix system
init" command output, the backtrace is not displayed correctly and the
keyboard cannot be used (arrows, tab, enter keys) to scroll down the
backtrace or dump it.

It can easily be reproduced by introducing an error in the "run-command"
function, this way:

--8<---cut here---start->8---
--- a/gnu/installer/utils.scm
+++ b/gnu/installer/utils.scm
@@ -184,6 +184,7 @@ (define (pause)
   (((port _ ...) _ _)
(read-line port
 
+  (error 'fake)
   (installer-log-line "running command ~s" command)
   (define result (run-external-command-with-line-hooks
   (list %display-line-hook) command
--8<---cut here---end--->8---

I suspect that we may need to run "newt-init" and "clear-screen" before
displaying the backtrace page.

Thanks,

Mathieu





bug#58734: installer: backtrace page in final step

2022-10-23 Thread Mathieu Othacehe


> +  (error 'fake)
>(installer-log-line "running command ~s" command)
>(define result (run-external-command-with-line-hooks
>(list %display-line-hook) command

Fixed with bf5e78d59fcb188d0bce02d93c93d06069178837.

Thanks,

Mathieu





bug#58732: installer: finalizers & device destroy segfault

2022-10-23 Thread Mathieu Othacehe


Hello,

I found a segfault in the installer by running those steps:

- Run an automatic partitioning with separate home and no encryption
- In the final configuration page, come back to partitioning
- Remove all partitions but the ESP one, create a new btrfs root
- partition
- Repeat until the crash occurs

Using Josselin's instructions here: https://issues.guix.gnu.org/57513, I
was able to get the following backtrace:

--8<---cut here---start->8---
Reading symbols from 
/gnu/store/b0ymz7vjfkcvhbci49q5yk1fi0l9lq49-parted-3.5/lib/libparted.so...
(gdb) bt
#0  linux_destroy (dev=0x1dc89e0) at arch/linux.c:1615
#1  0x7f8941aecd37 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#2  0x7f8941a45e3f in GC_invoke_finalizers () from 
/gnu/store/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#3  0x7f8941aed429 in scm_run_finalizers () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#4  0x7f8941af4482 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#5  0x7f8941ae085a in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#6  0x7f8941b6d336 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#7  0x7f8941b7a5e9 in scm_call_n () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#8  0x7f8941ae209a in scm_call_2 () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#9  0x7f8941b98752 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#10 0x7f8941b6a88f in scm_c_catch () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#11 0x7f8941ae2e66 in scm_c_with_continuation_barrier () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#12 0x7f8941b69b39 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#13 0x7f8941a400ba in GC_call_with_stack_base () from 
/gnu/store/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#14 0x7f8941b628b8 in scm_with_guile () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#15 0x7f8941a16d7e in ?? () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0
#16 0x7f8941614eff in clone () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
--8<---cut here---end--->8---

linux_destroy is the PedDevice destruction function. The crash occurs
when dereferencing the arch_specific pointer which is ...

--8<---cut here---start->8---
(gdb) p dev
$1 = (PedDevice *) 0x1dc89e0
(gdb) p *dev
$2 = {next = 0x1, model = 0x1645d50 "", path = 0x0, type = PED_DEVICE_UNKNOWN, 
sector_size = 0, phys_sector_size = 1, length = 23272720, open_count = 0, 
read_only = 1, external_mode = 0, dirty = 0, boot_dirty = 0, hw_geom = {
cylinders = 0, heads = 2, sectors = 0}, bios_geom = {cylinders = 23259184, 
heads = 0, sectors = 0}, host = 1, did = 0, arch_specific = 0x0}
(gdb) p dev->arch_specific 
$3 = (void *) 0x0
--8<---cut here---end--->8---

null! I guess this has to deal with device pointer finalizers. I'm a bit
disappointed because I thought we had overcome those mistakes.

Thanks,

Mathieu





bug#53541: [installer] backtrace during fresh Guix System install after during formatting

2022-10-23 Thread Mathieu Othacehe

Hey,

> so the disk partition table is updated because we move from 4 to 2
> partitions. Could it be possible that during a brief period of time the
> /dev/nvme0n1p1 file disappears then re-appears?

Looks like that's what happening. I'm not able to reproduce it on a
VM. I guess that's because my hardware is slower.

Anyway having a few retries of read-partition-uuid fixes it for me. This
is a bit dirty but that's how we usually deal with that kind of
problems. A patch is attached.

Running those tests I experienced a segmentation fault in libparted and
then in libblkid, but that's another story. I'll open a ticket about
that later on.

Thanks,

Mathieu
>From 4407374ff4087772bd8226824cf4883537752f01 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Sat, 22 Oct 2022 22:27:57 +0200
Subject: [PATCH 1/1] installer: parted: Retry failing read-partition-uuid
 call.

Fixes: <https://issues.guix.gnu.org/53541>.

* gnu/installer/parted.scm (read-partition-uuid/retry): New procedure.
(check-user-partitions): Use it.
---
 gnu/installer/parted.scm | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/gnu/installer/parted.scm b/gnu/installer/parted.scm
index fcc936a391..82375d29e3 100644
--- a/gnu/installer/parted.scm
+++ b/gnu/installer/parted.scm
@@ -319,6 +319,25 @@ (define (find-user-partition-by-parted-object user-partitions
   partition))
 user-partitions))
 
+(define (read-partition-uuid/retry file-name)
+  "Call READ-PARTITION-UUID with 5 retries spaced by 1 second.  This is useful
+if the partition table is updated by the kernel at the time this function is
+called, causing the underlying /dev to be absent."
+  (define max-retries 5)
+
+  (let loop ((retry max-retries))
+(catch #t
+  (lambda ()
+(read-partition-uuid file-name))
+  (lambda _
+(if (> retry 0)
+(begin
+  (sleep 1)
+  (loop (- retry 1)))
+(error
+ (format #f (G_ "Could not open ~a after ~a retries~%.")
+ file-name max-retries)))
+
 
 ;;
 ;; Devices
@@ -1108,7 +1127,7 @@ (define (check-uuid)
(need-formatting?
 (user-partition-need-formatting? user-partition)))
(or need-formatting?
-   (read-partition-uuid file-name)
+   (read-partition-uuid/retry file-name)
(raise
 (condition
  (
-- 
2.38.0



bug#52943: Cannot build guix as part of guix system reconfigure after commit 224d437fb4 on aarch64

2022-10-20 Thread Mathieu Othacehe


Hello,

> Also, substitute availability for aarch64-linux and armhf-linux is
> OK. Does that mean this issue can be closed?

The guix package for armhf-linux is not built anymore by
ci.guix.gnu.org, but guix for aarch64-linux seems to be working.

Closing,

Thanks,

Mathieu





bug#55360: bug#58375: Installer does not show what is being downloaded

2022-10-20 Thread Mathieu Othacehe


Hey,

Thanks for having a look!

> I haven’t actually tested the patch but it LGTM.  One thing to check is
> whether ‘terminal-window-size’ returns something sensible for the
> pseudo-terminal; it could be that we need an extra ioctl so the
> pseudo-terminal has the same size as the actual terminal.

Well it returns 0 for all fields, but I tested on several screen sizes
and everything seems fine so I went ahead.

While testing I noticed two new issues though:

1. When the disk is GPT partitionned there is no confirmation page in
  "run-label-page". Something I missed in #57232.

2. When there is an exception in run-external-command-with-handler/tty
for instance, the backtrace page is displayed in the PTY and the
keyboard shortcuts do not work anymore.

I'll address point 1 shortly but could use some advice for point 2.

Thanks,

Mathieu





bug#58375: Installer does not show what is being downloaded

2022-10-14 Thread Mathieu Othacehe

Hey,

> If we really want to capture the output of ‘guix system init’, then we
> need to open a pseudo-terminal with ‘openpty’ & co. instead of ‘pipe’ in
> ‘run-external-command-with-handler’.  That may be relatively easy
> actually.

So I implemented your proposal. It seems to be working quite well. As
discussed on #guix, we could avoid to dump the download bars to the
syslog if the "guix system init" command succeeds. However, it seems
quite tricky in the current implementation where the syslog dumping is
actually a hook (%syslog-line-hook).

Fixing this issue, I also realized that when the "guix system init"
command fails, the user is only offered to resume the installation or
restart it.

In cases where "guix system init" failed because of a network issue, or
because a partition was too small, restarting/resuming seems like the
right thing to do.

However, when the installer failed because "guix system init" crashed or
segfaulted, restarting/resuming won't probably help, and dumping the
crash is probably the best way to get help. That's why I added in a
second patch, a new button "Report the failure" to the
"run-install-failed-page".

Thanks,

Mathieu
>From c6286404e9c4c0dc302c3d398a8f27b050cf4ce0 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Fri, 14 Oct 2022 17:28:27 +0200
Subject: [PATCH 1/2] installer: Run the "guix system init" command in a PTY.

Fixes: <https://issues.guix.gnu.org/55360>

* gnu/installer/utils.scm (run-external-command-with-handler/tty): New
procedure.
(run-external-command-with-line-hooks, run-command): Add a TTY? argument.
* gnu/installer/final.scm (install-system): Call run-command with TTY?
argument set to #true.
---
 gnu/installer/final.scm |  2 +-
 gnu/installer/utils.scm | 50 +
 2 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index 3f6dacc490..044f79372b 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -211,7 +211,7 @@ (define (assert-exit x)
 
  (setenv "PATH" "/run/current-system/profile/bin/")
 
- (set! ret (run-command install-command)))
+ (set! ret (run-command install-command #:tty? #t)))
(lambda ()
  ;; Restart guix-daemon so that it does no keep the MNT namespace
  ;; alive.
diff --git a/gnu/installer/utils.scm b/gnu/installer/utils.scm
index 5fd2e2d425..061493e6a7 100644
--- a/gnu/installer/utils.scm
+++ b/gnu/installer/utils.scm
@@ -20,6 +20,7 @@
 (define-module (gnu installer utils)
   #:use-module (gnu services herd)
   #:use-module (guix utils)
+  #:use-module ((guix build syscalls) #:select (openpty login-tty))
   #:use-module (guix build utils)
   #:use-module (guix i18n)
   #:use-module (srfi srfi-1)
@@ -45,6 +46,7 @@ (define-module (gnu installer utils)
 nearest-exact-integer
 read-percentage
 run-external-command-with-handler
+run-external-command-with-handler/tty
 run-external-command-with-line-hooks
 run-command
 run-command-in-installer
@@ -124,10 +126,37 @@ (define dummy-pipe
 (close-port input)
 (close-pipe dummy-pipe)))
 
-(define (run-external-command-with-line-hooks line-hooks command)
+(define (run-external-command-with-handler/tty handler command)
+  "Run command specified by the list COMMAND in a child operating in a
+pseudoterminal with output handler HANDLER.  HANDLER is a procedure taking an
+input port, to which the command will write its standard output and error.
+Returns the integer status value of the child process as returned by waitpid."
+  (define-values (controller inferior)
+(openpty))
+
+  (match (primitive-fork)
+(0
+ (catch #t
+   (lambda ()
+ (close-fdes controller)
+ (login-tty inferior)
+ (apply execlp (car command) command))
+   (lambda _
+ (primitive-exit 127
+(pid
+ (close-fdes inferior)
+ (let* ((port (fdopen controller "r0"))
+(result (false-if-exception
+ (handler port
+   (close-port port)
+   (cdr (waitpid pid))
+
+(define* (run-external-command-with-line-hooks line-hooks command
+   #:key (tty? #false))
   "Run command specified by the list COMMAND in a child, processing each
-output line with the procedures in LINE-HOOKS.  Returns the integer status
-value of the child process as returned by waitpid."
+output line with the procedures in LINE-HOOKS.  If TTY is set to #true, the
+COMMAND will be run in a pseudoterminal.  Returns the integer status value of
+the child process as returned by waitpid."
   (define (handler input)
 (and
  (and=> (get-line input)
@@ -136,14 +165,17 @@ (defi

bug#53480: i686 ISO image from CI is installing an x86_64 system

2022-10-13 Thread Mathieu Othacehe


Hello,

Fixed with: 84b4216e988ec6791b6df8f894d5a01cbf2e5fa5.

Thanks,

Mathieu





bug#57232: [installer] ENTER in guided partitioner destroys partition table

2022-10-13 Thread Mathieu Othacehe


Hey,

> Fixes: .
>
> * gnu/installer/newt/partition.scm (run-label-confirmation-page): New
> procedure.
> (run-label-page): Call the above procedure before proceeding.

I pushed a slightly edited version of this patch.

Thanks,

Mathieu





bug#55360: bug#58375: Installer does not show what is being downloaded

2022-10-13 Thread Mathieu Othacehe


Hey Ludo!

> I’m not sure it’s a good idea for ‘guix system init’: we’d be logging
> mostly progress bars, package names, and the likes to syslog—not super
> useful.  So I’d suggest not capturing stdout of ‘guix system init’.

In the bug report https://issues.guix.gnu.org/57983 capturing the 'guix
system init' output highlighted a "guix substitute" crash. So it does
seem like a useful mechanism, especially while 56005 is still open.

Now the current situation is also not really acceptable. What about
hiding the "guix system init" output completely and display a progress
bar page instead?

Thanks,

Mathieu





bug#57827: Shepherd 0.9.2 possible regressions

2022-09-24 Thread Mathieu Othacehe

Hey,

> This is fixed by 6abdcef4a68e98f538ab69fde096adc5f5ca4ff4; the log
> contains extra details.

Thanks for fixing it! Turns out we still have an issue on all four
installer tests.

The error messages look like:

--8<---cut here---start->8---
Sep 22 09:49:19 localhost installer[252]: running command ("guix" "system" 
"init" "--fallback" "--no-grafts" "--no-substitutes" "/mnt/etc/config.scm" 
"/mnt") 

Sep 22 09:49:21 localhost installer[252]: guix system: error: read error while 
loading '/mnt/etc/config.scm': /mnt/etc/config.scm:63:51: unexpected ")" 
--8<---cut here---end--->8---

Having a look at the configuration file (attached), it seems that the
problem comes from the edit-configuration-file function. As the
rewritten configuration is shorter than the initial because the comments
are stripped by the pretty-print function, we have some leftovers from
the initial config.

I do not really understand why because call-with-output-file is supposed
to use the O_TRUNC flag and remove the existing content of the
configuration file. Plus, calling edit-configuration-file from a guix
repl does the right thing.

Anyways, I pushed fe4663ae2476cb527d4f1f49ff8fa077d43f7251 which fixes
the issue by removing the file before rewriting it.

Mathieu


configuration_before.scm
Description: Binary data


configuration_after.scm
Description: Binary data


bug#57983: Error installing on a Framework Laptop

2022-09-22 Thread Mathieu Othacehe


Hey,

> Here is a fix for the second issue, as well as a little bonus commit for a
> mistake that made it past my refactoring.

That's awesome. I tested by running an install with "automatic encrypted
partitioning" then killing the "guix init" process. The installer is
able to resume the final step without any issue.

I added a "Partially-Fixes" tag as this patchset fixes one of the two
issues Dan reported, and pushed. This issue is now blocked by
https://issues.guix.gnu.org/56005 resolution.

Thanks,

Mathieu





bug#57983: Error installing on a Framework Laptop

2022-09-22 Thread Mathieu Othacehe


Hello Dan,

Thanks a lot for the bug report.

> The dump was uploaded as installer-dump-11a1087c.

As you are the first one reporting an installer bug using the new dump
mechanism, a small precision:

the dump can be downloaded this way:

--8<---cut here---start->8---
mathieu@meije ~$ wget -qO- dump.guix.gnu.org/download/installer-dump-11a1087c | 
tar xvz
dump.2022-09-21.05.57.10/syslog
dump.2022-09-21.05.57.10/installer-result
dump.2022-09-21.05.57.10/installer-backtrace
dump.2022-09-21.05.57.10/dmesg
--8<---cut here---end--->8---

It looks like the issue is:

--8<---cut here---start->8---
Sep 21 05:45:33 localhost installer[548]: substitute: ^Msubstitute: 
^[[Kupdating substitutes from 'https://ci.guix.gnu.org'...   0.0%Backtrace: 
Sep 21 05:45:33 localhost installer[548]: substitute:   14 
(primitive-load "/gnu/store/hsvz87ld2q231g3pqg62a0bwr4j…") 
Sep 21 05:45:33 localhost installer[548]: substitute: In guix/ui.scm: 
Sep 21 05:45:33 localhost installer[548]: substitute:2263:7 13 (run-guix . 
_) 
Sep 21 05:45:33 localhost installer[548]: substitute:   2226:10 12 
(run-guix-command _ . _) 
Sep 21 05:45:33 localhost installer[548]: substitute: In ice-9/boot-9.scm: 
Sep 21 05:45:33 localhost installer[548]: substitute:   1752:10 11 
(with-exception-handler _ _ #:unwind? _ # _) 
Sep 21 05:45:33 localhost installer[548]: substitute:   1752:10 10 
(with-exception-handler _ _ #:unwind? _ # _) 
Sep 21 05:45:33 localhost installer[548]: substitute: In 
guix/scripts/substitute.scm: 
Sep 21 05:45:33 localhost installer[548]: substitute:763:18  9 (_) 
Sep 21 05:45:33 localhost installer[548]: substitute:348:26  8 
(process-query # _ #:cache-urls _ #:acl _) 
Sep 21 05:45:33 localhost installer[548]: substitute: In guix/substitutes.scm: 
Sep 21 05:45:33 localhost installer[548]: substitute:365:27  7 
(lookup-narinfos/diverse _ _ # …) 
Sep 21 05:45:33 localhost installer[548]: substitute:322:31  6 
(lookup-narinfos "https://ci.guix.gnu.org; _ # _ # _) 
Sep 21 05:45:33 localhost installer[548]: substitute:245:26  5 
(fetch-narinfos _ _ #:open-connection _ # _) 
Sep 21 05:45:33 localhost installer[548]: substitute: In ice-9/boot-9.scm: 
Sep 21 05:45:33 localhost installer[548]: substitute:   1685:16  4 
(raise-exception _ #:continuable? _) 
Sep 21 05:45:33 localhost installer[548]: substitute:   1685:16  3 
(raise-exception _ #:continuable? _) 
Sep 21 05:45:33 localhost installer[548]: substitute:   1780:13  2 (_ 
#< components: (#<> #<…>) 
Sep 21 05:45:33 localhost installer[548]: substitute:   1685:16  1 
(raise-exception _ #:continuable? _) 
Sep 21 05:45:33 localhost installer[548]: substitute:   1685:16  0 
(raise-exception _ #:continuable? _) 
Sep 21 05:45:33 localhost installer[548]: substitute:  
Sep 21 05:45:33 localhost installer[548]: substitute: ice-9/boot-9.scm:1685:16: 
In procedure raise-exception: 
Sep 21 05:45:33 localhost installer[548]: substitute: In procedure 
write_wait_fd: unimplemented 
Sep 21 05:45:33 localhost installer[548]: guix system: error: 
`/gnu/store/hsvz87ld2q231g3pqg62a0bwr4jq5rb6-guix-command substitute' died 
unexpectedly 
Sep 21 05:45:33 localhost installer[548]: command ("guix" "system" "init" 
"--fallback" "/mnt/etc/config.scm" "/mnt") exited with value 1 
--8<---cut here---end--->8---

This is an unfixed bug reported here: https://issues.guix.gnu.org/56005

Then the final phase is retried and fails this way:

--8<---cut here---start->8---
Sep 21 05:57:10 localhost vmunix: [ 1295.546730] /dev/mapper/cryptroot: Can't 
open blockdev
Sep 21 05:57:10 localhost installer[426]: mounting "/dev/mapper/cryptroot" on 
"/mnt/" 
Sep 21 05:57:10 localhost installer[426]: crashing due to uncaught exception: 
system-error ("mount" "mount ~S on ~S: ~A" ("/dev/mapper/cryptroot" "/mnt/" "No 
such file or directory") (2)) 
--8<---cut here---end--->8---

The first issue is most likely transient and you might have better luck
just by retrying. In the meantime I'll try to understand why the final
phase retry fails.

Thanks,

Mathieu





bug#57933: Gtk is unsupported on i686-linux

2022-09-19 Thread Mathieu Othacehe


Hello,

I had the following error while trying out `make release` for
`i686-linux` specifically.

--8<---cut here---start->8---
+ for example in gnu/system/examples/*.tmpl
+ case "$example" in
+ options=
+ guix system -n disk-image gnu/system/examples/desktop.tmpl
accepted connection from pid 17139, user nixbld
guix system: warning: 'disk-image' is deprecated: use 'image' instead
guix system: error: package gvfs@1.50.2 does not support i686-linux
+ rm -f t-guix-system-16656 t-guix-system-error-16656 
/tmp/guix-build-guix-1.3.0.24760-34049.drv-0/t-guix-system-16656/config.scm 
/tmp/guix-build-guix-1.3.0.24760-34049.drv-0/t-guix-system-16656/my-torrc
+ rmdir /tmp/guix-build-guix-1.3.0.24760-34049.drv-0/t-guix-system-16656
FAIL tests/guix-system.sh (exit status: 1)
--8<---cut here---end--->8---

Turns out, Gtk is unsupported on i686-linux which is problematic to
build the desktop.tmpl image.

--8<---cut here---start->8---
mathieu@meije ~/guix [env]$ make -j8 && ./pre-inst-env guix build gtk -s 
i686-linux
gnu/packages/gtk.scm:1182:2: warning: package gtk@4.8.0 does not support 
i686-linux
--8<---cut here---end--->8---

That's because `gst-plugins-bad` and `librsvg-bootstrap` both refer to
`librsvg` which depends on Rust which is only supported on x86_64-linux.

There are other packages in (gnu packages gnome) relying on librsvg
directly. I'm not sure what's our best option here. Use
librsvg-for-system for the entire desktop.tmpl closure?

Thanks,

Mathieu





bug#57928: GNOME Calendar online account issue

2022-09-19 Thread Mathieu Othacehe


Hello,

Since the GNOME upgrade to 42.2 the GNOME Calendar, as well as the
calendar widget of GNOME Shell do not show my appointments from
configured online accounts.

Thanks,

Mathieu





bug#57827: Shepherd 0.9.2 possible regressions

2022-09-16 Thread Mathieu Othacehe


Hey Chris,

> Since empty files is a possibility with wait-for-file, I've sent a patch
> to [1] which prevents the eof issue, plus another change to make it
> easier to debug.

Nice! I wonder if we should maybe block in wait-for-file until the file
has some content. In the cgit/gitile tests, we will keep going while the
pid file is empty so the nginx server is not maybe completely up yet.

> This could be related to the Shepherd upgrade, but only indirectly, as I
> think the failure at least for cgit was also timing dependent.

Yes looks like so.

Thanks,

Mathieu





bug#57827: Shepherd 0.9.2 possible regressions

2022-09-15 Thread Mathieu Othacehe


Hello,

Since Shepherd 0.9.2 the following tests are failing:

* cgit: https://ci.guix.gnu.org/build/1427375/details
* gitile https://ci.guix.gnu.org/build/1427377/details

It seems that an unexpected # object is received on the marionette
socket.

* gui-uefi-installed-os https://ci.guix.gnu.org/build/1431041/details
* gui-installed-os https://ci.guix.gnu.org/build/1431027/details
* gui-installed-os-encrypted https://ci.guix.gnu.org/build/1431040/details
* gui-installed-desktop-os-encrypted 
https://ci.guix.gnu.org/build/1431044/details

It seems that the Shepherd cannot be restarted in the install-system
procedure.

Thanks,

Mathieu





bug#57642: [PATCH] gnu: linux: Fix unnecessary let clause in make-linux-libre.

2022-09-07 Thread Mathieu Othacehe


Hey,

> * gnu/packages/linux.scm (make-linux-libre*)[arguments]:
> Remove unnecessary let clause in 'configure phase.

You should send such patches to "guix-patc...@gnu.org" instead as it is
not a bug report nor correction. Pushed as 2183db8d2 :).

Thanks,

Mathieu





bug#57232: [installer] ENTER in guided partitioner destroys partition table

2022-08-16 Thread Mathieu Othacehe


Hey,

>  “You can change a disk's partition table by selecting it and   pressing
> ENTER.”
>
> Er, I was… expecting that to mean it would pop up a pretty window or
> something.  Is this really a feature?  Should it be?

I'm not sure what's the point here.

> I have to be honest: I was extremely let down by the installer UX, *because* I
> read a lot of the code and can see how much effort went into it.  I hate
> pointing out that the partitioner is at once less useful and more dangerous
> than (system "fdisk").

Many people have contributed to the installer over the years. Your
remark is both harsh and unconstructive. I don't think that the fact
that you, Tobias, are extremely let down matters much to the project.

Calling "fdisk" or "parted" directly would not provide the
auto-partitioning feature, and would be less convenient when it comes to
encryption and partition mount points selection.

It's no secret that the partitioning code can be improved like many
other Guix areas. If you feel like you can refine it, in term of
stability and general UX, we would be glad to get your support.

Mathieu





bug#57215: ci: Fail to evaluate Guix specification

2022-08-16 Thread Mathieu Othacehe


Hey,

> So, I think there's some involvement of grafts that mean you end up
> building things when just trying to compute the derivation. But that's
> as far as I got, I don't really understand why this is the case, or what
> can be done about it.

Thanks for sharing Chris, I tried to disable grafts, but I was not able
to go any further:

--8<---cut here---start->8---
mathieu@berlin ~$ guix pull -s powerpc64le-linux -v3  --no-grafts
Updating channel 'guix' from Git repository at 
'https://git.savannah.gnu.org/git/guix.git'...
Building from this channel:
  guix  https://git.savannah.gnu.org/git/guix.git   0598b5d
building /gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv...
@ unsupported-platform 
/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv 
powerpc64le-linux
while setting up the build environment: a `powerpc64le-linux' is required to 
build `/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv', 
but I am a `x86_64-linux'
builder for 
`/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv' failed 
with exit code 1
build of /gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv 
failed
View build log at 
'/var/log/guix/drvs/m7/ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv.gz'.
cannot build derivation 
`/gnu/store/xd1sxx07rbynhhgb693vdw1rqlqd4a93-bash-minimal-5.1.8.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/wmqmkvgfs3cab3kf269d4p9lkphkwzdm-binutils-2.37.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/8nmr09ik10j36hghg2kil51jkjnj06q6-binutils-cross-boot0-2.37.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/04ngqmmifwqmlmga1ax9sxk09xv131xk-bootstrap-binaries-0.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/f3azd8szmzhxr020jia5072kc5izxjfj-coreutils-8.32.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/gjn725x4h9skb20zgi0pa92vlxgkgavs-diffutils-3.8.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/vp5pwmylb79zni917rk8y6l35v452p6d-diffutils-boot0-3.8.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/av4s8y2s91njxxv23rai319adk65w85z-file-boot0-5.39.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/xfpd2plqyzbrj8i509rdwcl01y3jbfgd-findutils-4.8.0.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/8k74lss8j7gczv24l30vppbsbps3adra-findutils-boot0-4.8.0.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/qa8v3sfgi0832w5q524vkgqw4168f40p-gcc-10.3.0.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/8nq5sgw7qw8flawdza94i8g9ybbq2qmq-glibc-2.33.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/nkph22a5q6y6vwc5lcrn8z207wbdh3an-grep-3.6.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/5k20iw84477c3i2fjadjvx27flwxv2gg-guile-3.0.7.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/kkghmiz85q072fznybdzkhqqp8h0givf-gzip-1.10.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/7rj3x1mpw27bsngmkny8ivwqbkcxzsmb-ld-wrapper-boot0-0.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/1q67mcy2jjvfg0lbmca94inrhqwpi05r-ld-wrapper-boot3-0.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/dpxnqbrjnlj5q8l9mi0g1i51bni61m7v-libgc-8.0.4.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/6pb0ihb8y3r7skh69vdjcxc2n0h2qbx1-libunistring-0.9.10.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/5c8wan41scm0j9dgcmq0rpcsb59n0l9v-linux-libre-headers-5.10.35.drv': 
1 dependencies couldn't be built
cannot build derivation 
`/gnu/store/df9xgziafs0qpr7sxdcrxqbqd07a1h92-make-4.3.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/c0g8jj22ikdr6k53135rfrjj7mv234g1-make-boot0-4.3.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/wxqi4ibkxq2943krxf2xpxvyz0bjz1kx-patch-2.7.6.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/9wg27g47gqz9dr7grx2cv7l00kkm34bs-perl-5.34.0.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/1kq8l3m29nsaifsk1k0g1g0c6sk6ch4g-perl-boot0-5.34.0.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/1apai9zqhdc6zaj2r0fbxqipash5fnzx-sed-4.8.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/kl7934a54ykm9g3majmxlawks1xaz97y-tar-1.34.tar.xz.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/z7jij9k33bl8dm50zrhy97jxqwylx1s8-compute-guix-derivation.drv': 1 
dependencies couldn't be built
guix pull: error: build of 

bug#57229: ‘guix system image’ forces commit authentication?

2022-08-16 Thread Mathieu Othacehe


Hey,

> λ ./pre-inst-env guix system image -t iso9660 
> gnu/system/install.scm --disable-authentication
> guix system: error: disable-authentication: unrecognized option

That's probably a side effect of https://issues.guix.gnu.org/53210 that
includes the "current-guix" inside the installation image instead of the
latest Guix snapshot.

To build "current-guix", we rely on the channel-build-system and
ultimately the latest-channel-instance procedure. This procedure takes an
"authenticate?" argument.

Now passing this option will be tricky as we do not call
latest-channel-instance directly like in (guix scripts pull) but rather
rely on the "current-guix" variable.

I'll think more about it.

Mathieu





bug#57232: [installer] ENTER in guided partitioner destroys partition table

2022-08-16 Thread Mathieu Othacehe

Hey Tobias,

> What that does is immediately and without confirmation wipe the on-disc
> partition table.  And its back-up.

Oops, glad you were able to recover, I was also bitten in the past. The
attached patch adds an extra confirmation page before wiping everything,
WDYT?

Mathieu
>From 4a9c1fb1fe7f9a65b2b7d1f9e4419b1d28a8082e Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe 
Date: Tue, 16 Aug 2022 10:49:07 +0200
Subject: [PATCH 1/1] installer: partition: Add a confirmation page before
 formatting.

Fixes: <https://issues.guix.gnu.org/57232>.

* gnu/installer/newt/partition.scm (run-label-confirmation-page): New
procedure.
(run-label-page): Call the above procedure before proceeding.
---
 gnu/installer/newt/partition.scm | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gnu/installer/newt/partition.scm b/gnu/installer/newt/partition.scm
index e7a97810ac..f11a644f92 100644
--- a/gnu/installer/newt/partition.scm
+++ b/gnu/installer/newt/partition.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2018, 2019 Mathieu Othacehe 
+;;; Copyright © 2018, 2019, 2022 Mathieu Othacehe 
 ;;; Copyright © 2019, 2020 Ludovic Courtès 
 ;;; Copyright © 2020 Tobias Geerinckx-Rice 
 ;;;
@@ -92,6 +92,15 @@ (define (device-items)
  (device (car result)))
 device))
 
+(define (run-label-confirmation-page callback)
+  (lambda (item)
+(and (run-confirmation-page
+  (format #f (G_ "This will create a new ~a partition table, \
+all data on disk will be lost, are you sure you want to proceed?") item)
+  (G_ "Format disk?")
+  #:exit-button-procedure callback)
+ item)))
+
 (define (run-label-page button-text button-callback)
   "Run a page asking the user to select a partition table label."
   ;; Force the GPT label if UEFI is supported.
@@ -103,6 +112,8 @@ (define (run-label-page button-text button-callback)
#:title (G_ "Partition table")
#:listbox-items '("msdos" "gpt")
#:listbox-item->text identity
+   #:listbox-callback-procedure
+   (run-label-confirmation-page button-callback)
#:button-text button-text
#:button-callback-procedure button-callback)))
 
-- 
2.37.1



bug#53463: ci.guix.gnu.org not building the 'guix' job

2022-08-16 Thread Mathieu Othacehe


Hello,

> https://ci.guix.gnu.org/jobset/guix

It is now fixed for the following architectures: x86_64-linux,
i686-linux and aarch64-linux. I'll try to repair it for
powerpc64le-linux soon.

We can close this one I guess.

Thanks,

Mathieu





bug#57215: ci: Fail to evaluate Guix specification

2022-08-16 Thread Mathieu Othacehe


Hello,

Some news on this one: I discovered that the evaluation failure of the
"guix" specification is correlated with the selected systems. When
disabling "aarch64-linux" and "powerpc64le-linux", the evaluation
succeeded.

Now, when running:

--8<---cut here---start->8---
mathieu@berlin ~$ guix pull -s powerpc64le-linux -v3 
Updating channel 'guix' from Git repository at 
'https://git.savannah.gnu.org/git/guix.git'...
Building from this channel:
  guix  https://git.savannah.gnu.org/git/guix.git   0598b5d
building /gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv...
@ unsupported-platform 
/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv 
powerpc64le-linux
while setting up the build environment: a `powerpc64le-linux' is required to 
build `/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv', 
but I am a `x86_64-linux'
builder for 
`/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv' failed 
with exit code 1
build of /gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv 
failed
View build log at 
'/var/log/guix/drvs/m7/ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv.gz'.
--8<---cut here---end--->8---

It fails in the same way with aarch64-linux. That's because there's no
offload or binfmt mechanism on berlin at the moment.

I restored the /etc/guix/machines.scm to enable offloading, and
suddenly "aarch64-linux" system is working again for the "guix"
specification. "powerpc64le-linux" is still failing as we do not have a
machine for that architecture currently available: p9.tobias.gr is
commented out.

Tobias, I'm not sure to remember why?

Now there are multiple unclear points to me:

1. Why do we need an available machine with the foreign architecture to
compute the corresponding "guix" derivation? Note that the evaluation of
package derivations for foreign systems works even though a
corresponding machine is not available:

--8<---cut here---start->8---
mathieu@berlin ~$ guix build -s powerpc64le-linux -d hello
/gnu/store/spzmh79qi21k26p15w27r3jjg95szg17-hello-2.12.1.drv
--8<---cut here---end--->8---

2. Why the following traces are not reported back by the inferior in
charge of the evaluation of the "guix" derivation?

--8<---cut here---start->8---
@ unsupported-platform 
/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv 
powerpc64le-linux
while setting up the build environment: a `powerpc64le-linux' is required to 
build `/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv', 
but I am a `x86_64-linux'
builder for 
`/gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv' failed 
with exit code 1
build of /gnu/store/m7ppw9lb65g99dajwkb56w05zqmydsdh-guile-bootstrap-2.0.drv 
failed
--8<---cut here---end--->8---

3. Why does the "cuirass evaluate" process crashes when an inferior
crashes (see the backtrace in my previous email)?

I'll try to come-up with simple reproducers for those different points.

In the meantime, the "guix" specification is working again for
x86_64-linux, i686-linux and aarch64-linux systems which is a good news.

Thanks,

Mathieu





bug#57215: ci: Fail to evaluate Guix specification

2022-08-15 Thread Mathieu Othacehe


With symbols:

#0  0x7f63a4b15030 in raise () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
#1  0x7f63a4aff526 in abort () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
#2  0x7f63a5091198 in scm_dynstack_unwind_1.isra.0 (dynstack=) at dynstack.c:426
#3  0x7f63a5135728 in scm_dynstack_unwind (base=, 
dynstack=) at dynstack.c:443
#4  abort_to_prompt (thread=0x7f639a182000, saved_mra=) at 
vm.c:1454
#5  0x7f637620b25f in ?? ()
#6  0x7f639cf5bacc in ?? ()
#7  0x7f63806f08a0 in ?? ()
#8  0x0006 in ?? ()
#9  0x7f63a50d3ccc in scm_jit_enter_mcode (thread=0x7f639a182000, 
mcode=0x7f637620b276 "L\213c\030I\213L$\020H\215", ) 
at jit.c:6038
#10 0x7f63a5128f3c in vm_regular_engine (thread=0x7f639a182000) at 
vm-engine.c:360
#11 0x7f63a51365e9 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608
#12 0x7f63a509aa0e in scm_call_with_unblocked_asyncs (proc=0x7f6374320f40) 
at async.c:406
#13 0x7f63a5129336 in vm_regular_engine (thread=0x7f639a182000) at 
vm-engine.c:972
#14 0x7f63a51365e9 in scm_call_n (proc=, argv=, nargs=0) at vm.c:1608
#15 0x7f63a5125be6 in really_launch (d=0x7f6374290240) at threads.c:778
#16 0x7f63a509c85a in c_body (d=0x7f6371e2bd80) at continuations.c:430
#17 0x7f63761b3482 in ?? ()
#18 0x7f639cf5b7e0 in ?? ()
#19 0x7f639b24ae30 in ?? ()
#20 0x0048 in ?? ()
#21 0x7f63a50d3ccc in scm_jit_enter_mcode (thread=0x7f639a182000, 
mcode=0x80d4f4 "\034<\003") at jit.c:6038
#22 0x7f63a5128f3c in vm_regular_engine (thread=0x7f639a182000) at 
vm-engine.c:360
#23 0x7f63a51365e9 in scm_call_n (proc=, argv=, nargs=2) at vm.c:1608
#24 0x7f63a509e09a in scm_call_2 (proc=, arg1=, arg2=) at eval.c:503
#25 0x7f63a5154752 in scm_c_with_exception_handler.constprop.0 (type=0x404, 
handler_data=handler_data@entry=0x7f6371e2bd10, 
thunk_data=thunk_data@entry=0x7f6371e2bd10, thunk=, 
handler=)
at exceptions.c:170
#26 0x7f63a512688f in scm_c_catch (tag=, body=, body_data=, handler=, 
handler_data=, pre_unwind_handler=, 
pre_unwind_handler_data=0x7f639d0e8000) at throw.c:168
#27 0x7f63a509ee66 in scm_i_with_continuation_barrier 
(pre_unwind_handler=0x7f63a509eb80 , 
pre_unwind_handler_data=0x7f639d0e8000, handler_data=0x7f6371e2bd80, 
handler=0x7f63a50a58b0 , 
body_data=0x7f6371e2bd80, body=0x7f63a509c850 ) at 
continuations.c:368
#28 scm_c_with_continuation_barrier (func=, data=) at continuations.c:464
#29 0x7f63a5125b39 in with_guile (base=0x7f6371e2be08, data=0x7f6371e2be30) 
at threads.c:645
#30 0x7f63a4ffc0ba in GC_call_with_stack_base () from 
/gnu/store/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#31 0x7f63a511e16d in scm_i_with_guile (dynamic_state=, 
data=0x7f6374290240, func=0x7f63a5125b70 ) at threads.c:688
#32 launch_thread (d=0x7f6374290240) at threads.c:787
#33 0x7f63a4fd2d7e in ?? () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0
#34 0x7f63a4bd0eff in clone () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6

Mathieu





bug#57215: ci: Fail to evaluate Guix specification

2022-08-15 Thread Mathieu Othacehe


Hello,

> [pid 74060] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=74060, 
> si_uid=997} ---
>
> The evaluation process receives a SIGPIPE when trying to write the
> evaluation result to the pipe connecting it to "cuirass register".

I think this is a wrong lead. The cuirass evaluation process looks like:

   --- pipe B ---> 
inferior I1 
"cuirass register" <--- pipe A ---> "cuirass evaluate"  <--|
   --- pipe C ---> 
inferior I2 

In short, "cuirass register" is a daemon monitoring Git repositories and
triggering sporadic "cuirass evaluate" processes. Those processes use
inferiors to evaluate the new derivations.

The SIGPIPE observed with strace doesn't correspond to pipe A but to
pipe B or C. The issue here is probably that "cuirass evaluate" dies
unexpectedly which causes the inferior I1 or I2 to receive a SIGPIPE
when writing to a pipe which is not listened to anymore.

Stracing the "cuirass evaluate" process shows indeed that it dies
unexpectedly by receiving a SIGABRT signal.

--8<---cut here---start->8---
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pselect6(242, [3 227 241], [], [], NULL, NULL
 ) = ?
+++ killed by SIGABRT +++
--8<---cut here---end--->8---

I tried to use gdb to get more details:

--8<---cut here---start->8---
Thread 28 "guile" received signal SIGABRT, Aborted.
[Switching to LWP 113115]
0x7f890b618030 in raise () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
(gdb) bt
#0  0x7f890b618030 in raise () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
#1  0x7f890b602526 in abort () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
#2  0x7f890bb94198 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#3  0x7f890bc38728 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#4  0x7f88dcce625f in ?? ()
#5  0x7f8903a5eacc in ?? ()
#6  0x7f88e31eb8a0 in ?? ()
#7  0x0006 in ?? ()
#8  0x7f890bbd6ccc in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#9  0x7f890bc2bf3c in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#10 0x7f890bc395e9 in scm_call_n () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#11 0x7f890bb9da0e in scm_call_with_unblocked_asyncs () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#12 0x7f890bc2c336 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#13 0x7f890bc395e9 in scm_call_n () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#14 0x7f890bc28be6 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#15 0x7f890bb9f85a in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#16 0x7f88dcc8e482 in ?? ()
#17 0x7f8903a5e7e0 in ?? ()
#18 0x7f8901d4be30 in ?? ()
#19 0x0048 in ?? ()
#20 0x7f890bbd6ccc in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#21 0x7f890bc2bf3c in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#22 0x7f890bc395e9 in scm_call_n () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#23 0x7f890bba109a in scm_call_2 () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#24 0x7f890bc57752 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#25 0x7f890bc2988f in scm_c_catch () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#26 0x7f890bba1e66 in scm_c_with_continuation_barrier () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#27 0x7f890bc28b39 in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#28 0x7f890baff0ba in GC_call_with_stack_base () from 
/gnu/store/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#29 0x7f890bc2116d in ?? () from 
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#30 0x7f890bad5d7e in ?? () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0
#31 0x7f890b6d3eff in clone () from 
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
--8<---cut here---end--->8---

but the backtrace doesn't help much. I can try to use Guile with symbols
to have a more detailed backtrace.

Anyone knows 

bug#57215: ci: Fail to evaluate Guix specification

2022-08-14 Thread Mathieu Othacehe


Hello,

Since commit 0565cde, Cuirass fails to evaluate the guix
specification. The error messages do not give any clue, for instance:
https://ci.guix.gnu.org/eval/504294/log/raw.

I tried to strace an evaluation process of the guix specification, it
ends this way:

--8<---cut here---start->8---
[pid 74060] write(1, "(values (value (result (((#:job-name . guix.i686-linux) 
(#:derivation . 
\"/gnu/store/ap7sixkbikfw1xib3aisbcaw6i2005m2-guix-a5f199732.drv\") (#:inputs 
\"/gnu/store/4mdqwn29pvjqd6hs1q842g5wzamjy3dk-module-import-compiled.drv\" 
\"/gnu/store/4nv9xf9qyabcw2gz9zvxlq9ymknq7wad-guix-daemon.drv\" 
\"/gnu/store/a6f7jk08bs6clingcw3j8kmz3cbsj6iq-guix-command.drv\" 
\"/gnu/store/fl7hls4j08hp45j32xjjrjc6d49qmbmy-guix-a5f199732-modules.drv\" 
\"/gnu/store/h9aagy9mv2bqg7b8q3k6aksi7fa20qzc-guix-manual.drv\" 
\"/gnu/store/iz11170b9da1i3pkjdv6314swzvzkgvk-guile-3.0.7.drv\") (#:outputs 
(\"out\" . \"/gnu/store/ivs9raslmb77zn0hrcza0wbv3r9lspay-guix-a5f199732\")) 
(#:nix-name . \"guix-a5f199732\") (#:system . \"i686-linux\") 
(#:max-silent-time . 3600) (#:timeout . 18000))\n", 737) = -1 EPIPE (Broken 
pipe)
[pid 74060] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=74060, 
si_uid=997} ---
[pid 74060] futex(0x7f1110e6f330, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 74060] write(1, "(exception (arguments system-error (value 
\"fport_write\") (value \"~A\") (value (\"Broken pipe\")) (value (32))) (stack 
(#f (\"ice-9/boot-9.scm\" 1779 13)) (raise-exception (\"ice-9/boot-9.scm\" 1684 
16)) (newline (#f #f #f)) (#f (\"guix/repl.scm\" 104 8)) 
(with-exception-handler (\"ice-9/boot-9.scm\" 1751 10)) (with-exception-handler 
(\"ice-9/boot-9.scm\" 1746 15)) (#f (\"guix/repl.scm\" 125 7\n", 385) = -1 
EPIPE (Broken pipe)
[pid 74060] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=74060, 
si_uid=997} ---
--8<---cut here---end--->8---

The evaluation process receives a SIGPIPE when trying to write the
evaluation result to the pipe connecting it to "cuirass register".

Thanks,

Mathieu





bug#57180: Graphical Installer Failure

2022-08-14 Thread Mathieu Othacehe


Hello,

Thanks for the report. This looks like to be a manifestation of
https://issues.guix.gnu.org/56005 within the installer.

Mathieu





bug#54483: ‘guix system image’ chokes on host's /var

2022-08-12 Thread Mathieu Othacehe


Hey,

> However, fundamentally, ‘guix system image’ shouldn’t be reading
> /run/current-system/parameters because it has not use for it.
>
> Mathieu, do you happen to have an idea where to remove that
> ‘read-boot-parameters-file’ call?  :-)

Yes, that's because profile-boot-parameters was always evaluated in the
perform-action procedure of the (guix scripts system) module.

This has recently been fixed with
9d30cfa3372847e75038d34c4ea5b8d8b241. Tobias, you can cherry-pick
this patch on top of 1.3.0 if you'd like to generate an old installer
image. I just managed to do so successfully :).

Closing this one,

Thanks,

Mathieu





bug#53210: installer: referring to N-1 guix is problematic.

2022-08-10 Thread Mathieu Othacehe


Hey,

> Let me know if you have comments!

Thanks for taking care of this!

Looks like we have a small regression on 'system-tests and 'guix
specifications:

https://ci.guix.gnu.org/eval/528053/log/raw
https://ci.guix.gnu.org/eval/528056/log/raw

I think this is because channel-source->package is given a raw directory
as source in (gnu ci) while this procedure expects either a channel or a
lowerable object.

Thanks,

Mathieu





bug#54786: Installation tests are failing

2022-08-09 Thread Mathieu Othacehe


Closing as all the installation tests are now fixed.

Thanks to everyone involved :)

Mathieu





bug#57037: Package `guile-newt' cannot be cross-compiled

2022-08-08 Thread Mathieu Othacehe


Hello,

> The `guile-newt' package that is used for the installation UI can't
> be cross-compiled as it tries to load the `newt' dynamic library when
> the Guile code is compiled. I've tried to find a solution/fix but I
> don't know much about how Guile byte-code compilation works.

Fixed with bde902cb78c529174155e2d46ed814123182619f.

> I think this is one of the last remaining bits before being able to
> fully cross compile an installation image.

I think that guile-parted will also be problematic.

Thanks,

Mathieu





bug#55549: Parted 3.5 update breaks installer tests

2022-08-05 Thread Mathieu Othacehe


Reported the problem upstream with:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=56996.

Mathieu





bug#55549: Parted 3.5 update breaks installer tests

2022-08-05 Thread Mathieu Othacehe


Hey Ludo,

> Mathieu, I’m guessing the cause of the problem, when using Parted 4.5,
> is that the installer partitions disks somewhat incorrectly, as reported
> by ‘grub-install’.  I wonder if it might be due to an API or ABI change
> that goes unnoticed in Guile-Parted because it uses the FFI.

This is caused by a regression in Parted 3.5 introduced by
15c49ec04f7eaff014d2e1eddd0aecf4150db63d.

The gpt_partition_set_system call can undo what's done by
gpt_partition_set_flag call. This imposes us to reverse the call order
and make sure that gpt_partition_set_system is called before
gpt_partition_set_flag.

Fixed with: 3c381af76a144a4dc3d0f9269f43ee2ec501b538. I think we can
report that one upstream.

Thanks,

Mathieu





bug#55206: Intermittent ldap test failure

2022-06-24 Thread Mathieu Othacehe


Hey Timotej,

> After increasing the memory limit for the test VM, I was able to run the
> test successfully ten times on two different machines. The patch is
> attached.

Great, that wouldn't be the first time that the RAM limitation causes
strange transient issues.

I'm pushing the proposed patch and closing this bug,

Thanks,

Mathieu





  1   2   3   4   5   6   7   >