Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Cyril Brulebois
Samuel Thibault  (2023-05-10):
> I usually test the netinst image. I'm surprised that netboot use more
> memory, since it's supposed to fetch only what it really needs. And PXE
> is supposed not to change much since in the end it's supposed to be the
> same vmlinuz/initrd as non-pxe.

Oh right, I meant to include some disclaimer about my not knowing anything
about the inner workings of PXE/TFTP-booting, and my not being able to
judge whether it is possible/plausible this could be a factor here.

Given my todo list until Bookworm, it seems unlikely I'll be able to play
with that in the upcoming weeks. Knowing the installer works when bumping
the VM's memory is kind of reassuring at least: it's not completely broken
when started this way.


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Samuel Thibault
Cyril Brulebois, le mer. 10 mai 2023 17:34:53 +0200, a ecrit:
> Moritz Muehlenhoff  (2023-05-10):
> > This turned out to be redux of #932149: Bumping the memory of the
> > netboot-installed VM to 1536M RAM fixed it. There was anectotal
> > evidence of non-netboot installations still succeeding with 1024M, so
> > should we reassign to installation-guide to bump the documented
> > minimum RAM at least for netboot?
> 
> Both netboot and netboot-gtk's mini.iso, booted as a CD, are just fine
> with 1G RAM (modulo cryptsetup OOMKs for a little while, #1028250), and
> have been for years. That's how all my VM testing has been done for
> years. :)
> 
> If numbers are updated for netboot, maybe make it clear it's for PXE
> booting?
> 
> > When debugging the issue is also noticed that
> > rootskel/src/lib/debian-installer/menu currently checks how much RAM
> > is present and if it's less than 250M it exports
> > DEBCONF_DROP_TRANSLATIONS=1 to cdebconf.
> > 
> > Given that we already document 780MB as the minimum requirement for
> > Bullseye that seems obsolete, happy to create MRs to remove it from
> > rootskel and cdebconf to clean this up.
> 
> Looping in Samuel who has been bumping requirements on a regular basis,
> and who is likely to have good ideas in this area.

I usually test the netinst image. I'm surprised that netboot use more
memory, since it's supposed to fetch only what it really needs. And PXE
is supposed not to change much since in the end it's supposed to be the
same vmlinuz/initrd as non-pxe.

I didn't know about rootskel setting DEBCONF_DROP_TRANSLATIONS, that
should probably be coordinated with lowmem's management of low-memory
heuristics.

And at any rate, 1536M RAM looks really a *lot* to me, it really looks to
me like some bug somewhere.

Samuel



Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Cyril Brulebois
Moritz Muehlenhoff  (2023-05-10):
> This turned out to be redux of #932149: Bumping the memory of the
> netboot-installed VM to 1536M RAM fixed it. There was anectotal
> evidence of non-netboot installations still succeeding with 1024M, so
> should we reassign to installation-guide to bump the documented
> minimum RAM at least for netboot?

Both netboot and netboot-gtk's mini.iso, booted as a CD, are just fine
with 1G RAM (modulo cryptsetup OOMKs for a little while, #1028250), and
have been for years. That's how all my VM testing has been done for
years. :)

If numbers are updated for netboot, maybe make it clear it's for PXE
booting?

> When debugging the issue is also noticed that
> rootskel/src/lib/debian-installer/menu currently checks how much RAM
> is present and if it's less than 250M it exports
> DEBCONF_DROP_TRANSLATIONS=1 to cdebconf.
> 
> Given that we already document 780MB as the minimum requirement for
> Bullseye that seems obsolete, happy to create MRs to remove it from
> rootskel and cdebconf to clean this up.

Looping in Samuel who has been bumping requirements on a regular basis,
and who is likely to have good ideas in this area.


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Moritz Muehlenhoff
On Wed, May 10, 2023 at 11:35:14AM +0200, Cyril Brulebois wrote:
> Hallo Moritz,
> 
> And thanks for the report…
> 
> Moritz Mühlenhoff  (2023-05-10):
> > Moritz Muehlenhoff wrote:
> > > call. $MENU is set to '/usr/bin/main-menu' and in fact running
> > > 
> > > "debconf -o d-i /usr/bin/main-menu" tries to emit some output (I can see 
> > > the cursor
> > > moving), but drops back to the shell right away.
> > > 
> > > I'm not familiar with cdebconf, if there's some suggested steps to narrow 
> > > down the
> > > failure further, I'm happy to try them.
> > 
> > Looking at dmesg, there's actually a log entry about steal-ctty segfaulting:
> > 
> > [1.945968] steal-ctty[139]: segfault at 0 ip 7f3c073b9fa0 sp 
> > 7fff38 0)b70 error 4 in libc.so.6[7f3c0730b000+155000] likely on CPU 0 
> > (core 0, socket
> > [1.946977] Code: 2e 04 00 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 
> > 41 5f 84 47 01 00 00 49 89 f4 be 2f 00 00 00 48 89 fb 49 89 45 c8 31 c0 
> > <80> 3f 00 0f
> 
> … and that follow-up. For those not following IRC, I'm wondering whether
> this could be a redux of #932149; that'd be consistent with PXE-booting
> being successful on baremetal, but not with a 1G VM. Moritz will try
> bumping that and will let us know later on.

This turned out to be redux of #932149: Bumping the memory of the 
netboot-installed
VM to 1536M RAM fixed it. There was anectotal evidence of non-netboot 
installations
still succeeding with 1024M, so should we reassign to installation-guide to 
bump the
documented minimum RAM at least for netboot?

When debugging the issue is also noticed that 
rootskel/src/lib/debian-installer/menu
currently checks how much RAM is present and if it's less than 250M it exports
DEBCONF_DROP_TRANSLATIONS=1 to cdebconf.

Given that we already document 780MB as the minimum requirement for Bullseye 
that
seems obsolete, happy to create MRs to remove it from rootskel and cdebconf to 
clean
this up.

Cheers,
Moritz



Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Cyril Brulebois
Hallo Moritz,

And thanks for the report…

Moritz Mühlenhoff  (2023-05-10):
> Moritz Muehlenhoff wrote:
> > call. $MENU is set to '/usr/bin/main-menu' and in fact running
> > 
> > "debconf -o d-i /usr/bin/main-menu" tries to emit some output (I can see 
> > the cursor
> > moving), but drops back to the shell right away.
> > 
> > I'm not familiar with cdebconf, if there's some suggested steps to narrow 
> > down the
> > failure further, I'm happy to try them.
> 
> Looking at dmesg, there's actually a log entry about steal-ctty segfaulting:
> 
> [1.945968] steal-ctty[139]: segfault at 0 ip 7f3c073b9fa0 sp 
> 7fff38 0)b70 error 4 in libc.so.6[7f3c0730b000+155000] likely on CPU 0 
> (core 0, socket
> [1.946977] Code: 2e 04 00 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 
> 5f 84 47 01 00 00 49 89 f4 be 2f 00 00 00 48 89 fb 49 89 45 c8 31 c0 <80> 3f 
> 00 0f

… and that follow-up. For those not following IRC, I'm wondering whether
this could be a redux of #932149; that'd be consistent with PXE-booting
being successful on baremetal, but not with a 1G VM. Moritz will try
bumping that and will let us know later on.


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Moritz Mühlenhoff
Moritz Muehlenhoff wrote:
> call. $MENU is set to '/usr/bin/main-menu' and in fact running
> 
> "debconf -o d-i /usr/bin/main-menu" tries to emit some output (I can see the 
> cursor
> moving), but drops back to the shell right away.
> 
> I'm not familiar with cdebconf, if there's some suggested steps to narrow 
> down the
> failure further, I'm happy to try them.

Looking at dmesg, there's actually a log entry about steal-ctty segfaulting:

[1.945968] steal-ctty[139]: segfault at 0 ip 7f3c073b9fa0 sp 7fff38 
0)b70 error 4 in libc.so.6[7f3c0730b000+155000] likely on CPU 0 (core 0, socket
[1.946977] Code: 2e 04 00 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 
5f 84 47 01 00 00 49 89 f4 be 2f 00 00 00 48 89 fb 49 89 45 c8 31 c0 <80> 3f 00 
0f

Cheers,
Moritz



Bug#1035854: Bookworm netboot image fails in VM

2023-05-10 Thread Moritz Muehlenhoff
Package: installation-reports
Severity: normal

Boot method: network
Image version: netboot daily from 2023-05-09
Date: 2023-05-10

I've successfully tested the Bookworm installer on a few Dell PowerEdge servers 
(with rc1, rc2
and dailies) and it's working fine on baremetal using the netboot image.

As an additional test I also created a VM on a Ganeti cluster based on Ganeti 
3.0.2 cluster (as
provided by Bookworm) using KVM/qemu. This setup has no issues installing 
Bullseye with the
same d-i config.

The system emulated by qemu is a pretty standard pc-i440fx "hardware" model:

---
/usr/bin/qemu-system-x86_64 -name testvm2005.codfw.wmnet -m 1024 -smp 1 
-pidfile /var/run/ganeti/kvm-hypervisor/pid/testvm2005.codfw.wmnet -device 
virtio-balloon -daemonize -D /var/log/ganeti/kvm/testvm2005.codfw.wmnet.log 
-machine pc-i440fx-2.8,accel=kvm -boot n -monitor 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm2005.codfw.wmnet.monitor,server,nowait
 -serial 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm2005.codfw.wmnet.serial,server,nowait
 -usb -display none -cpu IvyBridge,+pcid,+invpcid,+spec-ctrl,+ssbd,+md-clear 
-uuid 3386590c-84b6-4e89-8717-2aa5e05b0d4a -netdev 
type=tap,id=nic-f510f85e-6c55-4c4e,fd=10 -device 
virtio-net-pci,id=nic-f510f85e-6c55-4c4e,bus=pci.0,addr=0xd,netdev=nic-f510f85e-6c55-4c4e,mac=aa:00:00:f2:45:f8
 -qmp 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm2005.codfw.wmnet.qmp,server,nowait
 -qmp 
unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm2005.codfw.wmnet.kvmd,server,nowait
 -device 
virtio-blk-pci,id=disk-d1dd7417-4f71-421e,bus=pci.0,addr=0xc,drive=disk-d1dd7417-4f71-421e
 -drive 
file=/var/run/ganeti/instance-disks/testvm2005.codfw.wmnet:0,format=raw,if=none,aio=threads,id=disk-d1dd7417-4f71-421e,auto-read-only=off
 -S
---

After retrieval and bootup of the TFTPed image, the installer crashes very 
early on and drops
into a busybox shell with the following userspace processes running:

---
147 root  2800 S{debian-installe} /bin/sh /sbin/debian-installer
154 root  3396 S/usr/bin/screen sh -c printf "\033k%s\033\\" install
155 root  4268 R{screen} /usr/bin/SCREEN sh -c printf "\033k%s\033\\
---

Poking at /proc/$PID/cmdline it's running

/usr/bin/SCREEN sh -c printf "\033k%s\033\\" installer ; 
/lib/debian-installer/menu

Looking at /lib/debian-installer/menu I checked that it's not running into any 
memory
shenanigans and it seems it's ultimately failing in the final

exec debconf -o d-i $MENU

call. $MENU is set to '/usr/bin/main-menu' and in fact running

"debconf -o d-i /usr/bin/main-menu" tries to emit some output (I can see the 
cursor
moving), but drops back to the shell right away.

I'm not familiar with cdebconf, if there's some suggested steps to narrow down 
the
failure further, I'm happy to try them.

Cheers,
Moritz