Bug#987008: grub LVM bug #987008: experimental package available, please test
I was not able to use the grub2 build as-is because we are running bullseye and the built image required newer versions of glibc. However, I grabbed the original source code from Debian and the Debian code patch from https://people.debian.org/~anarcat/debian/sid/ and rebuilt it myself with minor tweaks so that it would build in bullseye (changed gcc-12 to gcc-10) and loaded it onto a system. When running the test case Rogier which failed previously (fails within 120 iterations), it is now passing with >1000 iterations.
Bug#987008: Grub fails to find LVM volume after previous LV rename
On Sat, 28 Aug 2021 09:57:35 +0200 Rogier wrote: > Additional information for the benefit of anybody who uses LVM and > grub, and is unsure if or when this problem will affect them: > > Besides the system being rendered unbootable, another symptom of this > problem being acute is that, at that time, update-grub will most > probably(*) print messages of the following kind, one for each LVM > filesystem affected: > > grub-probe: error: disk `lvmid/**------ > **/**------**' not found. > > The immediate workaround, if this problem occurs, would be to make > another modification to the LVM configuration. Then run update-grub > again. For verification, the following command can also be used: > > grub-probe -d /dev/mapper/ -t > fs > > If the problem is acute, the error message should be printed, if it is > not acute, then there should be no error message (no warranties :-). > > Kind regards, > > Rogier. > > > (*) but not if os-prober is disabled for update-grub. > > > I have tested the patch and it fixes the issue we were seeing.
Bug#987008: Grub fails to find LVM volume after previous LV rename
Package: grub2 Version: 2.02+dfsg1-20+deb10u4 -- System Information: Debian Release: 10.8 Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-14-clp-eseries-amd64 (SMP w/4 CPU cores) Kernel taint flags: TAINT_OOT_MODULE Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) lvm2 version: 2.03.02-3 libc6 version: 2.28-10 Dear Maintainer, We are seeing a problem with the latest buster version of grub2 with finding our LVM volume to boot from. Our host has a single VG (named 'system'), with around 10 LVs. One of the LVs is system-root.active which is what we boot from. It contains both the /boot volume as well as the rest of the OS. Also we are using legacy boot on this host. We are finding that if you rename an LV (not the LV we boot from), that on a reboot, we sometimes cannot find the boot LV and end up at grub rescue prompt: error: disk `lvm/system-root.active' not found. Entering rescue mode... grub rescue> For a test, we have a LV which is not in use and we rename it to something else and then reboot. After about 220 iterations of this, we run into this problem. We are able to boot these hosts through a PXE server so that we can get access to the drive which contains this VG. When we do this sort of boot, we are not actually using the VG. When we do this and then simply rename that unused LV again and then reboot, suddenly it will boot up again. We reverted back to grub version 2.02+dfsg1-20+deb10u3 and the problem went away. As part of our investigation of this matter, I have been running grub-bios-setup command after PXE booting. I chroot into the LV system-root.active and run the grub-bios-setup command with appropriate parameters and verbose enabled as we have seen this to also fail due to the same problem. Below is the output from that command when we fail: grub-bios-setup: info: adding `hd0' -> `/dev/sda' from device.map. grub-bios-setup: info: /dev/sda is present. grub-bios-setup: info: Looking for /dev/sda. grub-bios-setup: info: /dev/sda is a parent of /dev/sda. grub-bios-setup: info: /dev/sda is present. grub-bios-setup: info: Looking for /dev/sda. grub-bios-setup: info: /dev/sda is a parent of /dev/sda. grub-bios-setup: info: transformed OS device `/dev/sda' into GRUB device `hd0'. grub-bios-setup: info: reading /boot/grub/i386-pc/boot.img. grub-bios-setup: info: reading /boot/grub/i386-pc/core.img. grub-bios-setup: info: root is `(null)', dest is `hd0'. grub-bios-setup: info: Opening dest. grub-bios-setup: info: drive = 0. grub-bios-setup: info: the size of hd0 is 62533296. grub-bios-setup: info: changing current directory to /dev/mapper. grub-bios-setup: info: /dev/mapper/system-root.active is not present. grub-bios-setup: info: changing current directory to /dev. grub-bios-setup: info: changing current directory to system. grub-bios-setup: info: changing current directory to mapper. grub-bios-setup: info: changing current directory to disk. grub-bios-setup: info: changing current directory to by-label. grub-bios-setup: info: changing current directory to by-uuid. grub-bios-setup: info: changing current directory to by-partuuid. grub-bios-setup: info: changing current directory to by-id. grub-bios-setup: info: changing current directory to by-path. grub-bios-setup: info: changing current directory to block. grub-bios-setup: info: /dev/sda1 is present. grub-bios-setup: info: Looking for /dev/sda1. grub-bios-setup: info: /dev/sda is a parent of /dev/sda1. grub-bios-setup: info: /dev/sda1 starts from 2048. grub-bios-setup: info: opening the device hd0. grub-bios-setup: info: drive = 0. grub-bios-setup: info: the size of hd0 is 62533296. grub-bios-setup: info: drive = 0. grub-bios-setup: info: the size of hd0 is 62533296. grub-bios-setup: info: Scanning for DISKFILTER devices on disk hd0. grub-bios-setup: info: Scanning for mdraid1x devices on disk hd0. grub-bios-setup: info: Scanning for mdraid09 devices on disk hd0. grub-bios-setup: info: Scanning for mdraid09_be devices on disk hd0. grub-bios-setup: info: Scanning for dmraid_nv devices on disk hd0. grub-bios-setup: info: Scanning for ldm devices on disk hd0. grub-bios-setup: info: scanning hd0 for LDM. grub-bios-setup: info: no LDM signature found. grub-bios-setup: info: Scanning for lvm devices on disk hd0. grub-bios-setup: info: no LVM signature found. grub-bios-setup: info: Scanning for DISKFILTER devices on disk hd0. grub-bios-setup: info: Scanning for mdraid1x devices on disk hd0. grub-bios-setup: info: Scanning for mdraid09 devices on disk hd0. grub-bios-setup: info: Scanning for mdraid09_be devices on disk hd0. grub-bios-setup: info: Scanning for dmraid_nv devices on disk hd0. grub-bios-setup: info: Scanning for ldm devices on disk hd0. grub-bios-setup: info: scanning hd0 for LDM.
Bug#926896: sysvinit-utils: pidof is unreliable
I agree - I was just keeping the style consistent. I would argue that it was totally unnecessary to do the check to all of these in the first place. I would also argue that a function should be added which performs this clean up so that it does not have to be repeated multiple times like it is and potentially miss one free (like it did).
Bug#926896: sysvinit-utils: pidof is unreliable
I am not seeing how it would have skipped the zombie processes in the past but I also did not closely review that code.I did see in the comments that skipping those processes was put in place because the stats would sometimes fail. I would argue that this should have been handled at the point where a stat was attempted. I do see a special code path lower in the code for that. Having said all this, I made the change locally to remove the check for zombie and disk access and now the program works every time for me. I think this needs to be reverted and a different fix be put in place for handle the failed stat. Explanations for the other changes in this diff 1) When "pathname" was added to readproc, the spots where the memory is freed was not done consistent with the other frees. There was also one missing spot. 2) Cleaned up a code addition where spaces were used instead of tabs which was not consistent with the rest of the code. 3) There was a block of old code which was commented out. I went ahead and removed it. diff --git a/src/killall5.c b/src/killall5.c index 25b333e..6f45858 100644 --- a/src/killall5.c +++ b/src/killall5.c @@ -503,7 +503,7 @@ int readproc(int do_stat) if (p->argv0) free(p->argv0); if (p->argv1) free(p->argv1); if (p->statname) free(p->statname); - free(p->pathname); + if (p->pathname) free(p->pathname); free(p); } plist = NULL; @@ -552,7 +552,7 @@ int readproc(int do_stat) if (p->argv0) free(p->argv0); if (p->argv1) free(p->argv1); if (p->statname) free(p->statname); - free(p->pathname); + if (p->pathname) free(p->pathname); free(p); continue; } @@ -568,19 +568,12 @@ int readproc(int do_stat) /* Get session, startcode, endcode. */ startcode = endcode = 0; -/* - if (sscanf(q, "%*c %*d %*d %d %*d %*d %*u %*u " + if (sscanf(q, "%10s %*d %*d %d %*d %*d %*u %*u " "%*u %*u %*u %*u %*u %*d %*d " "%*d %*d %*d %*d %*u %*u %*d " "%*u %lu %lu", - >sid, , ) != 3) { -*/ -if (sscanf(q, "%10s %*d %*d %d %*d %*d %*u %*u " -"%*u %*u %*u %*u %*u %*d %*d " -"%*d %*d %*d %*d %*u %*u %*d " -"%*u %lu %lu", -process_status, ->sid, , ) != 4) { + process_status, + >sid, , ) != 4) { p->sid = 0; nsyslog(LOG_ERR, "can't read sid from %s\n", @@ -589,31 +582,19 @@ int readproc(int do_stat) if (p->argv0) free(p->argv0); if (p->argv1) free(p->argv1); if (p->statname) free(p->statname); - free(p->pathname); + if (p->pathname) free(p->pathname); free(p); continue; } if (startcode == 0 && endcode == 0) p->kernel = 1; fclose(fp); -if ( (strchr(process_status, 'D') != NULL) || - (strchr(process_status, 'Z') != NULL) ){ - /* Ignore zombie processes or processes in - disk sleep, as attempts - to access the stats of these will - sometimes fail. */ - if (p->argv0) free(p->argv0); - if (p->argv1) free(p->argv1); - if (p->statname) free(p->statname); - free(p); - continue; -} } else { /* Process disappeared.. */ if (p->argv0) free(p->argv0); if (p->argv1) free(p->argv1); if (p->statname) free(p->statname); - free(p->pathname); + if (p->pathname) free(p->pathname); free(p);
Bug#926896: sysvinit-utils: pidof is unreliable
We have also been experiencing this problem since moving to Buster. We never saw this with Jessie. I believe it comes down to the following code in readproc: if ( (strchr(process_status, 'D') != NULL) || (strchr(process_status, 'Z') != NULL) ){ /* Ignore zombie processes or processes in disk sleep, as attempts to access the stats of these will sometimes fail. */ if (p->argv0) free(p->argv0); if (p->argv1) free(p->argv1); if (p->statname) free(p->statname); free(p); continue; } Our scenario is similar although not with the same process that is described below.It seems like this makes pidof not as effective for finding the pid of a process if D states are skipped. Are there alternatives to pidof? (BTW the -z option does not appear to help since this code snippet will remove it from the process list). -Original Message- From: Thorsten Glaser Sent: Tuesday, October 22, 2019 6:26 PM To: jsm...@resonatingmedia.com; 926...@bugs.debian.org Subject: Bug#926896: sysvinit-utils: pidof is unreliable NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. On Tue, 22 Oct 2019, Jesse Smith wrote: > >any ideas how it could be possible for process to be discovered by > >ps(1), but not pidof(1)? > I can think of a few possibilities, though they seem unlikely. One is > that the process could be crashing and restarting, making it a zombie or in D state, doing disc I/O… more likely even. > for brief periods of time. Testing pidof with the "-z" flag would fill > in the "holes" in the test output if that theory is correct. Also: start-stop-daemon --status should probably be used ipv pidof. (Even its manpage says so.) bye, //mirabilos -- tarent solutions GmbH Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/ Tel: +49 228 54881-393 • Fax: +49 228 54881-235 HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg -- To unsubscribe, send mail to 926896-unsubscr...@bugs.debian.org.