Fwd: Also observing #988477
I should have Cc'd debian-kernel@lists.debian.org, but failed to do so. As such now forwarding a copy. At the very least this involves the Linux MD-RAID1 functionality, but I am unsure whether this is a Linux kernel bug versus a Xen bug. Forwarded: I am also observing #988477 occur. This machine has a AMD Zen 4 processor. The first observation was when motherboard/processor was swapped out, the older motherboard/processor was several generations old. The pattern which is emerging is Linux MD RAID1 plus recent AMD processor which has full IOMMU functionality. The older machine was believed to have an IOMMU, but the BIOS wasn't creating appropriate ACPI tables (IVRS) and thus Xen was unable to utilize it. This seems to be occuring with a small percentage of write operations. Subsequent read operations appear to be fine. I am not convinced this is a Xen bug. I suspect this is instead a bug in the Linux MD subsystem. In particular if the DMA interface was designed assuming only a single device would ever access any page, but the MD RAID1 driver is reusing the same page for both devices. IOMMU page release could be handled by marking the page unused in a device data structure and later removed by sweeping a table. In such case if the MD-RAID1 driver was to redirect the page to another device between these two steps, the entry for a subsequent device could be wiped out when trying to invalidate an entry for a prior device. Anyway, I'm also observing bug #988477. This could also be a kernel bug. So far no crashes/confirmed data loss have occured, but sweeping the mirror does turn up small numbers of inconsistencies. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1049450: New rpc.mountd rejects -N 2 option
On Wed, Aug 16, 2023 at 08:57:16AM +0200, Salvatore Bonaccorso wrote: > > On Tue, Aug 15, 2023 at 04:13:59PM -0700, Elliott Mitchell wrote: > > Package: nfs-kernel-server > > Version: 1:2.6.2-4 > > > > Hopefully SSIA. > > > > `rpc.mountd` has a -N option to disable versions of NFS. > > > > I had been previously using "-N 2", but that is now broken. The error > > message was quite non-helpful ("nfsd2" if I recall correctly). Upon > > removing "-N 2", luckily NFSv2 didn't get enabled, but this was still > > annoying to deal with. At worst using a deprecated setting should merely > > generate a warning. > > Removal of NFSv2 support was documented with a Debian NEWS entry for > 1:2.6.1-1~exp1, cf. #1006650. > > nfs-utils (1:2.6.1-1~exp1) unstable; urgency=medium > > Support for NFSv2 has been removed from nfs-kernel-server. It was > previously disabled by default, but still available. > > -- Ben Hutchings Sun, 13 Mar 2022 19:05:02 +0100 Removing NFSv2 support shouldn't invalidate "-N 2". "-N 2" is supposed to disable NFSv2 at runtime, as such removing all NFSv2 support should merely render "-N 2" 100% redundant and at worst produce a warning. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1049450: New rpc.mountd rejects -N 2 option
Package: nfs-kernel-server Version: 1:2.6.2-4 Hopefully SSIA. `rpc.mountd` has a -N option to disable versions of NFS. I had been previously using "-N 2", but that is now broken. The error message was quite non-helpful ("nfsd2" if I recall correctly). Upon removing "-N 2", luckily NFSv2 didn't get enabled, but this was still annoying to deal with. At worst using a deprecated setting should merely generate a warning. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1034811: linux: consider CONFIG_HW_RANDOM_VIRTIO=n
Package: src:linux Version: 6.0.3-1~bpo11+1 Severity: wishlist Looks like someone had the idea of a virtualized HW RNG. Yet looking at the kernel source, there isn't a single actual implementation. Unless I'm missing something, having CONFIG_HW_RANDOM_VIRTIO simply wastes processor time during build and enlarges the package for no gain. Perhaps time for Debian to quit packaging this used idea? Looks like on-processor HW RNGs are what are taking over. Possibly also the HW RNG from the vTPM implementation. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1034463: closing 1034463
On Sun, Apr 16, 2023 at 07:08:03AM +0200, Salvatore Bonaccorso wrote: > CONFIG_AGP is built-in in Debian, in particular for: > > debian/config/alpha/config:CONFIG_AGP=y > debian/config/amd64/config:CONFIG_AGP=y > debian/config/hppa/config.parisc64:CONFIG_AGP=y > debian/config/ia64/config:CONFIG_AGP=y > debian/config/kernelarch-powerpc/config:CONFIG_AGP=y > debian/config/kernelarch-x86/config:CONFIG_AGP=y I hadn't checked all architectures, but was well-aware it is built-in for amd64. I was suggesting it should change from being built-in to being a module. The reason being AGP is very rare on amd64 motherboards. According to the handy reference, AGP was starting to disappear just as amd64 hardware started hitting the market. I'm unsure where other architectures stand on the issue. Yet amd64 it shouldn't be built-in. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1034463: linux: consider CONFIG_AGP=m
Package: src:linux Version: 5.10.158+2 Severity: wishlist Could AGP support be turned into a module for Debian kernels? I'm tempted to suggest it shouldn't even be built for amd64, but does seem reasonable for i686 kernels. Given this, module seems to make sense. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1009793: linux-source 5.10.106-1 changes block device order
Package: src:linux Version: 5.10.106-1 Between 5.10.103-1 and 5.10.106-1 (image -13) something changed which reliably causes what used to show as /dev/sda to show as /dev/sdb. Other block devices plugged into the SCSI subsystem may have swapped around, but I've yet to untangle the others. A few utilities are still sensitive to block device order and this causes issues for those. Nothing on the hardware explains this. The controller thinks the device has a lower number, the device should respond much faster. The lowest level is the cciss driver. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: (Presently) Not in 5.10 source
Having finally gotten to test this, the issue does NOT effect 5.10.70-1. So far I've only gotten to try reboot, but that went fine. Might have been an ACPI or Xen mismerge into 4.19. Alas this may simply disappear into history. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#996608: linux-source-5.10: Mising dependency: dwarves
Package: linux-source-5.10 Version: 5.10.70-1 SSIA. Debian's 5.10 configuration will NOT build without the "dwarves" package (`pahole`). In light of this some package, likely linux-source-5.10 should recommend "dwarves". -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: Simply ACPI powerdown/reset issue?
On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote: > I presume you are suggesting I try booting 4.19.181-1 on the > current version of Xen-4.14 for bullseye as a dom0. I am not > inclined to try it until an official Debian developer endorses > your opinion that the bug I am seeing is distinct > from #991967, at which point I will report the bug I am > seeing as a new bug. Chuck Zmudzinski you are getting rather close to my threshold for calling harrassment. You're not /quite/ there, but I'm concerned. Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: > > On 9/20/21 7:39 PM, Diederik de Haas wrote: > > On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: > >> Merely having the path is a sufficiently strong indicator for me to > >> simply wave it past. I though would suggest Debian should instead > >> cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. > >> > >> This is available as a patch at: > >> > >> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 > > You probably then also want the following commit, which is a fix on that > > patch: > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b > > > > Found that via the following url/query: > > https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI > > > > I don't know whether others should be used from that as well. > > I tried these two commits (adapted for the xen-4.14 branch) but this > approach did not fix the bug - with these patches applied the dom0 > did not power down. > > My advice for the Debian Xen Team is to consult with upstream and > get their advice on whether or not it is advisable for Debian to > retain the patches from the Xen-4.16 branch that have been > added to the Debian 4.14 package in an attempt to support > some arm devices that panic during on an unpatched Xen-4.14. > If upstream cannot help Debian backport fixes for arm panics > from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian > Xen team should remove aggressive patches that really have now > turned the Debian Xen-4.14 package into a Frankenstein version > that is a mixture of Xen-4.14 and Xen-4.16, and decide that support > for those arm devices must wait until Debian gets Xen 4.16 up > and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Mon, Sep 20, 2021 at 06:29:49PM -0400, Chuck Zmudzinski wrote: > On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: > > > > On 9/20/21 12:27 AM, Elliott Mitchell wrote: > >> On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > >> > >>> I suspect the following patch is the culprit for problems > >>> shutting down on the amd64 architecture: > >>> > >>> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > >>> This patch does affect amd64 acpi code, and is probably causing > >>> the problem on my amd64 system, so my build of the xen-4.14 > >>> hypervisor without this patch fixed the problem. > >> Of the ones listed that is the only one which has any overlap with x86 > >> code.?? The next reproduction step is `apt-get source xen && > >> patch -p1 -R < > >> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > >> && dpkg-buildpackage -b`.?? Then try with this to confirm that patch > >> is what does it. > >> > >> Thing is that delta is rather small.?? I don't have a simulator, but that > >> is rather small to be the culprit. > > > > I just tested the build with > > patch -p1 -R < > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > applied before building the package and I can confirm that this is the > > patch > > causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch > > fixes it on my amd64 system. But this would probably break the arm build. > > > > I think one possible fix would require modifying > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > so it only applies at runtime to the arm architecture. I will try some > > modifications to the patch instead of removing it, and if I get something > > that works on amd64 and also might work on arm, I will post it > > for Elliott to try. > > I have an encouraging result. I found a very simple patch > to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff > bug on my system and it should not affect the arm patches > at all: > -- > This patch partially reverts previous patch > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > This hopefully fixes #911976 > > --- a/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:49:08.0 -0400 > +++ b/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:25:05.572038000 -0400 > @@ -46,10 +46,6 @@ > if ((phys + size) <= (1 * 1024 * 1024)) > ?? return __va(phys); > > -?? /* No further arch specific implementation after early boot */ > -?? if (system_state >= SYS_STATE_boot) > -?? ?? return NULL; > - > offset = phys & (PAGE_SIZE - 1); > mapped_size = PAGE_SIZE - offset; > set_fixmap(FIX_ACPI_END, phys); > -- > > Can you try this patch to src:xen and see if your > arm devices are OK with it? Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 The other commit I would suggest being picked by src:xen is 5a4087004d1adbbb223925f3306db0e5824a2bdc This is for device-tree funkiness which got added between linux-5.10.0 and linux-5.10.y (if the Debian kernel team wants to maintain a fix in Debian's kernel source, that works too). BTW have I mentioned I've become rather skeptical of device-trees being a usable way of representing hardware information? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) > > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. You're referencing several software versions which are mismatches for #991967. #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3, but not Linux kernel 4.19.181. The fact it correlates with a Linux kernel update rather strongly points to the Linux kernel. I could believe the situation is partially the fault of both though. > I suspect the following patch is the culprit for problems > shutting down on the amd64 architecture: > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > This patch does affect amd64 acpi code, and is probably causing > the problem on my amd64 system, so my build of the xen-4.14 > hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. > I think this bug should be re-classified as a bug in src:xen. There could be a separate bug in src:xen, but that is not #991967. > I also would inquire with the Debian Xen Team about why they > are backporting patches from the upstream xen unstable > branch into Debian's 4.14 package that is currently shipping > on Debian stable (bullseye). IMHO, the aforementioned > patches that are not in the stable 4.14 branch upstream > should not be included in the xen package for Debian stable. It was requested since someone trying to have Xen operational on a device needed those for operation. Rather a lot of bugfix or very small standalone feature patches get cherry-picked. Presently I haven't been convinced this is a Xen bug (though it does effect Xen installations). Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux kernel? I'm suspecting got incorrectly backported on the Linux side (alternatively the Xen project seems a bit poor at keeping needed patches in Linux). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso > wrote: > > > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > > An experiment lead to a potential alternative explanation for #991967. > > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > > was tried on a UEFI system and the issue wasn't observed) > > > > Following up on https://bugs.debian.org/991967#12 > > > > Did you succeeded in bisecting the issue as you seem to have it > > reproducible? > > I noticed this bug on bullseye ever since I have been > running bullseye as a dom0, but my testing indicates > there is no problem with src:linux but the problem > appeared in src:xen with the 4.14 version of xen on > bullseye. > > I ask Elliott if you are only seeing the problem on Debian's > xen-4.14 hypervisor? Also, which architecture, arm or > amd64? I only see the problem on the Debian xen-4.14 > hypervisor, and I have only tested on amd64, and I > have found a fix for my amd64 system which is as > follows: > > Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, > with a Haswell CPU (core i5-4590S) > > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) Nope. As per the report the problem appeared with kernel 4.19.194-3 and at the time using Xen 4.11. The kernel you're listing is rather more recent, which might suggest a patch which had been backported from 5.x to 4.19. I could believe a Xen security update being the trigger though (I don't recall there being one at the right time, but I wouldn't rule it out). > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. Just to make sure, the kernel you were testing was 4.19.194-3? The issue didn't manifest with kernels earlier than that. Could be we're seeing distinct bugs. > This patch does affect amd64 acpi code, and is probably causing > the problem on my amd64 system, so my build of the xen-4.14 > hypervisor without this patch fixed the problem. While that commit modifies the code path the processor takes, the modified path appears identical. > I also would inquire with the Debian Xen Team about why they > are backporting patches from the upstream xen unstable > branch into Debian's 4.14 package that is currently shipping > on Debian stable (bullseye). IMHO, the aforementioned > patches that are not in the stable 4.14 branch upstream > should not be included in the xen package for Debian stable. Some people are asking for those. Those are bugfixes for an extremely popular device which panics on boot without the patches. Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees were modified in a way which broke Xen 4.14 on ARM64. The change violated Linux's own standards for device-trees, yet still appeared in a stable branch. In other news, if you see device-trees compared to ACPI tables, they're not very comparable. 99% of ACPI tables work for all versions of all OSes. Any given device-tree is only likely to work for a single version of a single OS. While a useful abstraction for portions of kernel code, device-trees are utter garbage compared to ACPI tables. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sat, Sep 11, 2021 at 01:29:12PM +0200, Salvatore Bonaccorso wrote: > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > An experiment lead to a potential alternative explanation for #991967. > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > was tried on a UEFI system and the issue wasn't observed) > > Following up on https://bugs.debian.org/991967#12 > > Did you succeeded in bisecting the issue as you seem to have it > reproducible? Problem is that is rather a lot of kernel builds, which also means a lot of downtime... Right now distribution update seems worthy of greater attention. The one notable bit is the one I sent in the last message. The system does NOT have UEFI, and a test system with UEFI seemed to have no problem. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
An experiment lead to a potential alternative explanation for #991967. The issue may be ACPI (non-UEFI) powerdown/reset was broken at 4.19.194-3. Presence of Xen on the system may be unrelated. Failing that, it could be Xen and non-UEFI systems are effected. (Xen was tried on a UEFI system and the issue wasn't observed) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: linux-src 4.19.194-3 breaks Xen Dom0 powerdown and reboot
Package: src:linux Version: 4.19.194-3 Control: affects -1 src:xen SSIA. Previous versions of 4.19 had no issues (4.19.181-1 according to notes), but this cropped up with 4.19.194-3 (-1 and -2 weren't tested). When a Xen domain 0 tries to reboot or powerdown the computer, it hangs with the display off, but the power supply is active. I'm rebuilding from source, so I imagine this also effects linux-image-4.19.0-17-amd64. Seems .194 caused multiple problems for Xen given 990642. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939633: More severe #939633 for RP4 on 5.8?
found 935456 5.9.6-1~bpo10+1 quit After having spent several hours on kernel compiles and experimenting with the situation, I'm fairly sure this also applies to linux-source-5.9. Odd thing is, when I booted the device using the Tianocore implementation it came right up with no problems. I'm getting this odd suspicion someone deliberately broke the device-trees in Debian's kernel source. The goal being to force everyone onto the Tianocore/ACPI implementation and try to kill device-trees. Right now I think this is conspiracy theory territory, but I'm left wondering how such a serious bug could hang around so long... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#923814: Does #923814 dup of #906225?
On Wed, Nov 25, 2020 at 02:30:30PM -0800, Elliott Mitchell wrote: > The kernel versions are quite different, but #923814 reads suspiciously > like it is a duplicate of #906225. On double-checking, hit the wrong follow-up address. I was wanting to advise the maintainers these two looked to potentially be the same bug... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939633: More severe #939633 for RP4 on 5.8?
found 939633 5.8.10-1~bpo10+1 severity 939633 important merge 935456 939633 quit I'm left suspecting bugs #935456 and #939633, are in reality a single bug: Raspberry Pi device trees were garbled during Debian's 5.2 kernel development. They appear to remain very garbled, to the point of being pretty well useless. I've built a kernel from Debian's 5.8 kernel source and the device tree binary produced doesn't appear to allow a Raspberry PI 4B to complete its boot. Might be USB functionality is operational, but neither ethernet interface nor display function. Ironically, the additional ACPI/EFI support DOES function. This means the Tianocore image for Raspberry PI 4B works better with the current source. I'm unsure whether badly breaking all Raspberry PI variants quite justifies critical or grave (popular machine, but kernel issues by nature cause 10x the damage so severities should be somewhat damped). I certainly hope to see the 5.9 release since that has additional high-value improvements... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#965049: linux-source-5.6 build issues for ARM64
On Tue, Jul 14, 2020 at 08:20:29PM -0700, Elliott Mitchell wrote: > I'm speculating the build may work if I run the correct rule, but I > haven't yet identified that. To make things easier for others, "all" was sufficient. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#965049: linux-source-5.6 build issues for ARM64
Package: src:linux Version: 5.6.14-2~bpo10+1 Severity: important I'm guessing this is isolated to ARM64 targets as I don't see other reports. I'm having difficulty trying to taget "bindeb-pkg" with linux-source-5.6. During the initial phase build was terminating quickly, complaining about missing System.map. I managed to work around this via `make vmlinux modules`. Now I'm to the error "cp: cannot stat 'arch/arm64/boot/Image.gz': No such file or directory" I'm speculating the build may work if I run the correct rule, but I haven't yet identified that. Kind of feels like all dependancies got lost for ARM64 targets. This may not warrant grave severity as some architectures build, but if you're on ARM64 there is a major problem. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)
On Mon, Jun 15, 2020 at 10:50:35AM -0400, J. Bruce Fields wrote: > Honestly I don't think I currently have a regression test for this so > it's possible I could have missed something upstream. I haven't seen > any reports, though > > ZFS's ACL implementation is very different from any in-tree > filesystem's, and given limited time, a filesystem with no prospect of > going upstream isn't going to get much attention, so, yes, I'd need to > see a reproducer on xfs or ext4 or something. Salvatore managing to reproduce it with ext4 yet all prior reports with the filesystem used being known was ZFS seems to suggest one of two things. First, could be enabling POSIX ACLs has been very strongly pushed by other filesystems, while ZFS hasn't pushed them as strongly. Second, could be a substantial majority of users of NFS are using ZFS. If the former, this simply means an additional test case is needed. If the latter, then any testing of NFS which excludes ZFS is going to have underwhelming coverage. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)
On Sat, Jun 13, 2020 at 02:54:31PM +0200, Salvatore Bonaccorso wrote: > indicated this was specifically observed on ZFS on Linux only. Seth > Arnold's answer seem to be inline with that that the issue is more on > the ZFS on Linux side and the issue keeps biting people a bit > unexpectedly. Why does this break with ACL off settings? I disagree with this assessment. All of the reporters have been using ZFS, but this could indicate an absence of testers using other filesystems. We need someone with a NFS server which has a 4.15+ kernel and uses a different filesystem which supports ACLs. I'm though doubtful ACLs are related to the actual problem. My impression of what I've read is they're a useful tool to work around the problem, but not related to the actual cause. > But there was at least one other (but again without further > detail/followups) that it was observed on an export from OpenWRT, but > no specific details here: > > https://bugs.openwrt.org/index.php?do=details_id=2581 This appears to be the same reporter as the RedHat bug report (comment 3 on the RedHat report). This is a report for the server portion of the reporter's setup. Analyzing the setup, I disagree with one of the prior assessment of this report. This is OpenWRT on x86_64 hardware which would suggest a high-end router or embedded device. Such might well have ECC memory and a processor fast enough to handle ZFS. Let me add one more data point. I had been thinking I might need the additional features in Linux-ZFS 0.7.12. As such my NFS server had been running a 4.9 kernel with Debian's ZFS 0.7.12-2+debg10u1~bpo9+1 packages. Now with the problem manifesting my NFS server is running a 4.19 kernel with Debian's ZFS 0.7.12-2+deb10u2 packages. I could well believe the actual root cause is a problem with the Linux-ZFS implementation. What manifested the problem though seems to be in Linux's NFS implementation between 4.9 and 4.15. ie Linux-ZFS implemented /something/ which worked when implemented, but may not have properly implemented the intended API and was broken by Linux-NFS. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#962254: NFS(v4) broken at 4.19.118-2
Bit more experimentation on this issue. I tried a very small C program meant to create files with fewer permissions bits set. This succeeded which strengthens the theory of the umask getting ignored. I haven't seen anything hinting whether this is more a client or server issue. I can speculate perhaps somewhere between 4.9 and 4.15 the NFS client code stepped closer to proper the "proper" 4.2 protocol. If a corresponding NFS server was slow at getting merged, what we're seeing could happen. Alternatively someone was trying to get a Linux NFS v4.2 client to work better with a different NFS v4.2 server, so they fixed Linux's NFS v4.2 client. Yet they failed to test with Linux's v4.2 server. This though is speculation. All I can say is sometime between kernels 4.9 and 4.15, NFS v4.2 got broken. There are hints this is related to handling of umask. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#934160: Bug#962254: NFS(v4) broken at 4.19.118-2
Control: tags 962254 +security -unreproducible Control: severity 962254 grave On Fri, Jun 05, 2020 at 08:36:31PM +0200, Salvatore Bonaccorso wrote: > This now let some rings bell, the described scenario is very similar > to what was reported in https://bugs.debian.org/934160 > > Respectively > https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736 and > https://bugzilla.redhat.com/show_bug.cgi?id=1667761 . Upon more experimentation I continue to favor this being a kernel bug (src:linux, bug #962254) and not a bug with nfs-common. Setting vers=4.1 works around the issue, so this is *strictly* NFSv4.2. I was able to reproduce this issue on a system with nfs-common 1:1.3.4-2.1 and a 4.19.118-2 kernel. Based upon what I've observed I believe this requires a recent kernel on *both* NFS client and NFS server. A NFS client with 4.9 connecting to a NFS server with 4.19 does NOT experience this issue. I suspect my earlier assessment of this appearing between 4.19.98-1 and 4.19.118-2 was erroneous. I think I was mislead by the order of computers being updated, and a NFS client with 4.19 connecting to a NFS server with 4.9 also does not experience this issue. >From https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736 this bug appeared somewhere between Linux kernels 4.9 and 4.15. I concur with John Goerzen's assessment of this qualifying as grave due to its security implications. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#934160: Bug#962254: NFS(v4) broken at 4.19.118-2
I've run into a problem which produces the same behavior as bug #934160, but attributed it elsewhere due to other observations. What are the version(s) of the Linux kernel being used on your server and clients? I've confirmed using a 4.9 kernel on a client instead of a 4.19 kernel also works around this issue. In fact one client using a kernel from 4.19.98+1+deb10u1 source doesn't display the issue, but one using 4.19.118+2 source does. This timeframe though doesn't match when you reported the issue. Could be there are several things working together to cause this. I haven't yet tried tried using NFS version 4.1, instead of 4.2. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#962254: NFS(v4) broken at 4.19.118-2
On Fri, Jun 05, 2020 at 08:36:31PM +0200, Salvatore Bonaccorso wrote: > This now let some rings bell, the described scenario is very similar > to what was reported in https://bugs.debian.org/934160 > > Respectively > https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736 and > https://bugzilla.redhat.com/show_bug.cgi?id=1667761 . Those do indeed seem similar and could be the same bug, but attributing the bug to a distinct package. Alternatively this is several bugs and *all* of them need to be present for the issue to occur. Seems I'll need to do some checking of the VM with the earlier kernel and see which updates cause it to break... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#962254: NFS(v4) broken at 4.19.118-2
On Fri, Jun 05, 2020 at 08:44:26AM +0200, Salvatore Bonaccorso wrote: > > On Thu, Jun 04, 2020 at 10:16:07PM -0700, Elliott Mitchell wrote: > > Somewhere between linux-image-4.19.0-8-amd64/4.19.98+1+deb10u1 and > > linux-image-4.19.0-9-amd64/4.19.118+2 NFS, in particular v4 got broken. > > Mounting an appropriate filesystem became unreliable, and once mounted > > behavior is unpredictable. > > > > In particular in the problematic case `umask 022 ; touch foo ; ls -l foo` > > yields a -rw-rw-rw- file. > > > > This occurs if *both* the server *and* client are on 4.19.118+2. I have > > confirmed this does NOT occur if the server is on a 4.9 kernel. I have > > also confirmed this does NOT occur if the client is on a 4.9 or > > 4.19.98+1+deb10u1 kernel. > > I cannot reproducde the described behaviour. Can you give more details > on your setup? > > How do you export the filesystem? > What is the underlying filesystem exported? > How and whith which options do clients mount the NFS share? Presently it is a whole directories being exported to hosts. The filesystem on the server is ZFS. Client is mounting hard,intr. Client is using cachefilesd, but that appears unrelated to the issue. As this is NFSv4 (v2 and v3 are thoroughly disabled on the server), TCP is being used. The port is non-standard. I'm uncertain I properly tried server on 4.9, client on 4.19.118+2 (could be this is strictly 4.19.118+2 NFSv4 client code). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#962254: NFS(v4) broken at 4.19.118-2
Package: src:linux Version: 4.19.118+2 Severity: important Somewhere between linux-image-4.19.0-8-amd64/4.19.98+1+deb10u1 and linux-image-4.19.0-9-amd64/4.19.118+2 NFS, in particular v4 got broken. Mounting an appropriate filesystem became unreliable, and once mounted behavior is unpredictable. In particular in the problematic case `umask 022 ; touch foo ; ls -l foo` yields a -rw-rw-rw- file. This occurs if *both* the server *and* client are on 4.19.118+2. I have confirmed this does NOT occur if the server is on a 4.9 kernel. I have also confirmed this does NOT occur if the client is on a 4.9 or 4.19.98+1+deb10u1 kernel. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#926046: Negotiated or default wsize causes misbehavior
Package: nfs-common Version: 1:1.3.4-2.1 I'm using NFSv4 over TCP at the moment. If I don't specify rsize and wsize on the client, either the client negotiates a wsize of 256KB or defaults to a wsize of 256KB ("wsize=262144"). When dumping large amounts of data (moving 2TB of data around, figure many 200MB files) onto the server, after a while the mount hangs and then messages start appearing in the server kernel log: "[sss.mmm] NFSD: client x testing state ID with incorrect client ID" After several minutes the mount was recovering, but having an entire machine locked up for a while is a problem. During an attempt to revert to using UDP, I discovered that explicitly setting wsize=8192 fixed the problem (this size is reasonable with UDP if you've got jumbo-frame support). I'm guessing either the default is bad or negotiation is failing to generate a working value. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#801067: Concurring with #801067
I have little option, but to agree with Reuben Thomas. The bottom of README.Debian.nfsv4 has a date of "Wed, 11 Oct 2006 15:18:03 +0200", more than 10 years old. Even for Debian being in the distribution for 10 years no longer qualifies as "rather new". A 2.6 kernel is no longer "recent" in light of Debian being on 4.9 now. The lines suggested to be added to /etc/services on the client are now present in Debian's default /etc/services file. Yeah, that file needs a bit of an update or removal... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#524458: Still a problem? NFS version?
It has been quite some time since there was last any activity on #524458. Is this problem still occuring for the submitter? Might it have been fixed in one of the update rounds? If this is still a problem, what version of the NFS protocol is in use? In theory NFSv2 should be able to handle files under 2GB, but perhaps a limitation of Linux's NFS client or NFS server made a 176MB file be a problem. Version 3 of the protocol is widely supportted, I'd suggest moving to version 3 or version 4 if this mount is still on version 2. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#903914: xen_netfront broken in 4.9.110-1
Package: linux-source-4.9 Version: 4.9.110-1 Anyone who was using jumbo frames inside a Xen guest was fine with 4.9.88-1+deb9u1, but a problem suddenly showed up with 4.9.110-1. Discussion of problem: https://lists.gt.net/xen/devel/519117 Something which acts like a working patch is here: http://ubuntu.5.x6.nabble.com/Xenial-Regression-SRU-Fix-quot-Cannot-set-MTU-higher-than-1500-in-Xen-instance-quot-td5170202.html -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#832629: CONFIG_CGROUPS=n appears to have been broken in updates
Package: src:linux Version: 3.16.7-ckt25-2+deb8u3 Severity: important Unfortunately I cannot finger the exact version where it happened, but it appears one of the updates to linux-source-3.16 *broke* builds where CONFIG_CGROUPS was left unset. While this may be an unusual configuration, it certainly *was* working. I thought updates were supposed to be confined to *security* issues, the better to avoid breaking other confirmed to work setups. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#696292: Is #696292 a duplicate of #588675?
Reads like #696292 might be yet another manifestation of #588675, or perhaps #588675 is the root cause (or related to the root cause) of #696292. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#827561: Update 3.2.78 -> 3.2.81 broke builds in fs/fcntl.c
Control: tags -1 patch On Fri, Jun 17, 2016 at 11:28:27PM +0100, Ben Hutchings wrote: > On Fri, 2016-06-17 at 12:27 -0700, Elliott Mitchell wrote: > > Package: linux-source-3.2 > > Version: 3.2.81-1 > > Severity: important > > > > SSIA: > > > > ?? CCfs/fcntl.o > > fs/fcntl.c: In function 'setfl': > > fs/fcntl.c:186:31: error: dereferencing pointer to incomplete type > > fs/fcntl.c:187:30: error: dereferencing pointer to incomplete type > > make[2]: *** [fs/fcntl.o] Error 1 > > make[2]: *** Waiting for unfinished jobs > > > > That would be a problem for this update, this hunk of code is new for > > 3.2.81.Seems someone forgot a header (I'm not yet sure which). > > This code was added as part of the fix for??#627782. ??It builds > successfully in Debian's own configurations. > > It looks like this build failure occurs if CONFIG_MODULES is disabled?? > and you should be able to avoid it by enabling that. Problem is that was a very deliberate on the particular computer. Unusual, but something that *should* work. I've got a partial patch for general consumption attached. I'm pretty sure the changes done for #627782 are buggy. If someone builds a kernel with AUFS built into the kernel the test in fcntl.c will fail (the test only works if AUFS is a module). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 --- fcntl.c.orig 2016-06-16 10:03:04.0 -0700 +++ fcntl.c 2016-06-17 17:58:26.0 -0700 @@ -182,10 +182,24 @@ * Since only aufs will implement it, check that the file ops * are implemented by a version of aufs that does. (Ugh.) */ - if (!error && filp->f_op->owner && - !strcmp(filp->f_op->owner->name, "aufs") && - strstr(filp->f_op->owner->version, "+setfl")) +#if defined(CONFIG_MODULES) +#if defined(CONFIG_AUFS_FS) +#if 0 +#if CONFIG_AUFS_FS == "y" +#error "CONFIG_AUFS_FS=y is a known problem, see #627782" +#endif +#endif +#define AUFS_UNLIKELY +#else +#define AUFS_UNLIKELY unlikely +#endif + if (likely(!error) && filp->f_op->owner && + !AUFS_UNLIKELY(strcmp(filp->f_op->owner->name, "aufs")) && + AUFS_UNLIKELY(strstr(filp->f_op->owner->version, "+setfl"))) error = filp->f_op->setfl(filp, arg); +#elif defined(CONFIG_AUFS_FS) +#error "CONFIG_MODULES=n && CONFIG_AUFS_FS=y is a known problem, see #627782" +#endif if (error) return error;
Bug#827561: Update 3.2.78 -> 3.2.81 broke builds in fs/fcntl.c
Okay, looked through and not quite the problem I thought. Problem is the section added to fs/fcntl.c:setfl() depends upon CONFIG_MODULES being enabled. Certainly turning off kernel modules isn't all that common, but it is a situation that is actively used for some situations. I also note the added code is only really useful if CONFIG_AUFS_FS is enabled. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#827561: Update 3.2.78 -> 3.2.81 broke builds in fs/fcntl.c
Package: linux-source-3.2 Version: 3.2.81-1 Severity: important SSIA: CC fs/fcntl.o fs/fcntl.c: In function 'setfl': fs/fcntl.c:186:31: error: dereferencing pointer to incomplete type fs/fcntl.c:187:30: error: dereferencing pointer to incomplete type make[2]: *** [fs/fcntl.o] Error 1 make[2]: *** Waiting for unfinished jobs That would be a problem for this update, this hunk of code is new for 3.2.81. Seems someone forgot a header (I'm not yet sure which). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: Beginings of Heading Towards a Root Cause of #588675
For some time I'd been trying to search for a cause of #588675. Looks like I finally searched for the right string (problem is "root" occurs in many places inside the Linux kernel source). Looks like the key file is linux/init/do_mounts.c: Appears the line: ROOT_DEV = name_to_dev_t(root_device_name); inside prepare_namespace() resolves any specified root device into major/minor. Later at the end of mount_root(), /dev/root is created with the appropriate major/minor, but mount_root() never tries to resolve the major/minor back into a proper device name. The two spots that I've gotten hints of potentially being able to get back the proper device name are: Inside do_mount_root(), s->s_id is "sda1", but I'm a bit worried that may not work in cases with LVM where the proper result could have been "scsi0/target0/". The other is potentially doing bdevname(bdget(ROOT_DEV), char_buffer) may give something approximating a proper name. >From looking at the current code, I suspect while this behavior may have first appeared with SCSI devices, it may well have spread to all block devices other than MTD and UBI (commonly being embedded devices with memory completely inadequate to hold an initial ramdisk, users of MTD device roots would have screamed too loudly to ignore). So I got that wrong. If things go well, I may have a patch soon (alas, I'm also having to fight other issues as well, so that could take a while). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#826999: Kernel build scripts confused by entries for SCSI devices in /proc/mounts
Control: reopen 826999 On Sat, Jun 11, 2016 at 10:06:16AM -0700, Elliott Mitchell wrote: > On Sat, Jun 11, 2016 at 01:15:18PM +0100, Ben Hutchings wrote: > > I make no judgement about the significance of that bug. ??But if you > > refuse to answer a maintainer's reasonable questions about a report, > > there is no way to progress and it should be closed. > > That is a perfectly reasonable statement. Please cite an example of > such an unanswered question. Can I take the lack of response to this as an admission that there are no such unanswered questions? Should I go further and suggest perhaps you didn't fully read a prior message? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#826999: Kernel build scripts confused by entries for SCSI devices in /proc/mounts
On Sat, Jun 11, 2016 at 01:15:18PM +0100, Ben Hutchings wrote: > Control: forcmerge??588675 -1 > > On Fri, 2016-06-10 at 18:16 -0700, Elliott Mitchell wrote: > > The kernel build scripts are confused by what the SCSI subsystem produces > > in /proc/mounts: > > This is not controlled by the SCSI subsystem. > > Unlike you, I've read the relevant source code, and I'm the upstream > maintainer of initramfs-tools. ??I've seen problems like this before. > When I ask you questions I am not just speculating. I'm pretty sure the initramfs isn't the problem, though its presence does manage to work around bug #588675. > > $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts > > /dev/root / ext3 ro > > $?? > > > > A kernel build on such a system will panic on boot unless the root > > filesystem is explicitly passed to the kernel by the bootloader.While > > in common configurations bootloaders generally default to telling the > > kernel what device it should use as root, that has not been documented to > > be required. > > It seems pretty obvious to me that you have to specify the root device > somehow. Indeed, and there are lots of ways to do that. There is the "rdev" setting (which I'm pretty sure is what underlies #826999), you or your bootloader can also specify a device on the kernel's command-line. > > Since Ben Hutchings thinks #588675 is too insignificant to ever be worthy > > of a single line of code to fix, this bug now needs to be fixed (along > > with many other utilities that are broken by #588675). > > I make no judgement about the significance of that bug. ??But if you > refuse to answer a maintainer's reasonable questions about a report, > there is no way to progress and it should be closed. That is a perfectly reasonable statement. Please cite an example of such an unanswered question. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#826999: Kernel build scripts confused by entries for SCSI devices in /proc/mounts
Package: src:linux Version: 3.2.63-2 Control: found -1 linux/2.6.18 Control: found -1 linux/3.16.7-ckt25-2 Control: found -1 linux/3.16.7-ckt20-1+deb8u3 Control: found -1 linux/3.2.78-1 Control: found -1 linux/3.16.7-ckt11-1+deb8u6~bpo70+1 Control: found -1 linux/2.6.32 The kernel build scripts are confused by what the SCSI subsystem produces in /proc/mounts: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/root / ext3 ro $ A kernel build on such a system will panic on boot unless the root filesystem is explicitly passed to the kernel by the bootloader. While in common configurations bootloaders generally default to telling the kernel what device it should use as root, that has not been documented to be required. Since Ben Hutchings thinks #588675 is too insignificant to ever be worthy of a single line of code to fix, this bug now needs to be fixed (along with many other utilities that are broken by #588675). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: Summary of observations of #588675
On Fri, Jun 10, 2016 at 10:19:47PM +0100, Ben Hutchings wrote: > Are you using LILO? ??Are you specifying the root device by name or > UUID? I'm quite certain that is completely irrelevant. Either of those, or even allowing the rdev field in the kernel image should result in the device being shown for / in /proc/mounts. On the most recent boot of this machine though, "root=/dev/sda1" is in /proc/cmdline, yet the line in /proc/mounts is "/dev/root / ext3 ..." The only common factor is the SCSI subsystem. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: Summary of observations of #588675
Control: retitle 588675 SCSI subsystem loses name of root device on boot Control: severity 588675 normal Control: found 588675 3.2.78-1 Control: found 588675 3.16.7-ckt20-1+deb8u3 Control: found 588675 3.16.7-ckt25-2 Control: found 588675 2.6.18 According to the advanced information on the BTS, under severity levels: wishlist for any feature request, and also for any bugs that are very difficult to fix due to major design considerations. The first condition is untrue, this is definitely a bug. While the damage may not be that major, it is pretty widespread. If the Debian kernel maintainers were to claim this wasn't a problem, then I would be forced to report another bug against src:linux since the kernel build scripts themselves are confused by this behavior! The second condition requires a judgement call to evaluate, but looking at things I'm pretty sure it is untrue. I'm guessing this is simply one crucial field that needs to be copied by the SCSI subsystem, but is not. Since many other subsystems manage to copy the value, almost certainly the change is small. I'd be surprised if it took more than 4 lines to fix (two of which being blank and one being a comment). I will concede this may need expertise on how /proc/mounts works and the interface between that and the driver subsystems (alternatively simply looking for one field which is ignored may be enough), but with that this should be a simple fix. Meanwhile the damage from this bug may not be that large, but it is rather widespread. I know of 4 reports where this is the root cause and I imagine there are others I do not know of. There may also be many utilities that already work around this bug and hundreds of scripts that are similarly forced to do so. This bug has also wasted a great deal of time trying to figure out where to attribute the issue. My earliest observations were close to a decade ago, but I didn't feel confident placing blame anywhere. Then more recently I had to spend time building several kernels to confirm the conditions under which the problem occurred. Uneffected systems: This group consists of all system where the root filesystem is NOT on a device that directly plugs into the SCSI subsystem. It does not matter whether an initial ramdisk is used or not. This includes systems like: root on Linux software RAID: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/md0 / ext3 ro 0 0 $ I recall this system being in service from around 2.6.5(?) to 2.6.18 or so. Even though the immediate driver was the MD subsystem, underlying this were SCSI devices. This is long in the past, but I'd already been observing the bug by then (and wondering where to point the finger). root on olde IDE devices, on the olde IDE subsystem: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/hda1 / ext3 ro 0 0 $ I think this system managed to remain in service into the 2.6.29 timeframe, but is also no longer in service. This does give an example of the root filesystem being on a different subsystem though. Crucially this is prior to the olde IDE subsystem being retired and the driver for PATA devices which plugged into the SCSI subsystem coming into service. root on MTD devices: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/mtdblock4 / jffs2 rw,relatime 0 0 $ A very different system here. Different filesystem and rather different device. This one hasn't been tried with kernels earlier than 3.2, but seems to echo other observations. This one is in active service and due to interesting setup allows for testing of some interesting scenarios. root on BLK_DEV_IDE_PMAC (olde Mac IDE subsystem?): This is Christian Kujau's report in bug #588675. I believe BLK_DEV_IDE_PMAC would be a PowerMac analog of the x86 IDE driver which had it's own subsystem and which didn't plug into the SCSI subsystem. Effected systems: This group consists of all system where the root filesystem is on a device that directly plugs into the SCSI subsystem and the system directly mounts that device at boot. On such systems: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/root / ro,relatime 0 0 $ Most of my systems are running ext3, but Christian Kujau confirmed this with ext4 and jfs. Christian Kujau also observed this with the PATA_MACIO driver, which I believe is a Macintosh equivalent of the x86 PATA driver which plugs into the SCSI subsystem. I've observed this on many different systems with devices which plug into the SCSI subsystem, this includes a 3ware card, SATA disks, USB flash drives and genuine SCSI disks. Workaround: The workaround that bypasses the problem is to initially mount some other device as root, then pivot_root or such onto the real root. Using an initial ramdisk is one example of this. From the DebWRT project I'm also aware of the case of booting onto a root on MTD and then doing a pivot_root onto a USB flash key works arount the issue. $ awk '$2 == "/" && $1 != "rootfs"' <
Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25
On Mon, Apr 11, 2016 at 01:34:56AM +0100, Ben Hutchings wrote: > On Sun, 2016-04-10 at 14:32 -0700, Elliott Mitchell wrote: > > On Sun, Apr 10, 2016 at 07:47:28PM +0100, Ben Hutchings wrote: > > > That in no way contradicts what I said. :-) ??When I backport the linux > > > source package from jessie to wheezy I change it to use gcc-4.6. > > > > > > But the linux-source-X.Y packages (which are a different thing to the > > > linux source package!) don't specify any particular compiler version. > > > You can choose that with the CC variable; otherwise the default > > > compiler (specified by the gcc package) will be used. > > For this particular mipsel device I was unable to kexec the kernel unless > > it was built with GCC-4.8. > > I see. > > > If the kernel was built with GCC-4.7 or > > earlier, I got symptoms identical to the above, messages from the old > > kernel on the console serial port that it was going away and kexec'd > > kernel never output any messages.I could believe this is a funky > > compiler issue. > > Could it be the kernel image is close to a critical size limit? ??The > kernel typically gets slightly larger with each stable update. ??Does > gcc 4.8 generate a smaller or larger kernel image than older versions? You win one and lose one. I tracked down the configuration option that managed to switch from "y" to "n" (seems my base config had it as "y", but other options interfered, now it is "m"), that shrank the kernel by 40KB and the resultant kernel was successfully loaded by the 3.3 kernel. The kernel built with GCC 4.4 was about 2% larger than the GCC 4.8 build, while a GCC 4.6 build was less than 1% smaller than the GCC 4.8 build. Neither of these kernels was able to successfully start when kexec'd by a 3.16 kernel (which *was* able to start the bigger kernel). So this solves the problem this bug was about, it was a size issue. :-( Alas I'm expecting it to be a while before I can get the proper solution in place. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25
On Sun, Apr 10, 2016 at 07:47:28PM +0100, Ben Hutchings wrote: > On Sun, 2016-04-10 at 11:09 -0700, Elliott Mitchell wrote: > > On Sun, Apr 10, 2016 at 10:09:38AM +0100, Ben Hutchings wrote: > > > > > > On Sat, 2016-04-09 at 18:31 -0700, Elliott Mitchell wrote: > > > > > > > > Between 3.16.7-ctk20 and 3.16.7-ctk25 the kexec functionality of the > > > > Linux kernel was damaged.The system I'm looking at uses a 3.3 kernel > > > > to load the "real" kernel off a filesystem and kexec into that.The > > > > 3.3 > > > > kernel was able to successfully kexec into a 3.16.7-ctk20 kernel, but > > > > is unable to kexec into a 3.16.7-ctk25 kernel.However I found the > > > > 3.16.7-ctk20 IS able to successfully kexec the 3.16.7-ctk25 kernel. > > > Surely this is a bug in the built-in (3.3) kernel, not the new one? ??If > > > there's something simple that can be done in the Debian kernel to work > > > around this, we should do that, but otherwise you're stuck with this. > > This is certainly a reasonable theory.Alas, I cannot speak to which of > > these theories is correct.All I can say for certain is that something > > changed between ctk20 and ctk25 which made the 3.3 kernel unable to kexec > > ctk25.I'm under the impression as of 3.3 the kexec functionality was > > supposed to be stable on MIPS, but that could be incorrect. > > > > I should also note, during the failed kexecs I would see the messages > > from the 3.3 kernel saying the kexec was starting, but never see any > > messages from the ctk25 kernel.Unless someone wants to send me a JTAG > > decoder that is all I can say. > > One of the MIPS porters may be able to help you, but I have no idea > what to suggest. > > Are you using one of the linux-image packages or building from source > with your own configuration? ??In the latter case, are you sure you used > the same configuration for old and new kernels? Building from source. The .config files used started identical, but looks like things changed in the Kconfig files which caused 4 items to switch from 'm' to 'y' (all networking, which shouldn't cause the observed bug). There were also some patches derived from OpenWRT's patches, but those did not change at all. > > > > Doing a double-kexec does work around the issue, but it means I need to > > > > hold onto that one magic kernel for the moment... > > > > > > > > In other news, it appears sometime between 3.3 and 3.10 there started > > > > being a requirement for GCC 4.8 on mipsel. > > > Packages in jessie must be buildable using compiler versions in jessie. > > > That means either gcc-4.8 or gcc-4.9. > > linux-source-3.16 is available as a backport to wheezy, which does not > > feature gcc-4.8. > > That in no way contradicts what I said. :-) ??When I backport the linux > source package from jessie to wheezy I change it to use gcc-4.6. > > But the linux-source-X.Y packages (which are a different thing to the > linux source package!) don't specify any particular compiler version. > You can choose that with the CC variable; otherwise the default > compiler (specified by the gcc package) will be used. For this particular mipsel device I was unable to kexec the kernel unless it was built with GCC-4.8. If the kernel was built with GCC-4.7 or earlier, I got symptoms identical to the above, messages from the old kernel on the console serial port that it was going away and kexec'd kernel never output any messages. I could believe this is a funky compiler issue. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25
On Sun, Apr 10, 2016 at 10:09:38AM +0100, Ben Hutchings wrote: > On Sat, 2016-04-09 at 18:31 -0700, Elliott Mitchell wrote: > > Between 3.16.7-ctk20 and 3.16.7-ctk25 the kexec functionality of the > > Linux kernel was damaged.The system I'm looking at uses a 3.3 kernel > > to load the "real" kernel off a filesystem and kexec into that.The 3.3 > > kernel was able to successfully kexec into a 3.16.7-ctk20 kernel, but > > is unable to kexec into a 3.16.7-ctk25 kernel.However I found the > > 3.16.7-ctk20 IS able to successfully kexec the 3.16.7-ctk25 kernel. > > Surely this is a bug in the built-in (3.3) kernel, not the new one? ??If > there's something simple that can be done in the Debian kernel to work > around this, we should do that, but otherwise you're stuck with this. This is certainly a reasonable theory. Alas, I cannot speak to which of these theories is correct. All I can say for certain is that something changed between ctk20 and ctk25 which made the 3.3 kernel unable to kexec ctk25. I'm under the impression as of 3.3 the kexec functionality was supposed to be stable on MIPS, but that could be incorrect. I should also note, during the failed kexecs I would see the messages from the 3.3 kernel saying the kexec was starting, but never see any messages from the ctk25 kernel. Unless someone wants to send me a JTAG decoder that is all I can say. > > Doing a double-kexec does work around the issue, but it means I need to > > hold onto that one magic kernel for the moment... > > > > In other news, it appears sometime between 3.3 and 3.10 there started > > being a requirement for GCC 4.8 on mipsel. > > Packages in jessie must be buildable using compiler versions in jessie. > That means either gcc-4.8 or gcc-4.9. linux-source-3.16 is available as a backport to wheezy, which does not feature gcc-4.8. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25
Package: linux-source-3.16 Version: 3.16.7-ckt25-1~bpo70+1 Between 3.16.7-ctk20 and 3.16.7-ctk25 the kexec functionality of the Linux kernel was damaged. The system I'm looking at uses a 3.3 kernel to load the "real" kernel off a filesystem and kexec into that. The 3.3 kernel was able to successfully kexec into a 3.16.7-ctk20 kernel, but is unable to kexec into a 3.16.7-ctk25 kernel. However I found the 3.16.7-ctk20 IS able to successfully kexec the 3.16.7-ctk25 kernel. Doing a double-kexec does work around the issue, but it means I need to hold onto that one magic kernel for the moment... In other news, it appears sometime between 3.3 and 3.10 there started being a requirement for GCC 4.8 on mipsel. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: / left as /dev/root with non-initrd kernel
On Thu, Dec 03, 2015 at 03:11:45PM -0800, Christian Kujau wrote: > On 12/02/2015 04:30 PM, Elliott Mitchell wrote: > > You're thinking of the wrong bug. #588675 is the bug # for /proc/mounts > > having "/dev/root" listed as the device for the root filesystem. Your > > Indeed, I think I confused this with #656333 ("Please ignore rootfs in > df output"), which may be related to this one. So many bugs, so little time. :-/ This actually effected many other utilities as well. I'm a bit surprised if this is getting fixed in the kernel, I thought the kernel maintainers had decided "this is the way the kernel does this, userspace needs to compensate". > > previous mention indicated you would expect "/dev/sda6" to be there. I'm > > guessing prior to wheezy, when you were using BLK_DEV_IDE_PMAC, you would > > have been seeing "/dev/hda6" listed as the root device? > > Again, I'm afraid I don't remember what I've seen prior to wheezy. The > system is running 24x7 but I'll try to boot a pre-wheezy image (or > something with BLK_DEV_IDE_PMAC enabled) the next time this machine is > rebooted and see if the actual disk or /dev/rootfs is displayed. Thanks. I'm pretty sure it will show the actual disk, but another confirmation will help. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: / left as /dev/root with non-initrd kernel
Control: found -1 3.16.7-ckt11-1+deb8u6~bpo70+1 Control: found -1 2.6.32 Could you confirm a few things about what you've seen of bug 588675? Did you observe the behavior prior to Debian wheezy/Linux kernel 3.2? What type of disk/controller/disk subsystem is on your powerpc system? >From your mention of /dev/sda6 in bug #588675 it is clear as of Debian wheezy/Linux kernel 3.2 that the disk/controller plugs into the SCSI subsystem. I'm pretty sure this is a SCSI subsystem bug, since you've also seen the behavior I'd like to confirm this has followed the SCSI subsystem for you as well. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: / left as /dev/root with non-initrd kernel
On Wed, Dec 02, 2015 at 02:15:18PM -0800, Christian Kujau wrote: > On 12/02/2015 01:23 PM, Elliott Mitchell wrote: > > Could you confirm a few things about what you've seen of bug 588675? > > > > Did you observe the behavior prior to Debian wheezy/Linux kernel 3.2? > > > > What type of disk/controller/disk subsystem is on your powerpc system? > > > > From your mention of /dev/sda6 in bug #588675 it is clear as of Debian > > wheezy/Linux kernel 3.2 that the disk/controller plugs into the SCSI > > subsystem. I'm pretty sure this is a SCSI subsystem bug, since you've > > also seen the behavior I'd like to confirm this has followed the SCSI > > subsystem for you as well. > > Wow, that's an old bug :-) Nah, some of my older bug reports would be eligible to drive in some countries. :-) > I had to reinstall the PowerBook with Wheezy due to a disk failure and > after that I've upgraded from Wheezy to Jessie and the problem is gone now. > > I can't tell if I've seen this prior to Linux 3.2 kernels. If it helps I > could try to boot an older Debian/wheezy live-cd and see if the rootfs > comes up twice. > > The disk controller of this PowerBook G4 is: > > 0002:20:0d.0 Unassigned class [ff00]: Apple Inc. UniNorth/Intrepid ATA/100 > > I've used BLK_DEV_IDE_PMAC ages ago, but have switched to PATA_MACIO for > a while now. But again, I can't tell when the double "/" entry occured > first, I noticed it only at the time of my bug entry (wheezy 7.3). > > I'm still using a self-compiled kernel but the issue is gone now, at > least on this system: > > $ uname -r; grep root /proc/mounts > 4.3.0-11626-g5d50ac7 > /dev/root / jfs rw,nodev,relatime 0 0 You're thinking of the wrong bug. #588675 is the bug # for /proc/mounts having "/dev/root" listed as the device for the root filesystem. Your previous mention indicated you would expect "/dev/sda6" to be there. I'm guessing prior to wheezy, when you were using BLK_DEV_IDE_PMAC, you would have been seeing "/dev/hda6" listed as the root device? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#588675: Narrowing on location of bug #588675
reassign 588675 linux-source 2.6.32 retitle 588675 SCSI subsystem forgets root device on boot found 588675 2.6.18 found 588675 3.2.63-2 submitter 588675 ! quit I suspect the list of kernel versions with this bug is rather longer, but I'm merely including some of those I'm certain it does effect. I suspect even the latest kernels are effected, but I haven't confirmed this. Thankfully I finally noticed an ingredient crucial enough to narrow down the list of causes to a reasonable length. The bug's manifestation is fairly simple, on an effected system: $ head -2 /proc/mounts rootfs / rootfs rw 0 0 /dev/root / ext3 ro,errors=continue 0 0 $ Whereas on an uneffected system: $ head -2 /proc/mounts rootfs / rootfs rw 0 0 /dev/sda1 / ext3 rw,errors=continue 0 0 $ And another uneffected example: $ head -2 /proc/mounts rootfs / rootfs rw 0 0 /dev/mtdblock2 / jffs2 rw,relatime 0 0 $ The two crucial ingredients for reproducing this bug, the system must boot directly onto the root device (no initrd) and the root device must be something that plugs into the SCSI subsystem. This effects x86, PowerPC, MIPSel and likely other machines. This effects systems with SATA main disks (SATA devices go through the SCSI subsystem) as well as root on USB devices (yes, it *is* possible to get a kernel to directly boot onto a USB key). This does NOT effect older kernels when booting onto IDE subsystem disks (/dev/hd* with newer kernels IDE disks go through the SCSI subsystem and are likely effected). This does not effect systems which initially mount *any* other device as root, and subsequently chroot onto a SCSI subsystem device (this explains why initrd system are uneffected). While this bug is mostly harmless, it is the root cause behind bugs 620465, 653073, 656333, and may very well have caused other bug reports I'm unaware of. -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 -PGP- 41D1 B375 37D0 8714\_|_/___/5445 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150105000223.ga50...@scollay.m5p.com
Bug#588675: Narrowing on location of bug #588675
On Mon, Jan 05, 2015 at 03:17:28AM +, Ben Hutchings wrote: Control: reassign -1 src:linux 3.2.63-2 Control: retitle -1 / left as /dev/root with non-initrd kernel Control: severity -1 wishlist Control: tag -1 upstream wontfix On Sun, 2015-01-04 at 16:02 -0800, Elliott Mitchell wrote: [...] The two crucial ingredients for reproducing this bug, the system must boot directly onto the root device (no initrd) and the root device must be something that plugs into the SCSI subsystem. [...] This does NOT effect older kernels when booting onto IDE subsystem disks (/dev/hd* with newer kernels IDE disks go through the SCSI subsystem and are likely effected). This does not effect systems which initially mount *any* other device as root, and subsequently chroot onto a SCSI subsystem device (this explains why initrd system are uneffected). [...] I don't see why the driver would matter. Since at least the beginning of git history (2.6.12), when you use the root= parameter to boot directly from a block device, the kernel has done: I'm also surprised about the driver making such a difference, but observation has demonstrated it clearly does. Prior to hardware replacement I'd been using a system with an IDE^WPATA disk which used the olde IDE subsystem and /dev/hda1 appeared in /proc/mounts. Notice my prior message I mentioned with a 3.2 kernel a device that mounts /dev/mtdblock2 (without any initrd) as root filesystem, and /dev/mtdblock2 appears correctly in /proc/mounts. Since the SCSI subsystem is in common with the observed occurances, I must point my finger towards *something* being messed up with the SCSI subsystem. 1. Mount rootfs (which is really either tmpfs or ramfs) at / 2. Create directories /dev and /root, and block device /dev/console 3. Create block device node /dev/root for the specified block device 4. Mount /dev/root at /root 5. Move-mount /root to / (hiding the tmpfs/ramfs) What *has* changed is that /etc/mtab is now a symlink to /proc/mounts and therefore the root device name recorded there is not affected by /etc/fstab. No, that is not where the problem occurs. Back when I was running on the system with IDE disk, /etc/mtab was already a symbolic link to /proc/mounts, yet the issue did not occur. My first observation of the problem corresponds with when I'd installed a PCI SATA card in a system and using the exact same filesystem image, except for rebuilding the kernel with SATA support. None of this is likely to change, so if you don't want to use an initramfs then you'd better create a symlink called /dev/root on your root filesystem. Userspace (I /think/, maybe this too is inside the kernel?) has already been creating /dev/root for a long time. While this only causes mild corruption of output, it causes it in *many* programs. Either this *kernel* *bug* needs to be fixed inside the kernel, or I'm going have to report many bugs against the many programs which display bad output. Again, a computer with /dev/mtdblock2 as root device, directly mounts /dev/mtdblock2 just fine and lists /dev/mtdblock2 in /proc/mounts just fine. That sounds very much like a (perhaps fairly minor) SCSI subsystem bug. -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 -PGP- 41D1 B375 37D0 8714\_|_/___/5445 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150105042259.ga50...@scollay.m5p.com
Is #588675 (/ left as /dev/root) A Kernel Bug?
#588675 may not be all that severe, but does cause issues with multiple packages. The issue is sometime between Debian Etch and Debian Lenny the line for the root filesystem in /proc/mounts stopped listing the actual device (/dev/sda1) started listing /dev/root instead. One crucial ingredient I'm certain of is this requires a kernel that has the drivers necessary to mount the root filesystem builtin and not be using an initrd. Both reportted encounters with this bug involve kernels that have been built from source. I've got little idea of where the actual bug is lurking. I've got some suspicion this may be a bug in the SCSI subsystem. The one machine I've noticed this *doesn't* occur on is one where it boots with root on a MTD device. I'll try to see if I can pursuade that machine to boot directly onto it's USB device as root, to see whether or not it is due to being a MIPS system (grr, emdebian.org being down is a problem). -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 -PGP- 41D1 B375 37D0 8714\_|_/___/5445 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20141223064816.ga8...@scollay.m5p.com
Bug#572406: users Mount Option Broken
From: Luk Claes l...@debian.org The users option got broken with the latest release, despite working correctly in 1:1.0.10-6+etch.1 (old stable). Non-root users can mount filesystems listed in /etc/fstab that have users specified, but they will be unable to unmount the filesystem (umount.nfs: You are not permitted to unmount ...). My first thought is someone confused the user (which would need to check who mounted the FS) and users option. Given bug report #501459, part of which sounds like a similar issue with cifs, I'm also wondering if the interface between `mount` and `mount.fstype` got changed slightly. Does adding a trailing slash in /etc/fstab fix the issue for you? Nope, no impact from adding one to either the NFS-server nor the mount point (nor both at the same time). Also tried `umount` both with and without a trailing slash in each of those combinations as well. Nothing. -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201103212257.p2lmvzjl001...@m5p.com
Bug#572406: users Mount Option Broken
I should also add that I'm now dealing with nfs-common 1:1.2.2-4, and mount 2.17.2-9 (current stable). -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201103220029.p2m0tahx002...@m5p.com
Bug#589118: `rdev` setting ignored
reopen 589118 quit From: Ben Hutchings b...@decadent.org.uk On Thu, 2010-07-15 at 13:48 -0700, Elliott Mitchell wrote: Bzzzt! While the initrd= kernel command-line option and `rdev` kernel settings are not completely orthogonal, they are mostly unrelated. You obviously haven't read the code. I have. This is in fact true. An unrelated project may cause me to do so. Unlike the kernel command-line, I don't know how the `rdev` (and accompanying) setting is passed along to initial ram disks, but I do know it is (or was). It isn't. Reeeaaally? Sorry to be speculating outside my area of firm knowledge, but I'm noting that the rdev setting was honored all the way through Debian 3.1/Sarge, which was a 2.4 kernel. Was the rdev setting really available to initial ramdisks all the way through 2.4, yet lost with 2.6 kernels? I'm unsure whether Debian 4.0/Etch honored the `rdev` setting, but I am pretty certain initial ram disks generated with Debian 3.1/Sage did honor the `rdev` setting unless overridden by the root= option. That's nice, but this feature isn't coming back. That sounds suspiciously like wontfix, not done. -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201007170018.o6h0ifmg076...@m5p.com
Bug#589118: `rdev` setting ignored
reopen 589118 quit From: Ben Hutchings b...@decadent.org.uk On Wed, 2010-07-14 at 18:11 -0700, Elliott Mitchell wrote: Package: initramfs-tools Version: 0.92o Subject tells the story. Appears the images generated by initramfs-tools completely ignore the `rdev` setting that the kernel was given to the kernel. While 99% of users may be explicitly passing the root device via passing root=/dev/foo through the bootloader, if that is absent one would think the value from `rdev` would be honored. (yeah, it's an ancient method, but not officially deprecated) If the bootloader passes an initramfs to the kernel, that overrides any rdev parameter. This is nothing to do with the contents of the initramfs. Bzzzt! While the initrd= kernel command-line option and `rdev` kernel settings are not completely orthogonal, they are mostly unrelated. The initrd= option overrides the `rdev` setting in the same fashion the initrd= option overrides the root= and all other kernel command-line options. Mainly, the initramfs can ignore any and all options and use ones built in, or it can implement all those options. It is the root= option that is directly related to `rdev`. Unlike the kernel command-line, I don't know how the `rdev` (and accompanying) setting is passed along to initial ram disks, but I do know it is (or was). I'm unsure whether Debian 4.0/Etch honored the `rdev` setting, but I am pretty certain initial ram disks generated with Debian 3.1/Sage did honor the `rdev` setting unless overridden by the root= option. -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201007152048.o6fkm7n7071...@m5p.com
Bug#589118: `rdev` setting ignored
Package: initramfs-tools Version: 0.92o Subject tells the story. Appears the images generated by initramfs-tools completely ignore the `rdev` setting that the kernel was given to the kernel. While 99% of users may be explicitly passing the root device via passing root=/dev/foo through the bootloader, if that is absent one would think the value from `rdev` would be honored. (yeah, it's an ancient method, but not officially deprecated) -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201007150111.o6f1byyw068...@m5p.com
Bug#575154: Incorrect assumes existance of /proc/modules
Package: initramfs-tools Version: 0.92o If the running kernel has had module support removed, you'll get a bunch of errors of: grep: /proc/modules: No such file or directory The one place I found was in /usr/share/initramfs-tools/hook-functions, the function manual_add_modules(). Looks like you need the -s option to grep, or else redirect standard error. -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003232141.o2nlfwh3061...@m5p.com
Bug#575157: Calling `cpio` can produce error messages when working correctly
Package: initramfs-tools Version: 0.92o Severity: minor Thankfully pretty harmless, despite the annoyance: cpio: ./etc/udev/RCS: Cannot stat: No such file or directory Looks like `/usr/sbin/mkinitramfs` is the culprit. In this case, /etc/udev/RCS is a symbolic link to ../RCS -- (\___(\___(\__ --= 8-) EHM =-- __/)___/)___/) \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |) / \_CS\ | _ -O #include stddisclaimer.h O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003232155.o2nltk4b061...@m5p.com