Processed: Properly reassigning ...
Processing commands for cont...@bugs.debian.org: > reassign 1013330 src:linux 5.18.2-1 Bug #1013330 [linux-image-5.18.0-0.bpo.1-arm64] linux-image-5.18.0-0.bpo.1-arm64: kernel panic in dpaa2_eth_free_tx_fd Bug reassigned from package 'linux-image-5.18.0-0.bpo.1-arm64' to 'src:linux'. No longer marked as found in versions linux-signed-arm64/5.18.2+1~bpo11+1. Ignoring request to alter fixed versions of bug #1013330 to the same values previously set Bug #1013330 [src:linux] linux-image-5.18.0-0.bpo.1-arm64: kernel panic in dpaa2_eth_free_tx_fd Marked as found in versions linux/5.18.2-1. > End of message, stopping processing here. Please contact me if you need assistance. -- 1013330: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013330 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1013330: linux-image-5.18.0-0.bpo.1-arm64: kernel panic in dpaa2_eth_free_tx_fd
Control: reassign src:linux 5.18.2-1 On Tuesday, 21 June 2022 23:31:29 CEST Harald Welte wrote: > Package: linux-image-5.18.0-0.bpo.1-arm64 > Version: 5.18.2-1~bpo11+1 > > today I briefly tried the backport 5.18 kernel on bullseye. It boots fine, > but as soon as some network traffic happens, it panics with a backtrace > indicating some kind of problem in the dpaa2_eth netwokr driver. > > [ 46.451190] Unable to handle kernel paging request at virtual address > fcf7fe08 [ 46.459126] Mem abort info: > [ 46.461937] ESR = 0x9605 > [ 46.464983] EC = 0x25: DABT (current EL), IL = 32 bits > [ 46.470301] SET = 0, FnV = 0 > [ 46.473347] EA = 0, S1PTW = 0 > [ 46.476491] FSC = 0x05: level 1 translation fault > [ 46.481373] Data abort info: > [ 46.484257] ISV = 0, ISS = 0x0005 > [ 46.488095] CM = 0, WnR = 0 > [ 46.491067] swapper pgtable: 4k pages, 48-bit VAs, pgdp=8258f000 > [ 46.497786] [fcf7fe08] pgd=102f78387003, > p4d=102f78387003, pud= [ 46.506496] Internal error: > Oops: 9605 [#1] SMP Kernel 5.18.3 contains (at least) 2 patches related to dpaa2-eth. Kernel 5.18.5-1 (currently in Sid) does contain quite a few fixes vs 5.18.2, so it would be useful to verify if that fixes your issue. I don't know when or what version becomes available next in Stable Backports though. signature.asc Description: This is a digitally signed message part.
Bug#1013330: linux-image-5.18.0-0.bpo.1-arm64: kernel panic in dpaa2_eth_free_tx_fd
Package: linux-image-5.18.0-0.bpo.1-arm64 Version: 5.18.2-1~bpo11+1 Severity: normal Dear Maintainer, today I briefly tried the backport 5.18 kernel on bullseye. It boots fine, but as soon as some network traffic happens, it panics with a backtrace indicating some kind of problem in the dpaa2_eth netwokr driver. The problem can be reproduced 100% within very few seconds after system boot. One can usually still ssh into the machine, but then the first shell command producing more than a single-line output (like ls -l /etc) makes the kernel panic like below. As soon as I downgraded back to linux-image-5.10.0-15-arm64 = 5.10.120-1 the problem disappeared. On 5.10.120-1 the network runs very stable. [ 46.451190] Unable to handle kernel paging request at virtual address fcf7fe08 [ 46.459126] Mem abort info: [ 46.461937] ESR = 0x9605 [ 46.464983] EC = 0x25: DABT (current EL), IL = 32 bits [ 46.470301] SET = 0, FnV = 0 [ 46.473347] EA = 0, S1PTW = 0 [ 46.476491] FSC = 0x05: level 1 translation fault [ 46.481373] Data abort info: [ 46.484257] ISV = 0, ISS = 0x0005 [ 46.488095] CM = 0, WnR = 0 [ 46.491067] swapper pgtable: 4k pages, 48-bit VAs, pgdp=8258f000 [ 46.497786] [fcf7fe08] pgd=102f78387003, p4d=102f78387003, pud= [ 46.506496] Internal error: Oops: 9605 [#1] SMP [ 46.511364] Modules linked in: caam_jr crypto_engine rng_core aes_ce_blk aes_ce_cipher ghash_ce dpaa2_caam gf128mul caamhash_desc sha2_ce caamalg_desc sha256_arm64 authenc libdes sha1_ce dpaa2_console caam ofpart error lm90 spi_nor at24 mtd sbsa_gwdt qoriq_thermal evdev layerscape_edac_mod qoriq_cpufreq drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod dax fsl_dpaa2_ptp fsl_dpaa2_eth xhci_plat_hcd xhci_hcd usbcore nvme nvme_core ahci_qoriq t10_pi libahci_platform libahci at803x libata fsl_mc_dpio crc64_rocksoft ptp_qoriq crc64 xgmac_mdio pcs_lynx acpi_mdio phylink crc_t10dif mdio_devres rtc_pcf2127 ptp of_mdio i2c_mux_pca954x crct10dif_generic regmap_spi i2c_mux dwc3 fixed_phy pps_core fwnode_mdio scsi_mod udc_core sfp crct10dif_ce sdhci_of_esdhc crct10dif_common mdio_i2c roles sdhci_pltfm ulpi scsi_common usb_common libphy sdhci spi_nxp_fspi i2c_imx fixed gpio_keys [ 46.591702] CPU: 7 PID: 822 Comm: sshd Not tainted 5.18.0-0.bpo.1-arm64 #1 Debian 5.18.2-1~bpo11+1 [ 46.600736] Hardware name: SolidRun LX2160A Clearfog CX (DT) [ 46.606383] pstate: a005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 46.613332] pc : kfree+0x78/0x290 [ 46.616644] lr : dpaa2_eth_free_tx_fd.isra.0+0x308/0x3b4 [fsl_dpaa2_eth] [ 46.623341] sp : 8aa3b2d0 [ 46.626643] x29: 8aa3b2d0 x28: 3e200d37a800 x27: 3e2005045d00 [ 46.633769] x26: 0001 x25: 0001 x24: 0002 [ 46.640895] x23: b76243dab000 x22: b76239b320e8 x21: faee1740 [ 46.648020] x20: 3dff8000 x19: fcf7fe00 x18: [ 46.655145] x17: 86cc769fc000 x16: b762425450d0 x15: 4000 [ 46.662270] x14: x13: c2008000 x12: 0001 [ 46.669395] x11: 0004 x10: 0008 x9 : b76239b320e8 [ 46.676520] x8 : x7 : 000faee2 x6 : 3e2000ce4a00 [ 46.683645] x5 : b76243196000 x4 : 0003 x3 : 0009 [ 46.690769] x2 : x1 : 0030 x0 : fc00 [ 46.697894] Call trace: [ 46.700328] kfree+0x78/0x290 [ 46.703286] dpaa2_eth_free_tx_fd.isra.0+0x308/0x3b4 [fsl_dpaa2_eth] [ 46.709631] dpaa2_eth_tx_conf+0xb0/0x19c [fsl_dpaa2_eth] [ 46.715020] dpaa2_eth_poll+0xf4/0x3b0 [fsl_dpaa2_eth] [ 46.720149] __napi_poll+0x40/0x1dc [ 46.723628] net_rx_action+0x2fc/0x390 [ 46.727366] __do_softirq+0x120/0x348 [ 46.731017] __irq_exit_rcu+0x10c/0x140 [ 46.734842] irq_exit_rcu+0x1c/0x30 [ 46.738320] el1_interrupt+0x38/0x54 [ 46.741885] el1h_64_irq_handler+0x18/0x24 [ 46.745970] el1h_64_irq+0x64/0x68 [ 46.749360] n_tty_poll+0x98/0x1e0 [ 46.752752] tty_poll+0x7c/0x114 [ 46.755968] do_select+0x28c/0x64c [ 46.759361] core_sys_select+0x238/0x3a0 [ 46.763273] __arm64_sys_pselect6+0x17c/0x280 [ 46.767619] invoke_syscall+0x50/0x120 [ 46.771357] el0_svc_common.constprop.0+0x4c/0xf4 [ 46.776051] do_el0_svc+0x30/0x90 [ 46.779354] el0_svc+0x34/0xd0 [ 46.782397] el0t_64_sync_handler+0x1a4/0x1b0 [ 46.786743] el0t_64_sync+0x18c/0x190 [ 46.790396] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) [ 46.796478] ---[ end trace ]--- [ 46.801083] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 46.807945] SMP: stopping secondary CPUs [ 46.811867] Kernel Offset: 0x37623a20 from 0x8800 [ 46.817947] PHYS_OFFSET: 0xc2008000 [ 46.822116] CPU features: 0x100,4b09,1086 [
Bug#1001001: linux-image-5.10.0-9-arm64: kernel BUG at include/linux/swapops.h:204!
Hi, On Tuesday, 21 June 2022 22:31:45 CEST Paul Gevers wrote: > On 21-06-2022 22:07, Diederik de Haas wrote: > > > Do these errors still occur? Still with 5.10.103-1 or a later one? > > The last occurrence of a machine hang I had is from 5 May 2022, but I'm > not sure if I checked if it was this same issue. Normally our kernels > are up-to-date, but I don't recall what we had at the time. We have > recommissioned our arm64 hosts, so the install logs are lost by now. It's good for ci.debian.net that there are such large gaps between failures, but it makes debugging a bit harder. I think that the install logs aren't that important (anymore) as the issue/ symptoms appear to be the same: - some swap action resulting in some failure - CPU gets stuck - watchdog triggers a reboot How is swap configured on these devices? > > Is it only on arm64 machines? Or is this just an example which also > > occurs on other arches? > > I'm pretty sure I haven't seen this on other arches, otherwise I'm sure > I would have reported it to this bug. Yeah, I _assumed_ as such, but assumptions can be dangerous ;-) Normally I scroll (hard) by the hardware listings as that rarely says anything to me. And I did that before too, but just now I made an important discovery. I *assumed* it was running on arm64 (native) hardware and was about to ask specifics about it and then I noticed this: Host bridge [0600]: Red Hat, Inc. QEMU PCIe Host bridge [1b36:0008] Qemu. Quite likely unrelated, but a while back I had an issue with qemu in building arm64 images: https://bugs.debian.org/988174 I think it would be useful to know which qemu version(s) were used. (It's unlikely I'll be able to help find the cause/solution, mostly gathering hopefully useful bits of information for people who could) > > If it still occurs, then the likely only way to get a possible resolve is > > reporting it to upstream. > > 1.5 months is quite long for it to be gone, although, before that it was > 2.5 months. If the issue does occur again, I think it would be useful to bring 'upstream' into the conversation. They likely can bring much more useful input into this then (f.e.) I could. Also, if upstream is made aware there is an issue (even infrequent), then they can make the most informed choice what to do with it. Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#1001001: linux-image-5.10.0-9-arm64: kernel BUG at include/linux/swapops.h:204!
Hi Diederik, On 21-06-2022 22:07, Diederik de Haas wrote: Do these errors still occur? Still with 5.10.103-1 or a later one? The last occurrence of a machine hang I had is from 5 May 2022, but I'm not sure if I checked if it was this same issue. Normally our kernels are up-to-date, but I don't recall what we had at the time. We have recommissioned our arm64 hosts, so the install logs are lost by now. Is it only on arm64 machines? Or is this just an example which also occurs on other arches? I'm pretty sure I haven't seen this on other arches, otherwise I'm sure I would have reported it to this bug. If it still occurs, then the likely only way to get a possible resolve is reporting it to upstream. 1.5 months is quite long for it to be gone, although, before that it was 2.5 months. Paul OpenPGP_signature Description: OpenPGP digital signature
Processed: Re: Bug#1001001: linux-image-5.10.0-9-arm64: kernel BUG at include/linux/swapops.h:204!
Processing control commands: > found -1 linux/5.10.103-1 Bug #1001001 [src:linux] linux-image-5.10.0-9-arm64: kernel BUG at include/linux/swapops.h:204! Marked as found in versions linux/5.10.103-1. -- 1001001: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1001001 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1001001: linux-image-5.10.0-9-arm64: kernel BUG at include/linux/swapops.h:204!
Control: found -1 linux/5.10.103-1 Hi Paul, On Tuesday, 29 March 2022 20:58:59 CEST Paul Gevers wrote: > On 20-02-2022 13:44, Paul Gevers wrote: > > > Sad to say, but this week we had two hangs again. > > And this week another two. > > ci-worker-arm64-07 == > > Mar 26 10:15:55 ci-worker-arm64-07 kernel: kernel BUG at > include/linux/swapops.h:204! > Mar 26 10:15:55 ci-worker-arm64-07 kernel: Internal error: Oops - BUG: 0 > [#1] SMP > > Linux kernel from before the last point release: > Linux version 5.10.0-12-arm64 (debian-kernel@lists.debian.org) (gcc-10 > (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2> > > ci-worker-arm64-08 == > Mar 25 22:13:44 ci-worker-arm64-08 kernel: kernel BUG at > include/linux/swapops.h:204! > Mar 25 22:13:44 ci-worker-arm64-08 kernel: Internal error: Oops - BUG: 0 > [#1] SMP Do these errors still occur? Still with 5.10.103-1 or a later one? Is it only on arm64 machines? Or is this just an example which also occurs on other arches? Is it possible to try newer kernel versions from Stable-backports to see whether the issue occurs there too? If it still occurs, then the likely only way to get a possible resolve is reporting it to upstream. For 'swapops.h' that should be this: ~/dev/kernel.org/linux$ scripts/get_maintainer.pl include/linux/swapops.h Andrew Morton Peter Xu David Hildenbrand Alistair Popple Miaohe Lin Naoya Horiguchi linux-ker...@vger.kernel.org (open list) But I'm not sure that's the right list as it is from the include directory, so the actual problem may be somewhere else. But I guess it would be a good start? Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#1002553: firmware-amd-graphics: Memory clock always at 100% (thinkpads w/ryzen 3XXXu)
I tried on a separate partition to run Fedora with the firmware 20220310, and it works correctly there. You were right. Have a nice day. El 12/06/22 a las 8:38, Diederik de Haas escribió: Control: tag -1 - moreinfo Control: forwarded -1 https://gitlab.freedesktop.org/drm/amd/-/issues/1455 Control: tag -1 fixed-upstream On Sunday, 12 June 2022 06:59:12 CEST ng wrote: Hi, I went on and installed the respective image from https://snapshot.debian.org/package/linux/5.10.46-5/#linux-image-5.10.0-8-am d64-unsigned_5.10.46-5 and had no luck, no change whatsoever. Hi! Thanks for testing, then this is a separate issue. https://lore.kernel.org/linux-firmware/CADnq5_PYhDcR3tNYgzQ-uz80Nf++oMPsF3=huk+qcgntiy_...@mail.gmail.com/ is where a fix has been proposed+merged upstream (date: 2022-02-28). An update of the firmware to version >= 20220310 _should_ fix this issue. If such a version becomes available, could you test whether it indeed fixes your issue and report your findings back to this bug report? TIA, Diederik
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Diederik de Haas dixit: >I'm talking here about 4.9, not 4.19 ... Ah sorry, I can’t keep them distinguished in my head apparently, or it’s too hot… >> $ git tag --contains 92833e8b5db6 >> v4.19.221 >> […] > >Thanks for that command :-) I usually went through several manual steps to >figure out in which release(s) a certain commit was. This is much quicker! There is also git branch --contains. HTH ☺ >But as said before, I'm going to leave it up to the maintainers on the best >way to go about fixing this issue. Right. bye, //mirabilos -- Infrastrukturexperte • tarent solutions GmbH Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/ Telephon +49 228 54881-393 • Fax: +49 228 54881-235 HRB AG Bonn 5168 • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg
Re: virtio_balloon regression in 5.19-rc3
On Tue, 2022-06-21 at 17:34 +0800, Jason Wang wrote: > On Tue, Jun 21, 2022 at 5:24 PM David Hildenbrand wrote: > > > > On 20.06.22 20:49, Ben Hutchings wrote: > > > I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type > > > pc-q35-5.2. It has a virtio balloon device defined in libvirt as: > > > > > > > > >> > function="0x0"/> > > > > > > > > > but the virtio_balloon driver fails to bind to it: > > > > > > virtio_balloon virtio4: init_vqs: add stat_vq failed > > > virtio_balloon: probe of virtio4 failed with error -5 > > > > > > > Hmm, I don't see any recent changes to drivers/virtio/virtio_balloon.c > > > > virtqueue_add_outbuf() fails with -EIO if I'm not wrong. That's the > > first call of virtqueue_add_outbuf() when virtio_balloon initializes. > > > > > > Maybe something in generic virtio code changed? > > Yes, we introduced the IRQ hardening. That could be the root cause and > we've received lots of reports so we decide to disable it by default. > > Ben, could you please try this patch: (and make sure > CONFIG_VIRTIO_HARDEN_NOTIFICATION is not set) > > https://lore.kernel.org/lkml/20220620024158.2505-1-jasow...@redhat.com/T/ Yes, that patch fixes the regression for me. Ben. -- Ben Hutchings Any smoothly functioning technology is indistinguishable from a rigged demo. signature.asc Description: This is a digitally signed message part
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
On Tuesday, 21 June 2022 15:34:12 CEST Thorsten Glaser wrote: > >In branch 'linux-4.9.y' there is no qdisc_put function, so the above check > >seems rightly in qdisc_destroy there. I'm talking here about 4.9, not 4.19 ... > Not any more. Since… > > $ git tag --contains 92833e8b5db6 > v4.19.221 > […] Thanks for that command :-) I usually went through several manual steps to figure out in which release(s) a certain commit was. This is much quicker! > … qdisc_destroy was renamed to qdisc_put in 4.19, breaking modules (grr). And yes, it is broken in the 4.19 series since 4.19.221 (And not in 5.10 or upstream 'master') > So yes, this needs to also be fixed upstream (hence me including that tag > when reporbugging), but perhaps Debian can quickfix. What I have observed so far is that a commit needs to be accepted upstream (but doesn't have to have gone through the whole 'chain of command') before a temporary patch is accepted to quickly fix it in Debian. But as said before, I'm going to leave it up to the maintainers on the best way to go about fixing this issue. Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Diederik de Haas dixit: >In branch 'linux-4.9.y' there is no qdisc_put function, so the above check >seems rightly in qdisc_destroy there. Not any more. Since… $ git tag --contains 92833e8b5db6 v4.19.221 […] … qdisc_destroy was renamed to qdisc_put in 4.19, breaking modules (grr). So yes, this needs to also be fixed upstream (hence me including that tag when reporbugging), but perhaps Debian can quickfix. bye, //mirabilos -- 16:47⎜«mika:#grml» .oO(mira ist einfach gut) 23:22⎜«mikap:#grml» mirabilos: und dein bootloader ist geil :)23:29⎜«mikap:#grml» und ich finds saugeil dass ich ein bsd zum booten mit grml hab, das muss ich dann gleich mal auf usb-stick installieren -- Michael Prokop über MirOS bsd4grml
Processed: Re: Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Processing control commands: > tag -1 help Bug #1013299 [src:linux] linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport Added tag(s) help. -- 1013299: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013299 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Control: tag -1 help On Tuesday, 21 June 2022 13:35:09 CEST Thorsten Glaser wrote: > >So I'm inclined to think that 92833e8b5db6c209e9311ac8c6a44d3bf1856659 is > >the commit which brought the bug back. > > Yes, definitely. The lines… > > if (!qdisc) > return; > > … from near the beginning of the now-static qdisc_destroy must > be moved to the beginning of the new qdisc_put function. Agreed. That would make it in line with 'master' and 'linux-5.10.y'. In branch 'linux-4.9.y' there is no qdisc_put function, so the above check seems rightly in qdisc_destroy there. This should be reported upstream, but I don't know what the best cq appropriate way to do that, so I'm referring that to the actual Debian maintainers, hence the 'help' tag. signature.asc Description: This is a digitally signed message part.
Processed: Re: Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Processing control commands: > tag -1 - moreinfo Bug #1013299 [src:linux] linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport Removed tag(s) moreinfo. -- 1013299: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013299 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Control: tag -1 - moreinfo Diederik de Haas dixit: >In Debian, the release before 4.19.235-1 was 4.19.232-1 which should also have >this bug. The release before that was 4.19.208-1, which shouldn't. >Can you verify that? Not easily any more, but I know it worked some weeks ago, and I *think* I particularily remember 208 as working. But I do have a clone of linux on another box and so I can look at ↓ >So I'm inclined to think that 92833e8b5db6c209e9311ac8c6a44d3bf1856659 is >the commit which brought the bug back. Yes, definitely. The lines… if (!qdisc) return; … from near the beginning of the now-static qdisc_destroy must be moved to the beginning of the new qdisc_put function. bye, //mirabilos -- Infrastrukturexperte • tarent solutions GmbH Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/ Telephon +49 228 54881-393 • Fax: +49 228 54881-235 HRB AG Bonn 5168 • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Diederik de Haas dixit: >It's a bit 'above my paygrade', but if qdisk_put() can accept a NULL pointer >then I'm curious whether that would be allowed for other functions in that file >as well ... there are several having "struct Qdisc *qdisc" as (only) >parameter, but only qdisk_put() checks for NULL. >That is also true for the current 'master' branch ... AIUI the check was added because qdisc_destroy() could accept one, and several qdiscs are using that, it’s like free(3) in that regard, the other functions aren’t. bye, //mirabilos -- Infrastrukturexperte • tarent solutions GmbH Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/ Telephon +49 228 54881-393 • Fax: +49 228 54881-235 HRB AG Bonn 5168 • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg
Re: virtio_balloon regression in 5.19-rc3
On Tue, Jun 21, 2022 at 5:24 PM David Hildenbrand wrote: > > On 20.06.22 20:49, Ben Hutchings wrote: > > I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type > > pc-q35-5.2. It has a virtio balloon device defined in libvirt as: > > > > > >> function="0x0"/> > > > > > > but the virtio_balloon driver fails to bind to it: > > > > virtio_balloon virtio4: init_vqs: add stat_vq failed > > virtio_balloon: probe of virtio4 failed with error -5 > > > > Hmm, I don't see any recent changes to drivers/virtio/virtio_balloon.c > > virtqueue_add_outbuf() fails with -EIO if I'm not wrong. That's the > first call of virtqueue_add_outbuf() when virtio_balloon initializes. > > > Maybe something in generic virtio code changed? Yes, we introduced the IRQ hardening. That could be the root cause and we've received lots of reports so we decide to disable it by default. Ben, could you please try this patch: (and make sure CONFIG_VIRTIO_HARDEN_NOTIFICATION is not set) https://lore.kernel.org/lkml/20220620024158.2505-1-jasow...@redhat.com/T/ Thanks > > -- > Thanks, > > David / dhildenb >
Re: virtio_balloon regression in 5.19-rc3
On 20.06.22 20:49, Ben Hutchings wrote: > I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type > pc-q35-5.2. It has a virtio balloon device defined in libvirt as: > > >function="0x0"/> > > > but the virtio_balloon driver fails to bind to it: > > virtio_balloon virtio4: init_vqs: add stat_vq failed > virtio_balloon: probe of virtio4 failed with error -5 > Hmm, I don't see any recent changes to drivers/virtio/virtio_balloon.c virtqueue_add_outbuf() fails with -EIO if I'm not wrong. That's the first call of virtqueue_add_outbuf() when virtio_balloon initializes. Maybe something in generic virtio code changed? -- Thanks, David / dhildenb
Re: virtio_balloon regression in 5.19-rc3
[TLDR: I'm adding this regression report to the list of tracked regressions; all text from me you find below is based on a few templates paragraphs you might have encountered already already in similar form.] On 20.06.22 20:49, Ben Hutchings wrote: > I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type > pc-q35-5.2. It has a virtio balloon device defined in libvirt as: > > >function="0x0"/> > > > but the virtio_balloon driver fails to bind to it: > > virtio_balloon virtio4: init_vqs: add stat_vq failed > virtio_balloon: probe of virtio4 failed with error -5 > > On a 5.18 kernel with similar configuration, it binds successfully. > > I've attached the kernel config for 5.19-rc3. CCing the regression mailing list, as it should be in the loop for all regressions, as explained here: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced v5.18..v5.19-rc3 #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply -- ideally with also telling regzbot about it, as explained here: https://linux-regtracking.leemhuis.info/tracked-regression/ Reminder for developers: When fixing the issue, add 'Link:' tags pointing to the report (the mail this one replies to), as explained for in the Linux kernel's documentation; above webpage explains why this is important for tracked regressions. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
On dinsdag 21 juni 2022 11:49:26 CEST you wrote: > https://lore.kernel.org/all/20190921063607.ga1083...@kroah.com/ is about the > 4.19.75 release and that contains that change in commit > 7a1bad565cebfbf6956f9bb36dba734a48fa31d4 titled "net_sched: let qdisc_put() > accept NULL pointer" which actually modifies the qdisc_destroy function > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/sc > hed/sch_generic.c?h=linux-4.19.y#n1004 OTOH, does indeed not have that NULL > check. It's a bit 'above my paygrade', but if qdisk_put() can accept a NULL pointer then I'm curious whether that would be allowed for other functions in that file as well ... there are several having "struct Qdisc *qdisc" as (only) parameter, but only qdisk_put() checks for NULL. That is also true for the current 'master' branch ... signature.asc Description: This is a digitally signed message part.
Processed: Re: Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Processing control commands: > tag -1 moreinfo Bug #1013299 [src:linux] linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport Added tag(s) moreinfo. -- 1013299: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013299 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Control: tag -1 moreinfo On Tuesday, 21 June 2022 08:10:54 CEST Thorsten Glaser wrote: > Package: src:linux > Version: 4.19.235-1 > Severity: critical > Tags: upstream > Justification: breaks the whole system > > A recent upstream “stable” upgrade backported the removal of the > qdisc_destroy() function (which, in itself, is questionable enough > already and caused no small amount of fun) using qdisc_put() instead. > > However, qdisc_put() does not accept NULL pointers, causing oopses > in several qdiscs that can be configured on a system. This breaks > sudo (su works), networking and even deconfiguration is not possible, > only /proc/sysrq-trigger makes it possible to recover. > > https://www.mail-archive.com/netdev@vger.kernel.org/msg314288.html > fixes this but was not backported along. https://lore.kernel.org/all/20190921063607.ga1083...@kroah.com/ is about the 4.19.75 release and that contains that change in commit 7a1bad565cebfbf6956f9bb36dba734a48fa31d4 titled "net_sched: let qdisc_put() accept NULL pointer" which actually modifies the qdisc_destroy function https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/sched/sch_generic.c?h=linux-4.19.y#n1004 OTOH, does indeed not have that NULL check. In commit 92833e8b5db6c209e9311ac8c6a44d3bf1856659, part of v4.19.221, titled "net: sched: rename qdisc_destroy() to qdisc_put()" wasn't actually a straight rename, but some code moved around into a new qdisc_put function, which doesn't have the NULL check. In Debian, the release before 4.19.235-1 was 4.19.232-1 which should also have this bug. The release before that was 4.19.208-1, which shouldn't. Can you verify that? So I'm inclined to think that 92833e8b5db6c209e9311ac8c6a44d3bf1856659 is the commit which brought the bug back. signature.asc Description: This is a digitally signed message part.
Processed: Fixing fixed version
Processing commands for cont...@bugs.debian.org: > notfixed 918097 0.141 Bug #918097 {Done: Diederik de Haas } [initramfs-tools-core] mkinitramfs prints error messages when no modules are needed in the initramfs No longer marked as fixed in versions 0.141. > fixed 918097 initramfs-tools/0.141 Bug #918097 {Done: Diederik de Haas } [initramfs-tools-core] mkinitramfs prints error messages when no modules are needed in the initramfs Marked as fixed in versions initramfs-tools/0.141. > End of message, stopping processing here. Please contact me if you need assistance. -- 918097: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918097 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#918097: marked as done (mkinitramfs prints error messages when no modules are needed in the initramfs)
Your message dated Tue, 21 Jun 2022 10:54:26 +0200 with message-id <12000826.O9o76ZdvQC@bagend> and subject line Re: Bug#918097: initramfs-tools-core: Error while building DKMS modules when kernel has all its modules built in has caused the Debian Bug report #918097, regarding mkinitramfs prints error messages when no modules are needed in the initramfs to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact ow...@bugs.debian.org immediately.) -- 918097: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918097 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems --- Begin Message --- Package: initramfs-tools-core Version: 0.132 Severity: normal -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Dear Maintainer, I wanted to build a custom kernel using the linux-source package from the Debian repository. I haven't changed much, the only thing I wanted to do is to build in all the modules my machine needs and remove all the others. Such kernels don't have modules in *.ko files and it looks like mkinitramfs has some issues with that. Basically, the problem is in the /usr/share/initramfs-tools/hook-functions script file with the following line: find "${DESTDIR}/lib/modules/${version}/kernel" -name '*.ko*' Since none of the modules listed above in the file were copied to the temp destination folder, the kernel/ subdir doesn't exist and hence the above command returns error. The initramfs image is building fine, but DKMS packages return something similar to the following: # dpkg --configure -a Setting up sysdig-dkms (0.24.1-1) ... Removing old sysdig-0.24.1 DKMS files... - Uninstall Beginning Module: sysdig Version: 0.24.1 Kernel: 4.19.13-amd64-morficzny (x86_64) - - Status: Before uninstall, this module version was ACTIVE on this kernel. sysdig-probe.ko: - Uninstallation - Deleting from: /lib/modules/4.19.13-amd64-morficzny/updates/dkms/ - Original module - No original module was found for this module on this kernel. - Use the dkms install command to reinstall any previous module version. depmod... DKMS: uninstall completed. - -- Deleting module version: 0.24.1 completely from the DKMS tree. - -- Done. Loading new sysdig-0.24.1 DKMS files... Building for 4.19.13-amd64-morficzny 4.20.0-amd64-morficzny Building initial module for 4.19.13-amd64-morficzny Error! Bad return status for module build on kernel: 4.19.13-amd64-morficzny (x86_64) Consult /var/lib/dkms/sysdig/0.24.1/build/make.log for more information. dpkg: error processing package sysdig-dkms (--configure): installed sysdig-dkms package post-installation script subprocess returned error exit status 10 Errors were encountered while processing: sysdig-dkms I changed my kernel config a little bit: # egrep \=m /boot/config-4.19.13-amd64-morficzny CONFIG_BTRFS_FS=m CONFIG_XOR_BLOCKS=m CONFIG_RAID6_PQ=m And after rebuilding the kernel when I want to created the initramfs image I can see the the kernel/ subdir: # tree /var/tmp/mkinitramfs_p7rDVj/usr/lib/modules/4.19.13-amd64-morficzny/kernel/ /var/tmp/mkinitramfs_p7rDVj/usr/lib/modules/4.19.13-amd64-morficzny/kernel/ ├── crypto │ └── xor.ko ├── fs │ └── btrfs │ └── btrfs.ko └── lib └── raid6 └── raid6_pq.ko 5 directories, 3 files Will this qualify as bug or should I fix this in some way myself since it's not the Debian distribution's kernel? - -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (990, 'unstable'), (130, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 4.19.13-amd64-morficzny (SMP w/2 CPU cores; PREEMPT) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages initramfs-tools-core depends on: ii coreutils8.30-1 ii cpio 2.12+dfsg-6 ii e2fsprogs1.44.5-1 ii klibc-utils 2.0.4-14 ii kmod 25-2 ii udev 240-2 Versions of packages initramfs-tools-core recommends: ii busybox 1:1.27.2-3 Versions of packages initramfs-tools-core suggests: ii bash-completion 1:2.8-5 -BEGIN PGP SIGNATURE- iQIzBAEBCgAdFiEE5JPPWm5C7TFDUMqpzQRoEHcbZSAFAlwt2AAACgkQzQRoEHcb ZSDrjg//X6cMtoFjEw3BOAg+1EpKTQ1/yfJSAW611NgsgfvvPurncPx5X3XtQ0sH /I47mXSB4ds3/KuGIJb96SXgOChBqvfdS/7ajirGL11Ou4Ujfmb9CC0sOpUe3uba OKNdB+QiwHMDlfKdvDMk0LXN4i7fx9hCUukAkhE1s9xbqCk+veTmHpnv5BomIoX/ AqfdmeN2Y08MDdd5K8S/g1/4fVg9QUSE8+9dP9r1Bw7nqdp4EpSj7ZSKtifSayRb
Bug#1013299: linux-image-4.19.0-20-amd64: NULL pointer deref in qdisc_put() due to missing backport
Package: src:linux Version: 4.19.235-1 Severity: critical Tags: upstream Justification: breaks the whole system A recent upstream “stable” upgrade backported the removal of the qdisc_destroy() function (which, in itself, is questionable enough already and caused no small amount of fun) using qdisc_put() instead. However, qdisc_put() does not accept NULL pointers, causing oopses in several qdiscs that can be configured on a system. This breaks sudo (su works), networking and even deconfiguration is not possible, only /proc/sysrq-trigger makes it possible to recover. https://www.mail-archive.com/netdev@vger.kernel.org/msg314288.html fixes this but was not backported along. -- Package-specific info: ** Version: Linux version 4.19.0-20-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.235-1 (2022-03-17) ** Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-20-amd64 root=/dev/vda2 ro net.ifnames=0 nomodeset ** Not tainted ** Kernel log: Unable to read kernel log; any relevant messages should be attached ** Model information sys_vendor: QEMU product_name: Standard PC (i440FX + PIIX, 1996) product_version: pc-i440fx-2.8 chassis_vendor: QEMU chassis_version: pc-i440fx-2.8 bios_vendor: SeaBIOS bios_version: 1.14.0-2 ** Loaded modules: ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter nft_chain_nat_ipv4 nf_nat_ipv4 xt_addrtype nft_compat xt_conntrack x_tables nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc nf_tables devlink nfnetlink overlay nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc loop kvm_intel ttm kvm drm_kms_helper irqbypass virtio_rng joydev drm evdev rng_core virtio_balloon serio_raw pcspkr button qemu_fw_cfg ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 hid_generic usbhid hid virtio_net net_failover failover virtio_blk ata_generic ata_piix uhci_hcd libata ehci_hcd usbcore psmouse usb_common virtio_pci virtio_ring i2c_piix4 crc32c_intel scsi_mod virtio floppy ** PCI devices: 00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:04.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001] Subsystem: Red Hat, Inc Virtio block device [1af4:0002] Physical Slot: 4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:05.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon [1af4:1002] Subsystem: Red Hat, Inc Virtio memory balloon [1af4:0005] Physical Slot: 5 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:06.0 Unclassified device [00ff]: Red Hat, Inc Virtio RNG [1af4:1005] Subsystem: Red Hat, Inc Virtio RNG [1af4:0004] Physical Slot: 6 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:07.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] Subsystem: Red Hat, Inc Virtio network device [1af4:0001] Physical Slot: 7 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: virtio-pci Kernel modules: virtio_pci ** USB devices: not available -- System Information: Debian Release: 10.12 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-20-amd64 (SMP w/3 CPU cores) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/lksh Init: sysvinit (via /sbin/init) Versions of packages linux-image-4.19.0-20-amd64 depends on: ii initramfs-tools [linux-initramfs-tool] 0.133+deb10u1 ii kmod26-1 ii linux-base 4.6 Versions of packages