Bug#1072004: linux: regression in the 9p protocol in 6.8 breaks autopkgtest qemu jobs (affecting debci)
On 27.05.24 14:22, Luca Boccassi wrote: >> https://bugs.launchpad.net/ubuntu/+source/autopkgtest/+bug/2056461 > > This has been reported upstream 3 weeks ago, but so far it seems no > action has been taken: > > https://lore.kernel.org/all/Zj0ErxVBE3DYT2Ea@gpd/ Hmmm, that thread is strange, why are David's replies not where they are supposed to be? Whatever. The last thing from just a few days ago seems to be a inquiry from David to Andrea that was not yet answered afaics: https://lore.kernel.org/all/531994.1716450...@warthog.procyon.org.uk/ Would also help a lot to know if this is a 6.8.y only thing, or happens with 6.9 and mainline as well, as 6.8.y will likely be EOLed soon. Ciao, Thorsten
Bug#1071420: linux-image-6.8.9-1-amd64: cannot mount btrfs root partition
TWIMC, the problem systemd is facing due to the removal of a obsolete option (that might or might not lead to the problem this bug is about) was finally properly reported upstream now – and from the first reply is sounds like a workaround is likely to be expected: https://lore.kernel.org/all/ZkxZT0J-z0GYvfy8@gardel-login/
Bug#1071420: linux-image-6.8.9-1-amd64: cannot mount btrfs root partition
On Sat, 18 May 2024 22:25:14 +0200 Matteo Settenvini wrote: > > booting kernel 6.8.9-1 with dracut, systemd, and btrfs as the root device > fails > to mount the root partition. I just tried the kernel from sid and it seems > indeed \ > affected. The 6.7 kernel from trixie is instead booting fine even after > regenerating all initrds. > > According to bl...@debian.org, this is likely due to > https://github.com/torvalds/linux/commit/a1912f712188291f9d7d434fba155461f1ebef66 Would be great to know what the actual problem is. Are there any error messages from systemd or the kernel? The upstream bug (https://github.com/systemd/systemd/pull/32892 ) about this also does not state what goes wrong (either in general or certain situations). Such details would likely be needed to convince the btrfs upstream devs to revert the change or apply a workaround -- especially as I'm pretty sure there are already a lot of btrfs systems with systemd and 6.8 (release upstream 2+ month ago and regularly used in Arch, Fedora and Tumbleweed for weeks now) out there and working just fine (including the Fedora machine one I write from). Thorsten
Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait
On 08.03.24 02:08, Alex Constantino wrote: > Fix OOM scenario by doing multiple notifications to the OOM handler through > a busy wait logic. > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > timeout. Thx for working on this. > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > Link: > https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info Nitpicking: that ideally should be pointing to https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/ , as that the report and not just a reply to prod things. Ciao, Thorsten
Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu
On 27.01.24 14:14, Salvatore Bonaccorso wrote: > > In Debian (https://bugs.debian.org/1061449) we got the following > quotred report: > > On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote: >> >> Giving a try to 6.7, here is a message extracted from dmesg: >> [4.177226] [ cut here ] >> [4.177227] WARNING: CPU: 6 PID: 248 at >> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387 >> construct_phy+0xb26/0xd60 [amdgpu] > [...] Not my area of expertise, but looks a lot like a duplicate of https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835 Mario (now CCed) already prepared a patch for that issue that seems to work. HTH, Ciao, Thorsten
Bug#1054514: linux-image-6.1.0-13-amd64: Debian VM with qxl graphics freezes frequently
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Gerd, it seems this regression[1] fell through the cracks. Could you please take a look? Or is there a good reason why this can't be addressed? Or was it dealt with and I just missed it? [1] apparently caused by 5a838e5d5825c8 ("drm/qxl: simplify qxl_fence_wait") [v5.13-rc1] from Gerd; for details see https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke On 24.10.23 23:39, Timo Lindfors wrote: > Hi, > > On Tue, 24 Oct 2023, Salvatore Bonaccorso wrote: >> Thanks for the excelent constructed report! I think it's best to >> forward this directly to upstream including the people for the >> bisected commit to get some idea. > > Thanks for the quick reply! > >> Can you reproduce the issue with 6.5.8-1 in unstable as well? > > Unfortunately yes: > > ansible@target:~$ uname -r > 6.5.0-3-amd64 > ansible@target:~$ time sudo ./reproduce.bash > Wed 25 Oct 2023 12:27:00 AM EEST starting round 1 > Wed 25 Oct 2023 12:27:24 AM EEST starting round 2 > Wed 25 Oct 2023 12:27:48 AM EEST starting round 3 > bug was reproduced after 3 tries > > real 0m48.838s > user 0m1.115s > sys 0m45.530s > > I also tested upstream tag v6.6-rc6: > > ... > + detected_version=6.6.0-rc6 > + '[' 6.6.0-rc6 '!=' 6.6.0-rc6 ']' > + exec ssh target sudo ./reproduce.bash > Wed 25 Oct 2023 12:37:16 AM EEST starting round 1 > Wed 25 Oct 2023 12:37:42 AM EEST starting round 2 > Wed 25 Oct 2023 12:38:10 AM EEST starting round 3 > Wed 25 Oct 2023 12:38:36 AM EEST starting round 4 > Wed 25 Oct 2023 12:39:01 AM EEST starting round 5 > Wed 25 Oct 2023 12:39:27 AM EEST starting round 6 > bug was reproduced after 6 tries > > > For completeness, here is also the grub_set_default_version.bash script > that I had to write to automate this (maybe these could be in debian > wiki?): > > #!/bin/bash > set -x > > version="$1" > > idx=$(expr $(grep "menuentry " /boot/grub/grub.cfg | sed 1d |grep -n > "'Debian GNU/Linux, with Linux $version'"|cut -d: -f1) - 1) > exec sudo grub-set-default "1>$idx" > > > > -Timo > > >
Bug#1051592: Regression: Commit "netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID" breaks ruleset loading in linux-stable
On 12.09.23 00:57, Pablo Neira Ayuso wrote: > On Mon, Sep 11, 2023 at 11:37:50PM +0200, Timo Sigurdsson wrote: >> >> recently, Debian updated their stable kernel from 6.1.38 to 6.1.52 >> which broke nftables ruleset loading on one of my machines with lots >> of "Operation not supported" errors. I've reported this to the >> Debian project (see link below) and Salvatore Bonaccorso and I >> identified "netfilter: nf_tables: disallow rule addition to bound >> chain via NFTA_RULE_CHAIN_ID" (0ebc1064e487) as the offending commit >> that introduced the regression. Salvatore also found that this issue >> affects the 5.10 stable tree as well (observed in 5.10.191), but he >> cannot reproduce it on 6.4.13 and 6.5.2. >> >> The issue only occurs with some rulesets. While I can't trigger it >> with simple/minimal rulesets that I use on some machines, it does >> occur with a more complex ruleset that has been in use for months >> (if not years, for large parts of it). I'm attaching a somewhat >> stripped down version of the ruleset from the machine I originally >> observed this issue on. It's still not a small or simple ruleset, >> but I'll try to reduce it further when I have more time. >> >> The error messages shown when trying to load the ruleset don't seem >> to be helpful. Just two simple examples: Just to give two simple >> examples from the log when nftables fails to start: >> /etc/nftables.conf:99:4-44: Error: Could not process rule: Operation not >> supported >> tcp option maxseg size 1-500 counter drop >> ^ >> /etc/nftables.conf:308:4-27: Error: Could not process rule: Operation not >> supported >> tcp dport sip-tls accept >> > > I can reproduce this issue with 5.10.191 and 6.1.52 and nftables v1.0.6, > this is not reproducible with v1.0.7 and v1.0.8. > >> Since the issue only affects some stable trees, Salvatore thought it >> might be an incomplete backport that causes this. >> >> If you need further information, please let me know. > > Userspace nftables v1.0.6 generates incorrect bytecode that hits a new > kernel check that rejects adding rules to bound chains. The incorrect > bytecode adds the chain binding, attach it to the rule and it adds the > rules to the chain binding. I have cherry-picked these three patches > for nftables v1.0.6 userspace and your ruleset restores fine. > [...] H. Well, this sounds like a kernel regression to me that normally should be dealt with on the kernel level, as users after updating the kernel should never have to update any userspace stuff to continue what they have been doing before the kernel update. Can't the kernel somehow detect the incorrect bytecode and do the right thing(tm) somehow? But yes, don't worry, I know that reality is not black and white and that it's crucial that things like package filtering do exactly what the user expect it to do; that's why this might be one of those rare situations where "user has to update userspace components to support newer kernels" might be the better of two bad choices. But I had to ask to ensure it's something like that. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
Bug#1042753: nouveau bug in linux/6.1.38-2
Hi! On 02.08.23 23:28, Olaf Skibbe wrote: > Dear Maintainers, > > Hereby I would like to report an apparent bug in the nouveau driver in > linux/6.1.38-2. Thx for your report. Maybe your problem is caused by a incomplete backport. I Cced the maintainers for the drivers (and the regressions and the stable list), maybe one of them has an idea, as they know the driver. If they don't reply in the next few days, please check if the problem is also present in mainline. If not, check if the latest 6.1.y. release already fixes this. If not, try to check which of the four patches you reverted to make things going is actually causing this (e.g. first only revert the one that was applied last; then the two last ones; ...). > Running a current debian stable on a Dell Latitude E6510 with a > "NVIDIA Corporation GT218M" graphic card, the monitor turns black > after the grub screen. Also switching to a console (Strg-Alt-F2) shows > just a black screen. Access via ssh is possible. > > ~# uname -r > 6.1.0-10-amd64 > > demesg shows the following error message: > > [ 3.560153] WARNING: CPU: 0 PID: 176 at > drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 > nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft > cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi > i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm > scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci > ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul > crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core > crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery > video wmi button > [ 3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted > 6.1.0-10-amd64 #1 Debian 6.1.38-2 > [ 3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 > 05/12/2017 > [ 3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau] > [ 3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 > 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc > cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26 > [ 3.560541] RSP: 0018:9899c048bd60 EFLAGS: 00010246 > [ 3.560542] RAX: 00041eb0 RBX: 88e0209d2600 RCX: > 00041eb0 > [ 3.560544] RDX: c079f760 RSI: RDI: > 9899c048bcf0 > [ 3.560545] RBP: 0001 R08: 9899c048bc64 R09: > 5b76 > [ 3.560546] R10: 000d R11: 9899c048bde0 R12: > ffea > [ 3.560548] R13: 88e00b39e480 R14: 00044d45 R15: > > [ 3.560549] FS: () GS:88e123c0() > knlGS: > [ 3.560551] CS: 0010 DS: ES: CR0: 80050033 > [ 3.560552] CR2: 7f57f4e90451 CR3: 00018141 CR4: > 06f0 > [ 3.560554] Call Trace: > [ 3.560558] > [ 3.560560] ? __warn+0x7d/0xc0 > [ 3.560566] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 3.560671] ? report_bug+0xe6/0x170 > [ 3.560675] ? handle_bug+0x41/0x70 > [ 3.560679] ? exc_invalid_op+0x13/0x60 > [ 3.560681] ? asm_exc_invalid_op+0x16/0x20 > [ 3.560685] ? init_reset_begun+0x20/0x20 [nouveau] > [ 3.560769] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] > [ 3.560888] nv50_disp_super_2_2+0x70/0x430 [nouveau] > [ 3.560997] nv50_disp_super+0x113/0x210 [nouveau] > [ 3.561103] process_one_work+0x1c7/0x380 > [ 3.561109] worker_thread+0x4d/0x380 > [ 3.561113] ? rescuer_thread+0x3a0/0x3a0 > [ 3.561116] kthread+0xe9/0x110 > [ 3.561120] ? kthread_complete_and_exit+0x20/0x20 > [ 3.561122] ret_from_fork+0x22/0x30 > [ 3.561130] > > Further information: > > $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }') > 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] > (rev a2) (prog-if 00 [VGA controller]) > Subsystem: Dell Latitude E6510 > Flags: bus master, fast devsel, latency 0, IRQ 27 > Memory at e200 (32-bit, non-prefetchable) [size=16M] > Memory at d000 (64-bit, prefetchable) [size=256M] > Memory at e000 (64-bit, prefetchable) [size=32M] > I/O ports at 7000 [size=128] > Expansion ROM at 000c [disabled] [size=128K] > Capabilities: > Kernel driver in use: nouveau > Kernel modules: nouveau > > I reported this bug to debian already, see > https://bugs.debian.org/1042753 for context. > > With support (thanks Diederik!) I managed to figure out that the cause > was a regression between upstream kernel version 6.1.27 and 6.1.38. > > I build a new 6.1.38 kernel with these commits reverted: > > 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL > fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode > 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA
Bug#1036530: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)
On 27.06.23 00:34, Nick Hastings wrote: > * Linux regression tracking (Thorsten Leemhuis) > [230626 21:09]: >> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting >> for once, to make this easily accessible to everyone. >> >> Nick, what's the status/was there any progress? Did you do what Mario >> suggested and file a nouveau bug? > > It was not apparent that the suggestion to open "a Nouveau drm bug" was > addressed to me. I wish things were earlier for reporters, but from what I can see this is the only way forward if you or some silent bystander cares. >> I ask, as I still have this on my list of regressions and it seems there >> was no progress in three+ weeks now. > > I have not pursued this further since as far as I could tell I already > provided all requested information and I don't actually use nouveau, so > I blacklisted it. I doubt any developer cares enough to take a closer look[1] without a proper nouveau bug and some help & prodding from someone affected. And looks to me like reverting the culprit now might create even bigger problems for users. Hence I guess then this won't be fixed in the end. In a ideal world this would not happen, but we don't live in one and all have just 24 hours in a day. :-/ Nevertheless: thx for your report your help through this thread. [1] some points on the following page kinda explain this https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot inconclusive: reporting deadlock (see thread for details) >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> -- >> Everything you wanna know about Linux kernel regression tracking: >> https://linux-regtracking.leemhuis.info/about/#tldr >> If I did something stupid, please tell me, as explained on that page. >> >> #regzbot backburner: slow progress, likely just affects one machine >> #regzbot poke >> >> >> On 02.06.23 02:57, Limonciello, Mario wrote: >>> [AMD Official Use Only - General] >>> >>>> -Original Message- >>>> From: Nick Hastings >>>> Sent: Thursday, June 1, 2023 7:02 PM >>>> To: Karol Herbst >>>> Cc: Limonciello, Mario ; Lyude Paul >>>> ; Lukas Wunner ; Salvatore >>>> Bonaccorso ; 1036...@bugs.debian.org; Rafael J. >>>> Wysocki ; Len Brown ; linux- >>>> a...@vger.kernel.org; linux-ker...@vger.kernel.org; >>>> regressi...@lists.linux.dev >>>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI >>>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system) >>>> >>>> Hi, >>>> >>>> * Karol Herbst [230602 03:10]: >>>>> On Thu, Jun 1, 2023 at 7:21 PM Limonciello, Mario >>>>> wrote: >>>>>>> -Original Message- >>>>>>> From: Karol Herbst >>>>>>> Sent: Thursday, June 1, 2023 12:19 PM >>>>>>> To: Limonciello, Mario >>>>>>> Cc: Nick Hastings ; Lyude Paul >>>>>>> ; Lukas Wunner ; Salvatore >>>>>>> Bonaccorso ; 1036...@bugs.debian.org; Rafael J. >>>>>>> Wysocki ; Len Brown ; linux- >>>>>>> a...@vger.kernel.org; linux-ker...@vger.kernel.org; >>>>>>> regressi...@lists.linux.dev >>>>>>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI >>>>>>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of >>>> system) >>>>>>> >>>>>>> On Thu, Jun 1, 2023 at 6:54 PM Limonciello, Mario >>>>>>> wrote: >>>>>>>> >>>>>>>> [AMD Official Use Only - General] >>>>>>>> >>>>>>>>> -Original Message- >>>>>>>>> From: Karol Herbst >>>>>>>>> Sent: Thursday, June 1, 2023 11:33 AM >>>>>>>>> To: Limonciello, Mario >>>>>>>>> Cc: Nick Hastings ; Lyude Paul >>>>>>>>> ; Lukas Wunner ; Salvatore >>>>>>>>> Bonaccorso ; 1036...@bugs.debian.org; Rafael >>>> J. >>>>>>>>> Wy
Bug#1036530: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Nick, what's the status/was there any progress? Did you do what Mario suggested and file a nouveau bug? I ask, as I still have this on my list of regressions and it seems there was no progress in three+ weeks now. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot backburner: slow progress, likely just affects one machine #regzbot poke On 02.06.23 02:57, Limonciello, Mario wrote: > [AMD Official Use Only - General] > >> -Original Message- >> From: Nick Hastings >> Sent: Thursday, June 1, 2023 7:02 PM >> To: Karol Herbst >> Cc: Limonciello, Mario ; Lyude Paul >> ; Lukas Wunner ; Salvatore >> Bonaccorso ; 1036...@bugs.debian.org; Rafael J. >> Wysocki ; Len Brown ; linux- >> a...@vger.kernel.org; linux-ker...@vger.kernel.org; >> regressi...@lists.linux.dev >> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI >> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system) >> >> Hi, >> >> * Karol Herbst [230602 03:10]: >>> On Thu, Jun 1, 2023 at 7:21 PM Limonciello, Mario >>> wrote: > -Original Message- > From: Karol Herbst > Sent: Thursday, June 1, 2023 12:19 PM > To: Limonciello, Mario > Cc: Nick Hastings ; Lyude Paul > ; Lukas Wunner ; Salvatore > Bonaccorso ; 1036...@bugs.debian.org; Rafael J. > Wysocki ; Len Brown ; linux- > a...@vger.kernel.org; linux-ker...@vger.kernel.org; > regressi...@lists.linux.dev > Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI > string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of >> system) > > On Thu, Jun 1, 2023 at 6:54 PM Limonciello, Mario > wrote: >> >> [AMD Official Use Only - General] >> >>> -Original Message- >>> From: Karol Herbst >>> Sent: Thursday, June 1, 2023 11:33 AM >>> To: Limonciello, Mario >>> Cc: Nick Hastings ; Lyude Paul >>> ; Lukas Wunner ; Salvatore >>> Bonaccorso ; 1036...@bugs.debian.org; Rafael >> J. >>> Wysocki ; Len Brown ; linux- >>> a...@vger.kernel.org; linux-ker...@vger.kernel.org; >>> regressi...@lists.linux.dev >>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video >> _OSI >>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of > system) >>> >>> On Thu, Jun 1, 2023 at 6:18 PM Limonciello, Mario Lyude, Lukas, Karol This thread is in relation to this commit: 24867516f06d ("ACPI: OSI: Remove Linux-Dell-Video _OSI string") Nick has found that runtime PM is *not* working for nouveau. >>> >>> keep in mind we have a list of PCIe controllers where we apply a >>> workaround: >>> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers >>> /gpu/drm/nouveau/nouveau_drm.c?h=v6.4-rc4#n682 >>> >>> And I suspect there might be one or two more IDs we'll have to add >>> there. Do we have any logs? >> >> There's some archived onto the distro bug. Search this page for > "journalctl.log.gz" >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036530 >> > > interesting.. It seems to be the same controller used here. I wonder > if the pci topology is different or if the workaround is applied at > all. I didn't see the message in the log about the workaround being applied in that log, so I guess PCI topology difference is a likely suspect. >>> >>> yeah, but I also couldn't see a log with the usual nouveau messages, >>> so it's kinda weird. >>> >>> Anyway, the output of `lspci -tvnn` would help >> >> % lspci -tvnn >> -[:00]-+-00.0 Intel Corporation Device [8086:3e20] >>+-01.0-[01]00.0 NVIDIA Corporation TU117M [GeForce GTX 1650 >> Mobile / Max-Q] [10de:1f91] > > So the bridge it's connected to is the same that the quirk *should have been* > triggering. > > May 29 15:02:42 xps kernel: pci :00:01.0: [8086:1901] type 01 class > 0x060400 > > Since the quirk isn't working and this is still a problem in 6.4-rc4 I > suggest opening a > Nouveau drm bug to figure out why. > >>+-02.0 Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630] >> [8086:3e9b] >>+-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core >> Processor Thermal Subsystem [8086:1903] >>+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / >> 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911] >>+-12.0 Intel Corporation Cannon Lake PCH Thermal Controller >> [8086:a379] >>+-14.0
Re: virtio_balloon regression in 5.19-rc3 #forregzbot
On 10.07.22 10:06, Thorsten Leemhuis wrote: > On 04.07.22 11:40, Thorsten Leemhuis wrote: >> TWIMC: this mail is primarily send for documentation purposes and for >> regzbot, my Linux kernel regression tracking bot. These mails usually >> contain '#forregzbot' in the subject, to make them easy to spot and filter. >> >> On 21.06.22 11:35, Thorsten Leemhuis wrote: >>> On 20.06.22 20:49, Ben Hutchings wrote: >>>> I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type >>>> pc-q35-5.2. It has a virtio balloon device defined in libvirt as: >>>> >>>> >>>> >>> function="0x0"/> >>>> >>>> >>>> but the virtio_balloon driver fails to bind to it: >>>> >>>> virtio_balloon virtio4: init_vqs: add stat_vq failed >>>> virtio_balloon: probe of virtio4 failed with error -5 >>>> >>> [...] >>> #regzbot ^introduced v5.18..v5.19-rc3 >>> #regzbot ignore-activity >> >> #regzbot introduced 8b4ec69d7e09 >> #regzbot monitor >> https://lore.kernel.org/all/20220622012940.21441-1-jasow...@redhat.com/ > > #regzbot fixed-by: 6a9720576c > #regzbot ignore-activity For the record: the fix was merged through a different branch and thus got a different commit id: #regzbot fixed-by: ebe797f25f68f28581f46a9cb9c1997ac15c39a0
Re: virtio_balloon regression in 5.19-rc3 #forregzbot
On 04.07.22 11:40, Thorsten Leemhuis wrote: > TWIMC: this mail is primarily send for documentation purposes and for > regzbot, my Linux kernel regression tracking bot. These mails usually > contain '#forregzbot' in the subject, to make them easy to spot and filter. > > On 21.06.22 11:35, Thorsten Leemhuis wrote: >> [TLDR: I'm adding this regression report to the list of tracked >> regressions; all text from me you find below is based on a few templates >> paragraphs you might have encountered already already in similar form.] >> >> On 20.06.22 20:49, Ben Hutchings wrote: >>> I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type >>> pc-q35-5.2. It has a virtio balloon device defined in libvirt as: >>> >>> >>> >> function="0x0"/> >>> >>> >>> but the virtio_balloon driver fails to bind to it: >>> >>> virtio_balloon virtio4: init_vqs: add stat_vq failed >>> virtio_balloon: probe of virtio4 failed with error -5 >>> >> [...] >> #regzbot ^introduced v5.18..v5.19-rc3 >> #regzbot ignore-activity > > #regzbot introduced 8b4ec69d7e09 > #regzbot monitor > https://lore.kernel.org/all/20220622012940.21441-1-jasow...@redhat.com/ #regzbot fixed-by: 6a9720576c #regzbot ignore-activity For details see: https://lore.kernel.org/all/cacgkmeu8eecpamy__oqqnf7iuku7nho_-mij2zwulfv2rv+...@mail.gmail.com/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Re: virtio_balloon regression in 5.19-rc3 #forregzbot
TWIMC: this mail is primarily send for documentation purposes and for regzbot, my Linux kernel regression tracking bot. These mails usually contain '#forregzbot' in the subject, to make them easy to spot and filter. On 21.06.22 11:35, Thorsten Leemhuis wrote: > [TLDR: I'm adding this regression report to the list of tracked > regressions; all text from me you find below is based on a few templates > paragraphs you might have encountered already already in similar form.] > > On 20.06.22 20:49, Ben Hutchings wrote: >> I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type >> pc-q35-5.2. It has a virtio balloon device defined in libvirt as: >> >> >> > function="0x0"/> >> >> >> but the virtio_balloon driver fails to bind to it: >> >> virtio_balloon virtio4: init_vqs: add stat_vq failed >> virtio_balloon: probe of virtio4 failed with error -5 >> > [...] > #regzbot ^introduced v5.18..v5.19-rc3 > #regzbot ignore-activity #regzbot introduced 8b4ec69d7e09 #regzbot monitor https://lore.kernel.org/all/20220622012940.21441-1-jasow...@redhat.com/
Re: virtio_balloon regression in 5.19-rc3
[TLDR: I'm adding this regression report to the list of tracked regressions; all text from me you find below is based on a few templates paragraphs you might have encountered already already in similar form.] On 20.06.22 20:49, Ben Hutchings wrote: > I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type > pc-q35-5.2. It has a virtio balloon device defined in libvirt as: > > >function="0x0"/> > > > but the virtio_balloon driver fails to bind to it: > > virtio_balloon virtio4: init_vqs: add stat_vq failed > virtio_balloon: probe of virtio4 failed with error -5 > > On a 5.18 kernel with similar configuration, it binds successfully. > > I've attached the kernel config for 5.19-rc3. CCing the regression mailing list, as it should be in the loop for all regressions, as explained here: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced v5.18..v5.19-rc3 #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply -- ideally with also telling regzbot about it, as explained here: https://linux-regtracking.leemhuis.info/tracked-regression/ Reminder for developers: When fixing the issue, add 'Link:' tags pointing to the report (the mail this one replies to), as explained for in the Linux kernel's documentation; above webpage explains why this is important for tracked regressions. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Bug#1005005: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
On 21.03.22 19:49, Dominique Dumont wrote: > On Monday, 21 March 2022 09:57:59 CET Thorsten Leemhuis wrote: >> Dominique/Salvatore/Eric, what's the status of this regression? >> According to the debian bug tracker the problem is solved with 5.16 and >> 5.17, but was 5.15 ever fixed? > > I don't think so. > > On kernel side, the commit fixing this issue is > e55a3aea418269266d84f426b3bd70794d3389c8 . > > According to the logs of [1] , this commit landed in v5.17-rc3 > > HTH > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git And from there it among others got backported to 5.15.22: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.15.y=8a15ac1786c92dce6ecbeb4e4c237f5f80c2c703 https://lwn.net/Articles/884107/ Another indicator that Eric's problem is something else. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Bug#1005005: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
On 21.03.22 13:07, Éric Valette wrote: > My problem has never been fixed. > > The proposed patch has been applied to 5.15. I do not remerber which version > 28 maybe. > > I still have à RIP in pm_suspend. Did not test the Last two 15 versions. > > I can leave with 5.10 est using own compiled kernels. > > Thanks for asking. This thread/the debian bug report (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005 ) is getting long which makes things hard to grasp. But to me it looks a lot like the problem you are facing is different from the problem that others ran into and bisected -- but I might be totally wrong there. Have you ever tried reverting 3c196f05 to seem if it helps (sorry if that's mentioned in the bug report somewhere, as I said, it became long)? I guess a bisection from your side really would help a lot; but before you go down that route you might want to give 5.17 and the latest 5.15.y kernel a try. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Bug#1005005: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone. Dominique/Salvatore/Eric, what's the status of this regression? According to the debian bug tracker the problem is solved with 5.16 and 5.17, but was 5.15 ever fixed? Ciao, Thorsten On 21.02.22 15:16, Alex Deucher wrote: > On Mon, Feb 21, 2022 at 3:29 AM Eric Valette wrote: >> >> On 20/02/2022 16:48, Dominique Dumont wrote: >>> On Monday, 14 February 2022 22:52:27 CET Alex Deucher wrote: Does the system actually suspend? >>> >>> Not really. The screens looks like it's going to suspend, but it does come >>> back after 10s or so. The light mounted in the middle of the power button >>> does >>> not switch off. >> >> >> As I have a very similar problem and also commented on the original >> debian bug report >> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005), I will add >> some information here on another amd only laptop (renoir AMD Ryzen 7 >> 4800H with Radeon Graphics + Radeon RX 5500/5500M / Pro 5500M). >> >> For me the suspend works once, but after the first resume (I do know >> know if it is in the suspend path or the resume path I see a RIP in the >> dmesg (see aditional info in debian bug)) and later suspend do not >> work: It only go to the kde login screen. >> >> I was unable due to network connectivity to do a full bisect but tested >> with the patch I had on my laptop: >> >> 5.10.101 works, 5.10 from debian works >> 5.11 works >> 5.12 works >> 5.13 suspend works but when resuming the PC is dead I have to reboot >> 5.14 seems to work but looking at dmesg it is full of RIP messages at >> various places. >> 5.15.24 is a described 5.15 from debian is behaving identically >> 5.16 from debian is behaving identically. >> Is this system S0i3 or regular S3? >> >> For me it is real S3. >> >> The proposed patch is intended for INTEl + intel gpu + amdgpu but I have >> dual amd GPU. > > It doesn't really matter what the platform is, it could still > potentially help on your system, it depends on the bios implementation > for your platform and how it handles suspend. You can try the patch, > but I don't think you are hitting the same issue. I bisect would be > helpful in your case. > > Alex
Bug#1005005: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
[TLDR: I'm adding the regression report below to regzbot, the Linux kernel regression tracking bot; all text you find below is compiled from a few templates paragraphs you might have encountered already already from similar mails.] Hi, this is your Linux kernel regression tracker speaking. CCing the regression mailing list, as it should be in the loop for all regressions, as explained here: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html To be sure this issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced 3c196f05 #regzbot title amdgfx: suspend stopped working #regzbot ignore-activity #regzbot link: https://bugs.debian.org/1005005 Reminder for developers: when fixing the issue, please add a 'Link:' tags pointing to the report (the mail quoted above) using lore.kernel.org/r/, as explained in 'Documentation/process/submitting-patches.rst' and 'Documentation/process/5.Posting.rst'. This allows the bot to connect the report with any patches posted or committed to fix the issue; this again allows the bot to show the current status of regressions and automatically resolve the issue when the fix hits the right tree. I'm sending this to everyone that got the initial report, to make them aware of the tracking. I also hope that messages like this motivate people to directly get at least the regression mailing list and ideally even regzbot involved when dealing with regressions, as messages like this wouldn't be needed then. Don't worry, I'll send further messages wrt to this regression just to the lists (with a tag in the subject so people can filter them away), if they are relevant just for regzbot. With a bit of luck no such messages will be needed anyway. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. On 12.02.22 19:23, Salvatore Bonaccorso wrote: > Hi Alex, hi all > > In Debian we got a regression report from Dominique Dumont, CC'ed in > https://bugs.debian.org/1005005 that afer an update to 5.15.15 based > kernel, his machine noe longer suspends correctly, after screen going > black as usual it comes back. The Debian bug above contians a trace. > > Dominique confirmed that this issue persisted after updating to 5.16.7 > furthermore he bisected the issue and found > > 3c196f0510912645c7c5d9107706003f67c3 is the first bad commit > commit 3c196f0510912645c7c5d9107706003f67c3 > Author: Alex Deucher > Date: Fri Nov 12 11:25:30 2021 -0500 > > drm/amdgpu: always reset the asic in suspend (v2) > > [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ] > > If the platform suspend happens to fail and the power rail > is not turned off, the GPU will be in an unknown state on > resume, so reset the asic so that it will be in a known > good state on resume even if the platform suspend failed. > > v2: handle s0ix > > Acked-by: Luben Tuikov > Acked-by: Evan Quan > Signed-off-by: Alex Deucher > Signed-off-by: Sasha Levin > >drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - >1 file changed, 4 insertions(+), 1 deletion(-) > > to be the first bad commit, see https://bugs.debian.org/1005005#34 . > > Does this ring any bell? Any idea on the problem? > > Regards, > Salvatore -- Additional information about regzbot: If you want to know more about regzbot, check out its web-interface, the getting start guide, and the references documentation: https://linux-regtracking.leemhuis.info/regzbot/ https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md The last two documents will explain how you can interact with regzbot yourself if your want to. Hint for reporters: when reporting a regression it's in your interest to CC the regression list and tell regzbot about the issue, as that ensures the regression makes it onto the radar of the Linux kernel's regression tracker -- that's in your interest, as it ensures your report won't fall through the cracks unnoticed. Hint for developers: you normally don't need to care about regzbot once it's involved. Fix the issue as you normally would, just remember to include 'Link:' tag in the patch descriptions pointing to all reports about the issue. This has been expected from developers even before regzbot showed up for reasons explained in