Bug#1071378: [REGRESSION] commit fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6 (Linux 6.7+) crashes during boot

2024-05-30 Thread Linux regression tracking (Thorsten Leemhuis)
On 30.05.24 10:45, Jörn Heusipp wrote:
> 
> On 30/05/2024 09:27, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 30.05.24 08:55, Jörn Heusipp wrote:
>>> commit fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6 ("x86/sev-es: Set
>>> x86_virt_bits to the correct value straight away, instead of a two-phase
>>> approach") crashes during boot for me on this 32bit x86 system.
>>
>> FWIW, not my area of expertise, but there is a patch from Dave with a
>> Fixes: tag for your culprit up for review:
>> https://lore.kernel.org/all/20240517200534.8ec5f...@davehans-spike.ostc.intel.com/
> 
> That did not apply cleanly to 6.10-rc1,

Maybe something changed since then.

> but I figured it out manually. I
> can confirm that it fixes the issue.

Cool. Guess Dave in that case might be happy about a "Tested-by" tag
from you:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot dup:
https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevche...@linux.intel.com/
#regzbot fix: x86/cpu: Provide default cache line size if not enumerated
#regzbot related:
https://lore.kernel.org/all/20240517200534.8ec5f...@davehans-spike.ostc.intel.com/



Bug#1071378: [REGRESSION] commit fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6 (Linux 6.7+) crashes during boot

2024-05-30 Thread Linux regression tracking (Thorsten Leemhuis)
On 30.05.24 08:55, Jörn Heusipp wrote:
> 
> Hello x86 maintainers!
> 
> commit fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6 ("x86/sev-es: Set
> x86_virt_bits to the correct value straight away, instead of a two-phase
> approach") crashes during boot for me on this 32bit x86 system.

FWIW, not my area of expertise, but there is a patch from Dave with a
Fixes: tag for your culprit up for review:
https://lore.kernel.org/all/20240517200534.8ec5f...@davehans-spike.ostc.intel.com/

Ciao, Thorsten

> Updating a Debian testing system resulted in a hang during boot before
> printing anything, with any 6.7 or later kernel. With 'earlyprintk=vga',
> I managed to capture the crash on video and stitched it together as an
> image [1].
> Trimmed transcription (might contain typos) of the crash from Debian
> kernel 6.7.12-1:
> ===
> BUG: kernel NULL pointer dereference, address: 0010
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> [...]
> EIP: __ring_buffer_alloc+0x32/0x194
> [...]
> show_regs
> __die
> page_fault_oops
> kernelmode_fixup_or_oops.constprop
> __bad_area_nosemaphore.constprop
> bad_area_nosemaphore
> do_user_addr_fault
> prb_read_valid
> exc_page_fault
> pvclock_clocksource_read_nowd
> handle_exception
> pvclock_clocksource_read_nowd
> __ring_buffer_alloc
> pvclock_clocksource_read_nowd
> __ring_buffer_alloc
> early_trace_init
> start_kernel
> i386_start_kernel
> startup_32_smp
> [...]
> ===
> I could transcribe all of it or capture it again from latest git and
> decode the symbols, if truely really needed, but I figured the type of
> crash and the trace itself could maybe be sufficient. It looks identical
> to me for all later crashing kernel versions.
> 
> I bisected this down to commit fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6.
> 
> The kernel config [2] I used is 'make olddefconfig' based on Debian's
> config-6.8.11-686-pae [3].
> 
> I also tested 6.9.2 and 6.10-rc1, both also still crash in the same way.
> 
> cpuinfo:
> ===
> manx@caesar:~$ cat /proc/cpuinfo
> processor   : 0
> vendor_id   : AuthenticAMD
> cpu family  : 6
> model   : 8
> model name  : AMD Duron(tm)
> stepping    : 1
> cpu MHz : 1798.331
> cache size  : 64 KB
> physical id : 0
> siblings    : 1
> core id : 0
> cpu cores   : 1
> apicid  : 0
> initial apicid  : 0
> fdiv_bug    : no
> f00f_bug    : no
> coma_bug    : no
> fpu : yes
> fpu_exception   : yes
> cpuid level : 1
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow cpuid
> 3dnowprefetch vmmcall
> bugs    : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2
> spec_store_bypass
> bogomips    : 3596.66
> clflush size    : 32
> cache_alignment : 32
> address sizes   : 34 bits physical, 32 bits virtual
> power management: ts
> ===
> 
> dmesg from a successful boot (Debian kernel 6.6.15-2) is here [4].
> 
> This particular system has been running all Debian testing kernels since
> at least the 2.6.32 days and is currently running 6.6.15-2 completely
> fine, thus this is an obvious regression.
> 
> The original Debian bug is #1071378 [5].
> 
> 
> #regzbot introduced: fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6
> 
> [1] https://manx.datengang.de/temp/linux-6.7-crash/6.7.12-1-crash.png
> [2] https://manx.datengang.de/temp/linux-6.7-crash/config
> [3] https://manx.datengang.de/temp/linux-6.7-crash/config-6.8.11-686-pae
> [4] https://manx.datengang.de/temp/linux-6.7-crash/dmesg-6.6.15-2.txt
> [5] https://bugs.debian.org/1071378
> 
> 
> Best regards,
> Jörn
> 
> 



Bug#1071420: linux-image-6.8.9-1-amd64: cannot mount btrfs root partition

2024-05-19 Thread Linux regression tracking (Thorsten Leemhuis)
On Sat, 18 May 2024 22:25:14 +0200 Matteo Settenvini
 wrote:
> 
> booting kernel 6.8.9-1 with dracut, systemd, and btrfs as the root device 
> fails
> to mount the root partition. I just tried the kernel from sid and it seems 
> indeed \
> affected. The 6.7 kernel from trixie is instead booting fine even after
> regenerating all initrds.
> 
> According to bl...@debian.org, this is likely due to
> https://github.com/torvalds/linux/commit/a1912f712188291f9d7d434fba155461f1ebef66

Would be great to know what the actual problem is. Are there any error
messages from systemd or the kernel?

The upstream bug (https://github.com/systemd/systemd/pull/32892 ) about
this also does not state what goes wrong (either in general or certain
situations).

Such details would likely be needed to convince the btrfs upstream devs
to revert the change or apply a workaround -- especially as I'm pretty
sure there are already a lot of btrfs systems with systemd and 6.8
(release upstream 2+ month ago and regularly used in Arch, Fedora and
Tumbleweed for weeks now) out there and working just fine (including the
Fedora machine one I write from).

Thorsten



Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-20 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.03.24 02:08, Alex Constantino wrote:
> Fix OOM scenario by doing multiple notifications to the OOM handler through
> a busy wait logic.
> Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> timeout.
> 
> Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> Link: 
> https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info
> Reported-by: Timo Lindfors 
> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
> Signed-off-by: Alex Constantino 
> ---
>  drivers/gpu/drm/qxl/qxl_release.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)

Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two
I just added to the CC), it seems to me this regression fix did not
maybe any progress since it was posted. Did I miss something, is it just
"we are busy with the merge window", or is there some other a reason?
Just wondering, I just saw someone on a Fedora IRC channel complaining
about the regression, that's why I'm asking. Would be really good to
finally get this resolved...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
> b/drivers/gpu/drm/qxl/qxl_release.c
> index 368d26da0d6a..51c22e7f9647 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -20,8 +20,6 @@
>   * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>  
> -#include 
> -
>  #include 
>  
>  #include "qxl_drv.h"
> @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool 
> intr,
>  {
>   struct qxl_device *qdev;
>   unsigned long cur, end = jiffies + timeout;
> + signed long iterations = 1;
> + signed long timeout_fraction = timeout;
>  
>   qdev = container_of(fence->lock, struct qxl_device, release_lock);
>  
> - if (!wait_event_timeout(qdev->release_event,
> + // using HZ as a factor since it is used in ttm_bo_wait_ctx too
> + if (timeout_fraction > HZ) {
> + iterations = timeout_fraction / HZ;
> + timeout_fraction = HZ;
> + }
> + for (int i = 0; i < iterations; i++) {
> + if (wait_event_timeout(
> + qdev->release_event,
>   (dma_fence_is_signaled(fence) ||
> -  (qxl_io_notify_oom(qdev), 0)),
> - timeout))
> - return 0;
> + (qxl_io_notify_oom(qdev), 0)),
> + timeout_fraction))
> + break;
> + }
>  
>   cur = jiffies;
>   if (time_after(cur, end))



Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-28 Thread Linux regression tracking (Thorsten Leemhuis)
On 27.01.24 14:14, Salvatore Bonaccorso wrote:
>
> In Debian (https://bugs.debian.org/1061449) we got the following
> quotred report:
> 
> On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
>>
>> Giving a try to 6.7, here is a message extracted from dmesg:
>> [4.177226] [ cut here ]
>> [4.177227] WARNING: CPU: 6 PID: 248 at
>> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
>> construct_phy+0xb26/0xd60 [amdgpu]
> [...]

Not my area of expertise, but looks a lot like a duplicate of
https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835

Mario (now CCed) already prepared a patch for that issue that seems to work.

HTH, Ciao, Thorsten



Bug#1054514: linux-image-6.1.0-13-amd64: Debian VM with qxl graphics freezes frequently

2023-12-06 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Gerd, it seems this regression[1] fell through the cracks. Could you
please take a look? Or is there a good reason why this can't be
addressed? Or was it dealt with and I just missed it?

[1] apparently caused by 5a838e5d5825c8 ("drm/qxl: simplify
qxl_fence_wait") [v5.13-rc1] from Gerd; for details see
https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 24.10.23 23:39, Timo Lindfors wrote:
> Hi,
> 
> On Tue, 24 Oct 2023, Salvatore Bonaccorso wrote:
>> Thanks for the excelent constructed report! I think it's best to
>> forward this directly to upstream including the people for the
>> bisected commit to get some idea.
> 
> Thanks for the quick reply!
> 
>> Can you reproduce the issue with 6.5.8-1 in unstable as well?
> 
> Unfortunately yes:
> 
> ansible@target:~$ uname -r
> 6.5.0-3-amd64
> ansible@target:~$ time sudo ./reproduce.bash
> Wed 25 Oct 2023 12:27:00 AM EEST starting round 1
> Wed 25 Oct 2023 12:27:24 AM EEST starting round 2
> Wed 25 Oct 2023 12:27:48 AM EEST starting round 3
> bug was reproduced after 3 tries
> 
> real    0m48.838s
> user    0m1.115s
> sys 0m45.530s
> 
> I also tested upstream tag v6.6-rc6:
> 
> ...
> + detected_version=6.6.0-rc6
> + '[' 6.6.0-rc6 '!=' 6.6.0-rc6 ']'
> + exec ssh target sudo ./reproduce.bash
> Wed 25 Oct 2023 12:37:16 AM EEST starting round 1
> Wed 25 Oct 2023 12:37:42 AM EEST starting round 2
> Wed 25 Oct 2023 12:38:10 AM EEST starting round 3
> Wed 25 Oct 2023 12:38:36 AM EEST starting round 4
> Wed 25 Oct 2023 12:39:01 AM EEST starting round 5
> Wed 25 Oct 2023 12:39:27 AM EEST starting round 6
> bug was reproduced after 6 tries
> 
> 
> For completeness, here is also the grub_set_default_version.bash script
> that I had to write to automate this (maybe these could be in debian
> wiki?):
> 
> #!/bin/bash
> set -x
> 
> version="$1"
> 
> idx=$(expr $(grep "menuentry " /boot/grub/grub.cfg | sed 1d |grep -n
> "'Debian GNU/Linux, with Linux $version'"|cut -d: -f1) - 1)
> exec sudo grub-set-default "1>$idx"
> 
> 
> 
> -Timo
> 
> 
> 



Bug#1051592: Regression: Commit "netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID" breaks ruleset loading in linux-stable

2023-09-29 Thread Linux regression tracking (Thorsten Leemhuis)
On 12.09.23 12:27, Florian Westphal wrote:
> Linux regression tracking (Thorsten Leemhuis)  
> wrote:
>> On 12.09.23 00:57, Pablo Neira Ayuso wrote:
>>> Userspace nftables v1.0.6 generates incorrect bytecode that hits a new
>>> kernel check that rejects adding rules to bound chains. The incorrect
>>> bytecode adds the chain binding, attach it to the rule and it adds the
>>> rules to the chain binding. I have cherry-picked these three patches
>>> for nftables v1.0.6 userspace and your ruleset restores fine.
>>> [...]
>>
>> H. Well, this sounds like a kernel regression to me that normally
>> should be dealt with on the kernel level, as users after updating the
>> kernel should never have to update any userspace stuff to continue what
>> they have been doing before the kernel update.
> 
> This is a combo of a userspace bug and this new sanity check that
> rejects the incorrect ordering (adding rules to the already-bound
> anonymous chain).
> 
> nf_tables uses a transaction allor-nothing model, this means that any
> error that occurs during a transaction has to be reverse/undo all the
> pending changes.  This has caused a myriad of bugs already.
> 
> So while this can be theoretically fixed in the kernel I don't see
> a sane way to do it.  Error unwinding / recovery from deeply nested
> errors is already too complex for my taste.
> 
>> Can't the kernel somehow detect the incorrect bytecode and do the right
>> thing(tm) somehow?
> 
> Theoretically yes, but I don't feel competent enough to do it, just look
> at all the UaF bugs of the past month.

Thx for the answer. FWIW, as this was a judgement call I mentioned this
in my last regression report to Linus; he didn't reply, so I guess it is
-- and will remove this issue from my tracking:

#regzbot resolve: can be solved by a nftables userspace update; not
nice, but likely best solution in this case
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.



Bug#1051592: Regression: Commit "netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID" breaks ruleset loading in linux-stable

2023-09-12 Thread Linux regression tracking (Thorsten Leemhuis)
On 12.09.23 00:57, Pablo Neira Ayuso wrote:
> On Mon, Sep 11, 2023 at 11:37:50PM +0200, Timo Sigurdsson wrote:
>>
>> recently, Debian updated their stable kernel from 6.1.38 to 6.1.52
>> which broke nftables ruleset loading on one of my machines with lots
>> of "Operation not supported" errors. I've reported this to the
>> Debian project (see link below) and Salvatore Bonaccorso and I
>> identified "netfilter: nf_tables: disallow rule addition to bound
>> chain via NFTA_RULE_CHAIN_ID" (0ebc1064e487) as the offending commit
>> that introduced the regression. Salvatore also found that this issue
>> affects the 5.10 stable tree as well (observed in 5.10.191), but he
>> cannot reproduce it on 6.4.13 and 6.5.2.
>>
>> The issue only occurs with some rulesets. While I can't trigger it
>> with simple/minimal rulesets that I use on some machines, it does
>> occur with a more complex ruleset that has been in use for months
>> (if not years, for large parts of it). I'm attaching a somewhat
>> stripped down version of the ruleset from the machine I originally
>> observed this issue on. It's still not a small or simple ruleset,
>> but I'll try to reduce it further when I have more time.
>>
>> The error messages shown when trying to load the ruleset don't seem
>> to be helpful. Just two simple examples: Just to give two simple
>> examples from the log when nftables fails to start:
>> /etc/nftables.conf:99:4-44: Error: Could not process rule: Operation not 
>> supported
>> tcp option maxseg size 1-500 counter drop
>> ^
>> /etc/nftables.conf:308:4-27: Error: Could not process rule: Operation not 
>> supported
>> tcp dport sip-tls accept
>> 
> 
> I can reproduce this issue with 5.10.191 and 6.1.52 and nftables v1.0.6,
> this is not reproducible with v1.0.7 and v1.0.8.
> 
>> Since the issue only affects some stable trees, Salvatore thought it
>> might be an incomplete backport that causes this.
>>
>> If you need further information, please let me know.
> 
> Userspace nftables v1.0.6 generates incorrect bytecode that hits a new
> kernel check that rejects adding rules to bound chains. The incorrect
> bytecode adds the chain binding, attach it to the rule and it adds the
> rules to the chain binding. I have cherry-picked these three patches
> for nftables v1.0.6 userspace and your ruleset restores fine.
> [...]

H. Well, this sounds like a kernel regression to me that normally
should be dealt with on the kernel level, as users after updating the
kernel should never have to update any userspace stuff to continue what
they have been doing before the kernel update.

Can't the kernel somehow detect the incorrect bytecode and do the right
thing(tm) somehow?

But yes, don't worry, I know that reality is not black and white and
that it's crucial that things like package filtering do exactly what the
user expect it to do; that's why this might be one of those rare
situations where "user has to update userspace components to support
newer kernels" might be the better of two bad choices. But I had to ask
to ensure it's something like that.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.



Bug#1036530: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)

2023-06-26 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Nick, what's the status/was there any progress? Did you do what Mario
suggested and file a nouveau bug?

I ask, as I still have this on my list of regressions and it seems there
was no progress in three+ weeks now.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot backburner: slow progress, likely just affects one machine
#regzbot poke


On 02.06.23 02:57, Limonciello, Mario wrote:
> [AMD Official Use Only - General]
> 
>> -Original Message-
>> From: Nick Hastings 
>> Sent: Thursday, June 1, 2023 7:02 PM
>> To: Karol Herbst 
>> Cc: Limonciello, Mario ; Lyude Paul
>> ; Lukas Wunner ; Salvatore
>> Bonaccorso ; 1036...@bugs.debian.org; Rafael J.
>> Wysocki ; Len Brown ; linux-
>> a...@vger.kernel.org; linux-ker...@vger.kernel.org;
>> regressi...@lists.linux.dev
>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI
>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)
>>
>> Hi,
>>
>> * Karol Herbst  [230602 03:10]:
>>> On Thu, Jun 1, 2023 at 7:21 PM Limonciello, Mario
>>>  wrote:
> -Original Message-
> From: Karol Herbst 
> Sent: Thursday, June 1, 2023 12:19 PM
> To: Limonciello, Mario 
> Cc: Nick Hastings ; Lyude Paul
> ; Lukas Wunner ; Salvatore
> Bonaccorso ; 1036...@bugs.debian.org; Rafael J.
> Wysocki ; Len Brown ; linux-
> a...@vger.kernel.org; linux-ker...@vger.kernel.org;
> regressi...@lists.linux.dev
> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI
> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of
>> system)
>
> On Thu, Jun 1, 2023 at 6:54 PM Limonciello, Mario
>  wrote:
>>
>> [AMD Official Use Only - General]
>>
>>> -Original Message-
>>> From: Karol Herbst 
>>> Sent: Thursday, June 1, 2023 11:33 AM
>>> To: Limonciello, Mario 
>>> Cc: Nick Hastings ; Lyude Paul
>>> ; Lukas Wunner ; Salvatore
>>> Bonaccorso ; 1036...@bugs.debian.org; Rafael
>> J.
>>> Wysocki ; Len Brown ; linux-
>>> a...@vger.kernel.org; linux-ker...@vger.kernel.org;
>>> regressi...@lists.linux.dev
>>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video
>> _OSI
>>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of
> system)
>>>
>>> On Thu, Jun 1, 2023 at 6:18 PM Limonciello, Mario

 Lyude, Lukas, Karol

 This thread is in relation to this commit:

 24867516f06d ("ACPI: OSI: Remove Linux-Dell-Video _OSI string")

 Nick has found that runtime PM is *not* working for nouveau.

>>>
>>> keep in mind we have a list of PCIe controllers where we apply a
>>> workaround:
>>>
>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers
>>> /gpu/drm/nouveau/nouveau_drm.c?h=v6.4-rc4#n682
>>>
>>> And I suspect there might be one or two more IDs we'll have to add
>>> there. Do we have any logs?
>>
>> There's some archived onto the distro bug.  Search this page for
> "journalctl.log.gz"
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036530
>>
>
> interesting.. It seems to be the same controller used here. I wonder
> if the pci topology is different or if the workaround is applied at
> all.

 I didn't see the message in the log about the workaround being applied
 in that log, so I guess PCI topology difference is a likely suspect.

>>>
>>> yeah, but I also couldn't see a log with the usual nouveau messages,
>>> so it's kinda weird.
>>>
>>> Anyway, the output of `lspci -tvnn` would help
>>
>> % lspci -tvnn
>> -[:00]-+-00.0  Intel Corporation Device [8086:3e20]
>>+-01.0-[01]00.0  NVIDIA Corporation TU117M [GeForce GTX 1650
>> Mobile / Max-Q] [10de:1f91]
> 
> So the bridge it's connected to is the same that the quirk *should have been* 
> triggering.
> 
> May 29 15:02:42 xps kernel: pci :00:01.0: [8086:1901] type 01 class 
> 0x060400
> 
> Since the quirk isn't working and this is still a problem in 6.4-rc4 I 
> suggest opening a
> Nouveau drm bug to figure out why.
> 
>>+-02.0  Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
>> [8086:3e9b]
>>+-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core
>> Processor Thermal Subsystem [8086:1903]
>>+-08.0  Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 /
>> 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
>>+-12.0  Intel Corporation Cannon Lake PCH Thermal Controller
>> [8086:a379]
>>+-14.0