Re: [REGRESSION] v6.9-rc7: nouveau: init failed, no display output from kernel; successfully bisected

2024-05-09 Thread Linux regression tracking (Thorsten Leemhuis)
On 06.05.24 20:23, Dan Moulding wrote:
> After upgrading to rc7 from rc6 on a system with NVIDIA GP104 using
> the nouveau driver, I get no display output from the kernel (only the
> output from GRUB shows on the primary display). Nonetheless, I was
> able to SSH to the system and get the kernel log from dmesg. I found
> errors from nouveau in it. Grepping it for nouveau gives me this:
> 
> [0.367379] nouveau :01:00.0: NVIDIA GP104 (134000a1)
> [0.474499] nouveau :01:00.0: bios: version 86.04.50.80.13
> [0.474620] nouveau :01:00.0: pmu: firmware unavailable
> [0.474977] nouveau :01:00.0: fb: 8192 MiB GDDR5
> [0.484371] nouveau :01:00.0: sec2(acr): mbox 0001 
> [0.484377] nouveau :01:00.0: sec2(acr):load: boot failed: -5
> [0.484379] nouveau :01:00.0: acr: init failed, -5
> [0.484466] nouveau :01:00.0: init failed with -5
> [0.484468] nouveau: DRM-master::0080: init failed with -5
> [0.484470] nouveau :01:00.0: DRM-master: Device allocation failed: -5
> [0.485078] nouveau :01:00.0: probe with driver nouveau failed with 
> error -50
> 
> I bisected between v6.9-rc6 and v6.9-rc7 and that identified commit
> 52a6947bf576 ("drm/nouveau/firmware: Fix SG_DEBUG error with
> nvkm_firmware_ctor()") as the first bad commit.

Lyude, that's a commit of yours.

Given that 6.9 is due a quick question: I assume there is no easy fix
for this in sight? Or is a quick revert something that might be
appropriate to prevent this from entering 6.9?

Ciao, Thorsten

> I then rebuilt
> v6.9-rc7 with just that commit reverted and the problem does not
> occur.
> 
> Please let me know if there are any additional details I can provide
> that would be helpful, or if I should reproduce the failure with
> additional debugging options enabled, etc.
> 
> Cheers,
> 
> -- Dan


Re: nouveau: r535.c:1266:3: error: label at end of compound statement default: with gcc-8

2024-04-29 Thread Linux regression tracking (Thorsten Leemhuis)



On 29.04.24 17:06, Naresh Kamboju wrote:
> Following build warnings / errors noticed on Linux next-20240429 tag on the
> arm64, arm and riscv with gcc-8 and gcc-13 builds pass.
> 
> Reported-by: Linux Kernel Functional Testing 
> 
> Commit id:
>  b58a0bc904ff nouveau: add command-line GSP-RM registry support
> 
> Buids:
> --
>   gcc-8-arm64-defconfig - Fail
>   gcc-8-arm-defconfig - Fail
>   gcc-8-riscv-defconfig - Fail
> 
> Build log:
> 
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c: In function 'build_registry':
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1266:3: error: label at
> end of compound statement
>default:
>^~~
> make[7]: *** [scripts/Makefile.build:244:
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.o] Error 1

TWIMC, there is another report about this in this thread (sadly some of
its post did not make it to lore):

https://lore.kernel.org/all/162ef3c0-1d7b-4220-a21f-b0008657f...@redhat.com/

Ciao, Thorsten

> metadata:
>   git_describe: next-20240429
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_short_log: b0a2c79c6f35 ("Add linux-next specific files for 20240429")
>   arch: arm64, arm, riscv
>   toolchain: gcc-8
> 
> Steps to reproduce:
> 
> # tuxmake --runtime podman --target-arch arm64 --toolchain gcc-8
> --kconfig defconfig
> 
> Links:
>  - 
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2flcoOuqVJfhTvX4AOYsWMd5hqe/
>  - 
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240429/testrun/23704376/suite/build/test/gcc-8-defconfig/history/
>  - 
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240429/testrun/23705756/suite/build/test/gcc-8-defconfig/details/
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org
> 
> 


Re: [PATCH] drm/nouveau: keep DMA buffers required for suspend/resume

2024-03-03 Thread Linux regression tracking (Thorsten Leemhuis)
[adding a bunch of list and people as well as Timur Tabi, who authored
the culprit]

Sid Pranjale, thx for the report. FWIW, I'm just replying to add this to
the regression tracking to ensure it does not fall through the cracks.
Nevertheless let me mention two things while at it:

On 29.02.24 18:58, Sid Pranjale wrote:
> Nouveau deallocates a few buffers post GPU init which are required for GPU 
> suspend/resume to function correctly.
> This is likely not as big an issue on systems where the NVGPU is the only 
> GPU, but on multi-GPU set ups it leads to a regression where the kernel 
> module errors and results in a system-wide rendering freeze.

These lines are too long, see
Documentation/process/submitting-patches.rst for details.

> This commit addresses that regression by moving the two buffers required for 
> suspend and resume to be deallocated at driver unload instead of post init.
> 
> Fixes: 042b5f8 ("drm/nouveau: fix several DMA buffer leaks")

And that should be:

Fixes:  042b5f83841fbf ("drm/nouveau: fix several DMA buffer leaks")

> Signed-off-by: Sid Pranjale 
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index a64c81385..a73a5b589 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1054,8 +1054,6 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
>   /* Release the DMA buffers that were needed only for boot and init */
>   nvkm_gsp_mem_dtor(gsp, >boot.fw);
>   nvkm_gsp_mem_dtor(gsp, >libos);
> - nvkm_gsp_mem_dtor(gsp, >rmargs);
> - nvkm_gsp_mem_dtor(gsp, >wpr_meta);
>  
>   return ret;
>  }
> @@ -2163,6 +2161,8 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
>  
>   r535_gsp_dtor_fws(gsp);
>  
> + nvkm_gsp_mem_dtor(gsp, >rmargs);
> + nvkm_gsp_mem_dtor(gsp, >wpr_meta);
>   nvkm_gsp_mem_dtor(gsp, >shm.mem);
>   nvkm_gsp_mem_dtor(gsp, >loginit);
>   nvkm_gsp_mem_dtor(gsp, >logintr);

To be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced 042b5f83841fbf
#regzbot title drm/nouveau: rendering freezes with multi-GPU setup
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2024-01-16 Thread Thorsten Leemhuis
Karol, Lyude, and Daniel:

On 29.11.23 01:37, Owen T. Heisler wrote:
> On 11/21/23 14:23, Owen T. Heisler wrote:
>> On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 15.11.23 07:19, Owen T. Heisler wrote:
>>>> On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On 28.10.23 04:46, Owen T. Heisler wrote:
>>>>>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>>>>>> #regzbot link:
>>>>>> https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>>>>>
>>>>>> 3. Suddenly the secondary Nvidia-connected display turns off and X
>>>>>> stops responding to keyboard/mouse input.
> 
>> I am currently testing v6.6 with the culprit commit reverted.
> 
> - v6.6: fails
> - v6.6 with the culprit commit reverted: works
> 
> See <https://gitlab.freedesktop.org/drm/nouveau/-/issues/180> for full
> details including a decoded kernel log.

Not sure about the others, but it's kind of confusing that you update
the issue descriptions all the time and never add a comment to that ticket.

Anyway: Nouveau maintainers, could any of you at least comment on this?
Sure, it's the regression is caused by an old commit (6eaa1f3c59a707 was
merged for v5.14-rc7) and reverting it likely is not a option, but it
nevertheless it would be great if this could be solved somehow.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke




Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-12-11 Thread Linux regression tracking #update (Thorsten Leemhuis)
[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 28.10.23 04:46, Owen T. Heisler wrote:
> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
> #regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
> 
> ## Problem
> 
> 1. Connect external display to DVI port on dock and run X with both
>    displays in use.
> 2. Wait hours or days.
> 3. Suddenly the secondary Nvidia-connected display turns off and X stops
>    responding to keyboard/mouse input. In *some* cases it is possible to
>    switch to a virtual TTY with Ctrl+Alt+Fn and log in there. In any
>    case, shutdown/reboot after this happens is *usually* not successful
>    (forced power-off is required).
> [...]

It turned out to be a problem that also happens in mainline, so update
the tracking:

#regzbot introduced: 6eaa1f3c59a707332e921e32782ffcad49915c5e
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: [Nouveau] [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 15.11.23 07:19, Owen T. Heisler wrote:
> On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 28.10.23 04:46, Owen T. Heisler wrote:
>>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>>> #regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>>
>>> ## Problem
>>>
>>> 1. Connect external display to DVI port on dock and run X with both
>>>     displays in use.
>>> 2. Wait hours or days.
>>> 3. Suddenly the secondary Nvidia-connected display turns off and X stops
>>>     responding to keyboard/mouse input. In *some* cases it is
>>> possible to
>>>     switch to a virtual TTY with Ctrl+Alt+Fn and log in there.
> 
>> You thus might want to check if the problem occurs with 6.6 -- and
>> ideally also check if reverting the culprit there fixes things for you.
> 
> The problem also occurs with v6.6.

You meanwhile might want to give 6.7-rc as well on the off chance that
it improves things, even if that is unlikely.

> Here is a decoded kernel log from an
> untainted kernel:
> 
> https://gitlab.freedesktop.org/drm/nouveau/uploads/c120faf09da46f9c74006df9f1d14442/async-wait-on-fence-180.log
> 
> The culprit commit does not revert cleanly on v6.6. I have not yet
> attempted to resolve the conflicts.
> 
> I have also updated the bug description at
> <https://gitlab.freedesktop.org/drm/nouveau/-/issues/180>.

Maybe one of the nouveau developer can take a quick look at
d386a4b54607cf and suggest a simple way to revert it in latest mainline.
Maybe just removing the main chunk of code that is added is all that it
takes.

Ciao, Thorsten


Re: [Nouveau] Fwd: System (Xeon Nvidia) hangs at boot terminal after kernel 6.4.7

2023-11-01 Thread Linux regression tracking #update (Thorsten Leemhuis)
[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 10.08.23 06:19, Thorsten Leemhuis wrote:
> On 10.08.23 05:03, Bagas Sanjaya wrote:
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>> [...]
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217776

#regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/255
#regzbot fix: 6eb4a83e612af65bab8492957cba
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



Re: [Nouveau] [REGRESSION]: nouveau: Asynchronous wait on fence

2023-10-31 Thread Linux regression tracking (Thorsten Leemhuis)
On 28.10.23 04:46, Owen T. Heisler wrote:
> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
> #regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
> 
> ## Problem
> 
> 1. Connect external display to DVI port on dock and run X with both
>    displays in use.
> 2. Wait hours or days.
> 3. Suddenly the secondary Nvidia-connected display turns off and X stops
>    responding to keyboard/mouse input. In *some* cases it is possible to
>    switch to a virtual TTY with Ctrl+Alt+Fn and log in there. In any
>    case, shutdown/reboot after this happens is *usually* not successful
>    (forced power-off is required).
> 
> This started happening after the upgrade to Debian bullseye, and the
> problem remains with Debian bookworm.
> [...] 

Thanks for your report. With a bit of luck someone will look into this,
But I doubt it, as this report has some aspects why it might be ignored.
Mainly: (a) the report was about a stable/longterm kernel and (b)it's
afaics unclear if the problem even happens with the latest mainline
kernel. For details about these aspects, see:
https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/

You thus might want to check if the problem occurs with 6.6 -- and
ideally also check if reverting the culprit there fixes things for you.

That might help getting things rolling, but it's a pretty old
regression, which complicates things.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.



Re: [Nouveau] nouveau bug in linux/6.1.38-2

2023-08-31 Thread Linux regression tracking #update (Thorsten Leemhuis)
[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 04.08.23 14:02, Thorsten Leemhuis wrote:
> On 02.08.23 23:28, Olaf Skibbe wrote:
>> Dear Maintainers,
>>
>> Hereby I would like to report an apparent bug in the nouveau driver in
>> linux/6.1.38-2.
> 
> Thx for your report. Maybe your problem is caused by a incomplete
> backport. I Cced the maintainers for the drivers (and the regressions
> and the stable list), maybe one of them has an idea, as they know the
> driver.

#regzbot fix: 98e470dc73a9b3539e5a7a3c72f6b7c01c98
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.




Re: [Nouveau] Fwd: System (Xeon Nvidia) hangs at boot terminal after kernel 6.4.7

2023-08-09 Thread Thorsten Leemhuis
On 10.08.23 05:03, Bagas Sanjaya wrote:
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> Kernel 6.4.6 compiled from source worked AOK on my desktop with Intel Xeon 
>> cpu and Nvidia graphics - see below for system specs.
>>
>> Kernels 6.4.7 & 6.4.8 also compiled from source with identical configs hang 
>> with a frozen boot terminal screen after a significant way through the boot 
>> sequence (e.g. whilst running /etc/profile). The system may still be running 
>> as a sound is emitted when the power button is pressed (only way to escape 
>> from the system hang).
> [...]
>> Computer Profile:
>>  MachineDell Inc. Precision WorkStation T5400   
>> (version: Not Specified)
>>  Mainboard  Dell Inc. 0RW203 (version: NA)
>>  • BIOS Dell Inc. A11 | Date: 04/30/2012 | Type: Legacy
>>  • CPU  Intel(R) Xeon(R) CPU E5450 @ 3.00GHz (4 cores)
>>  • RAM  Total: 7955 MB | Used: 1555 MB (19.5%) | Actual 
>> Used: 775 MB (9.7%)
>>  Graphics   Resolution: 1366x768 pixels | Display Server: 
>> X.Org 21.1.8
>>  • device-0 NVIDIA Corporation GT218 [NVS 300] [10de:10d8] 
>> (rev a2)
>>  Audio  ALSA
>>  • device-0 Intel Corporation 631xESB/632xESB High 
>> Definition Audio Controller [8086:269a] (rev 09)
>>  • device-1 NVIDIA Corporation High Definition Audio 
>> Controller [10de:0be3] (rev a1)
>>  Networkwlan1
>>  • device-0 Ethernet: Broadcom Inc. and subsidiaries 
>> NetXtreme BCM5754 Gigabit Ethernet PCI Express [14e4:167a] (rev 02)
> 
> See Bugzilla for the full thread.
> [...]
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217776

Not my area of expertise, but nevertheless pretty sure this is the same
issue already discussed here, as it's a GT218 there as well at 6.4.7 is
the version that commit was backported to:

https://lore.kernel.org/all/20230806213107.GFZNARG6moWpFuSJ9W@fat_crate.local/

No final solution ready yet, but looks like the culprit will be reverted.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: [Nouveau] 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

2023-08-09 Thread Thorsten Leemhuis
On 09.08.23 15:13, Takashi Iwai wrote:
> 
> If this can't be fixed quickly, I suppose it's safer to revert it from
> 6.4.y for now.  6.5 is still being cooked, but 6.4.x is already in
> wide deployment, hence the regression has to be addressed quickly.

Good luck with that. To quote
https://docs.kernel.org/process/handling-regressions.html :

```
Regarding stable and longterm kernels:

[...]

* Whenever you want to swiftly resolve a regression that recently also
made it into a proper mainline, stable, or longterm release, fix it
quickly in mainline; when appropriate thus involve Linus to fast-track
the fix (see above). That's because the stable team normally does
neither revert nor fix any changes that cause the same problems in mainline.
```

Note the "normally" in there, so there is a chance.

Ciao, Thorsten


Re: [Nouveau] 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

2023-08-07 Thread Thorsten Leemhuis
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 06.08.23 23:31, Borislav Petkov wrote:
> 
> the patch in $Subject

Side note, in case anyone cares: it was also included in 6.4.7.

> breaks booting here on one of my test boxes, see
> below.
> 
> Reverting it ontop of -rc4 fixes the issue.
> 
> Thx.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 2b5d1c29f6c4
#regzbot title drm/nouveau: stopped booting
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> [3.580535] ACPI: \_PR_.CP04: Found 4 idle states
> [3.585694] ACPI: \_PR_.CP05: Found 4 idle states
> [3.590852] ACPI: \_PR_.CP06: Found 4 idle states
> [3.596037] ACPI: \_PR_.CP07: Found 4 idle states
> [3.644065] Freeing initrd memory: 6740K
> [3.742932] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [3.750409] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 
> 16550A
> [3.762111] serial :00:16.3: enabling device ( -> 0003)
> [3.771589] :00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 
> 115200) is a 16550A
> [3.782503] Linux agpgart interface v0.103
> [3.787805] ACPI: bus type drm_connector registered
> 
> <--- boot stops here.
> 
> It should continue with this:
> 
> [3.795491] Console: switching to colour dummy device 80x25
> [3.801933] nouveau :03:00.0: vgaarb: deactivate vga console
> [3.808303] nouveau :03:00.0: NVIDIA GT218 (0a8c00b1)
> [3.931002] nouveau :03:00.0: bios: version 70.18.83.00.08
> [3.941731] nouveau :03:00.0: fb: 512 MiB DDR3
> [4.110348] tsc: Refined TSC clocksource calibration: 3591.349 MHz
> [4.116627] clocksource: tsc: mask: 0x max_cycles: 
> 0x33c466a1ab5, max_idle_ns: 440795209767 ns
> [4.126871] clocksource: Switched to clocksource tsc
> [4.252013] nouveau :03:00.0: DRM: VRAM: 512 MiB
> [4.257088] nouveau :03:00.0: DRM: GART: 1048576 MiB
> [4.262501] nouveau :03:00.0: DRM: TMDS table version 2.0
> [4.268333] nouveau :03:00.0: DRM: DCB version 4.0
> [4.273561] nouveau :03:00.0: DRM: DCB outp 00: 02000360 
> [4.280104] nouveau :03:00.0: DRM: DCB outp 01: 02000362 00020010
> [4.286630] nouveau :03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> [4.293176] nouveau :03:00.0: DRM: DCB outp 03: 01011380 
> [4.299711] nouveau :03:00.0: DRM: DCB outp 04: 08011382 00020010
> [4.306243] nouveau :03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> [4.312772] nouveau :03:00.0: DRM: DCB conn 00: 00101064
> [4.318520] nouveau :03:00.0: DRM: DCB conn 01: 00202165
> [4.329488] nouveau :03:00.0: DRM: MM: using COPY for buffer copies
> [4.336261] stackdepot: allocating hash table of 1048576 entries via 
> kvcalloc
> ...
> 
> 


Re: [Nouveau] nouveau bug in linux/6.1.38-2

2023-08-04 Thread Thorsten Leemhuis
Hi!

On 02.08.23 23:28, Olaf Skibbe wrote:
> Dear Maintainers,
> 
> Hereby I would like to report an apparent bug in the nouveau driver in
> linux/6.1.38-2.

Thx for your report. Maybe your problem is caused by a incomplete
backport. I Cced the maintainers for the drivers (and the regressions
and the stable list), maybe one of them has an idea, as they know the
driver.

If they don't reply in the next few days, please check if the problem is
also present in mainline. If not, check if the latest 6.1.y. release
already fixes this. If not, try to check which of the four patches you
reverted to make things going is actually causing this (e.g. first only
revert the one that was applied last; then the two last ones; ...).

> Running a current debian stable on a Dell Latitude E6510 with a
> "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> just a black screen. Access via ssh is possible.
> 
> ~# uname -r
> 6.1.0-10-amd64
> 
> demesg shows the following error message:
> 
> [    3.560153] WARNING: CPU: 0 PID: 176 at
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> video wmi button
> [    3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> 6.1.0-10-amd64 #1  Debian 6.1.38-2
> [    3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> 05/12/2017
> [    3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [    3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [    3.560541] RSP: 0018:9899c048bd60 EFLAGS: 00010246
> [    3.560542] RAX: 00041eb0 RBX: 88e0209d2600 RCX:
> 00041eb0
> [    3.560544] RDX: c079f760 RSI:  RDI:
> 9899c048bcf0
> [    3.560545] RBP: 0001 R08: 9899c048bc64 R09:
> 5b76
> [    3.560546] R10: 000d R11: 9899c048bde0 R12:
> ffea
> [    3.560548] R13: 88e00b39e480 R14: 00044d45 R15:
> 
> [    3.560549] FS:  () GS:88e123c0()
> knlGS:
> [    3.560551] CS:  0010 DS:  ES:  CR0: 80050033
> [    3.560552] CR2: 7f57f4e90451 CR3: 00018141 CR4:
> 06f0
> [    3.560554] Call Trace:
> [    3.560558]  
> [    3.560560]  ? __warn+0x7d/0xc0
> [    3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560671]  ? report_bug+0xe6/0x170
> [    3.560675]  ? handle_bug+0x41/0x70
> [    3.560679]  ? exc_invalid_op+0x13/0x60
> [    3.560681]  ? asm_exc_invalid_op+0x16/0x20
> [    3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
> [    3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [    3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
> [    3.561103]  process_one_work+0x1c7/0x380
> [    3.561109]  worker_thread+0x4d/0x380
> [    3.561113]  ? rescuer_thread+0x3a0/0x3a0
> [    3.561116]  kthread+0xe9/0x110
> [    3.561120]  ? kthread_complete_and_exit+0x20/0x20
> [    3.561122]  ret_from_fork+0x22/0x30
> [    3.561130]  
> 
> Further information:
> 
> $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> (rev a2) (prog-if 00 [VGA controller])
> Subsystem: Dell Latitude E6510
> Flags: bus master, fast devsel, latency 0, IRQ 27
> Memory at e200 (32-bit, non-prefetchable) [size=16M]
> Memory at d000 (64-bit, prefetchable) [size=256M]
> Memory at e000 (64-bit, prefetchable) [size=32M]
> I/O ports at 7000 [size=128]
> Expansion ROM at 000c [disabled] [size=128K]
> Capabilities: 
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> I reported this bug to debian already, see
> https://bugs.debian.org/1042753 for context.
> 
> With support (thanks Diederik!) I managed to figure out that the cause
> was a regression between upstream kernel version 6.1.27 and 6.1.38.
> 
> I build a new 6.1.38 kernel with these commits reverted:
> 
> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA 

Re: [Nouveau] Fwd: absent both plymouth, and video= on linu lines, vtty[1-6] framebuffers produce vast raster right and bottom borders on the larger resolution of two displays

2023-05-25 Thread Thorsten Leemhuis
On 25.05.23 12:55, Bagas Sanjaya wrote:
> On 5/25/23 17:52, Bagas Sanjaya wrote:
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>> [...]
>> Anyway, I'm adding it to regzbot:
>>
>> #regzbot introduced: v6.1.12..v6.2.12
>> #regzbot title: vast raster right and bottom borders on larger display (two 
>> displays with inequal resolution) unless forcing resolution with video= 
>> parameter

Bagas, thx again for your efforts, much appreciated. But I guess for drm
drivers that have a line like

B: https://gitlab.freedesktop.org/drm/[...]

in MAINTAINERS (which includes all the popular drm drivers) this just
creates a lot of confusion for everyone, as one issue will likely end up
being discussed in two or three places in parallel (bugzilla,
freedesktop, email). Better tell reporters to move their issue to the
freedesktop drm tracker and close the ticket in bugzilla. And don't get
regzbot involved, as it for now it sadly is unable to monitor the
freedesktop drm tracker (sooner or later I'll fix that, but for now it's
a blind spot :-/).

Pretty sure none of the DRM developers will disagree, but if I'm wrong,
please holler.

> Oops, I forget to add bugzilla link:
> 
> #regzbot introduced: v6.1.12..v6.2.12 
> https://bugzilla.kernel.org/show_bug.cgi?id=217479
> #regzbot from: Felix Miata 

Side note: that currently does not work with regzbot. :-/ Whatever, I'll
remove it from the tracking due to above reasons:

#regzbot inconclusive: sadly not tracked for now

Ciao, Thorsten


Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-03-12 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.03.23 11:20, Karol Herbst wrote:
> On Fri, Mar 10, 2023 at 10:26 AM Chris Clayton  
> wrote:
>>
>> Is it likely that this fix will be sumbmitted to mainline during the ongoing 
>> 6.3 development cycle?
>>
> 
> yes, it's already pushed to drm-misc-fixed, which then will go into
> the current devel cycle. I just don't know when it's the next time it
> will be pushed upwards, but it should get there eventually. 

FWIW, the fix landed now as 1b9b4f922f96 ; sadly without a Link: tag to
the report, hence I have to mark this manually as resolved:

#regzbot fix: 1b9b4f922f96108da3bb5d87b2d603f5dfbc5650

> And
> because it also contains a Fixes tag it will be backported to older
> branches as well.

FWIW, nope, that's not enough you have to tag those explicitly to ensure
backporting, as explained in
Documentation/process/stable-kernel-rules.rst Greg points that out every
few weeks, recently here for example:

https://lore.kernel.org/all/y6bwpo9s9qbns...@kroah.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> Chris
>>
>> On 20/02/2023 22:16, Ben Skeggs wrote:
>>> On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:
>>>>
>>>> On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 20/02/2023 05:35, Ben Skeggs wrote:
>>>>>> On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 18/02/2023 15:19, Chris Clayton wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18/02/2023 12:25, Karol Herbst wrote:
>>>>>>>>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 15/02/2023 11:09, Karol Herbst wrote:
>>>>>>>>>>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>>>>>>>>>>> (Thorsten Leemhuis)  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 13.02.23 10:14, Chris Clayton wrote:
>>>>>>>>>>>>> On 13/02/2023 02:57, Dave Airlie wrote:
>>>>>>>>>>>>>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten 
>>>>>>>>>>>>>>> Leemhuis) wrote:
>>>>>>>>>>>>>>>> On 10.02.23 20:01, Karol Herbst wrote:
>>>>>>>>>>>>>>>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
>>>>>>>>>>>>>>>>> (Thorsten
>>>>>>>>>>>>>>>>> Leemhuis)  wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 08.02.23 09:48, Chris Clayton wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm assuming  that we are not going to see a fix for this 
>>>>>>>>>>>>>>>>>>> regression before 6.2 is released.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yeah, looks like it. That's unfortunate, but happens. But 
>>>>>>>>>>>>>>>>>> there is still
>>>>>>>>>>>>>>>>>> time to fix it and there is one thing I wonder:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Did any of the nouveau developers look at the netconsole 
>>>>>>>>>>>>>>>>>> captures Chris
>>>>&

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-15 Thread Linux regression tracking #update (Thorsten Leemhuis)
On 13.02.23 10:14, Chris Clayton wrote:
> On 13/02/2023 02:57, Dave Airlie wrote:
>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>>
>>>
>>>
>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> On 10.02.23 20:01, Karol Herbst wrote:
>>>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>>>>> Leemhuis)  wrote:
>>>>>>
>>>>>> On 08.02.23 09:48, Chris Clayton wrote:
>>>>>>>
>>>>>>> I'm assuming  that we are not going to see a fix for this regression 
>>>>>>> before 6.2 is released.
>>>>>>
>>>>>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>>>>>> time to fix it and there is one thing I wonder:
>>>>>>
>>>>>> Did any of the nouveau developers look at the netconsole captures Chris
>>>>>> posted more than a week ago to check if they somehow help to track down
>>>>>> the root of this problem?
>>>>>
>>>>> I did now and I can't spot anything. I think at this point it would
>>>>> make sense to dump the active tasks/threads via sqsrq keys to see if
>>>>> any is in a weird state preventing the machine from shutting down.
>>>>
>>>> Many thx for looking into it!
>>>
>>> Yes, thanks Karol.
>>>
>>> Attached is the output from dmesg when this block of code:
>>>
>>> /bin/mount /dev/sda7 /mnt/sda7
>>> /bin/mountpoint /proc || /bin/mount /proc
>>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>>> /bin/echo t > /proc/sysrq-trigger
>>> /bin/sleep 1
>>> /bin/sync
>>> /bin/sleep 1
>>> kill $(pidof dmesg)
>>> /bin/umount /mnt/sda7
>>>
>>> is executed immediately before /sbin/reboot is called as the final step of 
>>> rebooting my system.
>>>
>>> I hope this is what you were looking for, but if not, please let me know 
>>> what you need
> 
> Thanks Dave. [...]
FWIW, in case anyone strands here in the archives: the msg was
truncated. The full post can be found in a new thread:

https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/

Sadly it seems the info "With runpm=0, both reboot and poweroff work on
my laptop." didn't bring us much further to a solution. :-/ I don't
really like it, but for regression tracking I'm now putting this on the
back-burner, as a fix is not in sight.

#regzbot monitor:
https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
#regzbot backburner: hard to debug and apparently rare
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

#regzbot ignore-activity


Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this regression before 
>>> 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures Chris
>> posted more than a week ago to check if they somehow help to track down
>> the root of this problem?
> 
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

Many thx for looking into it!

Ciao, Thorsten

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>>> Consequently, I've
>>> implemented a (very simple) workaround. All that happens is that in the 
>>> (sysv) init script that starts and stops SDDM,
>>> the nouveau module is removed once SDDM is stopped. With that in place, my 
>>> system no longer freezes on reboot or poweroff.
>>>
>>> Let me know if I can provide any additional diagnostics although, with the 
>>> problem seemingly occurring so late in the
>>> shutdown process, I may need help on how to go about capturing.
>>>
>>> Chris
>>>
>>> On 02/02/2023 20:45, Chris Clayton wrote:
>>>>
>>>>
>>>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>>>
>>>>>
>>>>> On 30/01/2023 23:27, Ben Skeggs wrote:
>>>>>> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi again.
>>>>>>>
>>>>>>> On 30/01/2023 20:19, Chris Clayton wrote:
>>>>>>>> Thanks, Ben.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>>>>>>>> *any* of my boards.  Could you try the attached patch please?
>>>>>>>>
>>>>>>>> Unfortunately, the patch made no difference.
>>>>>>>>
>>>>>>>> I've been looking at how the graphics on my laptop is set up, and have 
>>>>>>>> a bit of a worry about whether the firmware might
>>>>>>>> be playing a part in this problem. In order to offload video decoding 
>>>>>>>> to the NVidia TU117 GPU, it seems the scrubber
>>>>>>>> firmware must be available, but as far as I know,that has not been 
>>>>>>>> released by NVidia. To get it to work, I followed
>>>>>>>> what ubuntu have done and the scrubber in 
>>>>>>>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>>>>>>>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of 
>>>>>>>> the firmware loaded is for a different card is being
>>>>>>>> loaded. I note that processing related to firmware is being changed in 
>>>>>>>> the patch. Might my set up be at the root of my
>>>>>>>> problem?
>>>>>>>>
>>>>>>>> I'll have a fiddle an see what I can work out.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ben.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>> Well, my fiddling has got my system rebooting and shutting down 
>>>>>>> successfully again. I found that if I delete the symlink
>>>>>>> to the scrubber firmware, reboot and shutdown work again. There are 
>>>>>>> however, a number of other files in the tu117
>>>>>>> firmware directory tree that that are symlinks to actual files in its 
&g

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.02.23 09:48, Chris Clayton wrote:
> 
> I'm assuming  that we are not going to see a fix for this regression before 
> 6.2 is released.

Yeah, looks like it. That's unfortunate, but happens. But there is still
time to fix it and there is one thing I wonder:

Did any of the nouveau developers look at the netconsole captures Chris
posted more than a week ago to check if they somehow help to track down
the root of this problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Consequently, I've
> implemented a (very simple) workaround. All that happens is that in the 
> (sysv) init script that starts and stops SDDM,
> the nouveau module is removed once SDDM is stopped. With that in place, my 
> system no longer freezes on reboot or poweroff.
> 
> Let me know if I can provide any additional diagnostics although, with the 
> problem seemingly occurring so late in the
> shutdown process, I may need help on how to go about capturing.
> 
> Chris
> 
> On 02/02/2023 20:45, Chris Clayton wrote:
>>
>>
>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>
>>>
>>> On 30/01/2023 23:27, Ben Skeggs wrote:
 On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
 wrote:
>
> Hi again.
>
> On 30/01/2023 20:19, Chris Clayton wrote:
>> Thanks, Ben.
>
> 
>
>>> Hey,
>>>
>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>> *any* of my boards.  Could you try the attached patch please?
>>
>> Unfortunately, the patch made no difference.
>>
>> I've been looking at how the graphics on my laptop is set up, and have a 
>> bit of a worry about whether the firmware might
>> be playing a part in this problem. In order to offload video decoding to 
>> the NVidia TU117 GPU, it seems the scrubber
>> firmware must be available, but as far as I know,that has not been 
>> released by NVidia. To get it to work, I followed
>> what ubuntu have done and the scrubber in 
>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
>> firmware loaded is for a different card is being
>> loaded. I note that processing related to firmware is being changed in 
>> the patch. Might my set up be at the root of my
>> problem?
>>
>> I'll have a fiddle an see what I can work out.
>>
>> Chris
>>
>>>
>>> Thanks,
>>> Ben.
>>>

>
> Well, my fiddling has got my system rebooting and shutting down 
> successfully again. I found that if I delete the symlink
> to the scrubber firmware, reboot and shutdown work again. There are 
> however, a number of other files in the tu117
> firmware directory tree that that are symlinks to actual files in its 
> tu116 counterpart. So I deleted all of those too.
> Unfortunately, the absence of one or more of those symlinks causes Xorg 
> to fail to start. I've reinstated all the links
> except scrubber and I now have a system that works as it did until I 
> tried to run a kernel that includes the bad commit
> I identified in my bisection. That includes offloading video decoding to 
> the NVidia card, so what ever I read that said
> the scrubber firmware was needed seems to have been wrong. I get a new 
> message that (nouveau :01:00.0: fb: VPR
> locked, but no scrubber binary!), but, hey, we can't have everything.
>
> If you still want to get to the bottom of this, let me know what you need 
> me to provide and I'll do my best. I suspect
> you might want to because there will a n awful lot of Ubuntu-based 
> systems out there with that scrubber.bin symlink in
> place. On the other hand,m it could but quite a while before ubuntu are 
> deploying 6.2 or later kernels.
 The symlinks are correct - whole groups of GPUs share the same FW, and
 we use symlinks in linux-firmware to represent this.

 I don't really have any ideas how/why this patch causes issues with
 shutdown - it's a path that only gets executed during initialisation.
 Can you try and capture the kernel log during shutdown ("dmesg -w"
 over ssh? netconsole?), and see if there's any relevant messages
 providing a hint at what's going on?  Alternatively, you could try
 unloading the module (you will have to stop X/wayland/gdm/etc/etc
 first) and seeing if that hangs too.

 Ben.
>>>
>>> Sorry for the delay - I've been learning about netconsole and netcat. 
>>> However, I had no success with ssh and netconsole
>>> produced a log with nothing unusual in it.
>>>
>>> Simply stopping Xorg and removing the nouveau module succeeds.
>>>
>>> So, I rebuilt rc6+ after a pull from linus' 

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
On 27.01.23 20:46, Chris Clayton wrote:
> [Resend because the mail client on my phone decided to turn HTML on behind my 
> back, so my reply got bounced.]
> 
> Thanks Thorsten.
> 
> I did try to revert but it didnt revert cleanly and I don't have the 
> knowledge to fix it up.
> 
> The patch was part of a merge that included a number of related patches. 
> Tomorrow, I'll try to revert the lot and report
> back.

You are free to do so, but there is no need for that from my side. I
only wanted to know if a simple revert would do the trick; if it
doesn't, it in my experience often is best to leave things to the
developers of the code in question, as they know it best and thus have a
better idea which hidden side effect a more complex revert might have.

Ciao, Thorsten

> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
> wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> @nouveau-maintainers, did anyone take a look at this? The report is
>> already 8 days old and I don't see a single reply. Sure, we'll likely
>> get a -rc8, but still it would be good to not fix this on the finish line.
>>
>> Chris, btw, did you try if you can revert the commit on top of latest
>> mainline? And if so, does it fix the problem?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> [adding various lists and the two other nouveau maintainers to the list
>>> of recipients]
>>
>>> On 18.01.23 21:59, Chris Clayton wrote:
>>>> Hi.
>>>>
>>>> I build and installed the lastest development kernel earlier this week. 
>>>> I've found that when I try the laptop down (or
>>>> reboot it), it hangs right at the end of closing the current session. The 
>>>> last line I see on  the screen when rebooting is:
>>>>
>>>>sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>>
>>>> when closing down I see one additional line:
>>>>
>>>>sd 4:0:0:0 [sda]Stopping disk
>>>>
>>>> In both cases the machine then hangs and I have to hold down the power 
>>>> button fot a few seconds to switch it off.
>>>>
>>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
>>>> landed on:
>>>>
>>>># first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>>>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>>>> (VPR scrubber)
>>>>
>>>> I built and installed a kernel with 
>>>> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
>>>> checked out
>>>> and that shuts down and reboots fine. It the did the same with the bad 
>>>> commit checked out and that does indeed hang, so
>>>> I'm confident the bisect outcome is OK.
>>>>
>>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>>
>>>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>>>> 'lscpi -v' is:
>>>>
>>>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>>>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>>>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>>>
>>>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>>>
>>>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>>>
>>>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>>>
>>>> I/O ports at 5000 [size=64]
>>>>
>>>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>>>
>>>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>>>
>>>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>>>
>>>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>>>
>>>> Capabilities: [d0] Power Management version 2
>>>>
>>>> Kernel driver in use: i915
>>>>
>>>> Kernel modul

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

@nouveau-maintainers, did anyone take a look at this? The report is
already 8 days old and I don't see a single reply. Sure, we'll likely
get a -rc8, but still it would be good to not fix this on the finish line.

Chris, btw, did you try if you can revert the commit on top of latest
mainline? And if so, does it fix the problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> [adding various lists and the two other nouveau maintainers to the list
> of recipients]

> On 18.01.23 21:59, Chris Clayton wrote:
>> Hi.
>>
>> I build and installed the lastest development kernel earlier this week. I've 
>> found that when I try the laptop down (or
>> reboot it), it hangs right at the end of closing the current session. The 
>> last line I see on  the screen when rebooting is:
>>
>>  sd 4:0:0:0: [sda] Synchronising SCSI cache
>>
>> when closing down I see one additional line:
>>
>>  sd 4:0:0:0 [sda]Stopping disk
>>
>> In both cases the machine then hangs and I have to hold down the power 
>> button fot a few seconds to switch it off.
>>
>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
>> on:
>>
>>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>> (VPR scrubber)
>>
>> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
>> (the parent of the bad commit) checked out
>> and that shuts down and reboots fine. It the did the same with the bad 
>> commit checked out and that does indeed hang, so
>> I'm confident the bisect outcome is OK.
>>
>> Kernels 6.1.6 and 5.15.88 are also OK.
>>
>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>> 'lscpi -v' is:
>>
>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>
>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>
>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>
>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>
>> I/O ports at 5000 [size=64]
>>
>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>
>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>
>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>
>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>
>> Capabilities: [d0] Power Management version 2
>>
>> Kernel driver in use: i915
>>
>> Kernel modules: i915
>>
>>
>> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
>> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
>> controller])
>> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
>> Flags: bus master, fast devsel, latency 0, IRQ 141
>> Memory at c400 (32-bit, non-prefetchable) [size=16M]
>> Memory at b000 (64-bit, prefetchable) [size=256M]
>> Memory at c000 (64-bit, prefetchable) [size=32M]
>> I/O ports at 4000 [size=128]
>> Expansion ROM at c300 [disabled] [size=512K]
>> Capabilities: [60] Power Management version 3
>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> Capabilities: [78] Express Legacy Endpoint, MSI 00
>> Kernel driver in use: nouveau
>> Kernel modules: nouveau
>>
>> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
>> sysvinit).
>>
>> I've attached the bisect.log, but please let me know if I can provide any 
>> other diagnostics. Please cc me as I'm not
>> subscribed.
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e44c2170876197
> #regzbot title drm: nouveau: hangs on poweroff/reboot
> #regzbot ignore-activity
> 
> This isn't a regression? This is

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-19 Thread Linux kernel regression tracking (Thorsten Leemhuis)
[adding various lists and the two other nouveau maintainers to the list
of recipients]

For the rest of this mail:

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 18.01.23 21:59, Chris Clayton wrote:
> Hi.
> 
> I build and installed the lastest development kernel earlier this week. I've 
> found that when I try the laptop down (or
> reboot it), it hangs right at the end of closing the current session. The 
> last line I see on  the screen when rebooting is:
> 
>   sd 4:0:0:0: [sda] Synchronising SCSI cache
> 
> when closing down I see one additional line:
> 
>   sd 4:0:0:0 [sda]Stopping disk
> 
> In both cases the machine then hangs and I have to hold down the power button 
> fot a few seconds to switch it off.
> 
> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
> on:
> 
>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> (VPR scrubber)
> 
> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
> (the parent of the bad commit) checked out
> and that shuts down and reboots fine. It the did the same with the bad commit 
> checked out and that does indeed hang, so
> I'm confident the bisect outcome is OK.
> 
> Kernels 6.1.6 and 5.15.88 are also OK.
> 
> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> 'lscpi -v' is:
> 
> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> Graphics] (rev 05) (prog-if 00 [VGA controller])
> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> 
> Flags: bus master, fast devsel, latency 0, IRQ 142
> 
> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> 
> Memory at a000 (64-bit, prefetchable) [size=256M]
> 
> I/O ports at 5000 [size=64]
> 
> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> 
> Capabilities: [40] Vendor Specific Information: Len=0c 
> 
> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> 
> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 
> Capabilities: [d0] Power Management version 2
> 
> Kernel driver in use: i915
> 
> Kernel modules: i915
> 
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> controller])
> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> Memory at b000 (64-bit, prefetchable) [size=256M]
> Memory at c000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> Expansion ROM at c300 [disabled] [size=512K]
> Capabilities: [60] Power Management version 3
> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [78] Express Legacy Endpoint, MSI 00
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
> sysvinit).
> 
> I've attached the bisect.log, but please let me know if I can provide any 
> other diagnostics. Please cc me as I'm not
> subscribed.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e44c2170876197
#regzbot title drm: nouveau: hangs on poweroff/reboot
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-01-13 Thread Linux kernel regression tracking (Thorsten Leemhuis)
[CCing Daniel]

On 05.01.23 13:28, Thorsten Leemhuis wrote:
> [adding Karol and Lyude to the list of recipients]
> 
> On 28.12.22 15:49, Diogo Ivo wrote:
>> Hello,
>>
>> Commit 2541626cfb79 breaks GM20B probe with
>> the following kernel log:
> Just wondering: is anyone looking on this? The report was posted more
> than a week ago and didn't even get a single reply yet afaics. This of
> course can happen at this time of the year, but I nevertheless thought a
> quick status inquiry might be a good idea at this point.

Hmmm, the report is now more that two weeks old and didn't get a single
reply. My prodding about a week ago also didn't help. Then I guess I
have to bring this to Linus attention, unless something happens in the
next 2 days.

Diogo, for that it would be really helpful to known: is the issue still
happening with latest mainline? Is it possible to revert 2541626cfb79
easily? And if so: do things work afterwards again?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>> [2.153892] [ cut here ]
>> [2.153897] WARNING: CPU: 1 PID: 36 at 
>> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 
>> gf100_vmm_valid+0x2c4/0x390
>> [2.153916] Modules linked in:
>> [2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
>> [2.153929] Hardware name: Google Pixel C (DT)
>> [2.153933] Workqueue: events_unbound deferred_probe_work_func
>> [2.153943] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS 
>> BTYPE=--)
>> [2.153950] pc : gf100_vmm_valid+0x2c4/0x390
>> [2.153959] lr : gf100_vmm_valid+0xb4/0x390
>> [2.153966] sp : ffc009e134b0
>> [2.153969] x29: ffc009e134b0 x28:  x27: 
>> ffc008fd44c8
>> [2.153979] x26: ffea x25: ffc0087b98d0 x24: 
>> ff8080f89038
>> [2.153987] x23: ff8081fadc08 x22:  x21: 
>> 
>> [2.153995] x20: ff8080f8a000 x19: ffc009e13678 x18: 
>> 
>> [2.154003] x17: f37a8b93418958e6 x16: ffc009f0d000 x15: 
>> 
>> [2.154011] x14: 0002 x13: 0003a020 x12: 
>> ffc00800
>> [2.154019] x11: 000102913000 x10:  x9 : 
>> 
>> [2.154026] x8 : ffc009e136d8 x7 : ffc008fd44c8 x6 : 
>> ff80803d0f00
>> [2.154034] x5 :  x4 : ff8080f88c00 x3 : 
>> 0010
>> [2.154041] x2 : 000c x1 : ffea x0 : 
>> ffea
>> [2.154050] Call trace:
>> [2.154053]  gf100_vmm_valid+0x2c4/0x390
>> [2.154061]  nvkm_vmm_map_valid+0xd4/0x204
>> [2.154069]  nvkm_vmm_map_locked+0xa4/0x344
>> [2.154076]  nvkm_vmm_map+0x50/0x84
>> [2.154083]  nvkm_firmware_mem_map+0x84/0xc4
>> [2.154094]  nvkm_falcon_fw_oneinit+0xc8/0x320
>> [2.154101]  nvkm_acr_oneinit+0x428/0x5b0
>> [2.154109]  nvkm_subdev_oneinit_+0x50/0x104
>> [2.154114]  nvkm_subdev_init_+0x3c/0x12c
>> [2.154119]  nvkm_subdev_init+0x60/0xa0
>> [2.154125]  nvkm_device_init+0x14c/0x2a0
>> [2.154133]  nvkm_udevice_init+0x60/0x9c
>> [2.154140]  nvkm_object_init+0x48/0x1b0
>> [2.154144]  nvkm_ioctl_new+0x168/0x254
>> [2.154149]  nvkm_ioctl+0xd0/0x220
>> [2.154153]  nvkm_client_ioctl+0x10/0x1c
>> [2.154162]  nvif_object_ctor+0xf4/0x22c
>> [2.154168]  nvif_device_ctor+0x28/0x70
>> [2.154174]  nouveau_cli_init+0x150/0x590
>> [2.154180]  nouveau_drm_device_init+0x60/0x2a0
>> [2.154187]  nouveau_platform_device_create+0x90/0xd0
>> [2.154193]  nouveau_platform_probe+0x3c/0x9c
>> [2.154200]  platform_probe+0x68/0xc0
>> [2.154207]  really_probe+0xbc/0x2dc
>> [2.154211]  __driver_probe_device+0x78/0xe0
>> [2.154216]  driver_probe_device+0xd8/0x160
>> [2.154221]  __device_attach_driver+0xb8/0x134
>> [2.154226]  bus_for_each_drv+0x78/0xd0
>> [2.154230]  __device_attach+0x9c/0x1a0
>> [2.154234]  device_initial_probe+0x14/0x20
>> [2.154239]  bus_probe_device+0x98/0xa0
>> [2.154243]  deferred_probe_work_func+0x88/0xc0
>> [2.154247]  process_one_work+0x204/0x40c
>> [2.154256]  worker_thread+0x230/0x450
>> [2.154261]  kthread+0xc8/0xcc
>> [2.154266]  ret_from_fork+0x10/0x20
>> [2.154273] 

Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-01-05 Thread Thorsten Leemhuis
[adding Karol and Lyude to the list of recipients]

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

On 28.12.22 15:49, Diogo Ivo wrote:
> Hello,
> 
> Commit 2541626cfb79 breaks GM20B probe with
> the following kernel log:
Just wondering: is anyone looking on this? The report was posted more
than a week ago and didn't even get a single reply yet afaics. This of
course can happen at this time of the year, but I nevertheless thought a
quick status inquiry might be a good idea at this point.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> [2.153892] [ cut here ]
> [2.153897] WARNING: CPU: 1 PID: 36 at 
> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 
> gf100_vmm_valid+0x2c4/0x390
> [2.153916] Modules linked in:
> [2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
> [2.153929] Hardware name: Google Pixel C (DT)
> [2.153933] Workqueue: events_unbound deferred_probe_work_func
> [2.153943] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [2.153950] pc : gf100_vmm_valid+0x2c4/0x390
> [2.153959] lr : gf100_vmm_valid+0xb4/0x390
> [2.153966] sp : ffc009e134b0
> [2.153969] x29: ffc009e134b0 x28:  x27: 
> ffc008fd44c8
> [2.153979] x26: ffea x25: ffc0087b98d0 x24: 
> ff8080f89038
> [2.153987] x23: ff8081fadc08 x22:  x21: 
> 
> [2.153995] x20: ff8080f8a000 x19: ffc009e13678 x18: 
> 
> [2.154003] x17: f37a8b93418958e6 x16: ffc009f0d000 x15: 
> 
> [2.154011] x14: 0002 x13: 0003a020 x12: 
> ffc00800
> [2.154019] x11: 000102913000 x10:  x9 : 
> 
> [2.154026] x8 : ffc009e136d8 x7 : ffc008fd44c8 x6 : 
> ff80803d0f00
> [2.154034] x5 :  x4 : ff8080f88c00 x3 : 
> 0010
> [2.154041] x2 : 000c x1 : ffea x0 : 
> ffea
> [2.154050] Call trace:
> [2.154053]  gf100_vmm_valid+0x2c4/0x390
> [2.154061]  nvkm_vmm_map_valid+0xd4/0x204
> [2.154069]  nvkm_vmm_map_locked+0xa4/0x344
> [2.154076]  nvkm_vmm_map+0x50/0x84
> [2.154083]  nvkm_firmware_mem_map+0x84/0xc4
> [2.154094]  nvkm_falcon_fw_oneinit+0xc8/0x320
> [2.154101]  nvkm_acr_oneinit+0x428/0x5b0
> [2.154109]  nvkm_subdev_oneinit_+0x50/0x104
> [2.154114]  nvkm_subdev_init_+0x3c/0x12c
> [2.154119]  nvkm_subdev_init+0x60/0xa0
> [2.154125]  nvkm_device_init+0x14c/0x2a0
> [2.154133]  nvkm_udevice_init+0x60/0x9c
> [2.154140]  nvkm_object_init+0x48/0x1b0
> [2.154144]  nvkm_ioctl_new+0x168/0x254
> [2.154149]  nvkm_ioctl+0xd0/0x220
> [2.154153]  nvkm_client_ioctl+0x10/0x1c
> [2.154162]  nvif_object_ctor+0xf4/0x22c
> [2.154168]  nvif_device_ctor+0x28/0x70
> [2.154174]  nouveau_cli_init+0x150/0x590
> [2.154180]  nouveau_drm_device_init+0x60/0x2a0
> [2.154187]  nouveau_platform_device_create+0x90/0xd0
> [2.154193]  nouveau_platform_probe+0x3c/0x9c
> [2.154200]  platform_probe+0x68/0xc0
> [2.154207]  really_probe+0xbc/0x2dc
> [2.154211]  __driver_probe_device+0x78/0xe0
> [2.154216]  driver_probe_device+0xd8/0x160
> [2.154221]  __device_attach_driver+0xb8/0x134
> [2.154226]  bus_for_each_drv+0x78/0xd0
> [2.154230]  __device_attach+0x9c/0x1a0
> [2.154234]  device_initial_probe+0x14/0x20
> [2.154239]  bus_probe_device+0x98/0xa0
> [2.154243]  deferred_probe_work_func+0x88/0xc0
> [2.154247]  process_one_work+0x204/0x40c
> [2.154256]  worker_thread+0x230/0x450
> [2.154261]  kthread+0xc8/0xcc
> [2.154266]  ret_from_fork+0x10/0x20
> [2.154273] ---[ end trace  ]---
> [2.154278] nouveau 5700.gpu: pmu: map -22
> [2.154285] nouveau 5700.gpu: acr: one-time init failed, -22
> [2.154559] nouveau 5700.gpu: init failed with -22
> [2.154564] nouveau: DRM-master::0080: init failed with -22
> [2.154574] nouveau 5700.gpu: DRM-master: Device allocation failed: -22
> [2.162905] nouveau: probe of 5700.gpu failed with error -22
> 
> #regzbot introduced: 2541626cfb79
> 
> Thanks,
> 
> Diogo Ivo
> 
> 

#regzbot poke


Re: [Nouveau] [PATCH] drm/nouveau: wait for the exclusive fence after the shared ones v2

2021-12-21 Thread Thorsten Leemhuis
Hi, this is your Linux kernel regression tracker speaking.

CCing Dave and Daniel.

On 15.12.21 23:32, Ben Skeggs wrote:
> On Tue, 14 Dec 2021 at 19:19, Christian König
>  wrote:
>>
>> Am 11.12.21 um 10:59 schrieb Stefan Fritsch:
>>> On 09.12.21 11:23, Christian König wrote:
 Always waiting for the exclusive fence resulted on some performance
 regressions. So try to wait for the shared fences first, then the
 exclusive fence should always be signaled already.

 v2: fix incorrectly placed "(", add some comment why we do this.

 Signed-off-by: Christian König 
>>>
>>> Tested-by: Stefan Fritsch 
>>
>> Thanks.
>>
>>>
>>> Please also add a cc for linux-stable, so that this is fixed in 5.15.x
>>
>> Sure, but I still need some acked-by or rb from one of the Nouveau guys.
>> So gentle ping on that.
> Acked-by: Ben Skeggs 

What's the status of this patch? I checked a few git trees, but either
it's not there or it missed it.

Reminder, it's a regression already introduced in v5.15, hence all users
of the current stable kernel are affected by it, so it would be nice to
get the fix on its way now that Ben acked it and Dan tested it.

Ciao, Thorsten

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave thus might sent someone reading this down the wrong
rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

#regzbot poke

 ---
   drivers/gpu/drm/nouveau/nouveau_fence.c | 28 +
   1 file changed, 15 insertions(+), 13 deletions(-)

 diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
 b/drivers/gpu/drm/nouveau/nouveau_fence.c
 index 05d0b3eb3690..0ae416aa76dc 100644
 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
 +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
 @@ -353,15 +353,22 @@ nouveau_fence_sync(struct nouveau_bo *nvbo,
 struct nouveau_channel *chan, bool e
 if (ret)
   return ret;
 -}
   -fobj = dma_resv_shared_list(resv);
 -fence = dma_resv_excl_fence(resv);
 +fobj = NULL;
 +} else {
 +fobj = dma_resv_shared_list(resv);
 +}
   -if (fence) {
 +/* Waiting for the exclusive fence first causes performance
 regressions
 + * under some circumstances. So manually wait for the shared
 ones first.
 + */
 +for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) {
   struct nouveau_channel *prev = NULL;
   bool must_wait = true;
   +fence = rcu_dereference_protected(fobj->shared[i],
 +dma_resv_held(resv));
 +
   f = nouveau_local_fence(fence, chan->drm);
   if (f) {
   rcu_read_lock();
 @@ -373,20 +380,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo,
 struct nouveau_channel *chan, bool e
 if (must_wait)
   ret = dma_fence_wait(fence, intr);
 -
 -return ret;
   }
   -if (!exclusive || !fobj)
 -return ret;
 -
 -for (i = 0; i < fobj->shared_count && !ret; ++i) {
 +fence = dma_resv_excl_fence(resv);
 +if (fence) {
   struct nouveau_channel *prev = NULL;
   bool must_wait = true;
   -fence = rcu_dereference_protected(fobj->shared[i],
 -dma_resv_held(resv));
 -
   f = nouveau_local_fence(fence, chan->drm);
   if (f) {
   rcu_read_lock();
 @@ -398,6 +398,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo,
 struct nouveau_channel *chan, bool e
 if (must_wait)
   ret = dma_fence_wait(fence, intr);
 +
 +return ret;
   }
 return ret;
>>



Re: [Nouveau] [PATCH] drm/nouveau: wait for the exclusive fence after the shared ones v2

2021-12-10 Thread Thorsten Leemhuis
Hi, this is your Linux kernel regression tracker speaking.

On 09.12.21 11:23, Christian König wrote:
> Always waiting for the exclusive fence resulted on some performance
> regressions. So try to wait for the shared fences first, then the
> exclusive fence should always be signaled already.
> 
> v2: fix incorrectly placed "(", add some comment why we do this.
> 
> Signed-off-by: Christian König 

FWIW: In case you need to send an improved patch, could you please add
this (see (¹) below for the reasoning):

Link:
https://lore.kernel.org/dri-devel/da142fb9-07d7-24fe-4533-0247b8d16...@sfritsch.de/

And if the patch is already good to go: could the subsystem maintainer
please add it when applying? See (¹) for the reasoning.

BTW, these two lines afaics are missing as well:

Fixes: 3e1ad79bf661 ("drm/nouveau: always wait for the exclusive fence")
Reported-by: Stefan Fritsch 

Ciao, Thorsten

(¹) Long story: The commit message would benefit from a link to the
regression report, for reasons explained in
Documentation/process/submitting-patches.rst. To quote:

```
If related discussions or any other background information behind the
change can be found on the web, add 'Link:' tags pointing to it. In case
your patch fixes a bug, for example, add a tag with a URL referencing
the report in the mailing list archives or a bug tracker;
```

This concept is old, but the text was reworked recently to make this use
case for the Link: tag clearer. For details see:
https://git.kernel.org/linus/1f57bd42b77c

Yes, that "Link:" is not really crucial; but it's good to have if
someone needs to look into the backstory of this change sometime in the
future. But I care for a different reason. I'm tracking this regression
(and others) with regzbot, my Linux kernel regression tracking bot. This
bot will notice if a patch with a Link: tag to a tracked regression gets
posted and record that, which allowed anyone looking into the regression
to quickly gasp the current status from regzbot's webui
(https://linux-regtracking.leemhuis.info/regzbot ) or its reports. The
bot will also notice if a commit with a Link: tag to a regression report
is applied by Linus and then automatically mark the regression as
resolved then.

IOW: this tag makes my life a regression tracker a lot easier, as I
otherwise have to tell regzbot manually when the fix lands. :-/

#regzbot ^backmonitor:
https://lore.kernel.org/dri-devel/da142fb9-07d7-24fe-4533-0247b8d16...@sfritsch.de/


Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-06 Thread Thorsten Leemhuis
[TLDR: adding this regression to regzbot; most of this mail is compiled
from a few templates paragraphs some of you will have seen aready.]

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy accessible to everyone.

Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 3e1ad79bf66165bdb2baca3989f9227939241f11
#regzbot title drm: nouveau: annoying black flickering in some
applications with KDE Plasma with Xorg
#regzbot ignore-activity

Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail), then regzbot will automatically
mark the regression as resolved once the fix lands in the appropriate
tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave they thus might sent someone reading this down the
wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.


On 04.12.21 17:40, Stefan Fritsch wrote:
> Hi,
> 
> when updating from 5.14 to 5.15 on a system with NVIDIA GP108 [GeForce
> GT 1030] (NV138) and Ryzen 9 3900XT using kde/plasma on X (not wayland),
> there is a regression: There is now some annoying black flickering in
> some applications, for example thunderbird, firefox, or mpv. It mostly
> happens when scrolling or when playing video. Only the window of the
> application flickers, not the whole screen. But the flickering is not
> limited to the scrolled area: for example in firefox the url and
> bookmark bars flicker, too, not only the web site. I have bisected the
> issue to this commit:
> 
> commit 3e1ad79bf66165bdb2baca3989f9227939241f11 (HEAD)
> Author: Christian König 
> Date:   Sun Jun 6 11:50:15 2021 +0200
> 
>     drm/nouveau: always wait for the exclusive fence
> 
>     Drivers also need to to sync to the exclusive fence when
>     a shared one is present.
> 
>     Signed-off-by: Christian König 
>     Reviewed-by: Daniel Vetter 
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20210702111642.17259-4-christian.koe...@amd.com
> 
> 
> 
> This sounds like performance is impacted severely by that commit. Can
> this be fixed somehow? A partial dmesg is below.
> 
> Cheers,
> Stefan
> 
> 
> dmesg |grep -i -e drm -e dri -e nvidia -e nouveau -e fb
> [    0.00] BIOS-e820: [mem 0xbc552000-0xbc8fbfff]
> reserved
> [    0.004971] ACPI: XSDT 0xBCFB0728 CC (v01 ALASKA A M I
> 01072009 AMI  0113)
> [    0.010838] PM: hibernation: Registered nosave memory: [mem
> 0xbc552000-0xbc8fbfff]
> [    0.204873] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> [    0.292761] Registering PCC driver as Mailbox controller
> [    0.292761] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [    0.518295] pci :06:00.0: reg 0x10: [mem 0xfb00-0xfbff]
> [    0.519132] pci :06:00.1: [10de:0fb8] type 00 class 0x040300
> [    0.519653] pci :00:03.1:   bridge window [mem
> 0xfb00-0xfc0f]
> [    0.549101] pci :00:03.1:   bridge window [mem
> 0xfb00-0xfc0f]
> [    0.550994] pci_bus :06: resource 1 [mem 0xfb00-0xfc0f]
> [    0.561285] Block layer SCSI generic (bsg) driver version 0.4 loaded
> (major 250)
> [    0.564152] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> [    0.570870] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    0.571531] AMD-Vi: AMD IOMMUv2 driver by 

Re: [Nouveau] Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-06-26 Thread Thorsten Leemhuis
On 24.06.2016 16:19, George Spelvin wrote:
> Here's a regression you might add.  

Thx, added.

> I only reported it to dri-devel,
> since it's DRI-specific, but since there's been thunderous silence
> for a few weeks, I'm trying to be a squeakier wheel.

Added the nouveau developers to CC, maybe it's a bug in the drm driver
that triggers this problem; and airlied is "Internet challenged" right
now and Daniel on holidays, so it might be good to get more people into
the loop anyway.

> Given that I bisected it to a single, small, revertable commit, I'd
> hoped it would be easy to deal with.
> 
> [BISECTED: 0955c1250e] 4.7-rc1 oops at drm_connector_cleanup+0x5c/0x1d0 
> 
> E-mail report at
> https://marc.info/?l=dri-devel=146577898611849
> 
> Bugzilla report at
> https://bugs.freedesktop.org/show_bug.cgi?id=96532

FWIW the important detail: Reverting
https://git.kernel.org/linus/0955c1250e (drm/crtc: take references to
connectors used in a modeset. (v2)) fixes this.

Sincerely, your regression tracker for Linux 4.7 (http://bit.ly/28JRmJo)
 Thorsten
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau