Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
On 27.01.23 20:46, Chris Clayton wrote:
> [Resend because the mail client on my phone decided to turn HTML on behind my 
> back, so my reply got bounced.]
> 
> Thanks Thorsten.
> 
> I did try to revert but it didnt revert cleanly and I don't have the 
> knowledge to fix it up.
> 
> The patch was part of a merge that included a number of related patches. 
> Tomorrow, I'll try to revert the lot and report
> back.

You are free to do so, but there is no need for that from my side. I
only wanted to know if a simple revert would do the trick; if it
doesn't, it in my experience often is best to leave things to the
developers of the code in question, as they know it best and thus have a
better idea which hidden side effect a more complex revert might have.

Ciao, Thorsten

> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
> wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> @nouveau-maintainers, did anyone take a look at this? The report is
>> already 8 days old and I don't see a single reply. Sure, we'll likely
>> get a -rc8, but still it would be good to not fix this on the finish line.
>>
>> Chris, btw, did you try if you can revert the commit on top of latest
>> mainline? And if so, does it fix the problem?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> [adding various lists and the two other nouveau maintainers to the list
>>> of recipients]
>>
>>> On 18.01.23 21:59, Chris Clayton wrote:
>>>> Hi.
>>>>
>>>> I build and installed the lastest development kernel earlier this week. 
>>>> I've found that when I try the laptop down (or
>>>> reboot it), it hangs right at the end of closing the current session. The 
>>>> last line I see on  the screen when rebooting is:
>>>>
>>>>sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>>
>>>> when closing down I see one additional line:
>>>>
>>>>sd 4:0:0:0 [sda]Stopping disk
>>>>
>>>> In both cases the machine then hangs and I have to hold down the power 
>>>> button fot a few seconds to switch it off.
>>>>
>>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
>>>> landed on:
>>>>
>>>># first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>>>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>>>> (VPR scrubber)
>>>>
>>>> I built and installed a kernel with 
>>>> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
>>>> checked out
>>>> and that shuts down and reboots fine. It the did the same with the bad 
>>>> commit checked out and that does indeed hang, so
>>>> I'm confident the bisect outcome is OK.
>>>>
>>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>>
>>>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>>>> 'lscpi -v' is:
>>>>
>>>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>>>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>>>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>>>
>>>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>>>
>>>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>>>
>>>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>>>
>>>> I/O ports at 5000 [size=64]
>>>>
>>>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>>>
>>>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>>>
>>>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>>>
>>>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>>>
>>>> Capabilities: [d0] Power Management version 2
>>>>
>>>> Kernel driver in use: i915
>>>>
>>>> Kernel modul

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

@nouveau-maintainers, did anyone take a look at this? The report is
already 8 days old and I don't see a single reply. Sure, we'll likely
get a -rc8, but still it would be good to not fix this on the finish line.

Chris, btw, did you try if you can revert the commit on top of latest
mainline? And if so, does it fix the problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> [adding various lists and the two other nouveau maintainers to the list
> of recipients]

> On 18.01.23 21:59, Chris Clayton wrote:
>> Hi.
>>
>> I build and installed the lastest development kernel earlier this week. I've 
>> found that when I try the laptop down (or
>> reboot it), it hangs right at the end of closing the current session. The 
>> last line I see on  the screen when rebooting is:
>>
>>  sd 4:0:0:0: [sda] Synchronising SCSI cache
>>
>> when closing down I see one additional line:
>>
>>  sd 4:0:0:0 [sda]Stopping disk
>>
>> In both cases the machine then hangs and I have to hold down the power 
>> button fot a few seconds to switch it off.
>>
>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
>> on:
>>
>>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>> (VPR scrubber)
>>
>> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
>> (the parent of the bad commit) checked out
>> and that shuts down and reboots fine. It the did the same with the bad 
>> commit checked out and that does indeed hang, so
>> I'm confident the bisect outcome is OK.
>>
>> Kernels 6.1.6 and 5.15.88 are also OK.
>>
>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>> 'lscpi -v' is:
>>
>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>
>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>
>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>
>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>
>> I/O ports at 5000 [size=64]
>>
>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>
>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>
>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>
>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>
>> Capabilities: [d0] Power Management version 2
>>
>> Kernel driver in use: i915
>>
>> Kernel modules: i915
>>
>>
>> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
>> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
>> controller])
>> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
>> Flags: bus master, fast devsel, latency 0, IRQ 141
>> Memory at c400 (32-bit, non-prefetchable) [size=16M]
>> Memory at b000 (64-bit, prefetchable) [size=256M]
>> Memory at c000 (64-bit, prefetchable) [size=32M]
>> I/O ports at 4000 [size=128]
>> Expansion ROM at c300 [disabled] [size=512K]
>> Capabilities: [60] Power Management version 3
>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> Capabilities: [78] Express Legacy Endpoint, MSI 00
>> Kernel driver in use: nouveau
>> Kernel modules: nouveau
>>
>> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
>> sysvinit).
>>
>> I've attached the bisect.log, but please let me know if I can provide any 
>> other diagnostics. Please cc me as I'm not
>> subscribed.
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e44c2170876197
> #regzbot title drm: nouveau: hangs on poweroff/reboot
> #regzbot ignore-activity
> 
> This isn't a regression? This is

Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
On 27.01.23 08:39, Greg KH wrote:
> On Fri, Jan 20, 2023 at 11:51:04AM -0600, Limonciello, Mario wrote:
>> On 1/20/2023 11:46, Guenter Roeck wrote:
>>> On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:
 This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.

 [Why]
 Changes cause regression on amdgpu mst.
 E.g.
 In fill_dc_mst_payload_table_from_drm(), amdgpu expects to add/remove 
 payload
 one by one and call fill_dc_mst_payload_table_from_drm() to update the HW
 maintained payload table. But previous change tries to go through all the
 payloads in mst_state and update amdpug hw maintained table in once 
 everytime
 driver only tries to add/remove a specific payload stream only. The newly
 design idea conflicts with the implementation in amdgpu nowadays.

 [How]
 Revert this patch first. After addressing all regression problems caused by
 this previous patch, will add it back and adjust it.
>>>
>>> Has there been any progress on this revert, or on fixing the underlying
>>> problem ?
>>>
>>> Thanks,
>>> Guenter
>>
>> Hi Guenter,
>>
>> Wayne is OOO for CNY, but let me update you.
>>
>> Harry has sent out this series which is a collection of proper fixes.
>> https://patchwork.freedesktop.org/series/113125/
>>
>> Once that's reviewed and accepted, 4 of them are applicable for 6.1.
> 
> Any hint on when those will be reviewed and accepted?  patchwork doesn't
> show any activity on them, or at least I can't figure it out...

I didn't look closer (hence please correct me if I'm wrong), but the
core changes afaics are in the DRM pull airlied send a few hours ago to
Linus (note the "amdgpu […] DP MST fixes" line):

https://lore.kernel.org/all/capm%3d9tzuu4xnx6t5v7sksk%2ba5heapoc1iemyznsyqzgztj%3d...@mail.gmail.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-19 Thread Linux kernel regression tracking (Thorsten Leemhuis)
[adding various lists and the two other nouveau maintainers to the list
of recipients]

For the rest of this mail:

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 18.01.23 21:59, Chris Clayton wrote:
> Hi.
> 
> I build and installed the lastest development kernel earlier this week. I've 
> found that when I try the laptop down (or
> reboot it), it hangs right at the end of closing the current session. The 
> last line I see on  the screen when rebooting is:
> 
>   sd 4:0:0:0: [sda] Synchronising SCSI cache
> 
> when closing down I see one additional line:
> 
>   sd 4:0:0:0 [sda]Stopping disk
> 
> In both cases the machine then hangs and I have to hold down the power button 
> fot a few seconds to switch it off.
> 
> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
> on:
> 
>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> (VPR scrubber)
> 
> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
> (the parent of the bad commit) checked out
> and that shuts down and reboots fine. It the did the same with the bad commit 
> checked out and that does indeed hang, so
> I'm confident the bisect outcome is OK.
> 
> Kernels 6.1.6 and 5.15.88 are also OK.
> 
> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> 'lscpi -v' is:
> 
> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> Graphics] (rev 05) (prog-if 00 [VGA controller])
> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> 
> Flags: bus master, fast devsel, latency 0, IRQ 142
> 
> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> 
> Memory at a000 (64-bit, prefetchable) [size=256M]
> 
> I/O ports at 5000 [size=64]
> 
> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> 
> Capabilities: [40] Vendor Specific Information: Len=0c 
> 
> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> 
> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 
> Capabilities: [d0] Power Management version 2
> 
> Kernel driver in use: i915
> 
> Kernel modules: i915
> 
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> controller])
> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> Memory at b000 (64-bit, prefetchable) [size=256M]
> Memory at c000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> Expansion ROM at c300 [disabled] [size=512K]
> Capabilities: [60] Power Management version 3
> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [78] Express Legacy Endpoint, MSI 00
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
> sysvinit).
> 
> I've attached the bisect.log, but please let me know if I can provide any 
> other diagnostics. Please cc me as I'm not
> subscribed.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e44c2170876197
#regzbot title drm: nouveau: hangs on poweroff/reboot
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-01-13 Thread Linux kernel regression tracking (Thorsten Leemhuis)
[CCing Daniel]

On 05.01.23 13:28, Thorsten Leemhuis wrote:
> [adding Karol and Lyude to the list of recipients]
> 
> On 28.12.22 15:49, Diogo Ivo wrote:
>> Hello,
>>
>> Commit 2541626cfb79 breaks GM20B probe with
>> the following kernel log:
> Just wondering: is anyone looking on this? The report was posted more
> than a week ago and didn't even get a single reply yet afaics. This of
> course can happen at this time of the year, but I nevertheless thought a
> quick status inquiry might be a good idea at this point.

Hmmm, the report is now more that two weeks old and didn't get a single
reply. My prodding about a week ago also didn't help. Then I guess I
have to bring this to Linus attention, unless something happens in the
next 2 days.

Diogo, for that it would be really helpful to known: is the issue still
happening with latest mainline? Is it possible to revert 2541626cfb79
easily? And if so: do things work afterwards again?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>> [2.153892] [ cut here ]
>> [2.153897] WARNING: CPU: 1 PID: 36 at 
>> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 
>> gf100_vmm_valid+0x2c4/0x390
>> [2.153916] Modules linked in:
>> [2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
>> [2.153929] Hardware name: Google Pixel C (DT)
>> [2.153933] Workqueue: events_unbound deferred_probe_work_func
>> [2.153943] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS 
>> BTYPE=--)
>> [2.153950] pc : gf100_vmm_valid+0x2c4/0x390
>> [2.153959] lr : gf100_vmm_valid+0xb4/0x390
>> [2.153966] sp : ffc009e134b0
>> [2.153969] x29: ffc009e134b0 x28:  x27: 
>> ffc008fd44c8
>> [2.153979] x26: ffea x25: ffc0087b98d0 x24: 
>> ff8080f89038
>> [2.153987] x23: ff8081fadc08 x22:  x21: 
>> 
>> [2.153995] x20: ff8080f8a000 x19: ffc009e13678 x18: 
>> 
>> [2.154003] x17: f37a8b93418958e6 x16: ffc009f0d000 x15: 
>> 
>> [2.154011] x14: 0002 x13: 0003a020 x12: 
>> ffc00800
>> [2.154019] x11: 000102913000 x10:  x9 : 
>> 
>> [2.154026] x8 : ffc009e136d8 x7 : ffc008fd44c8 x6 : 
>> ff80803d0f00
>> [2.154034] x5 :  x4 : ff8080f88c00 x3 : 
>> 0010
>> [2.154041] x2 : 000c x1 : ffea x0 : 
>> ffea
>> [2.154050] Call trace:
>> [2.154053]  gf100_vmm_valid+0x2c4/0x390
>> [2.154061]  nvkm_vmm_map_valid+0xd4/0x204
>> [2.154069]  nvkm_vmm_map_locked+0xa4/0x344
>> [2.154076]  nvkm_vmm_map+0x50/0x84
>> [2.154083]  nvkm_firmware_mem_map+0x84/0xc4
>> [2.154094]  nvkm_falcon_fw_oneinit+0xc8/0x320
>> [2.154101]  nvkm_acr_oneinit+0x428/0x5b0
>> [2.154109]  nvkm_subdev_oneinit_+0x50/0x104
>> [2.154114]  nvkm_subdev_init_+0x3c/0x12c
>> [2.154119]  nvkm_subdev_init+0x60/0xa0
>> [2.154125]  nvkm_device_init+0x14c/0x2a0
>> [2.154133]  nvkm_udevice_init+0x60/0x9c
>> [2.154140]  nvkm_object_init+0x48/0x1b0
>> [2.154144]  nvkm_ioctl_new+0x168/0x254
>> [2.154149]  nvkm_ioctl+0xd0/0x220
>> [2.154153]  nvkm_client_ioctl+0x10/0x1c
>> [2.154162]  nvif_object_ctor+0xf4/0x22c
>> [2.154168]  nvif_device_ctor+0x28/0x70
>> [2.154174]  nouveau_cli_init+0x150/0x590
>> [2.154180]  nouveau_drm_device_init+0x60/0x2a0
>> [2.154187]  nouveau_platform_device_create+0x90/0xd0
>> [2.154193]  nouveau_platform_probe+0x3c/0x9c
>> [2.154200]  platform_probe+0x68/0xc0
>> [2.154207]  really_probe+0xbc/0x2dc
>> [2.154211]  __driver_probe_device+0x78/0xe0
>> [2.154216]  driver_probe_device+0xd8/0x160
>> [2.154221]  __device_attach_driver+0xb8/0x134
>> [2.154226]  bus_for_each_drv+0x78/0xd0
>> [2.154230]  __device_attach+0x9c/0x1a0
>> [2.154234]  device_initial_probe+0x14/0x20
>> [2.154239]  bus_probe_device+0x98/0xa0
>> [2.154243]  deferred_probe_work_func+0x88/0xc0
>> [2.154247]  process_one_work+0x204/0x40c
>> [2.154256]  worker_thread+0x230/0x450
>> [2.154261]  kthread+0xc8/0xcc
>> [2.154266]  ret_from_fork+0x10/0x20
>> [2.154273] ---[ end trace  ]---
>> [2.154278] nouveau 5700.gpu: pmu: map -22
>> [2.154285] nouveau 5700.gpu: acr: one-time init failed, -22
>> [2.154559] nouveau 5700.gpu: init failed with -22
>> [2.154564] nouveau: DRM-master::0080: init failed with -22
>> [2.154574] nouveau 5700.gpu: DRM-master: Device allocation failed: 
>> -22
>> [2.162905] nouveau: probe of 5700.gpu failed with error -22
>>
>> #regzbot