Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-03-12 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.03.23 11:20, Karol Herbst wrote:
> On Fri, Mar 10, 2023 at 10:26 AM Chris Clayton  
> wrote:
>>
>> Is it likely that this fix will be sumbmitted to mainline during the ongoing 
>> 6.3 development cycle?
>>
> 
> yes, it's already pushed to drm-misc-fixed, which then will go into
> the current devel cycle. I just don't know when it's the next time it
> will be pushed upwards, but it should get there eventually. 

FWIW, the fix landed now as 1b9b4f922f96 ; sadly without a Link: tag to
the report, hence I have to mark this manually as resolved:

#regzbot fix: 1b9b4f922f96108da3bb5d87b2d603f5dfbc5650

> And
> because it also contains a Fixes tag it will be backported to older
> branches as well.

FWIW, nope, that's not enough you have to tag those explicitly to ensure
backporting, as explained in
Documentation/process/stable-kernel-rules.rst Greg points that out every
few weeks, recently here for example:

https://lore.kernel.org/all/y6bwpo9s9qbns...@kroah.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> Chris
>>
>> On 20/02/2023 22:16, Ben Skeggs wrote:
>>> On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:

 On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
 wrote:
>
>
>
> On 20/02/2023 05:35, Ben Skeggs wrote:
>> On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
>> wrote:
>>>
>>>
>>>
>>> On 18/02/2023 15:19, Chris Clayton wrote:


 On 18/02/2023 12:25, Karol Herbst wrote:
> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
>  wrote:
>>
>>
>>
>> On 15/02/2023 11:09, Karol Herbst wrote:
>>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>>> (Thorsten Leemhuis)  wrote:

 On 13.02.23 10:14, Chris Clayton wrote:
> On 13/02/2023 02:57, Dave Airlie wrote:
>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>>  wrote:
>>>
>>>
>>>
>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten 
>>> Leemhuis) wrote:
 On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
> (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this 
>>> regression before 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But 
>> there is still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole 
>> captures Chris
>> posted more than a week ago to check if they somehow help to 
>> track down
>> the root of this problem?
>
> I did now and I can't spot anything. I think at this point it 
> would
> make sense to dump the active tasks/threads via sqsrq keys to 
> see if
> any is in a weird state preventing the machine from shutting 
> down.

 Many thx for looking into it!
>>>
>>> Yes, thanks Karol.
>>>
>>> Attached is the output from dmesg when this block of code:
>>>
>>> /bin/mount /dev/sda7 /mnt/sda7
>>> /bin/mountpoint /proc || /bin/mount /proc
>>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>>> /bin/echo t > /proc/sysrq-trigger
>>> /bin/sleep 1
>>> /bin/sync
>>> /bin/sleep 1
>>> kill $(pidof dmesg)
>>> /bin/umount /mnt/sda7
>>>
>>> is executed immediately before /sbin/reboot is called as the 
>>> final step of rebooting my system.
>>>
>>> I hope this is what you were looking for, but if not, please 
>>> let me know what you need
>
> Thanks Dave. [...]
 FWIW, in case anyone strands here in the archives: the msg was
 truncated. The full post can be found in a new thread:

 https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/

 Sadly it seems the info "With runpm=0, both reboot and poweroff 
 work on
 my laptop." didn't bring us much further to a 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-03-10 Thread Karol Herbst
On Fri, Mar 10, 2023 at 10:26 AM Chris Clayton  wrote:
>
> Hi.
>
> Is it likely that this fix will be sumbmitted to mainline during the ongoing 
> 6.3 development cycle?
>

yes, it's already pushed to drm-misc-fixed, which then will go into
the current devel cycle. I just don't know when it's the next time it
will be pushed upwards, but it should get there eventually. And
because it also contains a Fixes tag it will be backported to older
branches as well.

> Chris
>
> On 20/02/2023 22:16, Ben Skeggs wrote:
> > On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:
> >>
> >> On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 20/02/2023 05:35, Ben Skeggs wrote:
>  On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
>  wrote:
> >
> >
> >
> > On 18/02/2023 15:19, Chris Clayton wrote:
> >>
> >>
> >> On 18/02/2023 12:25, Karol Herbst wrote:
> >>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
> >>>  wrote:
> 
> 
> 
>  On 15/02/2023 11:09, Karol Herbst wrote:
> > On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> > (Thorsten Leemhuis)  wrote:
> >>
> >> On 13.02.23 10:14, Chris Clayton wrote:
> >>> On 13/02/2023 02:57, Dave Airlie wrote:
>  On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>   wrote:
> >
> >
> >
> > On 10/02/2023 19:33, Linux regression tracking (Thorsten 
> > Leemhuis) wrote:
> >> On 10.02.23 20:01, Karol Herbst wrote:
> >>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
> >>> (Thorsten
> >>> Leemhuis)  wrote:
> 
>  On 08.02.23 09:48, Chris Clayton wrote:
> >
> > I'm assuming  that we are not going to see a fix for this 
> > regression before 6.2 is released.
> 
>  Yeah, looks like it. That's unfortunate, but happens. But 
>  there is still
>  time to fix it and there is one thing I wonder:
> 
>  Did any of the nouveau developers look at the netconsole 
>  captures Chris
>  posted more than a week ago to check if they somehow help to 
>  track down
>  the root of this problem?
> >>>
> >>> I did now and I can't spot anything. I think at this point it 
> >>> would
> >>> make sense to dump the active tasks/threads via sqsrq keys to 
> >>> see if
> >>> any is in a weird state preventing the machine from shutting 
> >>> down.
> >>
> >> Many thx for looking into it!
> >
> > Yes, thanks Karol.
> >
> > Attached is the output from dmesg when this block of code:
> >
> > /bin/mount /dev/sda7 /mnt/sda7
> > /bin/mountpoint /proc || /bin/mount /proc
> > /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> > /bin/echo t > /proc/sysrq-trigger
> > /bin/sleep 1
> > /bin/sync
> > /bin/sleep 1
> > kill $(pidof dmesg)
> > /bin/umount /mnt/sda7
> >
> > is executed immediately before /sbin/reboot is called as the 
> > final step of rebooting my system.
> >
> > I hope this is what you were looking for, but if not, please 
> > let me know what you need
> >>>
> >>> Thanks Dave. [...]
> >> FWIW, in case anyone strands here in the archives: the msg was
> >> truncated. The full post can be found in a new thread:
> >>
> >> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >>
> >> Sadly it seems the info "With runpm=0, both reboot and poweroff 
> >> work on
> >> my laptop." didn't bring us much further to a solution. :-/ I don't
> >> really like it, but for regression tracking I'm now putting this 
> >> on the
> >> back-burner, as a fix is not in sight.
> >>
> >> #regzbot monitor:
> >> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >> #regzbot backburner: hard to debug and apparently rare
> >> #regzbot ignore-activity
> >>
> >
> > yeah.. this bug looks a little annoying. Sadly the only Turing based
> > laptop I got doesn't work on Nouveau because of firmware related
> > issues and we probably need to get updated ones from Nvidia here :(
> >
> > But it's a bit weird that the kernel doesn't shutdown, because I 
> > 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-03-10 Thread Chris Clayton
Hi.

Is it likely that this fix will be sumbmitted to mainline during the ongoing 
6.3 development cycle?

Chris

On 20/02/2023 22:16, Ben Skeggs wrote:
> On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:
>>
>> On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
>> wrote:
>>>
>>>
>>>
>>> On 20/02/2023 05:35, Ben Skeggs wrote:
 On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
 wrote:
>
>
>
> On 18/02/2023 15:19, Chris Clayton wrote:
>>
>>
>> On 18/02/2023 12:25, Karol Herbst wrote:
>>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
>>>  wrote:



 On 15/02/2023 11:09, Karol Herbst wrote:
> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> (Thorsten Leemhuis)  wrote:
>>
>> On 13.02.23 10:14, Chris Clayton wrote:
>>> On 13/02/2023 02:57, Dave Airlie wrote:
 On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
  wrote:
>
>
>
> On 10/02/2023 19:33, Linux regression tracking (Thorsten 
> Leemhuis) wrote:
>> On 10.02.23 20:01, Karol Herbst wrote:
>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
>>> (Thorsten
>>> Leemhuis)  wrote:

 On 08.02.23 09:48, Chris Clayton wrote:
>
> I'm assuming  that we are not going to see a fix for this 
> regression before 6.2 is released.

 Yeah, looks like it. That's unfortunate, but happens. But 
 there is still
 time to fix it and there is one thing I wonder:

 Did any of the nouveau developers look at the netconsole 
 captures Chris
 posted more than a week ago to check if they somehow help to 
 track down
 the root of this problem?
>>>
>>> I did now and I can't spot anything. I think at this point it 
>>> would
>>> make sense to dump the active tasks/threads via sqsrq keys to 
>>> see if
>>> any is in a weird state preventing the machine from shutting 
>>> down.
>>
>> Many thx for looking into it!
>
> Yes, thanks Karol.
>
> Attached is the output from dmesg when this block of code:
>
> /bin/mount /dev/sda7 /mnt/sda7
> /bin/mountpoint /proc || /bin/mount /proc
> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> /bin/echo t > /proc/sysrq-trigger
> /bin/sleep 1
> /bin/sync
> /bin/sleep 1
> kill $(pidof dmesg)
> /bin/umount /mnt/sda7
>
> is executed immediately before /sbin/reboot is called as the 
> final step of rebooting my system.
>
> I hope this is what you were looking for, but if not, please let 
> me know what you need
>>>
>>> Thanks Dave. [...]
>> FWIW, in case anyone strands here in the archives: the msg was
>> truncated. The full post can be found in a new thread:
>>
>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>
>> Sadly it seems the info "With runpm=0, both reboot and poweroff work 
>> on
>> my laptop." didn't bring us much further to a solution. :-/ I don't
>> really like it, but for regression tracking I'm now putting this on 
>> the
>> back-burner, as a fix is not in sight.
>>
>> #regzbot monitor:
>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>> #regzbot backburner: hard to debug and apparently rare
>> #regzbot ignore-activity
>>
>
> yeah.. this bug looks a little annoying. Sadly the only Turing based
> laptop I got doesn't work on Nouveau because of firmware related
> issues and we probably need to get updated ones from Nvidia here :(
>
> But it's a bit weird that the kernel doesn't shutdown, because I don't
> see anything in the logs which would prevent that from happening.
> Unless it's waiting on one of the tasks to complete, but none of them
> looked in any way nouveau related.
>
> If somebody else has any fancy kernel debugging tips here to figure
> out why it hangs, that would be very helpful...
>

 I think I've figured this out. It's to do with how my system is 
 configured. I do have an initrd, but the only thing on
 it is the cpu microcode which, it is recommended, should be loaded 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-20 Thread Ben Skeggs
On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:
>
> On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
> wrote:
> >
> >
> >
> > On 20/02/2023 05:35, Ben Skeggs wrote:
> > > On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
> > > wrote:
> > >>
> > >>
> > >>
> > >> On 18/02/2023 15:19, Chris Clayton wrote:
> > >>>
> > >>>
> > >>> On 18/02/2023 12:25, Karol Herbst wrote:
> >  On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
> >   wrote:
> > >
> > >
> > >
> > > On 15/02/2023 11:09, Karol Herbst wrote:
> > >> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> > >> (Thorsten Leemhuis)  wrote:
> > >>>
> > >>> On 13.02.23 10:14, Chris Clayton wrote:
> >  On 13/02/2023 02:57, Dave Airlie wrote:
> > > On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
> > >  wrote:
> > >>
> > >>
> > >>
> > >> On 10/02/2023 19:33, Linux regression tracking (Thorsten 
> > >> Leemhuis) wrote:
> > >>> On 10.02.23 20:01, Karol Herbst wrote:
> >  On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
> >  (Thorsten
> >  Leemhuis)  wrote:
> > >
> > > On 08.02.23 09:48, Chris Clayton wrote:
> > >>
> > >> I'm assuming  that we are not going to see a fix for this 
> > >> regression before 6.2 is released.
> > >
> > > Yeah, looks like it. That's unfortunate, but happens. But 
> > > there is still
> > > time to fix it and there is one thing I wonder:
> > >
> > > Did any of the nouveau developers look at the netconsole 
> > > captures Chris
> > > posted more than a week ago to check if they somehow help to 
> > > track down
> > > the root of this problem?
> > 
> >  I did now and I can't spot anything. I think at this point it 
> >  would
> >  make sense to dump the active tasks/threads via sqsrq keys to 
> >  see if
> >  any is in a weird state preventing the machine from shutting 
> >  down.
> > >>>
> > >>> Many thx for looking into it!
> > >>
> > >> Yes, thanks Karol.
> > >>
> > >> Attached is the output from dmesg when this block of code:
> > >>
> > >> /bin/mount /dev/sda7 /mnt/sda7
> > >> /bin/mountpoint /proc || /bin/mount /proc
> > >> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> > >> /bin/echo t > /proc/sysrq-trigger
> > >> /bin/sleep 1
> > >> /bin/sync
> > >> /bin/sleep 1
> > >> kill $(pidof dmesg)
> > >> /bin/umount /mnt/sda7
> > >>
> > >> is executed immediately before /sbin/reboot is called as the 
> > >> final step of rebooting my system.
> > >>
> > >> I hope this is what you were looking for, but if not, please let 
> > >> me know what you need
> > 
> >  Thanks Dave. [...]
> > >>> FWIW, in case anyone strands here in the archives: the msg was
> > >>> truncated. The full post can be found in a new thread:
> > >>>
> > >>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> > >>>
> > >>> Sadly it seems the info "With runpm=0, both reboot and poweroff 
> > >>> work on
> > >>> my laptop." didn't bring us much further to a solution. :-/ I don't
> > >>> really like it, but for regression tracking I'm now putting this on 
> > >>> the
> > >>> back-burner, as a fix is not in sight.
> > >>>
> > >>> #regzbot monitor:
> > >>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> > >>> #regzbot backburner: hard to debug and apparently rare
> > >>> #regzbot ignore-activity
> > >>>
> > >>
> > >> yeah.. this bug looks a little annoying. Sadly the only Turing based
> > >> laptop I got doesn't work on Nouveau because of firmware related
> > >> issues and we probably need to get updated ones from Nvidia here :(
> > >>
> > >> But it's a bit weird that the kernel doesn't shutdown, because I 
> > >> don't
> > >> see anything in the logs which would prevent that from happening.
> > >> Unless it's waiting on one of the tasks to complete, but none of them
> > >> looked in any way nouveau related.
> > >>
> > >> If somebody else has any fancy kernel debugging tips here to figure
> > >> out why it hangs, that would be very helpful...
> > >>
> > >
> > > I think I've figured this out. It's to do with how my system is 
> > > configured. I do have an initrd, but the only thing on
> > > it is the cpu microcode which, it is recommended, should be loaded 
> > > early. The absence of the 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-20 Thread Karol Herbst
On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  wrote:
>
>
>
> On 20/02/2023 05:35, Ben Skeggs wrote:
> > On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
> > wrote:
> >>
> >>
> >>
> >> On 18/02/2023 15:19, Chris Clayton wrote:
> >>>
> >>>
> >>> On 18/02/2023 12:25, Karol Herbst wrote:
>  On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
>  wrote:
> >
> >
> >
> > On 15/02/2023 11:09, Karol Herbst wrote:
> >> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> >> (Thorsten Leemhuis)  wrote:
> >>>
> >>> On 13.02.23 10:14, Chris Clayton wrote:
>  On 13/02/2023 02:57, Dave Airlie wrote:
> > On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
> >  wrote:
> >>
> >>
> >>
> >> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
> >> wrote:
> >>> On 10.02.23 20:01, Karol Herbst wrote:
>  On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
>  (Thorsten
>  Leemhuis)  wrote:
> >
> > On 08.02.23 09:48, Chris Clayton wrote:
> >>
> >> I'm assuming  that we are not going to see a fix for this 
> >> regression before 6.2 is released.
> >
> > Yeah, looks like it. That's unfortunate, but happens. But there 
> > is still
> > time to fix it and there is one thing I wonder:
> >
> > Did any of the nouveau developers look at the netconsole 
> > captures Chris
> > posted more than a week ago to check if they somehow help to 
> > track down
> > the root of this problem?
> 
>  I did now and I can't spot anything. I think at this point it 
>  would
>  make sense to dump the active tasks/threads via sqsrq keys to 
>  see if
>  any is in a weird state preventing the machine from shutting 
>  down.
> >>>
> >>> Many thx for looking into it!
> >>
> >> Yes, thanks Karol.
> >>
> >> Attached is the output from dmesg when this block of code:
> >>
> >> /bin/mount /dev/sda7 /mnt/sda7
> >> /bin/mountpoint /proc || /bin/mount /proc
> >> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> >> /bin/echo t > /proc/sysrq-trigger
> >> /bin/sleep 1
> >> /bin/sync
> >> /bin/sleep 1
> >> kill $(pidof dmesg)
> >> /bin/umount /mnt/sda7
> >>
> >> is executed immediately before /sbin/reboot is called as the final 
> >> step of rebooting my system.
> >>
> >> I hope this is what you were looking for, but if not, please let 
> >> me know what you need
> 
>  Thanks Dave. [...]
> >>> FWIW, in case anyone strands here in the archives: the msg was
> >>> truncated. The full post can be found in a new thread:
> >>>
> >>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >>>
> >>> Sadly it seems the info "With runpm=0, both reboot and poweroff work 
> >>> on
> >>> my laptop." didn't bring us much further to a solution. :-/ I don't
> >>> really like it, but for regression tracking I'm now putting this on 
> >>> the
> >>> back-burner, as a fix is not in sight.
> >>>
> >>> #regzbot monitor:
> >>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >>> #regzbot backburner: hard to debug and apparently rare
> >>> #regzbot ignore-activity
> >>>
> >>
> >> yeah.. this bug looks a little annoying. Sadly the only Turing based
> >> laptop I got doesn't work on Nouveau because of firmware related
> >> issues and we probably need to get updated ones from Nvidia here :(
> >>
> >> But it's a bit weird that the kernel doesn't shutdown, because I don't
> >> see anything in the logs which would prevent that from happening.
> >> Unless it's waiting on one of the tasks to complete, but none of them
> >> looked in any way nouveau related.
> >>
> >> If somebody else has any fancy kernel debugging tips here to figure
> >> out why it hangs, that would be very helpful...
> >>
> >
> > I think I've figured this out. It's to do with how my system is 
> > configured. I do have an initrd, but the only thing on
> > it is the cpu microcode which, it is recommended, should be loaded 
> > early. The absence of the NVidia firmare from an
> > initrd doesn't matter because the drivers for the hardware that need to 
> > load firmware are all built as modules, So, by
> > the time the devices are configured via udev, the root partition is 
> > mounted and the drivers can get at the firmware.
> >
> 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-20 Thread Chris Clayton



On 20/02/2023 05:35, Ben Skeggs wrote:
> On Sun, 19 Feb 2023 at 04:55, Chris Clayton  wrote:
>>
>>
>>
>> On 18/02/2023 15:19, Chris Clayton wrote:
>>>
>>>
>>> On 18/02/2023 12:25, Karol Herbst wrote:
 On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
 wrote:
>
>
>
> On 15/02/2023 11:09, Karol Herbst wrote:
>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>> (Thorsten Leemhuis)  wrote:
>>>
>>> On 13.02.23 10:14, Chris Clayton wrote:
 On 13/02/2023 02:57, Dave Airlie wrote:
> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>  wrote:
>>
>>
>>
>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
>> wrote:
>>> On 10.02.23 20:01, Karol Herbst wrote:
 On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
 Leemhuis)  wrote:
>
> On 08.02.23 09:48, Chris Clayton wrote:
>>
>> I'm assuming  that we are not going to see a fix for this 
>> regression before 6.2 is released.
>
> Yeah, looks like it. That's unfortunate, but happens. But there 
> is still
> time to fix it and there is one thing I wonder:
>
> Did any of the nouveau developers look at the netconsole captures 
> Chris
> posted more than a week ago to check if they somehow help to 
> track down
> the root of this problem?

 I did now and I can't spot anything. I think at this point it would
 make sense to dump the active tasks/threads via sqsrq keys to see 
 if
 any is in a weird state preventing the machine from shutting down.
>>>
>>> Many thx for looking into it!
>>
>> Yes, thanks Karol.
>>
>> Attached is the output from dmesg when this block of code:
>>
>> /bin/mount /dev/sda7 /mnt/sda7
>> /bin/mountpoint /proc || /bin/mount /proc
>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>> /bin/echo t > /proc/sysrq-trigger
>> /bin/sleep 1
>> /bin/sync
>> /bin/sleep 1
>> kill $(pidof dmesg)
>> /bin/umount /mnt/sda7
>>
>> is executed immediately before /sbin/reboot is called as the final 
>> step of rebooting my system.
>>
>> I hope this is what you were looking for, but if not, please let me 
>> know what you need

 Thanks Dave. [...]
>>> FWIW, in case anyone strands here in the archives: the msg was
>>> truncated. The full post can be found in a new thread:
>>>
>>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>>
>>> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
>>> my laptop." didn't bring us much further to a solution. :-/ I don't
>>> really like it, but for regression tracking I'm now putting this on the
>>> back-burner, as a fix is not in sight.
>>>
>>> #regzbot monitor:
>>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>> #regzbot backburner: hard to debug and apparently rare
>>> #regzbot ignore-activity
>>>
>>
>> yeah.. this bug looks a little annoying. Sadly the only Turing based
>> laptop I got doesn't work on Nouveau because of firmware related
>> issues and we probably need to get updated ones from Nvidia here :(
>>
>> But it's a bit weird that the kernel doesn't shutdown, because I don't
>> see anything in the logs which would prevent that from happening.
>> Unless it's waiting on one of the tasks to complete, but none of them
>> looked in any way nouveau related.
>>
>> If somebody else has any fancy kernel debugging tips here to figure
>> out why it hangs, that would be very helpful...
>>
>
> I think I've figured this out. It's to do with how my system is 
> configured. I do have an initrd, but the only thing on
> it is the cpu microcode which, it is recommended, should be loaded early. 
> The absence of the NVidia firmare from an
> initrd doesn't matter because the drivers for the hardware that need to 
> load firmware are all built as modules, So, by
> the time the devices are configured via udev, the root partition is 
> mounted and the drivers can get at the firmware.
>
> I've found, by turning on nouveau debug and taking a video of the screen 
> as the system shuts down, that nouveau seems to
> be trying to run the scrubber very very late in the shutdown process. The 
> problem is that by this time, I think the root
> partition, and thus the scrubber binary, have become inaccessible.
>
> I seem to 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-19 Thread Ben Skeggs
On Sun, 19 Feb 2023 at 04:55, Chris Clayton  wrote:
>
>
>
> On 18/02/2023 15:19, Chris Clayton wrote:
> >
> >
> > On 18/02/2023 12:25, Karol Herbst wrote:
> >> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 15/02/2023 11:09, Karol Herbst wrote:
>  On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>  (Thorsten Leemhuis)  wrote:
> >
> > On 13.02.23 10:14, Chris Clayton wrote:
> >> On 13/02/2023 02:57, Dave Airlie wrote:
> >>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
> >>>  wrote:
> 
> 
> 
>  On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
>  wrote:
> > On 10.02.23 20:01, Karol Herbst wrote:
> >> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> >> Leemhuis)  wrote:
> >>>
> >>> On 08.02.23 09:48, Chris Clayton wrote:
> 
>  I'm assuming  that we are not going to see a fix for this 
>  regression before 6.2 is released.
> >>>
> >>> Yeah, looks like it. That's unfortunate, but happens. But there 
> >>> is still
> >>> time to fix it and there is one thing I wonder:
> >>>
> >>> Did any of the nouveau developers look at the netconsole captures 
> >>> Chris
> >>> posted more than a week ago to check if they somehow help to 
> >>> track down
> >>> the root of this problem?
> >>
> >> I did now and I can't spot anything. I think at this point it would
> >> make sense to dump the active tasks/threads via sqsrq keys to see 
> >> if
> >> any is in a weird state preventing the machine from shutting down.
> >
> > Many thx for looking into it!
> 
>  Yes, thanks Karol.
> 
>  Attached is the output from dmesg when this block of code:
> 
>  /bin/mount /dev/sda7 /mnt/sda7
>  /bin/mountpoint /proc || /bin/mount /proc
>  /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>  /bin/echo t > /proc/sysrq-trigger
>  /bin/sleep 1
>  /bin/sync
>  /bin/sleep 1
>  kill $(pidof dmesg)
>  /bin/umount /mnt/sda7
> 
>  is executed immediately before /sbin/reboot is called as the final 
>  step of rebooting my system.
> 
>  I hope this is what you were looking for, but if not, please let me 
>  know what you need
> >>
> >> Thanks Dave. [...]
> > FWIW, in case anyone strands here in the archives: the msg was
> > truncated. The full post can be found in a new thread:
> >
> > https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >
> > Sadly it seems the info "With runpm=0, both reboot and poweroff work on
> > my laptop." didn't bring us much further to a solution. :-/ I don't
> > really like it, but for regression tracking I'm now putting this on the
> > back-burner, as a fix is not in sight.
> >
> > #regzbot monitor:
> > https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> > #regzbot backburner: hard to debug and apparently rare
> > #regzbot ignore-activity
> >
> 
>  yeah.. this bug looks a little annoying. Sadly the only Turing based
>  laptop I got doesn't work on Nouveau because of firmware related
>  issues and we probably need to get updated ones from Nvidia here :(
> 
>  But it's a bit weird that the kernel doesn't shutdown, because I don't
>  see anything in the logs which would prevent that from happening.
>  Unless it's waiting on one of the tasks to complete, but none of them
>  looked in any way nouveau related.
> 
>  If somebody else has any fancy kernel debugging tips here to figure
>  out why it hangs, that would be very helpful...
> 
> >>>
> >>> I think I've figured this out. It's to do with how my system is 
> >>> configured. I do have an initrd, but the only thing on
> >>> it is the cpu microcode which, it is recommended, should be loaded early. 
> >>> The absence of the NVidia firmare from an
> >>> initrd doesn't matter because the drivers for the hardware that need to 
> >>> load firmware are all built as modules, So, by
> >>> the time the devices are configured via udev, the root partition is 
> >>> mounted and the drivers can get at the firmware.
> >>>
> >>> I've found, by turning on nouveau debug and taking a video of the screen 
> >>> as the system shuts down, that nouveau seems to
> >>> be trying to run the scrubber very very late in the shutdown process. The 
> >>> problem is that by this time, I think the root
> >>> partition, and thus the scrubber binary, have become inaccessible.
> >>>
> >>> I seem to have two choices - either make the firmware 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-18 Thread Chris Clayton



On 18/02/2023 15:19, Chris Clayton wrote:
> 
> 
> On 18/02/2023 12:25, Karol Herbst wrote:
>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
>> wrote:
>>>
>>>
>>>
>>> On 15/02/2023 11:09, Karol Herbst wrote:
 On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
 (Thorsten Leemhuis)  wrote:
>
> On 13.02.23 10:14, Chris Clayton wrote:
>> On 13/02/2023 02:57, Dave Airlie wrote:
>>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
>>> wrote:



 On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
 wrote:
> On 10.02.23 20:01, Karol Herbst wrote:
>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>> Leemhuis)  wrote:
>>>
>>> On 08.02.23 09:48, Chris Clayton wrote:

 I'm assuming  that we are not going to see a fix for this 
 regression before 6.2 is released.
>>>
>>> Yeah, looks like it. That's unfortunate, but happens. But there is 
>>> still
>>> time to fix it and there is one thing I wonder:
>>>
>>> Did any of the nouveau developers look at the netconsole captures 
>>> Chris
>>> posted more than a week ago to check if they somehow help to track 
>>> down
>>> the root of this problem?
>>
>> I did now and I can't spot anything. I think at this point it would
>> make sense to dump the active tasks/threads via sqsrq keys to see if
>> any is in a weird state preventing the machine from shutting down.
>
> Many thx for looking into it!

 Yes, thanks Karol.

 Attached is the output from dmesg when this block of code:

 /bin/mount /dev/sda7 /mnt/sda7
 /bin/mountpoint /proc || /bin/mount /proc
 /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
 /bin/echo t > /proc/sysrq-trigger
 /bin/sleep 1
 /bin/sync
 /bin/sleep 1
 kill $(pidof dmesg)
 /bin/umount /mnt/sda7

 is executed immediately before /sbin/reboot is called as the final 
 step of rebooting my system.

 I hope this is what you were looking for, but if not, please let me 
 know what you need
>>
>> Thanks Dave. [...]
> FWIW, in case anyone strands here in the archives: the msg was
> truncated. The full post can be found in a new thread:
>
> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>
> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
> my laptop." didn't bring us much further to a solution. :-/ I don't
> really like it, but for regression tracking I'm now putting this on the
> back-burner, as a fix is not in sight.
>
> #regzbot monitor:
> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> #regzbot backburner: hard to debug and apparently rare
> #regzbot ignore-activity
>

 yeah.. this bug looks a little annoying. Sadly the only Turing based
 laptop I got doesn't work on Nouveau because of firmware related
 issues and we probably need to get updated ones from Nvidia here :(

 But it's a bit weird that the kernel doesn't shutdown, because I don't
 see anything in the logs which would prevent that from happening.
 Unless it's waiting on one of the tasks to complete, but none of them
 looked in any way nouveau related.

 If somebody else has any fancy kernel debugging tips here to figure
 out why it hangs, that would be very helpful...

>>>
>>> I think I've figured this out. It's to do with how my system is configured. 
>>> I do have an initrd, but the only thing on
>>> it is the cpu microcode which, it is recommended, should be loaded early. 
>>> The absence of the NVidia firmare from an
>>> initrd doesn't matter because the drivers for the hardware that need to 
>>> load firmware are all built as modules, So, by
>>> the time the devices are configured via udev, the root partition is mounted 
>>> and the drivers can get at the firmware.
>>>
>>> I've found, by turning on nouveau debug and taking a video of the screen as 
>>> the system shuts down, that nouveau seems to
>>> be trying to run the scrubber very very late in the shutdown process. The 
>>> problem is that by this time, I think the root
>>> partition, and thus the scrubber binary, have become inaccessible.
>>>
>>> I seem to have two choices - either make the firmware accessible on an 
>>> initrd or unload the module in a shutdown script
>>> before the scrubber binary becomes inaccessible. The latter of these is the 
>>> workaround I have implemented whilst the
>>> problem I reported has been under investigation. For simplicity, I think 
>>> I'll promote my 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-18 Thread Chris Clayton



On 18/02/2023 12:25, Karol Herbst wrote:
> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
> wrote:
>>
>>
>>
>> On 15/02/2023 11:09, Karol Herbst wrote:
>>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>>> (Thorsten Leemhuis)  wrote:

 On 13.02.23 10:14, Chris Clayton wrote:
> On 13/02/2023 02:57, Dave Airlie wrote:
>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
>> wrote:
>>>
>>>
>>>
>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
>>> wrote:
 On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this 
>>> regression before 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is 
>> still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures 
>> Chris
>> posted more than a week ago to check if they somehow help to track 
>> down
>> the root of this problem?
>
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

 Many thx for looking into it!
>>>
>>> Yes, thanks Karol.
>>>
>>> Attached is the output from dmesg when this block of code:
>>>
>>> /bin/mount /dev/sda7 /mnt/sda7
>>> /bin/mountpoint /proc || /bin/mount /proc
>>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>>> /bin/echo t > /proc/sysrq-trigger
>>> /bin/sleep 1
>>> /bin/sync
>>> /bin/sleep 1
>>> kill $(pidof dmesg)
>>> /bin/umount /mnt/sda7
>>>
>>> is executed immediately before /sbin/reboot is called as the final step 
>>> of rebooting my system.
>>>
>>> I hope this is what you were looking for, but if not, please let me 
>>> know what you need
>
> Thanks Dave. [...]
 FWIW, in case anyone strands here in the archives: the msg was
 truncated. The full post can be found in a new thread:

 https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/

 Sadly it seems the info "With runpm=0, both reboot and poweroff work on
 my laptop." didn't bring us much further to a solution. :-/ I don't
 really like it, but for regression tracking I'm now putting this on the
 back-burner, as a fix is not in sight.

 #regzbot monitor:
 https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
 #regzbot backburner: hard to debug and apparently rare
 #regzbot ignore-activity

>>>
>>> yeah.. this bug looks a little annoying. Sadly the only Turing based
>>> laptop I got doesn't work on Nouveau because of firmware related
>>> issues and we probably need to get updated ones from Nvidia here :(
>>>
>>> But it's a bit weird that the kernel doesn't shutdown, because I don't
>>> see anything in the logs which would prevent that from happening.
>>> Unless it's waiting on one of the tasks to complete, but none of them
>>> looked in any way nouveau related.
>>>
>>> If somebody else has any fancy kernel debugging tips here to figure
>>> out why it hangs, that would be very helpful...
>>>
>>
>> I think I've figured this out. It's to do with how my system is configured. 
>> I do have an initrd, but the only thing on
>> it is the cpu microcode which, it is recommended, should be loaded early. 
>> The absence of the NVidia firmare from an
>> initrd doesn't matter because the drivers for the hardware that need to load 
>> firmware are all built as modules, So, by
>> the time the devices are configured via udev, the root partition is mounted 
>> and the drivers can get at the firmware.
>>
>> I've found, by turning on nouveau debug and taking a video of the screen as 
>> the system shuts down, that nouveau seems to
>> be trying to run the scrubber very very late in the shutdown process. The 
>> problem is that by this time, I think the root
>> partition, and thus the scrubber binary, have become inaccessible.
>>
>> I seem to have two choices - either make the firmware accessible on an 
>> initrd or unload the module in a shutdown script
>> before the scrubber binary becomes inaccessible. The latter of these is the 
>> workaround I have implemented whilst the
>> problem I reported has been under investigation. For simplicity, I think 
>> I'll promote my workaround to being the
>> permanent solution.
>>
>> So, apologies (and thanks) to everyone whose time I have taken up with this 
>> non-bug.
>>
> 
> Well.. nouveau 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-18 Thread Karol Herbst
On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  wrote:
>
>
>
> On 15/02/2023 11:09, Karol Herbst wrote:
> > On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> > (Thorsten Leemhuis)  wrote:
> >>
> >> On 13.02.23 10:14, Chris Clayton wrote:
> >>> On 13/02/2023 02:57, Dave Airlie wrote:
>  On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
>  wrote:
> >
> >
> >
> > On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
> > wrote:
> >> On 10.02.23 20:01, Karol Herbst wrote:
> >>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> >>> Leemhuis)  wrote:
> 
>  On 08.02.23 09:48, Chris Clayton wrote:
> >
> > I'm assuming  that we are not going to see a fix for this 
> > regression before 6.2 is released.
> 
>  Yeah, looks like it. That's unfortunate, but happens. But there is 
>  still
>  time to fix it and there is one thing I wonder:
> 
>  Did any of the nouveau developers look at the netconsole captures 
>  Chris
>  posted more than a week ago to check if they somehow help to track 
>  down
>  the root of this problem?
> >>>
> >>> I did now and I can't spot anything. I think at this point it would
> >>> make sense to dump the active tasks/threads via sqsrq keys to see if
> >>> any is in a weird state preventing the machine from shutting down.
> >>
> >> Many thx for looking into it!
> >
> > Yes, thanks Karol.
> >
> > Attached is the output from dmesg when this block of code:
> >
> > /bin/mount /dev/sda7 /mnt/sda7
> > /bin/mountpoint /proc || /bin/mount /proc
> > /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> > /bin/echo t > /proc/sysrq-trigger
> > /bin/sleep 1
> > /bin/sync
> > /bin/sleep 1
> > kill $(pidof dmesg)
> > /bin/umount /mnt/sda7
> >
> > is executed immediately before /sbin/reboot is called as the final step 
> > of rebooting my system.
> >
> > I hope this is what you were looking for, but if not, please let me 
> > know what you need
> >>>
> >>> Thanks Dave. [...]
> >> FWIW, in case anyone strands here in the archives: the msg was
> >> truncated. The full post can be found in a new thread:
> >>
> >> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >>
> >> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
> >> my laptop." didn't bring us much further to a solution. :-/ I don't
> >> really like it, but for regression tracking I'm now putting this on the
> >> back-burner, as a fix is not in sight.
> >>
> >> #regzbot monitor:
> >> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> >> #regzbot backburner: hard to debug and apparently rare
> >> #regzbot ignore-activity
> >>
> >
> > yeah.. this bug looks a little annoying. Sadly the only Turing based
> > laptop I got doesn't work on Nouveau because of firmware related
> > issues and we probably need to get updated ones from Nvidia here :(
> >
> > But it's a bit weird that the kernel doesn't shutdown, because I don't
> > see anything in the logs which would prevent that from happening.
> > Unless it's waiting on one of the tasks to complete, but none of them
> > looked in any way nouveau related.
> >
> > If somebody else has any fancy kernel debugging tips here to figure
> > out why it hangs, that would be very helpful...
> >
>
> I think I've figured this out. It's to do with how my system is configured. I 
> do have an initrd, but the only thing on
> it is the cpu microcode which, it is recommended, should be loaded early. The 
> absence of the NVidia firmare from an
> initrd doesn't matter because the drivers for the hardware that need to load 
> firmware are all built as modules, So, by
> the time the devices are configured via udev, the root partition is mounted 
> and the drivers can get at the firmware.
>
> I've found, by turning on nouveau debug and taking a video of the screen as 
> the system shuts down, that nouveau seems to
> be trying to run the scrubber very very late in the shutdown process. The 
> problem is that by this time, I think the root
> partition, and thus the scrubber binary, have become inaccessible.
>
> I seem to have two choices - either make the firmware accessible on an initrd 
> or unload the module in a shutdown script
> before the scrubber binary becomes inaccessible. The latter of these is the 
> workaround I have implemented whilst the
> problem I reported has been under investigation. For simplicity, I think I'll 
> promote my workaround to being the
> permanent solution.
>
> So, apologies (and thanks) to everyone whose time I have taken up with this 
> non-bug.
>

Well.. nouveau shouldn't prevent the system from shutting down if the
firmware file isn't 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-18 Thread Chris Clayton



On 15/02/2023 11:09, Karol Herbst wrote:
> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> (Thorsten Leemhuis)  wrote:
>>
>> On 13.02.23 10:14, Chris Clayton wrote:
>>> On 13/02/2023 02:57, Dave Airlie wrote:
 On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
 wrote:
>
>
>
> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 10.02.23 20:01, Karol Herbst wrote:
>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>>> Leemhuis)  wrote:

 On 08.02.23 09:48, Chris Clayton wrote:
>
> I'm assuming  that we are not going to see a fix for this regression 
> before 6.2 is released.

 Yeah, looks like it. That's unfortunate, but happens. But there is 
 still
 time to fix it and there is one thing I wonder:

 Did any of the nouveau developers look at the netconsole captures Chris
 posted more than a week ago to check if they somehow help to track down
 the root of this problem?
>>>
>>> I did now and I can't spot anything. I think at this point it would
>>> make sense to dump the active tasks/threads via sqsrq keys to see if
>>> any is in a weird state preventing the machine from shutting down.
>>
>> Many thx for looking into it!
>
> Yes, thanks Karol.
>
> Attached is the output from dmesg when this block of code:
>
> /bin/mount /dev/sda7 /mnt/sda7
> /bin/mountpoint /proc || /bin/mount /proc
> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> /bin/echo t > /proc/sysrq-trigger
> /bin/sleep 1
> /bin/sync
> /bin/sleep 1
> kill $(pidof dmesg)
> /bin/umount /mnt/sda7
>
> is executed immediately before /sbin/reboot is called as the final step 
> of rebooting my system.
>
> I hope this is what you were looking for, but if not, please let me know 
> what you need
>>>
>>> Thanks Dave. [...]
>> FWIW, in case anyone strands here in the archives: the msg was
>> truncated. The full post can be found in a new thread:
>>
>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>
>> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
>> my laptop." didn't bring us much further to a solution. :-/ I don't
>> really like it, but for regression tracking I'm now putting this on the
>> back-burner, as a fix is not in sight.
>>
>> #regzbot monitor:
>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>> #regzbot backburner: hard to debug and apparently rare
>> #regzbot ignore-activity
>>
> 
> yeah.. this bug looks a little annoying. Sadly the only Turing based
> laptop I got doesn't work on Nouveau because of firmware related
> issues and we probably need to get updated ones from Nvidia here :(
> 
> But it's a bit weird that the kernel doesn't shutdown, because I don't
> see anything in the logs which would prevent that from happening.
> Unless it's waiting on one of the tasks to complete, but none of them
> looked in any way nouveau related.
> 
> If somebody else has any fancy kernel debugging tips here to figure
> out why it hangs, that would be very helpful...
> 

I think I've figured this out. It's to do with how my system is configured. I 
do have an initrd, but the only thing on
it is the cpu microcode which, it is recommended, should be loaded early. The 
absence of the NVidia firmare from an
initrd doesn't matter because the drivers for the hardware that need to load 
firmware are all built as modules, So, by
the time the devices are configured via udev, the root partition is mounted and 
the drivers can get at the firmware.

I've found, by turning on nouveau debug and taking a video of the screen as the 
system shuts down, that nouveau seems to
be trying to run the scrubber very very late in the shutdown process. The 
problem is that by this time, I think the root
partition, and thus the scrubber binary, have become inaccessible.

I seem to have two choices - either make the firmware accessible on an initrd 
or unload the module in a shutdown script
before the scrubber binary becomes inaccessible. The latter of these is the 
workaround I have implemented whilst the
problem I reported has been under investigation. For simplicity, I think I'll 
promote my workaround to being the
permanent solution.

So, apologies (and thanks) to everyone whose time I have taken up with this 
non-bug.

Chris

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> That page also explains what to do if mails like this annoy you.
>>
>> #regzbot ignore-activity
>>
> 


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-15 Thread Karol Herbst
On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
(Thorsten Leemhuis)  wrote:
>
> On 13.02.23 10:14, Chris Clayton wrote:
> > On 13/02/2023 02:57, Dave Airlie wrote:
> >> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>  On 10.02.23 20:01, Karol Herbst wrote:
> > On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> > Leemhuis)  wrote:
> >>
> >> On 08.02.23 09:48, Chris Clayton wrote:
> >>>
> >>> I'm assuming  that we are not going to see a fix for this regression 
> >>> before 6.2 is released.
> >>
> >> Yeah, looks like it. That's unfortunate, but happens. But there is 
> >> still
> >> time to fix it and there is one thing I wonder:
> >>
> >> Did any of the nouveau developers look at the netconsole captures Chris
> >> posted more than a week ago to check if they somehow help to track down
> >> the root of this problem?
> >
> > I did now and I can't spot anything. I think at this point it would
> > make sense to dump the active tasks/threads via sqsrq keys to see if
> > any is in a weird state preventing the machine from shutting down.
> 
>  Many thx for looking into it!
> >>>
> >>> Yes, thanks Karol.
> >>>
> >>> Attached is the output from dmesg when this block of code:
> >>>
> >>> /bin/mount /dev/sda7 /mnt/sda7
> >>> /bin/mountpoint /proc || /bin/mount /proc
> >>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> >>> /bin/echo t > /proc/sysrq-trigger
> >>> /bin/sleep 1
> >>> /bin/sync
> >>> /bin/sleep 1
> >>> kill $(pidof dmesg)
> >>> /bin/umount /mnt/sda7
> >>>
> >>> is executed immediately before /sbin/reboot is called as the final step 
> >>> of rebooting my system.
> >>>
> >>> I hope this is what you were looking for, but if not, please let me know 
> >>> what you need
> >
> > Thanks Dave. [...]
> FWIW, in case anyone strands here in the archives: the msg was
> truncated. The full post can be found in a new thread:
>
> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>
> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
> my laptop." didn't bring us much further to a solution. :-/ I don't
> really like it, but for regression tracking I'm now putting this on the
> back-burner, as a fix is not in sight.
>
> #regzbot monitor:
> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> #regzbot backburner: hard to debug and apparently rare
> #regzbot ignore-activity
>

yeah.. this bug looks a little annoying. Sadly the only Turing based
laptop I got doesn't work on Nouveau because of firmware related
issues and we probably need to get updated ones from Nvidia here :(

But it's a bit weird that the kernel doesn't shutdown, because I don't
see anything in the logs which would prevent that from happening.
Unless it's waiting on one of the tasks to complete, but none of them
looked in any way nouveau related.

If somebody else has any fancy kernel debugging tips here to figure
out why it hangs, that would be very helpful...

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>
> #regzbot ignore-activity
>



Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-15 Thread Linux regression tracking #update (Thorsten Leemhuis)
On 13.02.23 10:14, Chris Clayton wrote:
> On 13/02/2023 02:57, Dave Airlie wrote:
>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>>
>>>
>>>
>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
 On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this regression 
>>> before 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures Chris
>> posted more than a week ago to check if they somehow help to track down
>> the root of this problem?
>
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

 Many thx for looking into it!
>>>
>>> Yes, thanks Karol.
>>>
>>> Attached is the output from dmesg when this block of code:
>>>
>>> /bin/mount /dev/sda7 /mnt/sda7
>>> /bin/mountpoint /proc || /bin/mount /proc
>>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>>> /bin/echo t > /proc/sysrq-trigger
>>> /bin/sleep 1
>>> /bin/sync
>>> /bin/sleep 1
>>> kill $(pidof dmesg)
>>> /bin/umount /mnt/sda7
>>>
>>> is executed immediately before /sbin/reboot is called as the final step of 
>>> rebooting my system.
>>>
>>> I hope this is what you were looking for, but if not, please let me know 
>>> what you need
> 
> Thanks Dave. [...]
FWIW, in case anyone strands here in the archives: the msg was
truncated. The full post can be found in a new thread:

https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/

Sadly it seems the info "With runpm=0, both reboot and poweroff work on
my laptop." didn't bring us much further to a solution. :-/ I don't
really like it, but for regression tracking I'm now putting this on the
back-burner, as a fix is not in sight.

#regzbot monitor:
https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
#regzbot backburner: hard to debug and apparently rare
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

#regzbot ignore-activity


Fwd: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-13 Thread Chris Clayton
Proof, if any where needed, that I should consume more coffee before dealing 
with email...

Adding cc recipients that were dropped in my message this morning.


 Forwarded Message 
Subject: Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected
Date: Mon, 13 Feb 2023 09:21:10 +
From: Chris Clayton 
To: Dave Airlie 

[ Apologies for the incomplete message I sent a few minutes ago. I should have 
had more coffee before I started dealing
with email. ]

On 13/02/2023 02:57, Dave Airlie wrote:
> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>
>>
>>
>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 10.02.23 20:01, Karol Herbst wrote:
>>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>>>> Leemhuis)  wrote:
>>>>>
>>>>> On 08.02.23 09:48, Chris Clayton wrote:
>>>>>>
>>>>>> I'm assuming  that we are not going to see a fix for this regression 
>>>>>> before 6.2 is released.
>>>>>
>>>>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>>>>> time to fix it and there is one thing I wonder:
>>>>>
>>>>> Did any of the nouveau developers look at the netconsole captures Chris
>>>>> posted more than a week ago to check if they somehow help to track down
>>>>> the root of this problem?
>>>>
>>>> I did now and I can't spot anything. I think at this point it would
>>>> make sense to dump the active tasks/threads via sqsrq keys to see if
>>>> any is in a weird state preventing the machine from shutting down.
>>>
>>> Many thx for looking into it!
>>
>> Yes, thanks Karol.
>>
>> Attached is the output from dmesg when this block of code:
>>
>> /bin/mount /dev/sda7 /mnt/sda7
>> /bin/mountpoint /proc || /bin/mount /proc
>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>> /bin/echo t > /proc/sysrq-trigger
>> /bin/sleep 1
>> /bin/sync
>> /bin/sleep 1
>> kill $(pidof dmesg)
>> /bin/umount /mnt/sda7
>>
>> is executed immediately before /sbin/reboot is called as the final step of 
>> rebooting my system.
>>
>> I hope this is what you were looking for, but if not, please let me know 
>> what you need
> 

Thanks, Dave.

> Another shot in the dark, but does nouveau.runpm=0 help at all?
> 
Yes, it does. With runpm=0, both reboot and poweroff work on my laptop. Of 
course, it also means that the discrete
(NVidia) GPU is now powered on permanently.

Chris


> Dave.


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-13 Thread Chris Clayton



On 13/02/2023 02:57, Dave Airlie wrote:
> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>
>>
>>
>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 10.02.23 20:01, Karol Herbst wrote:
 On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
 Leemhuis)  wrote:
>
> On 08.02.23 09:48, Chris Clayton wrote:
>>
>> I'm assuming  that we are not going to see a fix for this regression 
>> before 6.2 is released.
>
> Yeah, looks like it. That's unfortunate, but happens. But there is still
> time to fix it and there is one thing I wonder:
>
> Did any of the nouveau developers look at the netconsole captures Chris
> posted more than a week ago to check if they somehow help to track down
> the root of this problem?

 I did now and I can't spot anything. I think at this point it would
 make sense to dump the active tasks/threads via sqsrq keys to see if
 any is in a weird state preventing the machine from shutting down.
>>>
>>> Many thx for looking into it!
>>
>> Yes, thanks Karol.
>>
>> Attached is the output from dmesg when this block of code:
>>
>> /bin/mount /dev/sda7 /mnt/sda7
>> /bin/mountpoint /proc || /bin/mount /proc
>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>> /bin/echo t > /proc/sysrq-trigger
>> /bin/sleep 1
>> /bin/sync
>> /bin/sleep 1
>> kill $(pidof dmesg)
>> /bin/umount /mnt/sda7
>>
>> is executed immediately before /sbin/reboot is called as the final step of 
>> rebooting my system.
>>
>> I hope this is what you were looking for, but if not, please let me know 
>> what you need
> 

Thanks Dave.
> Another ot in the dark, but does nouveau.runpm=0 help at all?
> 
> Dave.


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-12 Thread Dave Airlie
On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>
>
>
> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 10.02.23 20:01, Karol Herbst wrote:
> >> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> >> Leemhuis)  wrote:
> >>>
> >>> On 08.02.23 09:48, Chris Clayton wrote:
> 
>  I'm assuming  that we are not going to see a fix for this regression 
>  before 6.2 is released.
> >>>
> >>> Yeah, looks like it. That's unfortunate, but happens. But there is still
> >>> time to fix it and there is one thing I wonder:
> >>>
> >>> Did any of the nouveau developers look at the netconsole captures Chris
> >>> posted more than a week ago to check if they somehow help to track down
> >>> the root of this problem?
> >>
> >> I did now and I can't spot anything. I think at this point it would
> >> make sense to dump the active tasks/threads via sqsrq keys to see if
> >> any is in a weird state preventing the machine from shutting down.
> >
> > Many thx for looking into it!
>
> Yes, thanks Karol.
>
> Attached is the output from dmesg when this block of code:
>
> /bin/mount /dev/sda7 /mnt/sda7
> /bin/mountpoint /proc || /bin/mount /proc
> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> /bin/echo t > /proc/sysrq-trigger
> /bin/sleep 1
> /bin/sync
> /bin/sleep 1
> kill $(pidof dmesg)
> /bin/umount /mnt/sda7
>
> is executed immediately before /sbin/reboot is called as the final step of 
> rebooting my system.
>
> I hope this is what you were looking for, but if not, please let me know what 
> you need

Another shot in the dark, but does nouveau.runpm=0 help at all?

Dave.


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this regression before 
>>> 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures Chris
>> posted more than a week ago to check if they somehow help to track down
>> the root of this problem?
> 
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

Many thx for looking into it!

Ciao, Thorsten

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>>> Consequently, I've
>>> implemented a (very simple) workaround. All that happens is that in the 
>>> (sysv) init script that starts and stops SDDM,
>>> the nouveau module is removed once SDDM is stopped. With that in place, my 
>>> system no longer freezes on reboot or poweroff.
>>>
>>> Let me know if I can provide any additional diagnostics although, with the 
>>> problem seemingly occurring so late in the
>>> shutdown process, I may need help on how to go about capturing.
>>>
>>> Chris
>>>
>>> On 02/02/2023 20:45, Chris Clayton wrote:


 On 01/02/2023 13:51, Chris Clayton wrote:
>
>
> On 30/01/2023 23:27, Ben Skeggs wrote:
>> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
>> wrote:
>>>
>>> Hi again.
>>>
>>> On 30/01/2023 20:19, Chris Clayton wrote:
 Thanks, Ben.
>>>
>>> 
>>>
> Hey,
>
> This is a complete shot-in-the-dark, as I don't see this behaviour on
> *any* of my boards.  Could you try the attached patch please?

 Unfortunately, the patch made no difference.

 I've been looking at how the graphics on my laptop is set up, and have 
 a bit of a worry about whether the firmware might
 be playing a part in this problem. In order to offload video decoding 
 to the NVidia TU117 GPU, it seems the scrubber
 firmware must be available, but as far as I know,that has not been 
 released by NVidia. To get it to work, I followed
 what ubuntu have done and the scrubber in 
 /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
 ../../tu116/nvdev/scrubber.bin. That, of course, means that some of 
 the firmware loaded is for a different card is being
 loaded. I note that processing related to firmware is being changed in 
 the patch. Might my set up be at the root of my
 problem?

 I'll have a fiddle an see what I can work out.

 Chris

>
> Thanks,
> Ben.
>
>>
>>>
>>> Well, my fiddling has got my system rebooting and shutting down 
>>> successfully again. I found that if I delete the symlink
>>> to the scrubber firmware, reboot and shutdown work again. There are 
>>> however, a number of other files in the tu117
>>> firmware directory tree that that are symlinks to actual files in its 
>>> tu116 counterpart. So I deleted all of those too.
>>> Unfortunately, the absence of one or more of those symlinks causes Xorg 
>>> to fail to start. I've reinstated all the links
>>> except scrubber and I now have a system that works as it did until I 
>>> tried to run a kernel that includes the bad commit
>>> I identified in my bisection. That includes offloading video decoding 
>>> to the NVidia card, so what ever I read that said
>>> the scrubber firmware was needed seems to have been wrong. I get a new 
>>> message that (nouveau :01:00.0: fb: VPR
>>> locked, but no scrubber binary!), but, hey, we can't have everything.
>>>
>>> If you still want to get to the bottom of this, let me know what you 
>>> need me to provide and I'll do my best. I suspect
>>> you might want to because there will a n awful lot of Ubuntu-based 
>>> systems out there with that scrubber.bin symlink in
>>> place. On the other hand,m it could but quite a while before ubuntu are 
>>> deploying 6.2 or later kernels.
>> The symlinks are correct - whole groups of GPUs share the same FW, and
>> we use symlinks in linux-firmware to represent this.
>>
>> I don't really have any ideas how/why this patch causes issues with
>> shutdown - it's a path that only gets executed during initialisation.
>> Can you try and capture the kernel 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Karol Herbst
On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
Leemhuis)  wrote:
>
> On 08.02.23 09:48, Chris Clayton wrote:
> >
> > I'm assuming  that we are not going to see a fix for this regression before 
> > 6.2 is released.
>
> Yeah, looks like it. That's unfortunate, but happens. But there is still
> time to fix it and there is one thing I wonder:
>
> Did any of the nouveau developers look at the netconsole captures Chris
> posted more than a week ago to check if they somehow help to track down
> the root of this problem?
>

I did now and I can't spot anything. I think at this point it would
make sense to dump the active tasks/threads via sqsrq keys to see if
any is in a weird state preventing the machine from shutting down.

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> > Consequently, I've
> > implemented a (very simple) workaround. All that happens is that in the 
> > (sysv) init script that starts and stops SDDM,
> > the nouveau module is removed once SDDM is stopped. With that in place, my 
> > system no longer freezes on reboot or poweroff.
> >
> > Let me know if I can provide any additional diagnostics although, with the 
> > problem seemingly occurring so late in the
> > shutdown process, I may need help on how to go about capturing.
> >
> > Chris
> >
> > On 02/02/2023 20:45, Chris Clayton wrote:
> >>
> >>
> >> On 01/02/2023 13:51, Chris Clayton wrote:
> >>>
> >>>
> >>> On 30/01/2023 23:27, Ben Skeggs wrote:
>  On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
>  wrote:
> >
> > Hi again.
> >
> > On 30/01/2023 20:19, Chris Clayton wrote:
> >> Thanks, Ben.
> >
> > 
> >
> >>> Hey,
> >>>
> >>> This is a complete shot-in-the-dark, as I don't see this behaviour on
> >>> *any* of my boards.  Could you try the attached patch please?
> >>
> >> Unfortunately, the patch made no difference.
> >>
> >> I've been looking at how the graphics on my laptop is set up, and have 
> >> a bit of a worry about whether the firmware might
> >> be playing a part in this problem. In order to offload video decoding 
> >> to the NVidia TU117 GPU, it seems the scrubber
> >> firmware must be available, but as far as I know,that has not been 
> >> released by NVidia. To get it to work, I followed
> >> what ubuntu have done and the scrubber in 
> >> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
> >> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of 
> >> the firmware loaded is for a different card is being
> >> loaded. I note that processing related to firmware is being changed in 
> >> the patch. Might my set up be at the root of my
> >> problem?
> >>
> >> I'll have a fiddle an see what I can work out.
> >>
> >> Chris
> >>
> >>>
> >>> Thanks,
> >>> Ben.
> >>>
> 
> >
> > Well, my fiddling has got my system rebooting and shutting down 
> > successfully again. I found that if I delete the symlink
> > to the scrubber firmware, reboot and shutdown work again. There are 
> > however, a number of other files in the tu117
> > firmware directory tree that that are symlinks to actual files in its 
> > tu116 counterpart. So I deleted all of those too.
> > Unfortunately, the absence of one or more of those symlinks causes Xorg 
> > to fail to start. I've reinstated all the links
> > except scrubber and I now have a system that works as it did until I 
> > tried to run a kernel that includes the bad commit
> > I identified in my bisection. That includes offloading video decoding 
> > to the NVidia card, so what ever I read that said
> > the scrubber firmware was needed seems to have been wrong. I get a new 
> > message that (nouveau :01:00.0: fb: VPR
> > locked, but no scrubber binary!), but, hey, we can't have everything.
> >
> > If you still want to get to the bottom of this, let me know what you 
> > need me to provide and I'll do my best. I suspect
> > you might want to because there will a n awful lot of Ubuntu-based 
> > systems out there with that scrubber.bin symlink in
> > place. On the other hand,m it could but quite a while before ubuntu are 
> > deploying 6.2 or later kernels.
>  The symlinks are correct - whole groups of GPUs share the same FW, and
>  we use symlinks in linux-firmware to represent this.
> 
>  I don't really have any ideas how/why this patch causes issues with
>  shutdown - it's a path that only gets executed during initialisation.
>  Can you try and capture the kernel log during shutdown ("dmesg -w"
>  over ssh? netconsole?), and see if there's any relevant messages
>  

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.02.23 09:48, Chris Clayton wrote:
> 
> I'm assuming  that we are not going to see a fix for this regression before 
> 6.2 is released.

Yeah, looks like it. That's unfortunate, but happens. But there is still
time to fix it and there is one thing I wonder:

Did any of the nouveau developers look at the netconsole captures Chris
posted more than a week ago to check if they somehow help to track down
the root of this problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Consequently, I've
> implemented a (very simple) workaround. All that happens is that in the 
> (sysv) init script that starts and stops SDDM,
> the nouveau module is removed once SDDM is stopped. With that in place, my 
> system no longer freezes on reboot or poweroff.
> 
> Let me know if I can provide any additional diagnostics although, with the 
> problem seemingly occurring so late in the
> shutdown process, I may need help on how to go about capturing.
> 
> Chris
> 
> On 02/02/2023 20:45, Chris Clayton wrote:
>>
>>
>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>
>>>
>>> On 30/01/2023 23:27, Ben Skeggs wrote:
 On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
 wrote:
>
> Hi again.
>
> On 30/01/2023 20:19, Chris Clayton wrote:
>> Thanks, Ben.
>
> 
>
>>> Hey,
>>>
>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>> *any* of my boards.  Could you try the attached patch please?
>>
>> Unfortunately, the patch made no difference.
>>
>> I've been looking at how the graphics on my laptop is set up, and have a 
>> bit of a worry about whether the firmware might
>> be playing a part in this problem. In order to offload video decoding to 
>> the NVidia TU117 GPU, it seems the scrubber
>> firmware must be available, but as far as I know,that has not been 
>> released by NVidia. To get it to work, I followed
>> what ubuntu have done and the scrubber in 
>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
>> firmware loaded is for a different card is being
>> loaded. I note that processing related to firmware is being changed in 
>> the patch. Might my set up be at the root of my
>> problem?
>>
>> I'll have a fiddle an see what I can work out.
>>
>> Chris
>>
>>>
>>> Thanks,
>>> Ben.
>>>

>
> Well, my fiddling has got my system rebooting and shutting down 
> successfully again. I found that if I delete the symlink
> to the scrubber firmware, reboot and shutdown work again. There are 
> however, a number of other files in the tu117
> firmware directory tree that that are symlinks to actual files in its 
> tu116 counterpart. So I deleted all of those too.
> Unfortunately, the absence of one or more of those symlinks causes Xorg 
> to fail to start. I've reinstated all the links
> except scrubber and I now have a system that works as it did until I 
> tried to run a kernel that includes the bad commit
> I identified in my bisection. That includes offloading video decoding to 
> the NVidia card, so what ever I read that said
> the scrubber firmware was needed seems to have been wrong. I get a new 
> message that (nouveau :01:00.0: fb: VPR
> locked, but no scrubber binary!), but, hey, we can't have everything.
>
> If you still want to get to the bottom of this, let me know what you need 
> me to provide and I'll do my best. I suspect
> you might want to because there will a n awful lot of Ubuntu-based 
> systems out there with that scrubber.bin symlink in
> place. On the other hand,m it could but quite a while before ubuntu are 
> deploying 6.2 or later kernels.
 The symlinks are correct - whole groups of GPUs share the same FW, and
 we use symlinks in linux-firmware to represent this.

 I don't really have any ideas how/why this patch causes issues with
 shutdown - it's a path that only gets executed during initialisation.
 Can you try and capture the kernel log during shutdown ("dmesg -w"
 over ssh? netconsole?), and see if there's any relevant messages
 providing a hint at what's going on?  Alternatively, you could try
 unloading the module (you will have to stop X/wayland/gdm/etc/etc
 first) and seeing if that hangs too.

 Ben.
>>>
>>> Sorry for the delay - I've been learning about netconsole and netcat. 
>>> However, I had no success with ssh and netconsole
>>> produced a log with nothing unusual in it.
>>>
>>> Simply stopping Xorg and removing the nouveau module succeeds.
>>>
>>> So, I rebuilt rc6+ after a pull from linus' 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-08 Thread Chris Clayton
Hi.

I'm assuming  that we are not going to see a fix for this regression before 6.2 
is released. Consequently, I've
implemented a (very simple) workaround. All that happens is that in the (sysv) 
init script that starts and stops SDDM,
the nouveau module is removed once SDDM is stopped. With that in place, my 
system no longer freezes on reboot or poweroff.

Let me know if I can provide any additional diagnostics although, with the 
problem seemingly occurring so late in the
shutdown process, I may need help on how to go about capturing.

Chris

On 02/02/2023 20:45, Chris Clayton wrote:
> 
> 
> On 01/02/2023 13:51, Chris Clayton wrote:
>>
>>
>> On 30/01/2023 23:27, Ben Skeggs wrote:
>>> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
>>> wrote:

 Hi again.

 On 30/01/2023 20:19, Chris Clayton wrote:
> Thanks, Ben.

 

>> Hey,
>>
>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>> *any* of my boards.  Could you try the attached patch please?
>
> Unfortunately, the patch made no difference.
>
> I've been looking at how the graphics on my laptop is set up, and have a 
> bit of a worry about whether the firmware might
> be playing a part in this problem. In order to offload video decoding to 
> the NVidia TU117 GPU, it seems the scrubber
> firmware must be available, but as far as I know,that has not been 
> released by NVidia. To get it to work, I followed
> what ubuntu have done and the scrubber in 
> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
> firmware loaded is for a different card is being
> loaded. I note that processing related to firmware is being changed in 
> the patch. Might my set up be at the root of my
> problem?
>
> I'll have a fiddle an see what I can work out.
>
> Chris
>
>>
>> Thanks,
>> Ben.
>>
>>>

 Well, my fiddling has got my system rebooting and shutting down 
 successfully again. I found that if I delete the symlink
 to the scrubber firmware, reboot and shutdown work again. There are 
 however, a number of other files in the tu117
 firmware directory tree that that are symlinks to actual files in its 
 tu116 counterpart. So I deleted all of those too.
 Unfortunately, the absence of one or more of those symlinks causes Xorg to 
 fail to start. I've reinstated all the links
 except scrubber and I now have a system that works as it did until I tried 
 to run a kernel that includes the bad commit
 I identified in my bisection. That includes offloading video decoding to 
 the NVidia card, so what ever I read that said
 the scrubber firmware was needed seems to have been wrong. I get a new 
 message that (nouveau :01:00.0: fb: VPR
 locked, but no scrubber binary!), but, hey, we can't have everything.

 If you still want to get to the bottom of this, let me know what you need 
 me to provide and I'll do my best. I suspect
 you might want to because there will a n awful lot of Ubuntu-based systems 
 out there with that scrubber.bin symlink in
 place. On the other hand,m it could but quite a while before ubuntu are 
 deploying 6.2 or later kernels.
>>> The symlinks are correct - whole groups of GPUs share the same FW, and
>>> we use symlinks in linux-firmware to represent this.
>>>
>>> I don't really have any ideas how/why this patch causes issues with
>>> shutdown - it's a path that only gets executed during initialisation.
>>> Can you try and capture the kernel log during shutdown ("dmesg -w"
>>> over ssh? netconsole?), and see if there's any relevant messages
>>> providing a hint at what's going on?  Alternatively, you could try
>>> unloading the module (you will have to stop X/wayland/gdm/etc/etc
>>> first) and seeing if that hangs too.
>>>
>>> Ben.
>>
>> Sorry for the delay - I've been learning about netconsole and netcat. 
>> However, I had no success with ssh and netconsole
>> produced a log with nothing unusual in it.
>>
>> Simply stopping Xorg and removing the nouveau module succeeds.
>>
>> So, I rebuilt rc6+ after a pull from linus' tree this morning and set the 
>> nouveau debug level to 7. I then booted to a
>> console before doing a reboot (with Ctl+Alt+Del). As expected the machine 
>> locked up just before it would ordinarily
>> restart. The last few lines on the console might be helpful:
>>
>> ...
>> nouveau :01:00:0  fifo: preinit running...
>> nouveau :01:00:0  fifo: preinit completed in 4us
>> nouveau :01:00:0  gr: preinit running...
>> nouveau :01:00:0  gr: preinit completed in 0us
>> nouveau :01:00:0  nvdec0: preinit running...
>> nouveau :01:00:0  nvdec0: preinit completed in 0us
>> nouveau :01:00:0  nvdec0: preinit running...
>> nouveau :01:00:0  nvdec0: preinit completed in 0us
>> nouveau 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-01 Thread Chris Clayton



On 30/01/2023 23:27, Ben Skeggs wrote:
> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  wrote:
>>
>> Hi again.
>>
>> On 30/01/2023 20:19, Chris Clayton wrote:
>>> Thanks, Ben.
>>
>> 
>>
 Hey,

 This is a complete shot-in-the-dark, as I don't see this behaviour on
 *any* of my boards.  Could you try the attached patch please?
>>>
>>> Unfortunately, the patch made no difference.
>>>
>>> I've been looking at how the graphics on my laptop is set up, and have a 
>>> bit of a worry about whether the firmware might
>>> be playing a part in this problem. In order to offload video decoding to 
>>> the NVidia TU117 GPU, it seems the scrubber
>>> firmware must be available, but as far as I know,that has not been released 
>>> by NVidia. To get it to work, I followed
>>> what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ 
>>> is a symlink to
>>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
>>> firmware loaded is for a different card is being
>>> loaded. I note that processing related to firmware is being changed in the 
>>> patch. Might my set up be at the root of my
>>> problem?
>>>
>>> I'll have a fiddle an see what I can work out.
>>>
>>> Chris
>>>

 Thanks,
 Ben.

>
>>
>> Well, my fiddling has got my system rebooting and shutting down successfully 
>> again. I found that if I delete the symlink
>> to the scrubber firmware, reboot and shutdown work again. There are however, 
>> a number of other files in the tu117
>> firmware directory tree that that are symlinks to actual files in its tu116 
>> counterpart. So I deleted all of those too.
>> Unfortunately, the absence of one or more of those symlinks causes Xorg to 
>> fail to start. I've reinstated all the links
>> except scrubber and I now have a system that works as it did until I tried 
>> to run a kernel that includes the bad commit
>> I identified in my bisection. That includes offloading video decoding to the 
>> NVidia card, so what ever I read that said
>> the scrubber firmware was needed seems to have been wrong. I get a new 
>> message that (nouveau :01:00.0: fb: VPR
>> locked, but no scrubber binary!), but, hey, we can't have everything.
>>
>> If you still want to get to the bottom of this, let me know what you need me 
>> to provide and I'll do my best. I suspect
>> you might want to because there will a n awful lot of Ubuntu-based systems 
>> out there with that scrubber.bin symlink in
>> place. On the other hand,m it could but quite a while before ubuntu are 
>> deploying 6.2 or later kernels.
> The symlinks are correct - whole groups of GPUs share the same FW, and
> we use symlinks in linux-firmware to represent this.
> 
> I don't really have any ideas how/why this patch causes issues with
> shutdown - it's a path that only gets executed during initialisation.
> Can you try and capture the kernel log during shutdown ("dmesg -w"
> over ssh? netconsole?), and see if there's any relevant messages
> providing a hint at what's going on?  Alternatively, you could try
> unloading the module (you will have to stop X/wayland/gdm/etc/etc
> first) and seeing if that hangs too.
> 
> Ben.

Sorry for the delay - I've been learning about netconsole and netcat. However, 
I had no success with ssh and netconsole
produced a log with nothing unusual in it.

Simply stopping Xorg and removing the nouveau module succeeds.

So, I rebuilt rc6+ after a pull from linus' tree this morning and set the 
nouveau debug level to 7. I then booted to a
console before doing a reboot (with Ctl+Alt+Del). As expected the machine 
locked up just before it would ordinarily
restart. The last few lines on the console might be helpful:

...
nouveau :01:00:0  fifo: preinit running...
nouveau :01:00:0  fifo: preinit completed in 4us
nouveau :01:00:0  gr: preinit running...
nouveau :01:00:0  gr: preinit completed in 0us
nouveau :01:00:0  nvdec0: preinit running...
nouveau :01:00:0  nvdec0: preinit completed in 0us
nouveau :01:00:0  nvdec0: preinit running...
nouveau :01:00:0  nvdec0: preinit completed in 0us
nouveau :01:00:0  sec2: preinit running...
nouveau :01:00:0  sec2: preinit completed in 0us
nouveau :01:00:0  fb:.VPR locked, running scrubber binary

These messages appear after the "sd 4:0:0:0 [sda] Stopping disk" I reported in 
my initial email.

After the "running scrubber" line appears the machine is locked and I have to 
hold down the power button to recover. I
get the same outcome from running "halt -dip", "poweroff -di" and "shutdown -h 
-P now". I guess it's no surprise that
all three result in the same outcome because invocations halt, poweroff and 
reboot (without the -f argument)from a
runlevel other than 0 resukt in shutdown being run. switching to runlevel 0 
with "telenit 0" results in the same
messages from nouveau followed by the lockup.

Let me know if you need any additional diagnostics.

Chris

> 
>>
>> Thanks,
>>
>> Chris
>>
>> 


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-30 Thread Ben Skeggs
On Tue, 31 Jan 2023 at 09:09, Chris Clayton  wrote:
>
> Hi again.
>
> On 30/01/2023 20:19, Chris Clayton wrote:
> > Thanks, Ben.
>
> 
>
> >> Hey,
> >>
> >> This is a complete shot-in-the-dark, as I don't see this behaviour on
> >> *any* of my boards.  Could you try the attached patch please?
> >
> > Unfortunately, the patch made no difference.
> >
> > I've been looking at how the graphics on my laptop is set up, and have a 
> > bit of a worry about whether the firmware might
> > be playing a part in this problem. In order to offload video decoding to 
> > the NVidia TU117 GPU, it seems the scrubber
> > firmware must be available, but as far as I know,that has not been released 
> > by NVidia. To get it to work, I followed
> > what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ 
> > is a symlink to
> > ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
> > firmware loaded is for a different card is being
> > loaded. I note that processing related to firmware is being changed in the 
> > patch. Might my set up be at the root of my
> > problem?
> >
> > I'll have a fiddle an see what I can work out.
> >
> > Chris
> >
> >>
> >> Thanks,
> >> Ben.
> >>
> >>>
>
> Well, my fiddling has got my system rebooting and shutting down successfully 
> again. I found that if I delete the symlink
> to the scrubber firmware, reboot and shutdown work again. There are however, 
> a number of other files in the tu117
> firmware directory tree that that are symlinks to actual files in its tu116 
> counterpart. So I deleted all of those too.
> Unfortunately, the absence of one or more of those symlinks causes Xorg to 
> fail to start. I've reinstated all the links
> except scrubber and I now have a system that works as it did until I tried to 
> run a kernel that includes the bad commit
> I identified in my bisection. That includes offloading video decoding to the 
> NVidia card, so what ever I read that said
> the scrubber firmware was needed seems to have been wrong. I get a new 
> message that (nouveau :01:00.0: fb: VPR
> locked, but no scrubber binary!), but, hey, we can't have everything.
>
> If you still want to get to the bottom of this, let me know what you need me 
> to provide and I'll do my best. I suspect
> you might want to because there will a n awful lot of Ubuntu-based systems 
> out there with that scrubber.bin symlink in
> place. On the other hand,m it could but quite a while before ubuntu are 
> deploying 6.2 or later kernels.
The symlinks are correct - whole groups of GPUs share the same FW, and
we use symlinks in linux-firmware to represent this.

I don't really have any ideas how/why this patch causes issues with
shutdown - it's a path that only gets executed during initialisation.
Can you try and capture the kernel log during shutdown ("dmesg -w"
over ssh? netconsole?), and see if there's any relevant messages
providing a hint at what's going on?  Alternatively, you could try
unloading the module (you will have to stop X/wayland/gdm/etc/etc
first) and seeing if that hangs too.

Ben.

>
> Thanks,
>
> Chris
>
> 


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-30 Thread Chris Clayton
Hi again.

On 30/01/2023 20:19, Chris Clayton wrote:
> Thanks, Ben.



>> Hey,
>>
>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>> *any* of my boards.  Could you try the attached patch please?
> 
> Unfortunately, the patch made no difference.
> 
> I've been looking at how the graphics on my laptop is set up, and have a bit 
> of a worry about whether the firmware might
> be playing a part in this problem. In order to offload video decoding to the 
> NVidia TU117 GPU, it seems the scrubber
> firmware must be available, but as far as I know,that has not been released 
> by NVidia. To get it to work, I followed
> what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ 
> is a symlink to
> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
> firmware loaded is for a different card is being
> loaded. I note that processing related to firmware is being changed in the 
> patch. Might my set up be at the root of my
> problem?
> 
> I'll have a fiddle an see what I can work out.
> 
> Chris
> 
>>
>> Thanks,
>> Ben.
>>
>>>

Well, my fiddling has got my system rebooting and shutting down successfully 
again. I found that if I delete the symlink
to the scrubber firmware, reboot and shutdown work again. There are however, a 
number of other files in the tu117
firmware directory tree that that are symlinks to actual files in its tu116 
counterpart. So I deleted all of those too.
Unfortunately, the absence of one or more of those symlinks causes Xorg to fail 
to start. I've reinstated all the links
except scrubber and I now have a system that works as it did until I tried to 
run a kernel that includes the bad commit
I identified in my bisection. That includes offloading video decoding to the 
NVidia card, so what ever I read that said
the scrubber firmware was needed seems to have been wrong. I get a new message 
that (nouveau :01:00.0: fb: VPR
locked, but no scrubber binary!), but, hey, we can't have everything.

If you still want to get to the bottom of this, let me know what you need me to 
provide and I'll do my best. I suspect
you might want to because there will a n awful lot of Ubuntu-based systems out 
there with that scrubber.bin symlink in
place. On the other hand,m it could but quite a while before ubuntu are 
deploying 6.2 or later kernels.

Thanks,

Chris




Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-30 Thread Chris Clayton
Thanks, Ben.

On 30/01/2023 01:09, Ben Skeggs wrote:
> On Sat, 28 Jan 2023 at 21:29, Chris Clayton  wrote:
>>
>>
>>
>> On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten Leemhuis) 
>> wrote:
>>> On 27.01.23 20:46, Chris Clayton wrote:
 [Resend because the mail client on my phone decided to turn HTML on behind 
 my back, so my reply got bounced.]

 Thanks Thorsten.

 I did try to revert but it didnt revert cleanly and I don't have the 
 knowledge to fix it up.

 The patch was part of a merge that included a number of related patches. 
 Tomorrow, I'll try to revert the lot and report
 back.
>>>
>>> You are free to do so, but there is no need for that from my side. I
>>> only wanted to know if a simple revert would do the trick; if it
>>> doesn't, it in my experience often is best to leave things to the
>>> developers of the code in question,
>>
>> Sound advice, Thorsten. Way to many conflicts for me to resolve.
> Hey,
> 
> This is a complete shot-in-the-dark, as I don't see this behaviour on
> *any* of my boards.  Could you try the attached patch please?

Unfortunately, the patch made no difference.

I've been looking at how the graphics on my laptop is set up, and have a bit of 
a worry about whether the firmware might
be playing a part in this problem. In order to offload video decoding to the 
NVidia TU117 GPU, it seems the scrubber
firmware must be available, but as far as I know,that has not been released by 
NVidia. To get it to work, I followed
what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ is 
a symlink to
../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
firmware loaded is for a different card is being
loaded. I note that processing related to firmware is being changed in the 
patch. Might my set up be at the root of my
problem?

I'll have a fiddle an see what I can work out.

Chris

> 
> Thanks,
> Ben.
> 
>>
>> as they know it best and thus have a
>>> better idea which hidden side effect a more complex revert might have.
>>>
>>> Ciao, Thorsten
>>>
 On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
 wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> @nouveau-maintainers, did anyone take a look at this? The report is
> already 8 days old and I don't see a single reply. Sure, we'll likely
> get a -rc8, but still it would be good to not fix this on the finish line.
>
> Chris, btw, did you try if you can revert the commit on top of latest
> mainline? And if so, does it fix the problem?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
>> [adding various lists and the two other nouveau maintainers to the list
>> of recipients]
>
>> On 18.01.23 21:59, Chris Clayton wrote:
>>> Hi.
>>>
>>> I build and installed the lastest development kernel earlier this week. 
>>> I've found that when I try the laptop down (or
>>> reboot it), it hangs right at the end of closing the current session. 
>>> The last line I see on  the screen when rebooting is:
>>>
>>>   sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>
>>> when closing down I see one additional line:
>>>
>>>   sd 4:0:0:0 [sda]Stopping disk
>>>
>>> In both cases the machine then hangs and I have to hold down the power 
>>> button fot a few seconds to switch it off.
>>>
>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
>>> landed on:
>>>
>>>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>>> (VPR scrubber)
>>>
>>> I built and installed a kernel with 
>>> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
>>> checked out
>>> and that shuts down and reboots fine. It the did the same with the bad 
>>> commit checked out and that does indeed hang, so
>>> I'm confident the bisect outcome is OK.
>>>
>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>
>>> My system had dual GPUs - one intel and one NVidia. Related extracts 
>>> from 'lscpi -v' is:
>>>
>>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 
>>> [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
>>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>>
>>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>>
>>> Memory at c200 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-29 Thread Ben Skeggs
On Sat, 28 Jan 2023 at 21:29, Chris Clayton  wrote:
>
>
>
> On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten Leemhuis) 
> wrote:
> > On 27.01.23 20:46, Chris Clayton wrote:
> >> [Resend because the mail client on my phone decided to turn HTML on behind 
> >> my back, so my reply got bounced.]
> >>
> >> Thanks Thorsten.
> >>
> >> I did try to revert but it didnt revert cleanly and I don't have the 
> >> knowledge to fix it up.
> >>
> >> The patch was part of a merge that included a number of related patches. 
> >> Tomorrow, I'll try to revert the lot and report
> >> back.
> >
> > You are free to do so, but there is no need for that from my side. I
> > only wanted to know if a simple revert would do the trick; if it
> > doesn't, it in my experience often is best to leave things to the
> > developers of the code in question,
>
> Sound advice, Thorsten. Way to many conflicts for me to resolve.
Hey,

This is a complete shot-in-the-dark, as I don't see this behaviour on
*any* of my boards.  Could you try the attached patch please?

Thanks,
Ben.

>
> as they know it best and thus have a
> > better idea which hidden side effect a more complex revert might have.
> >
> > Ciao, Thorsten
> >
> >> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
> >> wrote:
> >>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> >>> to make this easily accessible to everyone.
> >>>
> >>> @nouveau-maintainers, did anyone take a look at this? The report is
> >>> already 8 days old and I don't see a single reply. Sure, we'll likely
> >>> get a -rc8, but still it would be good to not fix this on the finish line.
> >>>
> >>> Chris, btw, did you try if you can revert the commit on top of latest
> >>> mainline? And if so, does it fix the problem?
> >>>
> >>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >>> --
> >>> Everything you wanna know about Linux kernel regression tracking:
> >>> https://linux-regtracking.leemhuis.info/about/#tldr
> >>> If I did something stupid, please tell me, as explained on that page.
> >>>
> >>> #regzbot poke
> >>>
> >>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> >>> wrote:
>  [adding various lists and the two other nouveau maintainers to the list
>  of recipients]
> >>>
>  On 18.01.23 21:59, Chris Clayton wrote:
> > Hi.
> >
> > I build and installed the lastest development kernel earlier this week. 
> > I've found that when I try the laptop down (or
> > reboot it), it hangs right at the end of closing the current session. 
> > The last line I see on  the screen when rebooting is:
> >
> >   sd 4:0:0:0: [sda] Synchronising SCSI cache
> >
> > when closing down I see one additional line:
> >
> >   sd 4:0:0:0 [sda]Stopping disk
> >
> > In both cases the machine then hangs and I have to hold down the power 
> > button fot a few seconds to switch it off.
> >
> > Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
> > landed on:
> >
> >   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> > drm/nouveau/flcn: new code to load+boot simple HS FWs
> > (VPR scrubber)
> >
> > I built and installed a kernel with 
> > f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
> > checked out
> > and that shuts down and reboots fine. It the did the same with the bad 
> > commit checked out and that does indeed hang, so
> > I'm confident the bisect outcome is OK.
> >
> > Kernels 6.1.6 and 5.15.88 are also OK.
> >
> > My system had dual GPUs - one intel and one NVidia. Related extracts 
> > from 'lscpi -v' is:
> >
> > 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 
> > [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
> > Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> >
> > Flags: bus master, fast devsel, latency 0, IRQ 142
> >
> > Memory at c200 (64-bit, non-prefetchable) [size=16M]
> >
> > Memory at a000 (64-bit, prefetchable) [size=256M]
> >
> > I/O ports at 5000 [size=64]
> >
> > Expansion ROM at 000c [virtual] [disabled] [size=128K]
> >
> > Capabilities: [40] Vendor Specific Information: Len=0c 
> >
> > Capabilities: [70] Express Root Complex Integrated Endpoint, 
> > MSI 00
> >
> > Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >
> > Capabilities: [d0] Power Management version 2
> >
> > Kernel driver in use: i915
> >
> > Kernel modules: i915
> >
> >
> > 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce 
> > GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> > controller])
> > Subsystem: 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-28 Thread Chris Clayton



On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
> On 27.01.23 20:46, Chris Clayton wrote:
>> [Resend because the mail client on my phone decided to turn HTML on behind 
>> my back, so my reply got bounced.]
>>
>> Thanks Thorsten.
>>
>> I did try to revert but it didnt revert cleanly and I don't have the 
>> knowledge to fix it up.
>>
>> The patch was part of a merge that included a number of related patches. 
>> Tomorrow, I'll try to revert the lot and report
>> back.
> 
> You are free to do so, but there is no need for that from my side. I
> only wanted to know if a simple revert would do the trick; if it
> doesn't, it in my experience often is best to leave things to the
> developers of the code in question, 

Sound advice, Thorsten. Way to many conflicts for me to resolve.

as they know it best and thus have a
> better idea which hidden side effect a more complex revert might have.
> 
> Ciao, Thorsten
> 
>> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
>> wrote:
>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>> to make this easily accessible to everyone.
>>>
>>> @nouveau-maintainers, did anyone take a look at this? The report is
>>> already 8 days old and I don't see a single reply. Sure, we'll likely
>>> get a -rc8, but still it would be good to not fix this on the finish line.
>>>
>>> Chris, btw, did you try if you can revert the commit on top of latest
>>> mainline? And if so, does it fix the problem?
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>> --
>>> Everything you wanna know about Linux kernel regression tracking:
>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>> If I did something stupid, please tell me, as explained on that page.
>>>
>>> #regzbot poke
>>>
>>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>>> wrote:
 [adding various lists and the two other nouveau maintainers to the list
 of recipients]
>>>
 On 18.01.23 21:59, Chris Clayton wrote:
> Hi.
>
> I build and installed the lastest development kernel earlier this week. 
> I've found that when I try the laptop down (or
> reboot it), it hangs right at the end of closing the current session. The 
> last line I see on  the screen when rebooting is:
>
>   sd 4:0:0:0: [sda] Synchronising SCSI cache
>
> when closing down I see one additional line:
>
>   sd 4:0:0:0 [sda]Stopping disk
>
> In both cases the machine then hangs and I have to hold down the power 
> button fot a few seconds to switch it off.
>
> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
> landed on:
>
>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> (VPR scrubber)
>
> I built and installed a kernel with 
> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
> checked out
> and that shuts down and reboots fine. It the did the same with the bad 
> commit checked out and that does indeed hang, so
> I'm confident the bisect outcome is OK.
>
> Kernels 6.1.6 and 5.15.88 are also OK.
>
> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> 'lscpi -v' is:
>
> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> Graphics] (rev 05) (prog-if 00 [VGA controller])
> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>
> Flags: bus master, fast devsel, latency 0, IRQ 142
>
> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>
> Memory at a000 (64-bit, prefetchable) [size=256M]
>
> I/O ports at 5000 [size=64]
>
> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>
> Capabilities: [40] Vendor Specific Information: Len=0c 
>
> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 
> 00
>
> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>
> Capabilities: [d0] Power Management version 2
>
> Kernel driver in use: i915
>
> Kernel modules: i915
>
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> controller])
> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti 
> Mobile]
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> Memory at b000 (64-bit, prefetchable) [size=256M]
> Memory at c000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> Expansion ROM at c300 [disabled] [size=512K]
>

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
On 27.01.23 20:46, Chris Clayton wrote:
> [Resend because the mail client on my phone decided to turn HTML on behind my 
> back, so my reply got bounced.]
> 
> Thanks Thorsten.
> 
> I did try to revert but it didnt revert cleanly and I don't have the 
> knowledge to fix it up.
> 
> The patch was part of a merge that included a number of related patches. 
> Tomorrow, I'll try to revert the lot and report
> back.

You are free to do so, but there is no need for that from my side. I
only wanted to know if a simple revert would do the trick; if it
doesn't, it in my experience often is best to leave things to the
developers of the code in question, as they know it best and thus have a
better idea which hidden side effect a more complex revert might have.

Ciao, Thorsten

> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
> wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> @nouveau-maintainers, did anyone take a look at this? The report is
>> already 8 days old and I don't see a single reply. Sure, we'll likely
>> get a -rc8, but still it would be good to not fix this on the finish line.
>>
>> Chris, btw, did you try if you can revert the commit on top of latest
>> mainline? And if so, does it fix the problem?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> [adding various lists and the two other nouveau maintainers to the list
>>> of recipients]
>>
>>> On 18.01.23 21:59, Chris Clayton wrote:
 Hi.

 I build and installed the lastest development kernel earlier this week. 
 I've found that when I try the laptop down (or
 reboot it), it hangs right at the end of closing the current session. The 
 last line I see on  the screen when rebooting is:

sd 4:0:0:0: [sda] Synchronising SCSI cache

 when closing down I see one additional line:

sd 4:0:0:0 [sda]Stopping disk

 In both cases the machine then hangs and I have to hold down the power 
 button fot a few seconds to switch it off.

 Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
 landed on:

# first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
 drm/nouveau/flcn: new code to load+boot simple HS FWs
 (VPR scrubber)

 I built and installed a kernel with 
 f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
 checked out
 and that shuts down and reboots fine. It the did the same with the bad 
 commit checked out and that does indeed hang, so
 I'm confident the bisect outcome is OK.

 Kernels 6.1.6 and 5.15.88 are also OK.

 My system had dual GPUs - one intel and one NVidia. Related extracts from 
 'lscpi -v' is:

 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
 Graphics] (rev 05) (prog-if 00 [VGA controller])
 Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]

 Flags: bus master, fast devsel, latency 0, IRQ 142

 Memory at c200 (64-bit, non-prefetchable) [size=16M]

 Memory at a000 (64-bit, prefetchable) [size=256M]

 I/O ports at 5000 [size=64]

 Expansion ROM at 000c [virtual] [disabled] [size=128K]

 Capabilities: [40] Vendor Specific Information: Len=0c 

 Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00

 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-

 Capabilities: [d0] Power Management version 2

 Kernel driver in use: i915

 Kernel modules: i915


 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
 controller])
 Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
 Flags: bus master, fast devsel, latency 0, IRQ 141
 Memory at c400 (32-bit, non-prefetchable) [size=16M]
 Memory at b000 (64-bit, prefetchable) [size=256M]
 Memory at c000 (64-bit, prefetchable) [size=32M]
 I/O ports at 4000 [size=128]
 Expansion ROM at c300 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Legacy Endpoint, MSI 00
 Kernel driver in use: nouveau
 Kernel modules: nouveau

 DRI_PRIME=1 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Chris Clayton
[Resend because the mail client on my phone decided to turn HTML on behind my 
back, so my reply got bounced.]

Thanks Thorsten.

I did try to revert but it didnt revert cleanly and I don't have the knowledge 
to fix it up.

The patch was part of a merge that included a number of related patches. 
Tomorrow, I'll try to revert the lot and report
back.

Chris



On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
> 
> @nouveau-maintainers, did anyone take a look at this? The report is
> already 8 days old and I don't see a single reply. Sure, we'll likely
> get a -rc8, but still it would be good to not fix this on the finish line.
> 
> Chris, btw, did you try if you can revert the commit on top of latest
> mainline? And if so, does it fix the problem?
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
> #regzbot poke
> 
> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
>> [adding various lists and the two other nouveau maintainers to the list
>> of recipients]
> 
>> On 18.01.23 21:59, Chris Clayton wrote:
>>> Hi.
>>>
>>> I build and installed the lastest development kernel earlier this week. 
>>> I've found that when I try the laptop down (or
>>> reboot it), it hangs right at the end of closing the current session. The 
>>> last line I see on  the screen when rebooting is:
>>>
>>> sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>
>>> when closing down I see one additional line:
>>>
>>> sd 4:0:0:0 [sda]Stopping disk
>>>
>>> In both cases the machine then hangs and I have to hold down the power 
>>> button fot a few seconds to switch it off.
>>>
>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
>>> landed on:
>>>
>>> # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>>> (VPR scrubber)
>>>
>>> I built and installed a kernel with 
>>> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
>>> checked out
>>> and that shuts down and reboots fine. It the did the same with the bad 
>>> commit checked out and that does indeed hang, so
>>> I'm confident the bisect outcome is OK.
>>>
>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>
>>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>>> 'lscpi -v' is:
>>>
>>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>>
>>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>>
>>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>>
>>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>>
>>> I/O ports at 5000 [size=64]
>>>
>>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>>
>>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>>
>>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>>
>>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>>
>>> Capabilities: [d0] Power Management version 2
>>>
>>> Kernel driver in use: i915
>>>
>>> Kernel modules: i915
>>>
>>>
>>> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
>>> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
>>> controller])
>>> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
>>> Flags: bus master, fast devsel, latency 0, IRQ 141
>>> Memory at c400 (32-bit, non-prefetchable) [size=16M]
>>> Memory at b000 (64-bit, prefetchable) [size=256M]
>>> Memory at c000 (64-bit, prefetchable) [size=32M]
>>> I/O ports at 4000 [size=128]
>>> Expansion ROM at c300 [disabled] [size=512K]
>>> Capabilities: [60] Power Management version 3
>>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>> Capabilities: [78] Express Legacy Endpoint, MSI 00
>>> Kernel driver in use: nouveau
>>> Kernel modules: nouveau
>>>
>>> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
>>> sysvinit).
>>>
>>> I've attached the bisect.log, but please let me know if I can provide any 
>>> other diagnostics. Please cc me as I'm not
>>> subscribed.
>>
>> Thanks for the report. To be sure the issue doesn't fall through the
>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>> tracking bot:
>>
>> #regzbot ^introduced e44c2170876197
>> #regzbot title drm: nouveau: hangs on poweroff/reboot
>> #regzbot 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Chris Clayton
[Resend because the mail client on my phone dedcided to turn HTML on behinf my 
back, so my repluy got bounced.]

Thanks Karol.

I sent the original report to Ben and LKML. Thorsten then added you, Lyude Paul 
and the dri-devel and nouveau  mail
lists. So you should have received this report on or about January 19.

Chris

On 27/01/2023 11:35, Karol Herbst wrote:
> Where was the original email sent to anyway, because I don't have it at all.
> 
> Anyhow, I suspect we want to fetch logs to see what's happening, but
> due to the nature of this bug it might get difficult.
> 
> I'm checking out the laptops I have here if I can reproduce this
> issue, but I think all mine with Turing GPUs are fine.
> 
> Maybe Ben has any idea what might be wrong with
> 0e44c21708761977dcbea9b846b51a6fb684907a or if that's an issue which
> is already fixed by not upstreamed patches as I think I remember Ben
> to talk about something like that recently.
> 
> Karol
> 
> On Fri, Jan 27, 2023 at 12:20 PM Linux kernel regression tracking
> (Thorsten Leemhuis)  wrote:
>>
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> @nouveau-maintainers, did anyone take a look at this? The report is
>> already 8 days old and I don't see a single reply. Sure, we'll likely
>> get a -rc8, but still it would be good to not fix this on the finish line.
>>
>> Chris, btw, did you try if you can revert the commit on top of latest
>> mainline? And if so, does it fix the problem?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> [adding various lists and the two other nouveau maintainers to the list
>>> of recipients]
>>
>>> On 18.01.23 21:59, Chris Clayton wrote:
 Hi.

 I build and installed the lastest development kernel earlier this week. 
 I've found that when I try the laptop down (or
 reboot it), it hangs right at the end of closing the current session. The 
 last line I see on  the screen when rebooting is:

  sd 4:0:0:0: [sda] Synchronising SCSI cache

 when closing down I see one additional line:

  sd 4:0:0:0 [sda]Stopping disk

 In both cases the machine then hangs and I have to hold down the power 
 button fot a few seconds to switch it off.

 Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
 landed on:

  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
 drm/nouveau/flcn: new code to load+boot simple HS FWs
 (VPR scrubber)

 I built and installed a kernel with 
 f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
 checked out
 and that shuts down and reboots fine. It the did the same with the bad 
 commit checked out and that does indeed hang, so
 I'm confident the bisect outcome is OK.

 Kernels 6.1.6 and 5.15.88 are also OK.

 My system had dual GPUs - one intel and one NVidia. Related extracts from 
 'lscpi -v' is:

 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
 Graphics] (rev 05) (prog-if 00 [VGA controller])
 Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]

 Flags: bus master, fast devsel, latency 0, IRQ 142

 Memory at c200 (64-bit, non-prefetchable) [size=16M]

 Memory at a000 (64-bit, prefetchable) [size=256M]

 I/O ports at 5000 [size=64]

 Expansion ROM at 000c [virtual] [disabled] [size=128K]

 Capabilities: [40] Vendor Specific Information: Len=0c 

 Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00

 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-

 Capabilities: [d0] Power Management version 2

 Kernel driver in use: i915

 Kernel modules: i915


 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
 controller])
 Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
 Flags: bus master, fast devsel, latency 0, IRQ 141
 Memory at c400 (32-bit, non-prefetchable) [size=16M]
 Memory at b000 (64-bit, prefetchable) [size=256M]
 Memory at c000 (64-bit, prefetchable) [size=32M]
 I/O ports at 4000 [size=128]
 Expansion ROM at c300 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Chris Clayton
Thanks Thorsten.

I did try to revert but it didnt revert cleanly and I don't have the
knowledge to fix it up.

The patch was part of a merge that included a number of related patches.
I'll try to revert the lot and report back.

Chris


On Fri, 27 Jan 2023, 11:20 Linux kernel regression tracking (Thorsten
Leemhuis),  wrote:

> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> @nouveau-maintainers, did anyone take a look at this? The report is
> already 8 days old and I don't see a single reply. Sure, we'll likely
> get a -rc8, but still it would be good to not fix this on the finish line.
>
> Chris, btw, did you try if you can revert the commit on top of latest
> mainline? And if so, does it fix the problem?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
> > [adding various lists and the two other nouveau maintainers to the list
> > of recipients]
>
> > On 18.01.23 21:59, Chris Clayton wrote:
> >> Hi.
> >>
> >> I build and installed the lastest development kernel earlier this week.
> I've found that when I try the laptop down (or
> >> reboot it), it hangs right at the end of closing the current session.
> The last line I see on  the screen when rebooting is:
> >>
> >>  sd 4:0:0:0: [sda] Synchronising SCSI cache
> >>
> >> when closing down I see one additional line:
> >>
> >>  sd 4:0:0:0 [sda]Stopping disk
> >>
> >> In both cases the machine then hangs and I have to hold down the power
> button fot a few seconds to switch it off.
> >>
> >> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and
> landed on:
> >>
> >>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a]
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> >> (VPR scrubber)
> >>
> >> I built and installed a kernel with
> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit)
> checked out
> >> and that shuts down and reboots fine. It the did the same with the bad
> commit checked out and that does indeed hang, so
> >> I'm confident the bisect outcome is OK.
> >>
> >> Kernels 6.1.6 and 5.15.88 are also OK.
> >>
> >> My system had dual GPUs - one intel and one NVidia. Related extracts
> from 'lscpi -v' is:
> >>
> >> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2
> [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
> >> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> >>
> >> Flags: bus master, fast devsel, latency 0, IRQ 142
> >>
> >> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> >>
> >> Memory at a000 (64-bit, prefetchable) [size=256M]
> >>
> >> I/O ports at 5000 [size=64]
> >>
> >> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> >>
> >> Capabilities: [40] Vendor Specific Information: Len=0c 
> >>
> >> Capabilities: [70] Express Root Complex Integrated Endpoint,
> MSI 00
> >>
> >> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >>
> >> Capabilities: [d0] Power Management version 2
> >>
> >> Kernel driver in use: i915
> >>
> >> Kernel modules: i915
> >>
> >>
> >> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce
> GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> >> controller])
> >> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti
> Mobile]
> >> Flags: bus master, fast devsel, latency 0, IRQ 141
> >> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> >> Memory at b000 (64-bit, prefetchable) [size=256M]
> >> Memory at c000 (64-bit, prefetchable) [size=32M]
> >> I/O ports at 4000 [size=128]
> >> Expansion ROM at c300 [disabled] [size=512K]
> >> Capabilities: [60] Power Management version 3
> >> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >> Capabilities: [78] Express Legacy Endpoint, MSI 00
> >> Kernel driver in use: nouveau
> >> Kernel modules: nouveau
> >>
> >> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still
> using sysvinit).
> >>
> >> I've attached the bisect.log, but please let me know if I can provide
> any other diagnostics. Please cc me as I'm not
> >> subscribed.
> >
> > Thanks for the report. To be sure the issue doesn't fall through the
> > cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> > tracking bot:
> >
> > #regzbot ^introduced e44c2170876197
> > #regzbot title drm: nouveau: hangs on poweroff/reboot
> > #regzbot ignore-activity
> >
> > This isn't a regression? This issue or a fix for it are 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Chris Clayton
Hi Karol.

I sent the originsl report to Ben and LKML. Thorsten then added you, Lyude
Paul and the dri-devel and nouveau lists. So you should have received this
report on or about January 19.

Chris

On Fri, 27 Jan 2023, 11:35 Karol Herbst,  wrote:

> Where was the original email sent to anyway, because I don't have it at
> all.
>
> Anyhow, I suspect we want to fetch logs to see what's happening, but
> due to the nature of this bug it might get difficult.
>
> I'm checking out the laptops I have here if I can reproduce this
> issue, but I think all mine with Turing GPUs are fine.
>
> Maybe Ben has any idea what might be wrong with
> 0e44c21708761977dcbea9b846b51a6fb684907a or if that's an issue which
> is already fixed by not upstreamed patches as I think I remember Ben
> to talk about something like that recently.
>
> Karol
>
> On Fri, Jan 27, 2023 at 12:20 PM Linux kernel regression tracking
> (Thorsten Leemhuis)  wrote:
> >
> > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > to make this easily accessible to everyone.
> >
> > @nouveau-maintainers, did anyone take a look at this? The report is
> > already 8 days old and I don't see a single reply. Sure, we'll likely
> > get a -rc8, but still it would be good to not fix this on the finish
> line.
> >
> > Chris, btw, did you try if you can revert the commit on top of latest
> > mainline? And if so, does it fix the problem?
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> > #regzbot poke
> >
> > On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> > wrote:
> > > [adding various lists and the two other nouveau maintainers to the list
> > > of recipients]
> >
> > > On 18.01.23 21:59, Chris Clayton wrote:
> > >> Hi.
> > >>
> > >> I build and installed the lastest development kernel earlier this
> week. I've found that when I try the laptop down (or
> > >> reboot it), it hangs right at the end of closing the current session.
> The last line I see on  the screen when rebooting is:
> > >>
> > >>  sd 4:0:0:0: [sda] Synchronising SCSI cache
> > >>
> > >> when closing down I see one additional line:
> > >>
> > >>  sd 4:0:0:0 [sda]Stopping disk
> > >>
> > >> In both cases the machine then hangs and I have to hold down the
> power button fot a few seconds to switch it off.
> > >>
> > >> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and
> landed on:
> > >>
> > >>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a]
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> > >> (VPR scrubber)
> > >>
> > >> I built and installed a kernel with
> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit)
> checked out
> > >> and that shuts down and reboots fine. It the did the same with the
> bad commit checked out and that does indeed hang, so
> > >> I'm confident the bisect outcome is OK.
> > >>
> > >> Kernels 6.1.6 and 5.15.88 are also OK.
> > >>
> > >> My system had dual GPUs - one intel and one NVidia. Related extracts
> from 'lscpi -v' is:
> > >>
> > >> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2
> [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
> > >> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> > >>
> > >> Flags: bus master, fast devsel, latency 0, IRQ 142
> > >>
> > >> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> > >>
> > >> Memory at a000 (64-bit, prefetchable) [size=256M]
> > >>
> > >> I/O ports at 5000 [size=64]
> > >>
> > >> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> > >>
> > >> Capabilities: [40] Vendor Specific Information: Len=0c 
> > >>
> > >> Capabilities: [70] Express Root Complex Integrated Endpoint,
> MSI 00
> > >>
> > >> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> > >>
> > >> Capabilities: [d0] Power Management version 2
> > >>
> > >> Kernel driver in use: i915
> > >>
> > >> Kernel modules: i915
> > >>
> > >>
> > >> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce
> GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> > >> controller])
> > >> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti
> Mobile]
> > >> Flags: bus master, fast devsel, latency 0, IRQ 141
> > >> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> > >> Memory at b000 (64-bit, prefetchable) [size=256M]
> > >> Memory at c000 (64-bit, prefetchable) [size=32M]
> > >> I/O ports at 4000 [size=128]
> > >> Expansion ROM at c300 [disabled] [size=512K]
> > >> Capabilities: [60] Power Management version 3
> > >> Capabilities: [68] MSI: 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Karol Herbst
Where was the original email sent to anyway, because I don't have it at all.

Anyhow, I suspect we want to fetch logs to see what's happening, but
due to the nature of this bug it might get difficult.

I'm checking out the laptops I have here if I can reproduce this
issue, but I think all mine with Turing GPUs are fine.

Maybe Ben has any idea what might be wrong with
0e44c21708761977dcbea9b846b51a6fb684907a or if that's an issue which
is already fixed by not upstreamed patches as I think I remember Ben
to talk about something like that recently.

Karol

On Fri, Jan 27, 2023 at 12:20 PM Linux kernel regression tracking
(Thorsten Leemhuis)  wrote:
>
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> @nouveau-maintainers, did anyone take a look at this? The report is
> already 8 days old and I don't see a single reply. Sure, we'll likely
> get a -rc8, but still it would be good to not fix this on the finish line.
>
> Chris, btw, did you try if you can revert the commit on top of latest
> mainline? And if so, does it fix the problem?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
> > [adding various lists and the two other nouveau maintainers to the list
> > of recipients]
>
> > On 18.01.23 21:59, Chris Clayton wrote:
> >> Hi.
> >>
> >> I build and installed the lastest development kernel earlier this week. 
> >> I've found that when I try the laptop down (or
> >> reboot it), it hangs right at the end of closing the current session. The 
> >> last line I see on  the screen when rebooting is:
> >>
> >>  sd 4:0:0:0: [sda] Synchronising SCSI cache
> >>
> >> when closing down I see one additional line:
> >>
> >>  sd 4:0:0:0 [sda]Stopping disk
> >>
> >> In both cases the machine then hangs and I have to hold down the power 
> >> button fot a few seconds to switch it off.
> >>
> >> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
> >> landed on:
> >>
> >>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> >> drm/nouveau/flcn: new code to load+boot simple HS FWs
> >> (VPR scrubber)
> >>
> >> I built and installed a kernel with 
> >> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
> >> checked out
> >> and that shuts down and reboots fine. It the did the same with the bad 
> >> commit checked out and that does indeed hang, so
> >> I'm confident the bisect outcome is OK.
> >>
> >> Kernels 6.1.6 and 5.15.88 are also OK.
> >>
> >> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> >> 'lscpi -v' is:
> >>
> >> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> >> Graphics] (rev 05) (prog-if 00 [VGA controller])
> >> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> >>
> >> Flags: bus master, fast devsel, latency 0, IRQ 142
> >>
> >> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> >>
> >> Memory at a000 (64-bit, prefetchable) [size=256M]
> >>
> >> I/O ports at 5000 [size=64]
> >>
> >> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> >>
> >> Capabilities: [40] Vendor Specific Information: Len=0c 
> >>
> >> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> >>
> >> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >>
> >> Capabilities: [d0] Power Management version 2
> >>
> >> Kernel driver in use: i915
> >>
> >> Kernel modules: i915
> >>
> >>
> >> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> >> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> >> controller])
> >> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
> >> Flags: bus master, fast devsel, latency 0, IRQ 141
> >> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> >> Memory at b000 (64-bit, prefetchable) [size=256M]
> >> Memory at c000 (64-bit, prefetchable) [size=32M]
> >> I/O ports at 4000 [size=128]
> >> Expansion ROM at c300 [disabled] [size=512K]
> >> Capabilities: [60] Power Management version 3
> >> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >> Capabilities: [78] Express Legacy Endpoint, MSI 00
> >> Kernel driver in use: nouveau
> >> Kernel modules: nouveau
> >>
> >> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
> >> sysvinit).
> >>
> >> I've attached the bisect.log, but please let me know if I can provide any 
> >> other diagnostics. Please cc me as I'm not
> >> 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

@nouveau-maintainers, did anyone take a look at this? The report is
already 8 days old and I don't see a single reply. Sure, we'll likely
get a -rc8, but still it would be good to not fix this on the finish line.

Chris, btw, did you try if you can revert the commit on top of latest
mainline? And if so, does it fix the problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> [adding various lists and the two other nouveau maintainers to the list
> of recipients]

> On 18.01.23 21:59, Chris Clayton wrote:
>> Hi.
>>
>> I build and installed the lastest development kernel earlier this week. I've 
>> found that when I try the laptop down (or
>> reboot it), it hangs right at the end of closing the current session. The 
>> last line I see on  the screen when rebooting is:
>>
>>  sd 4:0:0:0: [sda] Synchronising SCSI cache
>>
>> when closing down I see one additional line:
>>
>>  sd 4:0:0:0 [sda]Stopping disk
>>
>> In both cases the machine then hangs and I have to hold down the power 
>> button fot a few seconds to switch it off.
>>
>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
>> on:
>>
>>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>> (VPR scrubber)
>>
>> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
>> (the parent of the bad commit) checked out
>> and that shuts down and reboots fine. It the did the same with the bad 
>> commit checked out and that does indeed hang, so
>> I'm confident the bisect outcome is OK.
>>
>> Kernels 6.1.6 and 5.15.88 are also OK.
>>
>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>> 'lscpi -v' is:
>>
>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>
>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>
>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>
>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>
>> I/O ports at 5000 [size=64]
>>
>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>
>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>
>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>
>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>
>> Capabilities: [d0] Power Management version 2
>>
>> Kernel driver in use: i915
>>
>> Kernel modules: i915
>>
>>
>> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
>> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
>> controller])
>> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
>> Flags: bus master, fast devsel, latency 0, IRQ 141
>> Memory at c400 (32-bit, non-prefetchable) [size=16M]
>> Memory at b000 (64-bit, prefetchable) [size=256M]
>> Memory at c000 (64-bit, prefetchable) [size=32M]
>> I/O ports at 4000 [size=128]
>> Expansion ROM at c300 [disabled] [size=512K]
>> Capabilities: [60] Power Management version 3
>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> Capabilities: [78] Express Legacy Endpoint, MSI 00
>> Kernel driver in use: nouveau
>> Kernel modules: nouveau
>>
>> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
>> sysvinit).
>>
>> I've attached the bisect.log, but please let me know if I can provide any 
>> other diagnostics. Please cc me as I'm not
>> subscribed.
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e44c2170876197
> #regzbot title drm: nouveau: hangs on poweroff/reboot
> #regzbot ignore-activity
> 
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
> 
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
> 
> Ciao, Thorsten (wearing 

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-19 Thread Linux kernel regression tracking (#update)
[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> On 18.01.23 21:59, Chris Clayton wrote:
>>
>>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>> (VPR scrubber)
>
> #regzbot ^introduced e44c2170876197

/me wonders if he failed to spot or cut'n'paste the leading 0
/me wonders if he needs glasses
#sigh

Sorry for the noise!

#regzbot 0e44c21708761977dc

> #regzbot title drm: nouveau: hangs on poweroff/reboot
> #regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

#regzbot ignore-activity


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-19 Thread Linux kernel regression tracking (Thorsten Leemhuis)
[adding various lists and the two other nouveau maintainers to the list
of recipients]

For the rest of this mail:

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 18.01.23 21:59, Chris Clayton wrote:
> Hi.
> 
> I build and installed the lastest development kernel earlier this week. I've 
> found that when I try the laptop down (or
> reboot it), it hangs right at the end of closing the current session. The 
> last line I see on  the screen when rebooting is:
> 
>   sd 4:0:0:0: [sda] Synchronising SCSI cache
> 
> when closing down I see one additional line:
> 
>   sd 4:0:0:0 [sda]Stopping disk
> 
> In both cases the machine then hangs and I have to hold down the power button 
> fot a few seconds to switch it off.
> 
> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
> on:
> 
>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> (VPR scrubber)
> 
> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
> (the parent of the bad commit) checked out
> and that shuts down and reboots fine. It the did the same with the bad commit 
> checked out and that does indeed hang, so
> I'm confident the bisect outcome is OK.
> 
> Kernels 6.1.6 and 5.15.88 are also OK.
> 
> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> 'lscpi -v' is:
> 
> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> Graphics] (rev 05) (prog-if 00 [VGA controller])
> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> 
> Flags: bus master, fast devsel, latency 0, IRQ 142
> 
> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> 
> Memory at a000 (64-bit, prefetchable) [size=256M]
> 
> I/O ports at 5000 [size=64]
> 
> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> 
> Capabilities: [40] Vendor Specific Information: Len=0c 
> 
> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> 
> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 
> Capabilities: [d0] Power Management version 2
> 
> Kernel driver in use: i915
> 
> Kernel modules: i915
> 
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> controller])
> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> Memory at b000 (64-bit, prefetchable) [size=256M]
> Memory at c000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> Expansion ROM at c300 [disabled] [size=512K]
> Capabilities: [60] Power Management version 3
> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [78] Express Legacy Endpoint, MSI 00
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
> sysvinit).
> 
> I've attached the bisect.log, but please let me know if I can provide any 
> other diagnostics. Please cc me as I'm not
> subscribed.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e44c2170876197
#regzbot title drm: nouveau: hangs on poweroff/reboot
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.