Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-12-05 Thread Owen T. Heisler

Hi Thorsten and others,

On 12/5/23 06:33, Thorsten Leemhuis wrote:

On 29.11.23 01:37, Owen T. Heisler wrote:

On 11/21/23 14:23, Owen T. Heisler wrote:

On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:

On 15.11.23 07:19, Owen T. Heisler wrote:

On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:

On 28.10.23 04:46, Owen T. Heisler wrote:

#regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
#regzbot link:
https://gitlab.freedesktop.org/drm/nouveau/-/issues/180

3. Suddenly the secondary Nvidia-connected display turns off and X
stops responding to keyboard/mouse input.



I am currently testing v6.6 with the culprit commit reverted.


- v6.6: fails
- v6.6 with the culprit commit reverted: works

See  for full
details including a decoded kernel log.


Not sure about the others, but it's kind of confusing that you update
the issue descriptions all the time and never add a comment to that ticket.


Thank you for the feedback; I will use comments more for future updates 
there. I didn't know anyone was following that issue (I haven't received 
any reply from nouveau developers on the nouveau list [1] or on gitlab 
[2]) so I have tried to keep that issue description succinct and 
up-to-date for anyone reading it for the first time.


[1]: 


[2]: But Karol Herbst did add the "regression" label.


Anyway: Nouveau maintainers, could any of you at least comment on this?
Sure, it's the regression is caused by an old commit (6eaa1f3c59a707 was
merged for v5.14-rc7) and reverting it likely is not a option, but it
nevertheless it would be great if this could be solved somehow.


Also if anyone has any ideas about any stress-tests or anything else 
that I might be able to trigger the crash with, please share.


Thanks,
Owen

--
Owen T. Heisler



Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-12-05 Thread Thorsten Leemhuis
Karol, Lyude, and Daniel:

On 29.11.23 01:37, Owen T. Heisler wrote:
> On 11/21/23 14:23, Owen T. Heisler wrote:
>> On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 15.11.23 07:19, Owen T. Heisler wrote:
 On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 28.10.23 04:46, Owen T. Heisler wrote:
>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>> #regzbot link:
>> https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>
>> 3. Suddenly the secondary Nvidia-connected display turns off and X
>> stops responding to keyboard/mouse input.
> 
>> I am currently testing v6.6 with the culprit commit reverted.
> 
> - v6.6: fails
> - v6.6 with the culprit commit reverted: works
> 
> See  for full
> details including a decoded kernel log.

Not sure about the others, but it's kind of confusing that you update
the issue descriptions all the time and never add a comment to that ticket.

Anyway: Nouveau maintainers, could any of you at least comment on this?
Sure, it's the regression is caused by an old commit (6eaa1f3c59a707 was
merged for v5.14-rc7) and reverting it likely is not a option, but it
nevertheless it would be great if this could be solved somehow.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke




Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-28 Thread Owen T. Heisler

On 11/21/23 14:23, Owen T. Heisler wrote:

On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:

On 15.11.23 07:19, Owen T. Heisler wrote:

On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:

On 28.10.23 04:46, Owen T. Heisler wrote:

#regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
#regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180

3. Suddenly the secondary Nvidia-connected display turns off and X 
stops responding to keyboard/mouse input.



I am currently testing v6.6 with the culprit commit reverted.


- v6.6: fails
- v6.6 with the culprit commit reverted: works

See  for full 
details including a decoded kernel log.


Thanks,
Owen

--
Owen T. Heisler



Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-21 Thread Owen T. Heisler

On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:

On 15.11.23 07:19, Owen T. Heisler wrote:

On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:

On 28.10.23 04:46, Owen T. Heisler wrote:

#regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
#regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180

## Problem

1. Connect external display to DVI port on dock and run X with both
     displays in use.
2. Wait hours or days.
3. Suddenly the secondary Nvidia-connected display turns off and X stops
     responding to keyboard/mouse input. In *some* cases it is
possible to
     switch to a virtual TTY with Ctrl+Alt+Fn and log in there.



Here is a decoded kernel log from an
untainted kernel:

https://gitlab.freedesktop.org/drm/nouveau/uploads/c120faf09da46f9c74006df9f1d14442/async-wait-on-fence-180.log



Maybe one of the nouveau developer can take a quick look at
d386a4b54607cf and suggest a simple way to revert it in latest mainline.
Maybe just removing the main chunk of code that is added is all that it
takes.


I was able to resolve the revert conflict; it was indeed trivial though 
I did not realize it initially. I am currently testing v6.6 with the 
culprit commit reverted. I need to test for at least a full week (ending 
11-23) before I can assume it fixes the problem.


After that I can try the latest v6.7-rc as you suggested.

I have updated the bug description at
.

Thanks again,
Owen

--
Owen T. Heisler



Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 15.11.23 07:19, Owen T. Heisler wrote:
> On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 28.10.23 04:46, Owen T. Heisler wrote:
>>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>>> #regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>>
>>> ## Problem
>>>
>>> 1. Connect external display to DVI port on dock and run X with both
>>>     displays in use.
>>> 2. Wait hours or days.
>>> 3. Suddenly the secondary Nvidia-connected display turns off and X stops
>>>     responding to keyboard/mouse input. In *some* cases it is
>>> possible to
>>>     switch to a virtual TTY with Ctrl+Alt+Fn and log in there.
> 
>> You thus might want to check if the problem occurs with 6.6 -- and
>> ideally also check if reverting the culprit there fixes things for you.
> 
> The problem also occurs with v6.6.

You meanwhile might want to give 6.7-rc as well on the off chance that
it improves things, even if that is unlikely.

> Here is a decoded kernel log from an
> untainted kernel:
> 
> https://gitlab.freedesktop.org/drm/nouveau/uploads/c120faf09da46f9c74006df9f1d14442/async-wait-on-fence-180.log
> 
> The culprit commit does not revert cleanly on v6.6. I have not yet
> attempted to resolve the conflicts.
> 
> I have also updated the bug description at
> .

Maybe one of the nouveau developer can take a quick look at
d386a4b54607cf and suggest a simple way to revert it in latest mainline.
Maybe just removing the main chunk of code that is added is all that it
takes.

Ciao, Thorsten


Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-14 Thread Owen T. Heisler

On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:

On 28.10.23 04:46, Owen T. Heisler wrote:

#regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
#regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180

## Problem

1. Connect external display to DVI port on dock and run X with both
    displays in use.
2. Wait hours or days.
3. Suddenly the secondary Nvidia-connected display turns off and X stops
    responding to keyboard/mouse input. In *some* cases it is possible to
    switch to a virtual TTY with Ctrl+Alt+Fn and log in there.



You thus might want to check if the problem occurs with 6.6 -- and
ideally also check if reverting the culprit there fixes things for you.


Hi Thorsten and others,

The problem also occurs with v6.6. Here is a decoded kernel log from an 
untainted kernel:


https://gitlab.freedesktop.org/drm/nouveau/uploads/c120faf09da46f9c74006df9f1d14442/async-wait-on-fence-180.log

The culprit commit does not revert cleanly on v6.6. I have not yet 
attempted to resolve the conflicts.


I have also updated the bug description at
.

Thanks,
Owen