Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-07-15 Thread Michel Dänzer
On 2021-07-08 6:13 p.m., Michel Dänzer wrote:
> On 2021-06-29 12:36 p.m., Michel Dänzer wrote:
>> On 2021-06-28 7:16 p.m., Deucher, Alexander wrote:
>>>
>>> Thanks for narrowing this down.  There is new PCO SDMA firmware available 
>>> (attached).  Can you try it?
>>
>> Sure, I'll try it, thanks.
> 
> So far, so good; no hang in a week. I'll try the rest of the new firmware as 
> well now, will follow up if there's a hang again.

Unfortunately, I hit a hang again[0] with the new firmware. I'm now back to 
testing with the old SDMA firmware.

BTW, since the symptoms include GPU page faults, something might be going wrong 
with GPU page table updates via SDMA.


[0] Triggered by furiously resizing the gtk4-demo "OpenGL Transitions and 
Effects" window, which in a Wayland session I can otherwise recommend as a 
jaw-dropping experience. :)

-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-07-14 Thread Ketsui
On Tue, Jul 13, 2021 at 10:40:30AM -0400, Alex Deucher wrote:
> On Mon, Jul 12, 2021 at 3:18 AM Ketsui  wrote:
> >
> > > So far, so good; no hang in a week. I'll try the rest of the new firmware 
> > > as well now, will follow up if there's a hang again.
> >
> > I've noticed that the VM_L2_PROTECTION_FAULT_STATUS error doesn't always 
> > result in a hang, looking through my journal I can see
> > maybe a dozen of them spread out across multiple boots but my system only 
> > became non-functional like two times so far (I know
> > because I have the dmesg when the hangs happened saved, you can find them 
> > attached to this mail).
> >
> > To make myself clear, I haven't actually had a hang too with the new 
> > firmwares even though these messages still appear on my dmesg,
> > sorry if my feedback gave the wrong impression.
> 
> Were these issues mitigated with older firmware for you previously?

Yes, I'm currently staying with linux-firmware tagged on 15-03-2021[0], I just 
checked my journal and the last time
I got the error was 12 days ago when I was testing newer firmware.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tag/?h=20210315

> 
> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-07-13 Thread Alex Deucher
On Mon, Jul 12, 2021 at 3:18 AM Ketsui  wrote:
>
> > So far, so good; no hang in a week. I'll try the rest of the new firmware 
> > as well now, will follow up if there's a hang again.
>
> I've noticed that the VM_L2_PROTECTION_FAULT_STATUS error doesn't always 
> result in a hang, looking through my journal I can see
> maybe a dozen of them spread out across multiple boots but my system only 
> became non-functional like two times so far (I know
> because I have the dmesg when the hangs happened saved, you can find them 
> attached to this mail).
>
> To make myself clear, I haven't actually had a hang too with the new 
> firmwares even though these messages still appear on my dmesg,
> sorry if my feedback gave the wrong impression.

Were these issues mitigated with older firmware for you previously?

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-07-12 Thread Michel Dänzer
On 2021-07-11 9:48 a.m., Ketsui wrote:
>> So far, so good; no hang in a week. I'll try the rest of the new firmware as 
>> well now, will follow up if there's a hang again.
> 
> I've noticed that the VM_L2_PROTECTION_FAULT_STATUS error doesn't always 
> result in a hang, looking through my journal I can see
> maybe a dozen of them spread out across multiple boots but my system only 
> became non-functional like two times so far (I know
> because I have the dmesg when the hangs happened saved, you can find them 
> attached to this mail).
> 
> To make myself clear, I haven't actually had a hang too with the new 
> firmwares even though these messages still appear on my dmesg,
> sorry if my feedback gave the wrong impression.

I'm counting soft recovered hangs as hangs for the purpose of this issue. I.e. 
when I write "no hang" I mean no soft recovered ones either. If I hit a soft 
recovered hang, I consider that setup bad.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-07-08 Thread Michel Dänzer
On 2021-06-29 12:36 p.m., Michel Dänzer wrote:
> On 2021-06-28 7:16 p.m., Deucher, Alexander wrote:
>>
>> Thanks for narrowing this down.  There is new PCO SDMA firmware available 
>> (attached).  Can you try it?
> 
> Sure, I'll try it, thanks.

So far, so good; no hang in a week. I'll try the rest of the new firmware as 
well now, will follow up if there's a hang again.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-07-01 Thread Ketsui
I cp'd raven* and picasso* firmware files from upstream version 21.20
and replaced the ones from my distro and I just got another one of these
errors.

[Jul 1 17:08] amdgpu :08:00.0: amdgpu: [gfxhub0] retry page fault
(src_id:0 ring:0 vmid:1 pasid:32778, for process mpv pid 7400 thread
mpv:cs0 pid 7432)
[  +0.14] amdgpu :08:00.0: amdgpu:   in page starting at address
0x80010008d000 from client 27
[  +0.10] amdgpu :08:00.0: amdgpu:
VM_L2_PROTECTION_FAULT_STATUS:0x00140C51
[  +0.02] amdgpu :08:00.0: amdgpu:   Faulty UTCL2 client ID:
CPG (0x6)
[  +0.05] amdgpu :08:00.0: amdgpu:   MORE_FAULTS: 0x1
[  +0.01] amdgpu :08:00.0: amdgpu:   WALKER_ERROR: 0x0
[  +0.02] amdgpu :08:00.0: amdgpu:   PERMISSION_FAULTS: 0x5
[  +0.01] amdgpu :08:00.0: amdgpu:   MAPPING_ERROR: 0x0
[  +0.01] amdgpu :08:00.0: amdgpu:   RW: 0x1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-30 Thread Alex Deucher
On Wed, Jun 30, 2021 at 8:35 AM Ketsui  wrote:
>
> >I could be wrong. I can't remember what marketing names map to what
> >asics. I can tell if you can get your dmesg output.
> Here it is.

Thanks.  It is a picasso system.

Alex

> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-30 Thread Ketsui
>I could be wrong. I can't remember what marketing names map to what
>asics. I can tell if you can get your dmesg output.
Here it is.


dmesg
Description: Binary data
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-29 Thread Alex Deucher
On Wed, Jun 30, 2021 at 12:45 AM Ketsui  wrote:
>
> >I think the 3200G may be a raven or raven2 variant rather than
> picasso.
>
> Are you sure? Examining vbios_version yields this on my system:
>
> $ cat /sys/class/drm/card0/device/vbios_version
> 113-PICASSO-114
>

I could be wrong.  I can't remember what marketing names map to what
asics.  I can tell if you can get your dmesg output.

Alex

>
> >Can you try the latest firmware from upstream:
> >https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu
>
> Sure.
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-29 Thread Ketsui
>I think the 3200G may be a raven or raven2 variant rather than
picasso.

Are you sure? Examining vbios_version yields this on my system:

$ cat /sys/class/drm/card0/device/vbios_version
113-PICASSO-114


>Can you try the latest firmware from 
>upstream:>https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu
> 
>

Sure.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-29 Thread Alex Deucher
On Tue, Jun 29, 2021 at 3:57 PM Ketsui  wrote:
>
> I have the 3200G I'm still getting this error with that version.

I think the 3200G may be a raven or raven2 variant rather than
picasso.  Can you try the latest firmware from upstream:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu

Alex

>
> [ +23.754701] amdgpu :08:00.0: amdgpu: [gfxhub0] retry page fault 
> (src_id:0 ring:0 vmid:2 pasid:32773, for process mpv pid 5016 thread mpv:cs0 
> pid 5064)
> [  +0.17] amdgpu :08:00.0: amdgpu:   in page starting at address 
> 0x80010009e000 from client 27
> [  +0.07] amdgpu :08:00.0: amdgpu: 
> VM_L2_PROTECTION_FAULT_STATUS:0x00240C51
> [  +0.03] amdgpu :08:00.0: amdgpu:   Faulty UTCL2 client ID: CPG 
> (0x6)
> [  +0.03] amdgpu :08:00.0: amdgpu:   MORE_FAULTS: 0x1
> [  +0.02] amdgpu :08:00.0: amdgpu:   WALKER_ERROR: 0x0
> [  +0.02] amdgpu :08:00.0: amdgpu:   PERMISSION_FAULTS: 0x5
> [  +0.02] amdgpu :08:00.0: amdgpu:   MAPPING_ERROR: 0x0
> [  +0.01] amdgpu :08:00.0: amdgpu:   RW: 0x1
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-29 Thread Ketsui
I have the 3200G I'm still getting this error with that version.

[ +23.754701] amdgpu :08:00.0: amdgpu: [gfxhub0] retry page fault
(src_id:0 ring:0 vmid:2 pasid:32773, for process mpv pid 5016 thread
mpv:cs0 pid 5064)
[  +0.17] amdgpu :08:00.0: amdgpu:   in page starting at address
0x80010009e000 from client 27
[  +0.07] amdgpu :08:00.0: amdgpu:
VM_L2_PROTECTION_FAULT_STATUS:0x00240C51
[  +0.03] amdgpu :08:00.0: amdgpu:   Faulty UTCL2 client ID:
CPG (0x6)
[  +0.03] amdgpu :08:00.0: amdgpu:   MORE_FAULTS: 0x1
[  +0.02] amdgpu :08:00.0: amdgpu:   WALKER_ERROR: 0x0
[  +0.02] amdgpu :08:00.0: amdgpu:   PERMISSION_FAULTS: 0x5
[  +0.02] amdgpu :08:00.0: amdgpu:   MAPPING_ERROR: 0x0
[  +0.01] amdgpu :08:00.0: amdgpu:   RW: 0x1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-29 Thread Michel Dänzer
On 2021-06-28 7:16 p.m., Deucher, Alexander wrote:
> 
> Thanks for narrowing this down.  There is new PCO SDMA firmware available 
> (attached).  Can you try it?

Sure, I'll try it, thanks.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-28 Thread Deucher, Alexander
[Public]

Thanks for narrowing this down.  There is new PCO SDMA firmware available 
(attached).  Can you try it?

Thanks,

Alex

From: amd-gfx  on behalf of Michel 
Dänzer 
Sent: Thursday, June 24, 2021 6:51 AM
To: Alex Deucher 
Cc: xgqt ; amd-gfx list 
Subject: Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* 
Waiting for fences timed out!"

On 2021-06-04 3:08 p.m., Michel Dänzer wrote:
> On 2021-06-04 2:33 p.m., Alex Deucher wrote:
>> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer  wrote:
>>>
>>> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
>>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer  wrote:
>>>>>
>>>>> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
>>>>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  
>>>>>> wrote:
>>>>>>>
>>>>>>> On 2021-05-17 11:33 a.m., xgqt wrote:
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 
>>>>>>>> 3500U with Radeon Vega 8 Graphics.
>>>>>>>> Recently some breakages started happening for me. In about 1h after 
>>>>>>>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes 
>>>>>>>> it would be possible to move the mouse but the rest will be frozen. 
>>>>>>>> Screen may start blinking or go black.
>>>>>>>>
>>>>>>>> I'm not sure if this is my kernel, firmware or the hardware.
>>>>>>>> I don't understands dmesg that's why I'm guessing, but I think it is 
>>>>>>>> the firmware since this behavior started around 2021-05-15.
>>>>>>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
>>>>>>>> 18:16:06.
>>>>>>>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
>>>>>>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. 
>>>>>>>> I didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
>>>>>>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
>>>>>>>> After that I booted to 5.4.97 again and downgraded my FW.
>>>>>>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>>>>>>>>
>>>>>>>> I also described my situation on the Gentoo bugzilla: 
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F790566&data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843342891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5HKZUabvEZWI%2BzQUBBPWl3Cpiy7Zjs%2BqaKa4XZyNK1g%3D&reserved=0
>>>>>>>>
>>>>>>>> "dmesg.log" attached here is from the time machine run fine (at the 
>>>>>>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log 
>>>>>>>> from the time system broke
>>>>>>>>
>>>>>>>> Can I get any help with this? What are the next steps I should take? 
>>>>>>>> Any other files I should provide?
>>>>>>>
>>>>>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
>>>>>>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
>>>>>>> them to be firware related. The hangs occurred with firmware from the 
>>>>>>> AMD 20.50 release. I'm currently running with firmware from the 20.40 
>>>>>>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days 
>>>>>>> after boot).
>>>>>>
>>>>>> Can you narrow down which firmware(s) cause the problem?
>>>>>
>>>>> I'll try, but note I'm not really sure yet my hangs were related to 
>>>>> firmware (only). Anyway, I'll try narrowing it down.
>>>>
>>>> Thanks.  Does this patch help?
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F433701%2F&data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe488

Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-24 Thread Michel Dänzer
On 2021-06-04 3:08 p.m., Michel Dänzer wrote:
> On 2021-06-04 2:33 p.m., Alex Deucher wrote:
>> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer  wrote:
>>>
>>> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
 On Wed, May 19, 2021 at 4:48 AM Michel Dänzer  wrote:
>
> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  
>> wrote:
>>>
>>> On 2021-05-17 11:33 a.m., xgqt wrote:
 Hello!

 I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 
 3500U with Radeon Vega 8 Graphics.
 Recently some breakages started happening for me. In about 1h after 
 boot-up while using a KDE desktop machine GUI would freeze. Sometimes 
 it would be possible to move the mouse but the rest will be frozen. 
 Screen may start blinking or go black.

 I'm not sure if this is my kernel, firmware or the hardware.
 I don't understands dmesg that's why I'm guessing, but I think it is 
 the firmware since this behavior started around 2021-05-15.
 From my Portage logs I see that I updated my firmware on 2021-05-14 at 
 18:16:06.
 So breakages started with my kernel: 5.10.27 and FW: 20210511.
 After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. 
 I didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
 So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
 After that I booted to 5.4.97 again and downgraded my FW.
 While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.

 I also described my situation on the Gentoo bugzilla: 
 https://bugs.gentoo.org/790566

 "dmesg.log" attached here is from the time machine run fine (at the 
 moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log 
 from the time system broke

 Can I get any help with this? What are the next steps I should take? 
 Any other files I should provide?
>>>
>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
>>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
>>> them to be firware related. The hangs occurred with firmware from the 
>>> AMD 20.50 release. I'm currently running with firmware from the 20.40 
>>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days 
>>> after boot).
>>
>> Can you narrow down which firmware(s) cause the problem?
>
> I'll try, but note I'm not really sure yet my hangs were related to 
> firmware (only). Anyway, I'll try narrowing it down.

 Thanks.  Does this patch help?
 https://patchwork.freedesktop.org/patch/433701/
>>>
>>> Unfortunately not. After no hangs for two weeks with older firmware, I just 
>>> got a hang again within a day with newer firmware and a kernel with this 
>>> fix.
>>>
>>>
>>> I'll try and narrow down which firmware triggers it now. Does Picasso use 
>>> the picasso_*.bin ones only, or others as well?
>>
>> The picasso ones and raven_dmcu.bin.
> 
> Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso 
> ones which have changed:
> 
> picasso_asd.bin
> picasso_ce.bin
> picasso_me.bin
> picasso_mec2.bin
> picasso_mec.bin
> picasso_pfp.bin
> picasso_sdma.bin
> picasso_vcn.bin

Things are pointing to picasso_sdma.bin. I'm currently running with only that 
one reverted to linux-firmware 20210315, and haven't got any hangs for a week.

Note that I've previously gone for a week without a hang even with firmware 
which had hung before. So there's still a small chance that I'm just on another 
lucky run.

That said, Pierre-Eric has also homed in on raven_sdma.bin for similar hangs, 
and reverting to older firmware seems to have helped multiple people on bug 
reports.

So, I think it makes sense for you guys to start looking for what could be 
going wrong with the Picasso/Raven SDMA firmware from 20.50. One thing I 
noticed is that the SDMA firmware from 20.50 advertises the same feature 
version, but a *lower* firmware version than the one from 18.50. So it might be 
worth double-checking that there wasn't an accidental downgrade to some older 
version.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-04 Thread Michel Dänzer
On 2021-06-04 2:33 p.m., Alex Deucher wrote:
> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer  wrote:
>>
>> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer  wrote:

 On 2021-05-19 12:05 a.m., Alex Deucher wrote:
> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  wrote:
>>
>> On 2021-05-17 11:33 a.m., xgqt wrote:
>>> Hello!
>>>
>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U 
>>> with Radeon Vega 8 Graphics.
>>> Recently some breakages started happening for me. In about 1h after 
>>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes 
>>> it would be possible to move the mouse but the rest will be frozen. 
>>> Screen may start blinking or go black.
>>>
>>> I'm not sure if this is my kernel, firmware or the hardware.
>>> I don't understands dmesg that's why I'm guessing, but I think it is 
>>> the firmware since this behavior started around 2021-05-15.
>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
>>> 18:16:06.
>>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
>>> didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
>>> After that I booted to 5.4.97 again and downgraded my FW.
>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>>>
>>> I also described my situation on the Gentoo bugzilla: 
>>> https://bugs.gentoo.org/790566
>>>
>>> "dmesg.log" attached here is from the time machine run fine (at the 
>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log 
>>> from the time system broke
>>>
>>> Can I get any help with this? What are the next steps I should take? 
>>> Any other files I should provide?
>>
>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
>> them to be firware related. The hangs occurred with firmware from the 
>> AMD 20.50 release. I'm currently running with firmware from the 20.40 
>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days 
>> after boot).
>
> Can you narrow down which firmware(s) cause the problem?

 I'll try, but note I'm not really sure yet my hangs were related to 
 firmware (only). Anyway, I'll try narrowing it down.
>>>
>>> Thanks.  Does this patch help?
>>> https://patchwork.freedesktop.org/patch/433701/
>>
>> Unfortunately not. After no hangs for two weeks with older firmware, I just 
>> got a hang again within a day with newer firmware and a kernel with this fix.
>>
>>
>> I'll try and narrow down which firmware triggers it now. Does Picasso use 
>> the picasso_*.bin ones only, or others as well?
> 
> The picasso ones and raven_dmcu.bin.

Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso 
ones which have changed:

picasso_asd.bin
picasso_ce.bin
picasso_me.bin
picasso_mec2.bin
picasso_mec.bin
picasso_pfp.bin
picasso_sdma.bin
picasso_vcn.bin


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-04 Thread Alex Deucher
On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer  wrote:
>
> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
> > On Wed, May 19, 2021 at 4:48 AM Michel Dänzer  wrote:
> >>
> >> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
> >>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  wrote:
> 
>  On 2021-05-17 11:33 a.m., xgqt wrote:
> > Hello!
> >
> > I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U 
> > with Radeon Vega 8 Graphics.
> > Recently some breakages started happening for me. In about 1h after 
> > boot-up while using a KDE desktop machine GUI would freeze. Sometimes 
> > it would be possible to move the mouse but the rest will be frozen. 
> > Screen may start blinking or go black.
> >
> > I'm not sure if this is my kernel, firmware or the hardware.
> > I don't understands dmesg that's why I'm guessing, but I think it is 
> > the firmware since this behavior started around 2021-05-15.
> > From my Portage logs I see that I updated my firmware on 2021-05-14 at 
> > 18:16:06.
> > So breakages started with my kernel: 5.10.27 and FW: 20210511.
> > After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
> > didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
> > So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
> > After that I booted to 5.4.97 again and downgraded my FW.
> > While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
> >
> > I also described my situation on the Gentoo bugzilla: 
> > https://bugs.gentoo.org/790566
> >
> > "dmesg.log" attached here is from the time machine run fine (at the 
> > moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log 
> > from the time system broke
> >
> > Can I get any help with this? What are the next steps I should take? 
> > Any other files I should provide?
> 
>  I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
>  Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
>  them to be firware related. The hangs occurred with firmware from the 
>  AMD 20.50 release. I'm currently running with firmware from the 20.40 
>  release, no hang in almost 2 weeks (the hangs happened within 1-2 days 
>  after boot).
> >>>
> >>> Can you narrow down which firmware(s) cause the problem?
> >>
> >> I'll try, but note I'm not really sure yet my hangs were related to 
> >> firmware (only). Anyway, I'll try narrowing it down.
> >
> > Thanks.  Does this patch help?
> > https://patchwork.freedesktop.org/patch/433701/
>
> Unfortunately not. After no hangs for two weeks with older firmware, I just 
> got a hang again within a day with newer firmware and a kernel with this fix.
>
>
> I'll try and narrow down which firmware triggers it now. Does Picasso use the 
> picasso_*.bin ones only, or others as well?

The picasso ones and raven_dmcu.bin.

Thanks,

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-04 Thread Michel Dänzer
On 2021-05-19 3:57 p.m., Alex Deucher wrote:
> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer  wrote:
>>
>> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  wrote:

 On 2021-05-17 11:33 a.m., xgqt wrote:
> Hello!
>
> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U 
> with Radeon Vega 8 Graphics.
> Recently some breakages started happening for me. In about 1h after 
> boot-up while using a KDE desktop machine GUI would freeze. Sometimes it 
> would be possible to move the mouse but the rest will be frozen. Screen 
> may start blinking or go black.
>
> I'm not sure if this is my kernel, firmware or the hardware.
> I don't understands dmesg that's why I'm guessing, but I think it is the 
> firmware since this behavior started around 2021-05-15.
> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
> 18:16:06.
> So breakages started with my kernel: 5.10.27 and FW: 20210511.
> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
> didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
> After that I booted to 5.4.97 again and downgraded my FW.
> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>
> I also described my situation on the Gentoo bugzilla: 
> https://bugs.gentoo.org/790566
>
> "dmesg.log" attached here is from the time machine run fine (at the 
> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log from 
> the time system broke
>
> Can I get any help with this? What are the next steps I should take? Any 
> other files I should provide?

 I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
 Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
 them to be firware related. The hangs occurred with firmware from the AMD 
 20.50 release. I'm currently running with firmware from the 20.40 release, 
 no hang in almost 2 weeks (the hangs happened within 1-2 days after boot).
>>>
>>> Can you narrow down which firmware(s) cause the problem?
>>
>> I'll try, but note I'm not really sure yet my hangs were related to firmware 
>> (only). Anyway, I'll try narrowing it down.
> 
> Thanks.  Does this patch help?
> https://patchwork.freedesktop.org/patch/433701/

Unfortunately not. After no hangs for two weeks with older firmware, I just got 
a hang again within a day with newer firmware and a kernel with this fix.


I'll try and narrow down which firmware triggers it now. Does Picasso use the 
picasso_*.bin ones only, or others as well?


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-06-01 Thread Ketsui
Hi,

I'm using the Ryzen 3200G and its iGPU, on a kernel with that patch
applied[0] and the latest linux-firmware (20210511.7685cf4-1) I'm still
getting this hang.

[0] https://git.archlinux.org/linux.git/log/?h=v5.12.8-arch1


hang5
Description: Binary data
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-05-19 Thread Alex Deucher
On Wed, May 19, 2021 at 4:48 AM Michel Dänzer  wrote:
>
> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
> > On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  wrote:
> >>
> >> On 2021-05-17 11:33 a.m., xgqt wrote:
> >>> Hello!
> >>>
> >>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U 
> >>> with Radeon Vega 8 Graphics.
> >>> Recently some breakages started happening for me. In about 1h after 
> >>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes it 
> >>> would be possible to move the mouse but the rest will be frozen. Screen 
> >>> may start blinking or go black.
> >>>
> >>> I'm not sure if this is my kernel, firmware or the hardware.
> >>> I don't understands dmesg that's why I'm guessing, but I think it is the 
> >>> firmware since this behavior started around 2021-05-15.
> >>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
> >>> 18:16:06.
> >>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
> >>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
> >>> didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
> >>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
> >>> After that I booted to 5.4.97 again and downgraded my FW.
> >>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
> >>>
> >>> I also described my situation on the Gentoo bugzilla: 
> >>> https://bugs.gentoo.org/790566
> >>>
> >>> "dmesg.log" attached here is from the time machine run fine (at the 
> >>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log from 
> >>> the time system broke
> >>>
> >>> Can I get any help with this? What are the next steps I should take? Any 
> >>> other files I should provide?
> >>
> >> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
> >> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
> >> them to be firware related. The hangs occurred with firmware from the AMD 
> >> 20.50 release. I'm currently running with firmware from the 20.40 release, 
> >> no hang in almost 2 weeks (the hangs happened within 1-2 days after boot).
> >
> > Can you narrow down which firmware(s) cause the problem?
>
> I'll try, but note I'm not really sure yet my hangs were related to firmware 
> (only). Anyway, I'll try narrowing it down.

Thanks.  Does this patch help?
https://patchwork.freedesktop.org/patch/433701/

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-05-19 Thread Michel Dänzer
On 2021-05-19 12:05 a.m., Alex Deucher wrote:
> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  wrote:
>>
>> On 2021-05-17 11:33 a.m., xgqt wrote:
>>> Hello!
>>>
>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U 
>>> with Radeon Vega 8 Graphics.
>>> Recently some breakages started happening for me. In about 1h after boot-up 
>>> while using a KDE desktop machine GUI would freeze. Sometimes it would be 
>>> possible to move the mouse but the rest will be frozen. Screen may start 
>>> blinking or go black.
>>>
>>> I'm not sure if this is my kernel, firmware or the hardware.
>>> I don't understands dmesg that's why I'm guessing, but I think it is the 
>>> firmware since this behavior started around 2021-05-15.
>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
>>> 18:16:06.
>>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
>>> didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
>>> After that I booted to 5.4.97 again and downgraded my FW.
>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>>>
>>> I also described my situation on the Gentoo bugzilla: 
>>> https://bugs.gentoo.org/790566
>>>
>>> "dmesg.log" attached here is from the time machine run fine (at the 
>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log from 
>>> the time system broke
>>>
>>> Can I get any help with this? What are the next steps I should take? Any 
>>> other files I should provide?
>>
>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / Picasso 
>> / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting them to be 
>> firware related. The hangs occurred with firmware from the AMD 20.50 
>> release. I'm currently running with firmware from the 20.40 release, no hang 
>> in almost 2 weeks (the hangs happened within 1-2 days after boot).
> 
> Can you narrow down which firmware(s) cause the problem?

I'll try, but note I'm not really sure yet my hangs were related to firmware 
(only). Anyway, I'll try narrowing it down.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-05-18 Thread Alex Deucher
On Tue, May 18, 2021 at 10:11 AM Michel Dänzer  wrote:
>
> On 2021-05-17 11:33 a.m., xgqt wrote:
> > Hello!
> >
> > I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U 
> > with Radeon Vega 8 Graphics.
> > Recently some breakages started happening for me. In about 1h after boot-up 
> > while using a KDE desktop machine GUI would freeze. Sometimes it would be 
> > possible to move the mouse but the rest will be frozen. Screen may start 
> > blinking or go black.
> >
> > I'm not sure if this is my kernel, firmware or the hardware.
> > I don't understands dmesg that's why I'm guessing, but I think it is the 
> > firmware since this behavior started around 2021-05-15.
> > From my Portage logs I see that I updated my firmware on 2021-05-14 at 
> > 18:16:06.
> > So breakages started with my kernel: 5.10.27 and FW: 20210511.
> > After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
> > didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
> > So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
> > After that I booted to 5.4.97 again and downgraded my FW.
> > While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
> >
> > I also described my situation on the Gentoo bugzilla: 
> > https://bugs.gentoo.org/790566
> >
> > "dmesg.log" attached here is from the time machine run fine (at the 
> > moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log from 
> > the time system broke
> >
> > Can I get any help with this? What are the next steps I should take? Any 
> > other files I should provide?
>
> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / Picasso / 
> RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting them to be 
> firware related. The hangs occurred with firmware from the AMD 20.50 release. 
> I'm currently running with firmware from the 20.40 release, no hang in almost 
> 2 weeks (the hangs happened within 1-2 days after boot).

Can you narrow down which firmware(s) cause the problem?

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!"

2021-05-18 Thread Michel Dänzer
On 2021-05-17 11:33 a.m., xgqt wrote:
> Hello!
> 
> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 3500U with 
> Radeon Vega 8 Graphics.
> Recently some breakages started happening for me. In about 1h after boot-up 
> while using a KDE desktop machine GUI would freeze. Sometimes it would be 
> possible to move the mouse but the rest will be frozen. Screen may start 
> blinking or go black.
> 
> I'm not sure if this is my kernel, firmware or the hardware.
> I don't understands dmesg that's why I'm guessing, but I think it is the 
> firmware since this behavior started around 2021-05-15.
> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
> 18:16:06.
> So breakages started with my kernel: 5.10.27 and FW: 20210511.
> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. I 
> didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
> After that I booted to 5.4.97 again and downgraded my FW.
> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
> 
> I also described my situation on the Gentoo bugzilla: 
> https://bugs.gentoo.org/790566
> 
> "dmesg.log" attached here is from the time machine run fine (at the moment); 
> "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log from the time 
> system broke
> 
> Can I get any help with this? What are the next steps I should take? Any 
> other files I should provide?

I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / Picasso / 
RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting them to be firware 
related. The hangs occurred with firmware from the AMD 20.50 release. I'm 
currently running with firmware from the 20.40 release, no hang in almost 2 
weeks (the hangs happened within 1-2 days after boot).


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx