[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Just to correct a few of the targets on this issue. * The reverts mentioned in #30 need to be pulled into linux-firmware for focal. * They're already included in jammy. ** Changed in: amd Status: New => Fix Released ** No longer affects: mesa (Ubuntu) ** Also affects: linux-firmware (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: linux-firmware (Ubuntu Focal) Assignee: (unassigned) => Juerg Haefliger (juergh) ** Changed in: linux-firmware (Ubuntu) Status: Invalid => Fix Released -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: Fix Released Status in linux-firmware package in Ubuntu: Fix Released Status in linux-firmware source package in Focal: New Status in linux-firmware source package in Hirsute: Won't Fix Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Hirsute is EOL so closing this bug. Please open a new one if the problem still persists with one of the supported series. ** Changed in: linux-firmware (Ubuntu Hirsute) Status: Incomplete => Won't Fix -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Won't Fix Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Hello Juerg, Em quinta-feira, 20 de janeiro de 2022, às 12:32:48 -03, Juerg Haefliger escreveu: > If you want this fixed in Ubuntu I need to know what series are > affected. Hirsute goes EOL at the end of the month. Are Impish and/or > Jammy working or affected as well? I upgraded to Impish a while ago. I haven’t seen “retry page fault” messages in a long while (I don’t think it’s related to the distro upgrade, but not sure) so I’d say this particular bug is fixed at least for me (I have a Picasso GPU). Which is not to say that things are rosy, unfortunately. But the other issues I see don’t cause any message to appear in dmesg so it’s hard to search for existing bug reports about them or open a new one. The following is off-topic for this bug report, but I’ll mention anyway, hope you’ll bear with me: One thing I noticed is that things did get rosy when I did two things: 1. Switched from Xorg to Wayland. 2. Switched Firefox to use Wayland as well. This led me to the conclusion that the bugs that plague my machine are triggered by something that Firefox does when it uses X (both “natively” or via XWayland). For some reason, when it uses Wayland it doesn’t trigger these GPU bugs. Another thing that might be relevant is that I have tons of tabs open (probably more than 200) distributed in 27 open windows. Perhaps I’m stressing some kind of resource limit in the driver or firmware? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Incomplete Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
If you want this fixed in Ubuntu I need to know what series are affected. Hirsute goes EOL at the end of the month. Are Impish and/or Jammy working or affected as well? ** Changed in: linux-firmware (Ubuntu Hirsute) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Incomplete Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
No more crashes with firmware https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux- firmware.git/snapshot/linux-firmware-20211027.tar.gz and kernel 5.15.6. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
** Changed in: linux-firmware (Ubuntu) Assignee: Seth Forshee (sforshee) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
@antonio-petricca, What series? What kernel? I can produce a hirsute linux-firmware package with the reverted sdma firmware but need someone to verify it on hirsute with the hirsute kernel. Any takers? Or have you all moved on to impish? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
With latest firmare https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux- firmware.git/snapshot/linux-firmware-20211027.tar.gz is much more stable. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
** Also affects: mesa (Ubuntu Hirsute) Importance: Undecided Status: New ** Also affects: linux-firmware (Ubuntu Hirsute) Importance: Undecided Status: New ** No longer affects: mesa (Ubuntu Hirsute) ** Changed in: mesa (Ubuntu) Status: Confirmed => Invalid ** Changed in: linux-firmware (Ubuntu Hirsute) Status: New => Confirmed ** Changed in: linux-firmware (Ubuntu) Status: Confirmed => Invalid ** Changed in: linux-firmware (Ubuntu Hirsute) Assignee: (unassigned) => Juerg Haefliger (juergh) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
@antonio-petricca, sorry but 5.15.2 is not a supported Ubuntu kernel and especially not on Bionic with (old) Bionic firmware. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Confirmed Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
I have the same issue on: Dell E5495 AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx 16Gb RAM Linux Mint 19.3 (Ubuntu 18.04) Kernel 5.15.2 Linux Firmware 1.173.20 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Confirmed Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
The reverts are in the latest firmware tree: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=d7b50e61669dc137924337d03d09b8986eb752a3 https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=d843e520a4b0d92b986645548d11ade3b9b239a4 https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=99d72504bff7ab40c261b8509c0b9d8abf98b296 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Confirmed Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Hi. I'm picking up this ticket from Seth. Reading through the history it seems it's still an open issue? My understanding is that upstream 'fixed' this by reverting fw blobs in version 20210818. I can produce a linux-firmware test package for hirsute 20.04 with these reverts if necessary. Just let me know. ** Changed in: linux-firmware (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Confirmed Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Hello, I’d just like to report that I haven’t seen this problem in a while. The last time I see the “retry page fault” messages in my log was on August 9. I’ve been using the ‘amdgpu/picasso*‘ files from linux-firmware commit c46b8c364b82 (“ice: update package file to 1.3.26.0”) so apparently this particular problem was recently fixed. Which isn’t to say that I’m having a trouble-free amdgpu experience, unfortunately. Every week or so my laptop comes back from sleep with the screen and keyboard frozen (I can still ssh into it), but now the error is: kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out But it seems to be a separate problem from the one reported in this particular launchpad issue. I’ll see if I can find a more appropriate launchpad issue and post the details there. Thank you all for your help and support with this issue. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
It is the Blue-Tooth Driver. I Got This Too on my Acer Aspire F5-573 series "Laptop" There is a sticker that says "Intel i-5 core" Know amd is a possablity ? I Do not think My Processor Is an amd? Also It set my Screen Res to like 1377x768 (Its a 6k screen) 17" screen When I was On Windows "Yes I had to switch off Windows Because my Windows 11 os failed" It was installed When I installed viurtalbox or kodi. It is like a nightmare. I lost function of my usb 1.0 driver. It installs a pnp driver that do not exsist. The pnp Driver is a printer driver that is pointing to your desktop. I Think This Is a Virus That is effecting other systems like My Self. Thanks -- I-Cat -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
I have similar messages in journalctl: Package: linux-firmware Version: 1.197.3 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:1 pasid:32778, for process vivaldi-bin pid 1673 thread vivaldi-bi:cs0 pid 1699) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: in page starting at address 0x80010114 from client 0x12 (VMC) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0x2b) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: RW: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:1 pasid:32778, for process vivaldi-bin pid 1673 thread vivaldi-bi:cs0 pid 1699) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: in page starting at address 0x800101188000 from client 0x12 (VMC) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0x2b) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: RW: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:1 pasid:32778, for process vivaldi-bin pid 1673 thread vivaldi-bi:cs0 pid 1699) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: in page starting at address 0x800101189000 from client 0x12 (VMC) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0x2b) Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: RW: 0x0 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
It happened to me too ** Attachment added: "Crash log of amdgpu driver" https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1928393/+attachment/5516220/+files/amdgu_crash.txt -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Hello, For some reason, in the past week or so this bug has been freezing my machine every couple of days or so (I’m surprised that AMD wasn’t able to reproduce the problem yet¹). You can imagine how “pleasant” it makes using this computer. Today I got an interesting error in dmesg, perhaps it provides some clue: [38454.299445] [ cut here ] [38454.299449] refcount_t: underflow; use-after-free. [38454.299457] WARNING: CPU: 5 PID: 17577 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0 [38454.299465] Modules linked in: overlay ccm rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nft_counter nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct bridge stp llc nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine intel_rapl_msr intel_rapl_common joydev snd_pcm edac_mce_amd snd_seq_midi ath10k_pci ath10k_core snd_seq_midi_event kvm_amd snd_rawmidi ath mac80211 kvm uvcvideo snd_seq btusb videobuf2_vmalloc rapl videobuf2_memops videobuf2_v4l2 videobuf2_common btrtl input_leds [38454.299510] serio_raw btbcm videodev btintel wmi_bmof snd_seq_device efi_pstore bluetooth snd_timer mc cfg80211 k10temp ecdh_generic snd ecc ideapad_laptop ccp libarc4 sparse_keymap soundcore elan_i2c mac_hid sch_fq_codel msr parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt zstd zram z3fold amdgpu crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iommu_v2 gpu_sched aesni_intel i2c_algo_bit drm_ttm_helper ttm crypto_simd cryptd glue_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm i2c_piix4 nvme xhci_pci i2c_hid xhci_pci_renesas nvme_core wmi video hid [38454.299550] CPU: 5 PID: 17577 Comm: kworker/u32:18 Not tainted 5.11.0-25-generic #27-Ubuntu [38454.299552] Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN23WW 11/05/2019 [38454.299554] Workqueue: events_unbound async_run_entry_fn [38454.299559] RIP: 0010:refcount_warn_saturate+0xae/0xf0 [38454.299562] Code: f8 1c 96 01 01 e8 9f f1 62 00 0f 0b 5d c3 80 3d e5 1c 96 01 00 75 91 48 c7 c7 e8 c7 60 b9 c6 05 d5 1c 96 01 01 e8 7f f1 62 00 <0f> 0b 5d c3 80 3d c3 1c 96 01 00 0f 85 6d ff ff ff 48 c7 c7 40 c8 [38454.299564] RSP: 0018:b60383537b58 EFLAGS: 00010282 [38454.299566] RAX: RBX: RCX: 8d4578b58ac8 [38454.299567] RDX: ffd8 RSI: 0027 RDI: 8d4578b58ac0 [38454.299568] RBP: b60383537b58 R08: b9c73540 R09: b60383537af0 [38454.299569] R10: 2d2d2d2d R11: b603835379e8 R12: 8d44cf64d000 [38454.299570] R13: R14: b6038b8cd000 R15: 0004 [38454.299571] FS: () GS:8d4578b4() knlGS: [38454.299572] CS: 0010 DS: ES: CR0: 80050033 [38454.299574] CR2: CR3: 00016ae1 CR4: 003506e0 [38454.299575] Call Trace: [38454.299578] dc_stream_release+0x78/0x80 [amdgpu] [38454.299751] dc_resource_state_destruct+0x58/0x80 [amdgpu] [38454.299904] dc_release_state+0x2f/0x60 [amdgpu] [38454.300055] dm_atomic_destroy_state+0x21/0x30 [amdgpu] [38454.300211] drm_atomic_state_default_clear+0x23d/0x2f0 [drm] [38454.300236] __drm_atomic_state_free+0x5e/0xa0 [drm] [38454.300257] drm_atomic_helper_resume+0x12b/0x150 [drm_kms_helper] [38454.300271] dm_resume+0x2bd/0x540 [amdgpu] [38454.300427] amdgpu_device_ip_resume_phase2+0x58/0xc0 [amdgpu] [38454.300531] amdgpu_device_resume+0x8d/0x370 [amdgpu] [38454.300635] ? native_queued_spin_lock_slowpath+0x2b/0x30 [38454.300638] ? _raw_spin_lock_irq+0x26/0x2a [38454.300642] ? __wait_for_common+0xfb/0x150 [38454.300644] amdgpu_pmops_resume+0x17/0x20 [amdgpu] [38454.300748] pci_pm_resume+0x6b/0xf0 [38454.300751] ? pci_pm_poweroff_noirq+0x120/0x120 [38454.300752] dpm_run_callback+0x50/0x110 [38454.300755] device_resume+0xad/0x200 [38454.300757] async_resume+0x1e/0x40 [38454.300759] async_run_entry_fn+0x3c/0x150 [38454.300761] process_one_work+0x220/0x3c0 [38454.300764] worker_thread+0x50/0x370 [38454.300765] kthread+0x12f/0x150 [38454.300767] ? process_one_work+0x3c0/0x3c0 [38454.300768] ? __kthread_bind_mask+0x70/0x70 [38454.300770] ret_from_fork+0x22/0x30 [38454.300775] ---[ end trace 1f54ad57671def2f ]--- Note that immediately before it there’s a page allocation failure during wake up from suspend. So there’s some refcounting bug in an error path somewhere. Much later there’s the familiar “
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Em segunda-feira, 12 de julho de 2021, às 15:12:19 -03, Alex Deucher escreveu: > Does the latest firmware in the firmware git tree help? > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.g > it/log/amdgpu I updated the picasso* files from commit: d79c26779d45 amdgpu: update vcn firmware for green sardine for 21.20 And I still see the issue. It took a while to reproduce: I updated the firmware (and ran `update-initramfs -u -k all` to get it into the initramfs) on July 12 and had the laptop turned on since then (closing the lid to put it to sleep), and today I saw the problem again. The full dmesg is attached. ** Attachment added: "dmesg.log" https://bugs.launchpad.net/bugs/1928393/+attachment/5511525/+files/dmesg.log -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
After updating to linux-firmware commit d79c26779d45906 the problems persist on Lenovo Thinkpad E585: amdgpu :05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1336 thread Xorg:cs0 pid 1862) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
@alexander-deucher CPU model: AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx Kernel: 5.10.49 Firmware: VCE feature version: 0, firmware version: 0x UVD feature version: 0, firmware version: 0x MC feature version: 0, firmware version: 0x ME feature version: 52, firmware version: 0x00a4 PFP feature version: 52, firmware version: 0x00bc CE feature version: 52, firmware version: 0x004f RLC feature version: 1, firmware version: 0x0213 RLC SRLC feature version: 1, firmware version: 0x0001 RLC SRLG feature version: 1, firmware version: 0x0001 RLC SRLS feature version: 1, firmware version: 0x0001 MEC feature version: 52, firmware version: 0x01c2 MEC2 feature version: 52, firmware version: 0x01c2 SOS feature version: 0, firmware version: 0x ASD feature version: 0, firmware version: 0x2155 TA RAS feature version: 0x, firmware version: 0x212b TA XGMI feature version: 0x, firmware version: 0x212b TA HDCP feature version: 0x1711, firmware version: 0x212b TA DTM feature version: 0x1203, firmware version: 0x212b SMC feature version: 0, firmware version: 0x1e49 SDMA0 feature version: 41, firmware version: 0x0028 VCN feature version: 0, firmware version: 0x0210c005 DMCU feature version: 0, firmware version: 0x DMCUB feature version: 0, firmware version: 0x VBIOS version: 113-RAVEN-107 Also affected -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
There is an upstream bug report https://bugzilla.kernel.org/show_bug.cgi?id=213391 Comment 9 suggest: "downgrade the firmware." Comment 15 claims: "20210315 seems to work fine here (on an E595)." ** Bug watch added: Linux Kernel Bug Tracker #213391 https://bugzilla.kernel.org/show_bug.cgi?id=213391 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Does the latest firmware in the firmware git tree help? https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Also affected: Ubuntu version: 21.04 Linux kernel: 5.11.0-22-generic x86_64 CPU model: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx GPU: 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4) Laptop model: Lenovo Thinkpad E585 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Em quinta-feira, 17 de junho de 2021, às 00:45:30 -03, Thiago Jung Bauermann escreveu: > > > I think it may be related to a change in mesa. Specifically mesa > > > commit > > > 820dec3f7c7. For more info see > > > https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866 > > > > I’ll run with Mario’s build of Mesa with that patch backported. > > Thanks, Mario! > > I’m running with the Mesa build from Mario’s PPA now. If I don’t see any > issue within two weeks, I think it will be possible to say that the bug > is gone, or at least much harder to hit. > > I can’t use my reproducer in this case, because I can’t change the Mesa > version inside the flatpak image. I just had this bug happen again spontaneously, while running with Mesa from Mario’s PPA. So this bug isn’t fixed by the patch mentioned by Alex. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: mesa (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: Confirmed Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
I was finally able to spend a bit of time on this. Unfortunately, there’s not much to report back. Em terça-feira, 8 de junho de 2021, às 15:13:36 -03, Thiago Jung Bauermann escreveu: > Em terça-feira, 8 de junho de 2021, às 10:30:24 -03, Alex Deucher escreveu: > > Can you narrow down which specific firmware file causes the problem? > > Ok, I will try. I don’t think I can narrow down which firmware file causes the problem, because I don’t have a last known good version. All firmware files that I tested (Ubuntu versions 1.190.5, 1.197 and latest linux-firmware.git) immediately trigger the bug when I try the only reliable reproducer I know (i.e., running flatpak’s com.github.quaternion package). Since it can take several days for the bug to happen if I just use the machine normally, it would take weeks to narrow down which of the picasso_* files is more stable relative to the others. And even then, I wouldn’t be sure about it. > > I think it may be related to a change in mesa. Specifically mesa > > commit > > 820dec3f7c7. For more info see > > https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866 > > I’ll run with Mario’s build of Mesa with that patch backported. > Thanks, Mario! I’m running with the Mesa build from Mario’s PPA now. If I don’t see any issue within two weeks, I think it will be possible to say that the bug is gone, or at least much harder to hit. I can’t use my reproducer in this case, because I can’t change the Mesa version inside the flatpak image. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: New Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
21.04 comes with Mesa 21.0.1 which does not seem to have 820dec3f7c7 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: New Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #1598 https://gitlab.freedesktop.org/drm/amd/-/issues/1598 ** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #920 https://gitlab.freedesktop.org/drm/amd/-/issues/920 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: New Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Thanks for your input. Em terça-feira, 8 de junho de 2021, às 10:30:24 -03, Alex Deucher escreveu: > Can you narrow down which specific firmware file causes the problem? Ok, I will try. Also, is it possible and/or worthwhile trying to bisect firmware versions from the linux-firmware repo? How coupled is the firmware with the kernel driver? E.g., can I try using firmware files from 1 year ago with current kernel and Mesa? > We haven't been able to repro this. One thing that’s a bit “fishy” about my machine is that it doesn’t seem to have a good clock: [0.211436] TSC synchronization [CPU#0 -> CPU#1]: [0.211436] Measured 3304683447 cycles TSC warp between CPUs, turning off TSC clock. [0.211436] tsc: Marking TSC unstable due to check_tsc_sync_source failed … [0.252117] hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 [0.252117] hpet0: 3 comparators, 32-bit 14.318180 MHz counter [0.253970] clocksource: Switched to clocksource hpet … [0.580451] Unstable clock detected, switching default tracing clock to "global" If you want to keep using the local clock, then add: "trace_clock=local" on the kernel command line Could this bug be related to that? > I think it may be related to a change in mesa. Specifically mesa commit > 820dec3f7c7. For more info see > https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866 I’ll run with Mario’s build of Mesa with that patch backported. Thanks, Mario! > ** Bug watch added: gitlab.freedesktop.org/mesa/mesa/-/issues #4866 >https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866 Other upstream issues that look similar: https://gitlab.freedesktop.org/drm/amd/-/issues/1598 https://gitlab.freedesktop.org/drm/amd/-/issues/920 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: New Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"
Here's a PPA build with the mesa fix Alex mentioned backported: https://launchpad.net/~superm1/+archive/ubuntu/lp1928393 If you can follow the directions to add that PPA and upgrade to that mesa package you can see if that indeed fixes it. ** Also affects: mesa (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mesa in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Incomplete Status in mesa package in Ubuntu: New Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu :03:00.0: amdgpu: in page starting at address 0x80401000 from client 27 [20061.061135] amdgpu :03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu :03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu :03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu :03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu :03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu :03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu :03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp