[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2022-04-28 Thread Mario Limonciello
Just to correct a few of the targets on this issue.  
* The reverts mentioned in #30 need to be pulled into linux-firmware for focal. 
 
* They're already included in jammy.

** Changed in: amd
   Status: New => Fix Released

** No longer affects: mesa (Ubuntu)

** Also affects: linux-firmware (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Changed in: linux-firmware (Ubuntu Focal)
 Assignee: (unassigned) => Juerg Haefliger (juergh)

** Changed in: linux-firmware (Ubuntu)
   Status: Invalid => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  Fix Released
Status in linux-firmware package in Ubuntu:
  Fix Released
Status in linux-firmware source package in Focal:
  New
Status in linux-firmware source package in Hirsute:
  Won't Fix

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2022-02-15 Thread Juerg Haefliger
Hirsute is EOL so closing this bug. Please open a new one if the problem
still persists with one of the supported series.

** Changed in: linux-firmware (Ubuntu Hirsute)
   Status: Incomplete => Won't Fix

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Won't Fix

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2022-01-29 Thread Thiago Jung Bauermann
Hello Juerg,

Em quinta-feira, 20 de janeiro de 2022, às 12:32:48 -03, Juerg Haefliger 
escreveu:
> If you want this fixed in Ubuntu I need to know what series are
> affected. Hirsute goes EOL at the end of the month. Are Impish and/or
> Jammy working or affected as well?

I upgraded to Impish a while ago.

I haven’t seen “retry page fault” messages in a long while (I don’t think 
it’s related to the distro upgrade, but not sure) so I’d say this 
particular bug is fixed at least for me (I have a Picasso GPU).

Which is not to say that things are rosy, unfortunately. But the other 
issues I see don’t cause any message to appear in dmesg so it’s hard to 
search for existing bug reports about them or open a new one.

The following is off-topic for this bug report, but I’ll mention anyway, 
hope you’ll bear with me:

One thing I noticed is that things did get rosy when I did two things:

1. Switched from Xorg to Wayland.
2. Switched Firefox to use Wayland as well.

This led me to the conclusion that the bugs that plague my machine are 
triggered by something that Firefox does when it uses X (both “natively” or 
via XWayland). For some reason, when it uses Wayland it doesn’t trigger 
these GPU bugs.

Another thing that might be relevant is that I have tons of tabs open 
(probably more than 200) distributed in 27 open windows. Perhaps I’m 
stressing some kind of resource limit in the driver or firmware?

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Incomplete

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2022-01-20 Thread Juerg Haefliger
If you want this fixed in Ubuntu I need to know what series are
affected. Hirsute goes EOL at the end of the month. Are Impish and/or
Jammy working or affected as well?

** Changed in: linux-firmware (Ubuntu Hirsute)
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Incomplete

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-12-07 Thread Lancillotto
No more crashes with firmware
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-
firmware.git/snapshot/linux-firmware-20211027.tar.gz and kernel 5.15.6.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-11-29 Thread Seth Forshee
** Changed in: linux-firmware (Ubuntu)
 Assignee: Seth Forshee (sforshee) => (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-11-29 Thread Juerg Haefliger
@antonio-petricca, What series? What kernel?

I can produce a hirsute linux-firmware package with the reverted sdma
firmware but need someone to verify it on hirsute with the hirsute
kernel. Any takers? Or have you all moved on to impish?

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-11-29 Thread Lancillotto
With latest firmare
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-
firmware.git/snapshot/linux-firmware-20211027.tar.gz is much more
stable.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-11-29 Thread Juerg Haefliger
** Also affects: mesa (Ubuntu Hirsute)
   Importance: Undecided
   Status: New

** Also affects: linux-firmware (Ubuntu Hirsute)
   Importance: Undecided
   Status: New

** No longer affects: mesa (Ubuntu Hirsute)

** Changed in: mesa (Ubuntu)
   Status: Confirmed => Invalid

** Changed in: linux-firmware (Ubuntu Hirsute)
   Status: New => Confirmed

** Changed in: linux-firmware (Ubuntu)
   Status: Confirmed => Invalid

** Changed in: linux-firmware (Ubuntu Hirsute)
 Assignee: (unassigned) => Juerg Haefliger (juergh)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Invalid
Status in mesa package in Ubuntu:
  Invalid
Status in linux-firmware source package in Hirsute:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-11-29 Thread Juerg Haefliger
@antonio-petricca, sorry but 5.15.2 is not a supported Ubuntu kernel and
especially not on Bionic with (old) Bionic firmware.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Confirmed
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-11-18 Thread Lancillotto
I have the same issue on:

Dell E5495
AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx
16Gb RAM
Linux Mint 19.3 (Ubuntu 18.04)
Kernel 5.15.2
Linux Firmware 1.173.20

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Confirmed
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-10-27 Thread Alex Deucher
The reverts are in the latest firmware tree:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=d7b50e61669dc137924337d03d09b8986eb752a3
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=d843e520a4b0d92b986645548d11ade3b9b239a4
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=99d72504bff7ab40c261b8509c0b9d8abf98b296

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Confirmed
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-10-27 Thread Juerg Haefliger
Hi. I'm picking up this ticket from Seth. Reading through the history it
seems it's still an open issue? My understanding is that upstream
'fixed' this by reverting fw blobs in version 20210818. I can produce a
linux-firmware test package for hirsute 20.04 with these reverts if
necessary. Just let me know.


** Changed in: linux-firmware (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Confirmed
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-10-06 Thread Thiago Jung Bauermann
Hello,

I’d just like to report that I haven’t seen this problem in a while. The 
last time I see the “retry page fault” messages in my log was on August 9.

I’ve been using the ‘amdgpu/picasso*‘ files from linux-firmware commit 
c46b8c364b82 (“ice: update package file to 1.3.26.0”) so apparently this 
particular problem was recently fixed.

Which isn’t to say that I’m having a trouble-free amdgpu experience, 
unfortunately. Every week or so my laptop comes back from sleep with the 
screen and keyboard frozen (I can still ssh into it), but now the error is:

kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* 
[CRTC:67:crtc-0] flip_done timed out

But it seems to be a separate problem from the one reported in this 
particular launchpad issue. I’ll see if I can find a more appropriate 
launchpad issue and post the details there.

Thank you all for your help and support with this issue.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-08-29 Thread I-Cat
It is the Blue-Tooth Driver.
I Got This Too on my 
Acer Aspire F5-573 series "Laptop"
There is a sticker that says "Intel i-5 core" Know amd is a possablity ?
I Do not think My Processor Is an amd?
Also It set my Screen Res to like 1377x768 (Its a 6k screen) 17" screen
When I was On Windows "Yes I had to switch off Windows Because my Windows 11 os 
failed"
It was installed When I installed viurtalbox or kodi.
It is like a nightmare.
I lost function of my usb 1.0 driver.
It installs a pnp driver that do not exsist.
The pnp Driver is a printer driver that is pointing to your desktop. 

I Think This Is a Virus That is effecting other systems like My Self.
Thanks -- I-Cat

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-08-29 Thread Michal Przybylowicz
I have similar messages in journalctl:

Package: linux-firmware
Version: 1.197.3

Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: [mmhub] page fault 
(src_id:0 ring:0 vmid:1 pasid:32778, for process vivaldi-bin pid 1673 thread 
vivaldi-bi:cs0 pid 1699)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:   in page starting 
at address 0x80010114 from client 0x12 (VMC)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: 
MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  Faulty 
UTCL2 client ID: VCN0 (0x2b)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
MORE_FAULTS: 0x1
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
WALKER_ERROR: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
PERMISSION_FAULTS: 0x3
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
MAPPING_ERROR: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  RW: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: [mmhub] page fault 
(src_id:0 ring:0 vmid:1 pasid:32778, for process vivaldi-bin pid 1673 thread 
vivaldi-bi:cs0 pid 1699)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:   in page starting 
at address 0x800101188000 from client 0x12 (VMC)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: 
MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  Faulty 
UTCL2 client ID: VCN0 (0x2b)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
MORE_FAULTS: 0x1
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
WALKER_ERROR: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
PERMISSION_FAULTS: 0x3
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
MAPPING_ERROR: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  RW: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: [mmhub] page fault 
(src_id:0 ring:0 vmid:1 pasid:32778, for process vivaldi-bin pid 1673 thread 
vivaldi-bi:cs0 pid 1699)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:   in page starting 
at address 0x800101189000 from client 0x12 (VMC)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu: 
MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  Faulty 
UTCL2 client ID: VCN0 (0x2b)
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
MORE_FAULTS: 0x1
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
WALKER_ERROR: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
PERMISSION_FAULTS: 0x3
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  
MAPPING_ERROR: 0x0
Aug 29 16:58:44 dagon kernel: amdgpu :03:00.0: amdgpu:  RW: 0x0

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that 

[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-08-05 Thread Leandro Scott
It happened to me too

** Attachment added: "Crash log of amdgpu driver"
   
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1928393/+attachment/5516220/+files/amdgu_crash.txt

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-08-04 Thread Thiago Jung Bauermann
Hello,

For some reason, in the past week or so this bug has been freezing my 
machine every couple of days or so (I’m surprised that AMD wasn’t able 
to reproduce the problem yet¹). You can imagine how “pleasant” it makes 
using this computer.

Today I got an interesting error in dmesg, perhaps it provides some
clue:

[38454.299445] [ cut here ]
[38454.299449] refcount_t: underflow; use-after-free.
[38454.299457] WARNING: CPU: 5 PID: 17577 at lib/refcount.c:28 
refcount_warn_saturate+0xae/0xf0
[38454.299465] Modules linked in: overlay ccm rfcomm xt_CHECKSUM xt_MASQUERADE 
xt_conntrack ipt_REJECT xt_tcpudp nft_compat nft_counter nft_objref 
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject 
nft_ct bridge stp llc nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 ip_set nf_tables nfnetlink cmac algif_hash algif_skcipher af_alg 
bnep binfmt_misc nls_iso8859_1 snd_hda_codec_generic ledtrig_audio 
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel 
soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core 
snd_hwdep soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine 
intel_rapl_msr intel_rapl_common joydev snd_pcm edac_mce_amd snd_seq_midi 
ath10k_pci ath10k_core snd_seq_midi_event kvm_amd snd_rawmidi ath mac80211 kvm 
uvcvideo snd_seq btusb videobuf2_vmalloc rapl videobuf2_memops videobuf2_v4l2 
videobuf2_common btrtl input_leds
[38454.299510]  serio_raw btbcm videodev btintel wmi_bmof snd_seq_device 
efi_pstore bluetooth snd_timer mc cfg80211 k10temp ecdh_generic snd ecc 
ideapad_laptop ccp libarc4 sparse_keymap soundcore elan_i2c mac_hid 
sch_fq_codel msr parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs 
blake2b_generic xor raid6_pq libcrc32c dm_crypt zstd zram z3fold amdgpu 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iommu_v2 gpu_sched 
aesni_intel i2c_algo_bit drm_ttm_helper ttm crypto_simd cryptd glue_helper 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm 
i2c_piix4 nvme xhci_pci i2c_hid xhci_pci_renesas nvme_core wmi video hid
[38454.299550] CPU: 5 PID: 17577 Comm: kworker/u32:18 Not tainted 
5.11.0-25-generic #27-Ubuntu
[38454.299552] Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN23WW 11/05/2019
[38454.299554] Workqueue: events_unbound async_run_entry_fn
[38454.299559] RIP: 0010:refcount_warn_saturate+0xae/0xf0
[38454.299562] Code: f8 1c 96 01 01 e8 9f f1 62 00 0f 0b 5d c3 80 3d e5 1c 96 
01 00 75 91 48 c7 c7 e8 c7 60 b9 c6 05 d5 1c 96 01 01 e8 7f f1 62 00 <0f> 0b 5d 
c3 80 3d c3 1c 96 01 00 0f 85 6d ff ff ff 48 c7 c7 40 c8
[38454.299564] RSP: 0018:b60383537b58 EFLAGS: 00010282
[38454.299566] RAX:  RBX:  RCX: 8d4578b58ac8
[38454.299567] RDX: ffd8 RSI: 0027 RDI: 8d4578b58ac0
[38454.299568] RBP: b60383537b58 R08: b9c73540 R09: b60383537af0
[38454.299569] R10: 2d2d2d2d R11: b603835379e8 R12: 8d44cf64d000
[38454.299570] R13:  R14: b6038b8cd000 R15: 0004
[38454.299571] FS:  () GS:8d4578b4() 
knlGS:
[38454.299572] CS:  0010 DS:  ES:  CR0: 80050033
[38454.299574] CR2:  CR3: 00016ae1 CR4: 003506e0
[38454.299575] Call Trace:
[38454.299578]  dc_stream_release+0x78/0x80 [amdgpu]
[38454.299751]  dc_resource_state_destruct+0x58/0x80 [amdgpu]
[38454.299904]  dc_release_state+0x2f/0x60 [amdgpu]
[38454.300055]  dm_atomic_destroy_state+0x21/0x30 [amdgpu]
[38454.300211]  drm_atomic_state_default_clear+0x23d/0x2f0 [drm]
[38454.300236]  __drm_atomic_state_free+0x5e/0xa0 [drm]
[38454.300257]  drm_atomic_helper_resume+0x12b/0x150 [drm_kms_helper]
[38454.300271]  dm_resume+0x2bd/0x540 [amdgpu]
[38454.300427]  amdgpu_device_ip_resume_phase2+0x58/0xc0 [amdgpu]
[38454.300531]  amdgpu_device_resume+0x8d/0x370 [amdgpu]
[38454.300635]  ? native_queued_spin_lock_slowpath+0x2b/0x30
[38454.300638]  ? _raw_spin_lock_irq+0x26/0x2a
[38454.300642]  ? __wait_for_common+0xfb/0x150
[38454.300644]  amdgpu_pmops_resume+0x17/0x20 [amdgpu]
[38454.300748]  pci_pm_resume+0x6b/0xf0
[38454.300751]  ? pci_pm_poweroff_noirq+0x120/0x120
[38454.300752]  dpm_run_callback+0x50/0x110
[38454.300755]  device_resume+0xad/0x200
[38454.300757]  async_resume+0x1e/0x40
[38454.300759]  async_run_entry_fn+0x3c/0x150
[38454.300761]  process_one_work+0x220/0x3c0
[38454.300764]  worker_thread+0x50/0x370
[38454.300765]  kthread+0x12f/0x150
[38454.300767]  ? process_one_work+0x3c0/0x3c0
[38454.300768]  ? __kthread_bind_mask+0x70/0x70
[38454.300770]  ret_from_fork+0x22/0x30
[38454.300775] ---[ end trace 1f54ad57671def2f ]---

Note that immediately before it there’s a page allocation failure during
wake up from suspend. So there’s some refcounting bug in an error path
somewhere.

Much later there’s the familiar 

Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-07-16 Thread Thiago Jung Bauermann
Em segunda-feira, 12 de julho de 2021, às 15:12:19 -03, Alex Deucher 
escreveu:
> Does the latest firmware in the firmware git tree help?
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.g
> it/log/amdgpu

I updated the picasso* files from commit:

d79c26779d45 amdgpu: update vcn firmware for green sardine for 21.20

And I still see the issue. It took a while to reproduce: I updated the 
firmware (and ran `update-initramfs -u -k all` to get it into the 
initramfs) on July 12 and had the laptop turned on since then (closing the 
lid to put it to sleep), and today I saw the problem again.

The full dmesg is attached.

** Attachment added: "dmesg.log"
   https://bugs.launchpad.net/bugs/1928393/+attachment/5511525/+files/dmesg.log

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-07-15 Thread Heinrich Schuchardt
After updating to linux-firmware commit d79c26779d45906 the problems
persist on Lenovo Thinkpad E585:

amdgpu :05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0
vmid:1 pasid:32769, for process Xorg pid 1336 thread Xorg:cs0 pid 1862)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-07-13 Thread Serj
@alexander-deucher

CPU model: AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx
Kernel: 5.10.49
Firmware:
VCE feature version: 0, firmware version: 0x
UVD feature version: 0, firmware version: 0x
MC feature version: 0, firmware version: 0x
ME feature version: 52, firmware version: 0x00a4
PFP feature version: 52, firmware version: 0x00bc
CE feature version: 52, firmware version: 0x004f
RLC feature version: 1, firmware version: 0x0213
RLC SRLC feature version: 1, firmware version: 0x0001
RLC SRLG feature version: 1, firmware version: 0x0001
RLC SRLS feature version: 1, firmware version: 0x0001
MEC feature version: 52, firmware version: 0x01c2
MEC2 feature version: 52, firmware version: 0x01c2
SOS feature version: 0, firmware version: 0x
ASD feature version: 0, firmware version: 0x2155
TA RAS feature version: 0x, firmware version: 0x212b
TA XGMI feature version: 0x, firmware version: 0x212b
TA HDCP feature version: 0x1711, firmware version: 0x212b
TA DTM feature version: 0x1203, firmware version: 0x212b
SMC feature version: 0, firmware version: 0x1e49
SDMA0 feature version: 41, firmware version: 0x0028
VCN feature version: 0, firmware version: 0x0210c005
DMCU feature version: 0, firmware version: 0x
DMCUB feature version: 0, firmware version: 0x
VBIOS version: 113-RAVEN-107

Also affected

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-07-12 Thread Heinrich Schuchardt
There is an upstream bug report 
https://bugzilla.kernel.org/show_bug.cgi?id=213391
Comment 9 suggest: "downgrade the firmware."
Comment 15 claims: "20210315 seems to work fine here (on an E595)."

** Bug watch added: Linux Kernel Bug Tracker #213391
   https://bugzilla.kernel.org/show_bug.cgi?id=213391

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-07-12 Thread Alex Deucher
Does the latest firmware in the firmware git tree help?
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-07-09 Thread Heinrich Schuchardt
Also affected:

Ubuntu version: 21.04
Linux kernel: 5.11.0-22-generic  x86_64
CPU model: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
GPU: 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 
Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4)
Laptop model: Lenovo Thinkpad E585

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-22 Thread Thiago Jung Bauermann
Em quinta-feira, 17 de junho de 2021, às 00:45:30 -03, Thiago Jung 
Bauermann escreveu:
> > > I think it may be related to a change in mesa.  Specifically mesa
> > > commit
> > > 820dec3f7c7.  For more info see
> > > https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866
> > 
> > I’ll run with Mario’s build of Mesa with that patch backported.
> > Thanks, Mario!
> 
> I’m running with the Mesa build from Mario’s PPA now. If I don’t see any
> issue within two weeks, I think it will be possible to say that the bug
> is gone, or at least much harder to hit.
> 
> I can’t use my reproducer in this case, because I can’t change the Mesa
> version inside the flatpak image.

I just had this bug happen again spontaneously, while running with  Mesa 
from Mario’s PPA.

So this bug isn’t fixed by the patch mentioned by Alex.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-21 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: mesa (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  Confirmed

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-16 Thread Thiago Jung Bauermann
I was finally able to spend a bit of time on this. Unfortunately, there’s 
not much to report back.

Em terça-feira, 8 de junho de 2021, às 15:13:36 -03, Thiago Jung Bauermann 
escreveu:
> Em terça-feira, 8 de junho de 2021, às 10:30:24 -03, Alex Deucher 
escreveu:
> > Can you narrow down which specific firmware file causes the problem?
> 
> Ok, I will try.

I don’t think I can narrow down which firmware file causes the problem, 
because I don’t have a last known good version. All firmware files that I 
tested (Ubuntu versions 1.190.5, 1.197 and latest linux-firmware.git) 
immediately trigger the bug when I try the only reliable reproducer I know 
(i.e., running flatpak’s com.github.quaternion package).

Since it can take several days for the bug to happen if I just use the 
machine normally, it would take weeks to narrow down which of the picasso_* 
files is more stable relative to the others. And even then, I wouldn’t be 
sure about it.

> > I think it may be related to a change in mesa.  Specifically mesa
> > commit
> > 820dec3f7c7.  For more info see
> > https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866
> 
> I’ll run with Mario’s build of Mesa with that patch backported.
> Thanks, Mario!

I’m running with the Mesa build from Mario’s PPA now. If I don’t see any 
issue within two weeks, I think it will be possible to say that the bug is 
gone, or at least much harder to hit.

I can’t use my reproducer in this case, because I can’t change the Mesa 
version inside the flatpak image.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  New

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-09 Thread Timo Aaltonen
21.04 comes with Mesa 21.0.1 which does not seem to have 820dec3f7c7

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  New

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-08 Thread Thiago Jung Bauermann
** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #1598
   https://gitlab.freedesktop.org/drm/amd/-/issues/1598

** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #920
   https://gitlab.freedesktop.org/drm/amd/-/issues/920

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  New

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-08 Thread Thiago Jung Bauermann
Thanks for your input.

Em terça-feira, 8 de junho de 2021, às 10:30:24 -03, Alex Deucher escreveu:
> Can you narrow down which specific firmware file causes the problem?

Ok, I will try.

Also, is it possible and/or worthwhile trying to bisect firmware versions from 
the linux-firmware repo? How coupled is the firmware with the kernel 
driver? E.g., can I try using firmware files from 1 year ago with current 
kernel and Mesa?

> We haven't been able to repro this.

One thing that’s a bit “fishy” about my machine is that it doesn’t seem to 
have a good clock:

[0.211436] TSC synchronization [CPU#0 -> CPU#1]:
[0.211436] Measured 3304683447 cycles TSC warp between CPUs, turning off 
TSC clock.
[0.211436] tsc: Marking TSC unstable due to check_tsc_sync_source failed
…
[0.252117] hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 
[0.252117] hpet0: 3 comparators, 32-bit 14.318180 MHz counter
[0.253970] clocksource: Switched to clocksource hpet
…
[0.580451] Unstable clock detected, switching default tracing clock to 
"global"
   If you want to keep using the local clock, then add:
 "trace_clock=local"
   on the kernel command line

Could this bug be related to that?

> I think it may be related to a change in mesa.  Specifically mesa commit
> 820dec3f7c7.  For more info see
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866

I’ll run with Mario’s build of Mesa with that patch backported.
Thanks, Mario!

> ** Bug watch added: gitlab.freedesktop.org/mesa/mesa/-/issues #4866
>https://gitlab.freedesktop.org/mesa/mesa/-/issues/4866

Other upstream issues that look similar:

https://gitlab.freedesktop.org/drm/amd/-/issues/1598
https://gitlab.freedesktop.org/drm/amd/-/issues/920

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  New

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1928393] Re: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault"

2021-06-08 Thread Mario Limonciello
Here's a PPA build with the mesa fix Alex mentioned backported:
https://launchpad.net/~superm1/+archive/ubuntu/lp1928393

If you can follow the directions to add that PPA and upgrade to that
mesa package you can see if that indeed fixes it.

** Also affects: mesa (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1928393

Title:
  linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0]
  retry page fault"

Status in amd:
  New
Status in linux-firmware package in Ubuntu:
  Incomplete
Status in mesa package in Ubuntu:
  New

Bug description:
  After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the
  upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent
  and severe GPU instability. When this happens, I see this error in
  dmesg:

  [20061.061069] amdgpu :03:00.0: amdgpu: [gfxhub0] retry page fault 
(src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 
pid 1236)
  [20061.061103] amdgpu :03:00.0: amdgpu:   in page starting at address 
0x80401000 from client 27
  [20061.061135] amdgpu :03:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00101031
  [20061.061147] amdgpu :03:00.0: amdgpu:  Faulty UTCL2 client ID: TCP 
(0x8)
  [20061.061157] amdgpu :03:00.0: amdgpu:  MORE_FAULTS: 0x1
  [20061.061167] amdgpu :03:00.0: amdgpu:  WALKER_ERROR: 0x0
  [20061.061174] amdgpu :03:00.0: amdgpu:  PERMISSION_FAULTS: 0x3
  [20061.061183] amdgpu :03:00.0: amdgpu:  MAPPING_ERROR: 0x0
  [20061.061189] amdgpu :03:00.0: amdgpu:  RW: 0x0

  I'll attach a couple of full dmesgs that I collected.

  Many of the times when this happens, the screen and keyboard freeze
  irreversibly (I tried waiting for more than 30 minutes, but it doesn't
  help). I can still log in via ssh though. When there's no freeze, I
  can continue using the computer normally, but the laptop fans keep
  running are always running and the battery depletes fast. There's
  probably something on a permanent loop either in the kernel or in the
  GPU.

  This bug happens several times a day, rendering the machine so
  unstable as to be almost unusable. It is a severe regression and I'm
  aghast that it passed AMD's Quality Assurance.

  After downgrading back to linux-firmware 1.190.5, the machine is back
  to the previous, mostly-reliable state. Which is to say, this bug is
  gone, I'm just left with the other amdgpu suspend bug I've learned to
  live with since I bought this computer.

  Please revert the amdgpu firmware in this package as soon as possible.
  This is unbearable.

  Relevant information:
  Ubuntu version: 21.04
  Linux kernel: 5.11.0-17-generic x86_64
  CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
  GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Picasso (rev c1)
  Laptop model: Lenovo Ideapad S145

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp