[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2019-08-23 Thread Paul Graydon
Seeing the same on Ubuntu 18.04.3 with the HWE kernel, 5.0.0-25-generic

AMD A6-9225 RADEON R4

This appears to be tied in to the problems resuming the laptop from
suspended (black screen, flashing cursor)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 
eeh_dev_check_failure+0x5bc/0x5e0
  [ 2362.080593] [c000fb7079e0] [c003babc] 
eeh_check_failure+0x8c/0xd0
  [ 2362.080628] [c000fb707a20] [c0080cfa1b88] 
amdgpu_mm_rreg+0x280/0x2a0 [amdgpu]
  [ 2362.080676] [c000fb707a70] [c0080d04cf68] 
gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu]
  [ 2362.080711] [c000fb707aa0] [c0080cfa1194] 
amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu]
  [ 2362.080745] [c000fb707b30] [c0080cfa649c] 
amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu]
  [ 2362.080799] [c000fb707c00] [c0080d0b97a4] 
amdgpu_job_timedout+0x5c/0x80 [amdgpu]
  [ 2362.080805] [c000fb707c70] [c0080c8f0040] 
drm_sched_job_timedout+0x38/0x60 [gpu_sched]
  [ 2362.080810] [c000fb707c90] [c0137928] 
process_one_work+0x298/0x580
  [ 2362.080813] [c000fb707d20] [c0137c98] worker_thread+0x88/0x610
  [ 2362.080817] [c000fb707dc0] [c0140958] kthread+0x1a8/0x1b0
  [ 2362.080822] [c000fb707e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 2362.080827] [drm] IP block:gmc_v8_0 is hung!
  [ 2362.080832] [drm] IP block:tonga_ih is hung!
  [ 2362.080843] [drm] IP block:gfx_v8_0 is hung!
  [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0
  [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour
  [ 2362.080849] EEH: Notify device drivers to shutdown
  [ 2362.080850] [drm] IP block:sdma_v3_0 is hung!
  [ 2362.080856] [drm] IP block:uvd_v6_0 is hung!
  [ 2362.080858] EEH: Collect temporary log
  [ 2362.080866] [drm] IP block:vce_v3_0 is hung!
  [ 2362.080867] [drm] GPU recovery disabled.
  [ 2362.080903] EEH: of node=0033:01:00.1
  [ 2362.080905] EEH: PCI device/vendor: 
  [ 2362.080907] EEH: PCI cmd/status register: 
  [ 2362.080908] EEH: PCI-E capabilities and status follow:
  [ 2362.080915] EEH: PCI-E 00:     
  [ 2362.080920] EEH: PCI-E 10:     
  [ 2362.080921] EEH: PCI-E 20:  
  [ 2362.080922] EEH: PCI-E AER capability register set follows:
  [ 2362.080928] EEH: PCI-E AER 00:     
  [ 2362.080933] EEH: PCI-E AER 10:     
  [ 2362.080938] EEH: PCI-E AER 20:     
  [ 2362.080940] EEH: PCI-E AER 30:   
  [ 2362.080941] EEH: of node=0033:01:00.0
  [ 2362.080943] EEH: PCI device/vendor: 
  [ 2362.080945] EEH: PCI cmd/status register: 
  [ 2362.080945] EEH: PCI-E capabilities and status follow:
  [ 2362.080951] EEH: PCI-E 00:     
  [ 2362.080956] EEH: PCI-E 10:     
  [ 2362.080957] EEH: PCI-E 20:  
  [ 2362.080958] EEH: PCI-E AER capability register set follows:
  [ 2362.080964] EEH: PCI-E AER 00:     
  [ 2362.080969] EEH: PCI-E AER 10:     
  [ 2362.080974] EEH: PCI-E AER 20:     
  [ 2362.080975] EEH: PCI-E AER 30:   
  [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1)
  [ 2362.080978] brdgCtl:0002
  [ 2362.080979] RootSts:00060020 00402000 c1010008 00100107 
  [ 2362.080980] RootErrSts:  0020 
  [ 2362.080981] PhbSts: 001c 001c
  [ 2362.080982] Lem:0001  0001
  [ 2362.080983] PhbErr: 00c0 0080 214898000240 
a0084000
  [ 2362.080984] RegbErr:0090 0010 483c 
0200
  [ 2362.080985] PE[000] A/B: 

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2019-08-21 Thread Jakub Paś
I have similar error on matebook D: Ryzen + AMD GPU:

Linux matebook 4.18.0-25-generic #26~18.04.1-Ubuntu SMP Thu Jun 27
07:28:31 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

The same is on kernel 4.15. It basically cause whole system to hang and
I need to physically restart it


sie 21 10:41:44 matebook kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* 
ring gfx timeout, last signaled seq=760524, last emitted seq=760524
sie 21 10:41:44 matebook kernel: [drm] GPU recovery disabled.
sie 21 10:41:57 matebook kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 
22s! [indicator-cpufr:4128]

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 
eeh_dev_check_failure+0x5bc/0x5e0
  [ 2362.080593] [c000fb7079e0] [c003babc] 
eeh_check_failure+0x8c/0xd0
  [ 2362.080628] [c000fb707a20] [c0080cfa1b88] 
amdgpu_mm_rreg+0x280/0x2a0 [amdgpu]
  [ 2362.080676] [c000fb707a70] [c0080d04cf68] 
gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu]
  [ 2362.080711] [c000fb707aa0] [c0080cfa1194] 
amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu]
  [ 2362.080745] [c000fb707b30] [c0080cfa649c] 
amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu]
  [ 2362.080799] [c000fb707c00] [c0080d0b97a4] 
amdgpu_job_timedout+0x5c/0x80 [amdgpu]
  [ 2362.080805] [c000fb707c70] [c0080c8f0040] 
drm_sched_job_timedout+0x38/0x60 [gpu_sched]
  [ 2362.080810] [c000fb707c90] [c0137928] 
process_one_work+0x298/0x580
  [ 2362.080813] [c000fb707d20] [c0137c98] worker_thread+0x88/0x610
  [ 2362.080817] [c000fb707dc0] [c0140958] kthread+0x1a8/0x1b0
  [ 2362.080822] [c000fb707e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 2362.080827] [drm] IP block:gmc_v8_0 is hung!
  [ 2362.080832] [drm] IP block:tonga_ih is hung!
  [ 2362.080843] [drm] IP block:gfx_v8_0 is hung!
  [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0
  [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour
  [ 2362.080849] EEH: Notify device drivers to shutdown
  [ 2362.080850] [drm] IP block:sdma_v3_0 is hung!
  [ 2362.080856] [drm] IP block:uvd_v6_0 is hung!
  [ 2362.080858] EEH: Collect temporary log
  [ 2362.080866] [drm] IP block:vce_v3_0 is hung!
  [ 2362.080867] [drm] GPU recovery disabled.
  [ 2362.080903] EEH: of node=0033:01:00.1
  [ 2362.080905] EEH: PCI device/vendor: 
  [ 2362.080907] EEH: PCI cmd/status register: 
  [ 2362.080908] EEH: PCI-E capabilities and status follow:
  [ 2362.080915] EEH: PCI-E 00:     
  [ 2362.080920] EEH: PCI-E 10:     
  [ 2362.080921] EEH: PCI-E 20:  
  [ 2362.080922] EEH: PCI-E AER capability register set follows:
  [ 2362.080928] EEH: PCI-E AER 00:     
  [ 2362.080933] EEH: PCI-E AER 10:     
  [ 2362.080938] EEH: PCI-E AER 20:     
  [ 2362.080940] EEH: PCI-E AER 30:   
  [ 2362.080941] EEH: of node=0033:01:00.0
  [ 2362.080943] EEH: PCI device/vendor: 
  [ 2362.080945] EEH: PCI cmd/status register: 
  [ 2362.080945] EEH: PCI-E capabilities and status follow:
  [ 2362.080951] EEH: PCI-E 00:     
  [ 2362.080956] EEH: PCI-E 10:     
  [ 2362.080957] EEH: PCI-E 20:  
  [ 2362.080958] EEH: PCI-E AER capability register set follows:
  [ 2362.080964] EEH: PCI-E AER 00:     
  [ 2362.080969] EEH: PCI-E AER 10:     
  [ 2362.080974] EEH: PCI-E AER 20:     
  [ 2362.080975] EEH: PCI-E AER 30:   
  [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1)
  [ 2362.080978] brdgCtl:0002
  [ 2362.080979] RootSts:00060020 00402000 c1010008 00100107 
  [ 2362.080980] RootErrSts:  0020 

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2018-11-08 Thread Alexandr Kara
Reproduced on Fedora (with kernel 4.18.16-300.fc29.x86_64).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 
eeh_dev_check_failure+0x5bc/0x5e0
  [ 2362.080593] [c000fb7079e0] [c003babc] 
eeh_check_failure+0x8c/0xd0
  [ 2362.080628] [c000fb707a20] [c0080cfa1b88] 
amdgpu_mm_rreg+0x280/0x2a0 [amdgpu]
  [ 2362.080676] [c000fb707a70] [c0080d04cf68] 
gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu]
  [ 2362.080711] [c000fb707aa0] [c0080cfa1194] 
amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu]
  [ 2362.080745] [c000fb707b30] [c0080cfa649c] 
amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu]
  [ 2362.080799] [c000fb707c00] [c0080d0b97a4] 
amdgpu_job_timedout+0x5c/0x80 [amdgpu]
  [ 2362.080805] [c000fb707c70] [c0080c8f0040] 
drm_sched_job_timedout+0x38/0x60 [gpu_sched]
  [ 2362.080810] [c000fb707c90] [c0137928] 
process_one_work+0x298/0x580
  [ 2362.080813] [c000fb707d20] [c0137c98] worker_thread+0x88/0x610
  [ 2362.080817] [c000fb707dc0] [c0140958] kthread+0x1a8/0x1b0
  [ 2362.080822] [c000fb707e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 2362.080827] [drm] IP block:gmc_v8_0 is hung!
  [ 2362.080832] [drm] IP block:tonga_ih is hung!
  [ 2362.080843] [drm] IP block:gfx_v8_0 is hung!
  [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0
  [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour
  [ 2362.080849] EEH: Notify device drivers to shutdown
  [ 2362.080850] [drm] IP block:sdma_v3_0 is hung!
  [ 2362.080856] [drm] IP block:uvd_v6_0 is hung!
  [ 2362.080858] EEH: Collect temporary log
  [ 2362.080866] [drm] IP block:vce_v3_0 is hung!
  [ 2362.080867] [drm] GPU recovery disabled.
  [ 2362.080903] EEH: of node=0033:01:00.1
  [ 2362.080905] EEH: PCI device/vendor: 
  [ 2362.080907] EEH: PCI cmd/status register: 
  [ 2362.080908] EEH: PCI-E capabilities and status follow:
  [ 2362.080915] EEH: PCI-E 00:     
  [ 2362.080920] EEH: PCI-E 10:     
  [ 2362.080921] EEH: PCI-E 20:  
  [ 2362.080922] EEH: PCI-E AER capability register set follows:
  [ 2362.080928] EEH: PCI-E AER 00:     
  [ 2362.080933] EEH: PCI-E AER 10:     
  [ 2362.080938] EEH: PCI-E AER 20:     
  [ 2362.080940] EEH: PCI-E AER 30:   
  [ 2362.080941] EEH: of node=0033:01:00.0
  [ 2362.080943] EEH: PCI device/vendor: 
  [ 2362.080945] EEH: PCI cmd/status register: 
  [ 2362.080945] EEH: PCI-E capabilities and status follow:
  [ 2362.080951] EEH: PCI-E 00:     
  [ 2362.080956] EEH: PCI-E 10:     
  [ 2362.080957] EEH: PCI-E 20:  
  [ 2362.080958] EEH: PCI-E AER capability register set follows:
  [ 2362.080964] EEH: PCI-E AER 00:     
  [ 2362.080969] EEH: PCI-E AER 10:     
  [ 2362.080974] EEH: PCI-E AER 20:     
  [ 2362.080975] EEH: PCI-E AER 30:   
  [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1)
  [ 2362.080978] brdgCtl:0002
  [ 2362.080979] RootSts:00060020 00402000 c1010008 00100107 
  [ 2362.080980] RootErrSts:  0020 
  [ 2362.080981] PhbSts: 001c 001c
  [ 2362.080982] Lem:0001  0001
  [ 2362.080983] PhbErr: 00c0 0080 214898000240 
a0084000
  [ 2362.080984] RegbErr:0090 0010 483c 
0200
  [ 2362.080985] PE[000] A/B: 8000 8000
  [ 2362.080987] PE[..1fe] A/B: as above
  [ 2362.080988] PE[1ff] A/B: b740002a0100 8000
  [ 

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2018-07-29 Thread Joel Stanley
Reproduced with 4.17-rc7

** Tags added: kernel-bug-exists-upstream

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 
eeh_dev_check_failure+0x5bc/0x5e0
  [ 2362.080593] [c000fb7079e0] [c003babc] 
eeh_check_failure+0x8c/0xd0
  [ 2362.080628] [c000fb707a20] [c0080cfa1b88] 
amdgpu_mm_rreg+0x280/0x2a0 [amdgpu]
  [ 2362.080676] [c000fb707a70] [c0080d04cf68] 
gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu]
  [ 2362.080711] [c000fb707aa0] [c0080cfa1194] 
amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu]
  [ 2362.080745] [c000fb707b30] [c0080cfa649c] 
amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu]
  [ 2362.080799] [c000fb707c00] [c0080d0b97a4] 
amdgpu_job_timedout+0x5c/0x80 [amdgpu]
  [ 2362.080805] [c000fb707c70] [c0080c8f0040] 
drm_sched_job_timedout+0x38/0x60 [gpu_sched]
  [ 2362.080810] [c000fb707c90] [c0137928] 
process_one_work+0x298/0x580
  [ 2362.080813] [c000fb707d20] [c0137c98] worker_thread+0x88/0x610
  [ 2362.080817] [c000fb707dc0] [c0140958] kthread+0x1a8/0x1b0
  [ 2362.080822] [c000fb707e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 2362.080827] [drm] IP block:gmc_v8_0 is hung!
  [ 2362.080832] [drm] IP block:tonga_ih is hung!
  [ 2362.080843] [drm] IP block:gfx_v8_0 is hung!
  [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0
  [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour
  [ 2362.080849] EEH: Notify device drivers to shutdown
  [ 2362.080850] [drm] IP block:sdma_v3_0 is hung!
  [ 2362.080856] [drm] IP block:uvd_v6_0 is hung!
  [ 2362.080858] EEH: Collect temporary log
  [ 2362.080866] [drm] IP block:vce_v3_0 is hung!
  [ 2362.080867] [drm] GPU recovery disabled.
  [ 2362.080903] EEH: of node=0033:01:00.1
  [ 2362.080905] EEH: PCI device/vendor: 
  [ 2362.080907] EEH: PCI cmd/status register: 
  [ 2362.080908] EEH: PCI-E capabilities and status follow:
  [ 2362.080915] EEH: PCI-E 00:     
  [ 2362.080920] EEH: PCI-E 10:     
  [ 2362.080921] EEH: PCI-E 20:  
  [ 2362.080922] EEH: PCI-E AER capability register set follows:
  [ 2362.080928] EEH: PCI-E AER 00:     
  [ 2362.080933] EEH: PCI-E AER 10:     
  [ 2362.080938] EEH: PCI-E AER 20:     
  [ 2362.080940] EEH: PCI-E AER 30:   
  [ 2362.080941] EEH: of node=0033:01:00.0
  [ 2362.080943] EEH: PCI device/vendor: 
  [ 2362.080945] EEH: PCI cmd/status register: 
  [ 2362.080945] EEH: PCI-E capabilities and status follow:
  [ 2362.080951] EEH: PCI-E 00:     
  [ 2362.080956] EEH: PCI-E 10:     
  [ 2362.080957] EEH: PCI-E 20:  
  [ 2362.080958] EEH: PCI-E AER capability register set follows:
  [ 2362.080964] EEH: PCI-E AER 00:     
  [ 2362.080969] EEH: PCI-E AER 10:     
  [ 2362.080974] EEH: PCI-E AER 20:     
  [ 2362.080975] EEH: PCI-E AER 30:   
  [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1)
  [ 2362.080978] brdgCtl:0002
  [ 2362.080979] RootSts:00060020 00402000 c1010008 00100107 
  [ 2362.080980] RootErrSts:  0020 
  [ 2362.080981] PhbSts: 001c 001c
  [ 2362.080982] Lem:0001  0001
  [ 2362.080983] PhbErr: 00c0 0080 214898000240 
a0084000
  [ 2362.080984] RegbErr:0090 0010 483c 
0200
  [ 2362.080985] PE[000] A/B: 8000 8000
  [ 2362.080987] PE[..1fe] A/B: as above

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2018-07-29 Thread Joel Stanley
With -rc7:

[  333.596521] EEH: PHB#33 failure detected, location: N/A
[  333.596563] CPU: 12 PID: 811 Comm: kworker/u257:1 Not tainted 
4.18.0-041800rc7-generic #201807292230
[  333.596576] Workqueue: events_unbound commit_work [drm_kms_helper]
[  333.596578] Call Trace:
[  333.596582] [c00fec1036c0] [c0d4d6fc] dump_stack+0xb0/0xf4 
(unreliable)
[  333.596587] [c00fec103700] [c003b114] 
eeh_dev_check_failure+0x514/0x5e0
[  333.596589] [c00fec1037a0] [c003b26c] eeh_check_failure+0x8c/0xd0
[  333.596616] [c00fec1037e0] [c0080d5119f8] amdgpu_mm_rreg+0x240/0x2a0 
[amdgpu]
[  333.596649] [c00fec103840] [c0080d623250] 
amdgpu_cgs_read_register+0x28/0x50 [amdgpu]
[  333.596685] [c00fec103860] [c0080d6ce81c] 
dce110_timing_generator_get_vblank_counter+0x44/0x70 [amdgpu]
[  333.596717] [c00fec103880] [c0080d6f4430] 
dc_stream_get_vblank_counter+0x88/0xb0 [amdgpu]
[  333.596752] [c00fec1038a0] [c0080d67f5f4] 
dm_vblank_get_counter+0x4c/0xa8 [amdgpu]
[  333.596774] [c00fec103900] [c0080d518630] 
amdgpu_get_vblank_counter_kms+0xa8/0x250 [amdgpu]
[  333.596808] [c00fec1039b0] [c0080d67c1b8] 
amdgpu_dm_do_flip+0xd0/0x3b0 [amdgpu]
[  333.596844] [c00fec103b00] [c0080d6801a0] 
amdgpu_dm_atomic_commit_tail+0x7f8/0xf20 [amdgpu]
[  333.596850] [c00fec103c60] [c0080cb72da4] commit_tail+0x6c/0xe0 
[drm_kms_helper]
[  333.596853] [c00fec103c90] [c01385f0] 
process_one_work+0x2b0/0x560
[  333.596855] [c00fec103d20] [c0138928] worker_thread+0x88/0x610
[  333.596858] [c00fec103dc0] [c01415dc] kthread+0x1ac/0x1c0
[  333.596861] [c00fec103e30] [c000b65c] 
ret_from_kernel_thread+0x5c/0x80
[  333.596886] EEH: Detected error on PHB#33
[  333.596890] EEH: This PCI device has failed 1 times in the last hour and 
will be permanently disabled after 5 failures.
[  333.596891] EEH: Notify device drivers to shutdown
[  333.596895] EEH: Beginning: 'error_detected(IO frozen)'
[  333.596898] EEH: PE#1fe (PCI 0033:00:00.0): no driver
[  333.596900] EEH: PE#0 (PCI 0033:01:00.1): driver not EEH aware
[  333.596902] EEH: PE#0 (PCI 0033:01:00.0): driver not EEH aware
[  333.596904] EEH: Finished:'error_detected(IO frozen)' with aggregate 
recovery state:'none'
[  333.596908] EEH: Collect temporary log
[  333.596910] PHB4 PHB#51 Diag-data (Version: 1)
[  333.596911] brdgCtl:0002
[  333.596913] RootSts:0020 00402000 e9010008 00100107 
[  333.596915] nFir:   8000 0030001c 8000
[  333.596916] PhbSts: 0018 0018
[  333.596918] Lem:000400010100  0001
[  333.596919] PhbErr: 05a0 0080 214898000240 
a0084000
[  333.596921] PhbTxeErr:  2000 2000 4000 

[  333.596923] RxeMrgErr:  0001 0001  

[  333.596925] PblErr: 0800 0800  
028de410
[  333.596926] RegbErr:00100010 0010 483c 
0200
[  333.596929] EEH: Reset with hotplug activity
[  334.084373] iommu: Removing device 0033:01:00.1 from group 3
[  334.084445] pci 0033:01:00.1: Dropping the link to 0033:01:00.0
[  334.085057] [drm] amdgpu: finishing device.
[  343.769080] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=590, last emitted seq=591
[  343.769186] [drm] GPU recovery disabled.
[  344.281128] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, 
last signaled seq=349, last emitted seq=350
[  344.281189] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0] 
hw_done or flip_done timed out
[  344.281390] [drm] GPU recovery disabled.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2018-07-29 Thread Joel Stanley
With upstream kernels I get this (and a frozen desktop):

[ 2604.488694] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 
681000 pix_clk 154000
[ 2634.551719] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[ 2634.554170] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[ 3060.974388] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 
681000 pix_clk 154000
[ 3510.632708] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 
681000 pix_clk 154000
[ 3527.956089] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 
681000 pix_clk 154000
[ 4992.501324] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 
681000 pix_clk 154000
[ 5015.179529] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 
681000 pix_clk 154000
[ 5189.342133] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=4657, last emitted seq=4658
[ 5189.342233] [drm] GPU recovery disabled.
[ 5317.867388] INFO: task kworker/u257:3:54387 blocked for more than 120 
seconds.
[ 5317.867471]   Not tainted 4.18.0-041800rc6-generic #201807221830
[ 5317.867548] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 5317.867656] kworker/u257:3  D0 54387  2 0x0808
[ 5317.867675] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 5317.867677] Call Trace:
[ 5317.867680] [c00fe3447460] [002a] 0x2a (unreliable)
[ 5317.867688] [c00fe3447630] [c001c430] __switch_to+0x260/0x4c0
[ 5317.867694] [c00fe3447690] [c0d67b44] __schedule+0x304/0xad0
[ 5317.867697] [c00fe3447760] [c0d68358] schedule+0x48/0xc0
[ 5317.867701] [c00fe3447780] [c0d6d1b8] 
schedule_timeout+0x348/0x510
[ 5317.867707] [c00fe3447880] [c0928b60] 
dma_fence_default_wait+0x2b0/0x350
[ 5317.867710] [c00fe34478f0] [c092780c] 
dma_fence_wait_timeout+0x6c/0x1b0
[ 5317.867714] [c00fe3447930] [c092aeb0] 
reservation_object_wait_timeout_rcu+0x320/0x3d0
[ 5317.867774] [c00fe34479b0] [c0080d5fc220] 
amdgpu_dm_do_flip+0x138/0x3b0 [amdgpu]
[ 5317.867831] [c00fe3447b00] [c0080d6001a0] 
amdgpu_dm_atomic_commit_tail+0x7f8/0xf20 [amdgpu]
[ 5317.867840] [c00fe3447c60] [c0080cb72da4] commit_tail+0x6c/0xe0 
[drm_kms_helper]
[ 5317.867846] [c00fe3447c90] [c0138720] 
process_one_work+0x2b0/0x560
[ 5317.867850] [c00fe3447d20] [c0138a58] worker_thread+0x88/0x610
[ 5317.867854] [c00fe3447dc0] [c01416fc] kthread+0x1ac/0x1c0
[ 5317.867859] [c00fe3447e30] [c000b65c] 
ret_from_kernel_thread+0x5c/0x80
[ 5438.711397] INFO: task kworker/u257:3:54387 blocked for more than 120 
seconds.
[ 5438.711473]   Not tainted 4.18.0-041800rc6-generic #201807221830
[ 5438.711552] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.

If I kill the wayland session:

[ 7012.419912] EEH: Frozen PHB#33-PE#0 detected
[ 7012.419919] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
[ 7012.419923] CPU: 74 PID: 126541 Comm: pulseaudio Not tainted 
4.18.0-041800rc6-generic #201807221830
[ 7012.419924] Call Trace:
[ 7012.419932] [c000200b3600] [c0d4ce3c] dump_stack+0xb0/0xf4 
(unreliable)
[ 7012.419936] [c000200b3640] [c003b0ac] 
eeh_dev_check_failure+0x4ac/0x5e0
[ 7012.419938] [c000200b36e0] [c003b26c] eeh_check_failure+0x8c/0xd0
[ 7012.419945] [c000200b36333420] [c00816342ae8] pci_azx_readw+0x80/0xb0 
[snd_hda_intel]
[ 7012.419950] [c000200b36333450] [c008161c5790] 
snd_hdac_bus_send_cmd+0x78/0x210 [snd_hda_core]
[ 7012.419956] [c000200b363334a0] [c008162a20ec] azx_send_cmd+0x34/0x390 
[snd_hda_codec]
[ 7012.419959] [c000200b36333530] [c008161c0274] 
snd_hdac_bus_exec_verb_unlocked+0x7c/0x280 [snd_hda_core]
[ 7012.419964] [c000200b36333590] [c0081629240c] codec_exec_verb+0xb4/0x1f0 
[snd_hda_codec]
[ 7012.419967] [c000200b36333630] [c008161c1a10] 
snd_hdac_exec_verb+0x38/0x90 [snd_hda_core]
[ 7012.419971] [c000200b36333650] [c008161c4158] hda_reg_write+0x120/0x3b0 
[snd_hda_core]
[ 7012.419974] [c000200b363336c0] [c08c87e8] _regmap_write+0x98/0x190
[ 7012.419977] [c000200b36333710] [c08ca5b4] regmap_write+0x74/0xc0
[ 7012.419981] [c000200b36333750] [c008161c47e4] 
snd_hdac_regmap_write_raw+0x4c/0x130 [snd_hda_core]
[ 7012.419985] [c000200b36333790] [c00816485d80] hdmi_pcm_open+0x168/0x4a0 
[snd_hda_codec_hdmi]
[ 7012.419989] [c000200b36333820] [c008162a12e8] azx_pcm_open+0x1b0/0x3d0 
[snd_hda_codec]
[ 7012.419995] [c000200b36333890] [c008160ab3dc] 
snd_pcm_open_substream+0xb4/0x1a0 [snd_pcm]
[ 7012.419998] [c000200b36333920] [c008160ab5d4] snd_pcm_open+0x10c/0x2e0 
[snd_pcm]
[ 7012.420002] [c000200b363339b0] [c008160ab8c4] 
snd_pcm_playback_open+0x6c/0xa8 [snd_pcm]
[ 7012.420008] [c000200b363339f0] [c0080f9c0750] snd_open+0x108/0x240 [snd]
[ 7012.420011] 

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2018-07-24 Thread Joseph Salisbury
Did this issue start happening after an update/upgrade?  Was there a
prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc6

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

** Tags added: kernel-da-key

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 
eeh_dev_check_failure+0x5bc/0x5e0
  [ 2362.080593] [c000fb7079e0] [c003babc] 
eeh_check_failure+0x8c/0xd0
  [ 2362.080628] [c000fb707a20] [c0080cfa1b88] 
amdgpu_mm_rreg+0x280/0x2a0 [amdgpu]
  [ 2362.080676] [c000fb707a70] [c0080d04cf68] 
gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu]
  [ 2362.080711] [c000fb707aa0] [c0080cfa1194] 
amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu]
  [ 2362.080745] [c000fb707b30] [c0080cfa649c] 
amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu]
  [ 2362.080799] [c000fb707c00] [c0080d0b97a4] 
amdgpu_job_timedout+0x5c/0x80 [amdgpu]
  [ 2362.080805] [c000fb707c70] [c0080c8f0040] 
drm_sched_job_timedout+0x38/0x60 [gpu_sched]
  [ 2362.080810] [c000fb707c90] [c0137928] 
process_one_work+0x298/0x580
  [ 2362.080813] [c000fb707d20] [c0137c98] worker_thread+0x88/0x610
  [ 2362.080817] [c000fb707dc0] [c0140958] kthread+0x1a8/0x1b0
  [ 2362.080822] [c000fb707e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 2362.080827] [drm] IP block:gmc_v8_0 is hung!
  [ 2362.080832] [drm] IP block:tonga_ih is hung!
  [ 2362.080843] [drm] IP block:gfx_v8_0 is hung!
  [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0
  [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour
  [ 2362.080849] EEH: Notify device drivers to shutdown
  [ 2362.080850] [drm] IP block:sdma_v3_0 is hung!
  [ 2362.080856] [drm] IP block:uvd_v6_0 is hung!
  [ 2362.080858] EEH: Collect temporary log
  [ 2362.080866] [drm] IP block:vce_v3_0 is hung!
  [ 2362.080867] [drm] GPU recovery disabled.
  [ 2362.080903] EEH: of node=0033:01:00.1
  [ 2362.080905] EEH: PCI device/vendor: 
  [ 2362.080907] EEH: PCI cmd/status register: 
  [ 2362.080908] EEH: PCI-E capabilities and status follow:
  [ 2362.080915] EEH: PCI-E 00:     
  [ 2362.080920] EEH: PCI-E 10:     
  [ 2362.080921] EEH: PCI-E 20:  
  [ 2362.080922] EEH: PCI-E AER capability register set follows:
  [ 2362.080928] EEH: PCI-E AER 00:     
  [ 2362.080933] EEH: PCI-E AER 10:     
  [ 2362.080938] EEH: PCI-E AER 20:     
  [ 2362.080940] EEH: PCI-E AER 30:   
  [ 2362.080941] EEH: of node=0033:01:00.0
  [ 2362.080943] EEH: PCI device/vendor: 
  [ 2362.080945] EEH: PCI cmd/status register: 
  [ 2362.080945] EEH: PCI-E capabilities and status follow:
  [ 2362.080951] EEH: PCI-E 00:     
  [ 2362.080956] EEH: PCI-E 10:     
  [ 2362.080957] EEH: PCI-E 20:  
  [ 2362.080958] EEH: PCI-E AER capability register set follows:
  [ 2362.080964] EEH: PCI-E AER 00:     
  [ 2362.080969] EEH: PCI-E AER 10:     
  [ 2362.080974] EEH: PCI-E AER 20:     
  [ 2362.080975] EEH: PCI-E AER 30:   
  [ 2362.080977] PHB4 PHB#51 

[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

2018-07-20 Thread Joel Stanley
After that, it fails to recover:

 2372.712463] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0] 
hw_done or flip_done timed out
[ 2538.367847] INFO: task kworker/u257:2:8785 blocked for more than 120 seconds.
[ 2538.367917]   Not tainted 4.17.0-5-generic #6-Ubuntu
[ 2538.367968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2538.368053] kworker/u257:2  D0  8785  2 0x0800
[ 2538.368067] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 2538.368069] Call Trace:
[ 2538.368072] [c00fcfb33460] [0014] 0x14 (unreliable)
[ 2538.368078] [c00fcfb33630] [c001cd3c] __switch_to+0x2ec/0x4c0
[ 2538.368081] [c00fcfb33690] [c0d42550] __schedule+0x330/0xa90
[ 2538.368083] [c00fcfb33760] [c0d42cf0] schedule+0x40/0xc0
[ 2538.368086] [c00fcfb33780] [c0d47b88] 
schedule_timeout+0x258/0x4f0
[ 2538.368090] [c00fcfb33880] [c0923b90] 
dma_fence_default_wait+0x2b0/0x370
[ 2538.368093] [c00fcfb338f0] [c0922f64] 
dma_fence_wait_timeout+0x74/0x190
[ 2538.368096] [c00fcfb33930] [c0925fc0] 
reservation_object_wait_timeout_rcu+0x2f0/0x3e0
[ 2538.368141] [c00fcfb339b0] [c0080d10d108] 
amdgpu_dm_do_flip+0x130/0x3b0 [amdgpu]
[ 2538.368184] [c00fcfb33b00] [c0080d1113c8] 
amdgpu_dm_atomic_commit_tail+0xcb0/0xf90 [amdgpu]
[ 2538.368191] [c00fcfb33c60] [c0080c762b94] commit_tail+0x6c/0xe0 
[drm_kms_helper]
[ 2538.368194] [c00fcfb33c90] [c0137928] 
process_one_work+0x298/0x580
[ 2538.368197] [c00fcfb33d20] [c0137c98] worker_thread+0x88/0x610
[ 2538.368200] [c00fcfb33dc0] [c0140958] kthread+0x1a8/0x1b0
[ 2538.368203] [c00fcfb33e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84
[ 2659.214902] INFO: task kworker/u257:2:8785 blocked for more than 120 seconds.
[ 2659.214976]   Not tainted 4.17.0-5-generic #6-Ubuntu
[ 2659.215019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2659.215126] kworker/u257:2  D0  8785  2 0x0800
[ 2659.215141] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 2659.215142] Call Trace:
[ 2659.215145] [c00fcfb33460] [0014] 0x14 (unreliable)
[ 2659.215151] [c00fcfb33630] [c001cd3c] __switch_to+0x2ec/0x4c0
[ 2659.215154] [c00fcfb33690] [c0d42550] __schedule+0x330/0xa90
[ 2659.215157] [c00fcfb33760] [c0d42cf0] schedule+0x40/0xc0
[ 2659.215160] [c00fcfb33780] [c0d47b88] 
schedule_timeout+0x258/0x4f0
[ 2659.215163] [c00fcfb33880] [c0923b90] 
dma_fence_default_wait+0x2b0/0x370
[ 2659.215166] [c00fcfb338f0] [c0922f64] 
dma_fence_wait_timeout+0x74/0x190
[ 2659.215169] [c00fcfb33930] [c0925fc0] 
reservation_object_wait_timeout_rcu+0x2f0/0x3e0
[ 2659.215217] [c00fcfb339b0] [c0080d10d108] 
amdgpu_dm_do_flip+0x130/0x3b0 [amdgpu]
[ 2659.215264] [c00fcfb33b00] [c0080d1113c8] 
amdgpu_dm_atomic_commit_tail+0xcb0/0xf90 [amdgpu]
[ 2659.215272] [c00fcfb33c60] [c0080c762b94] commit_tail+0x6c/0xe0 
[drm_kms_helper]
[ 2659.215275] [c00fcfb33c90] [c0137928] 
process_one_work+0x298/0x580
[ 2659.215278] [c00fcfb33d20] [c0137c98] worker_thread+0x88/0x610
[ 2659.215281] [c00fcfb33dc0] [c0140958] kthread+0x1a8/0x1b0
[ 2659.215284] [c00fcfb33e30] [c000b658] 
ret_from_kernel_thread+0x5c/0x84

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1782716

Title:
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon
  R9 Fury GPU

  
  0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff)

  [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
last signaled seq=8777, last emitted seq=8778
  [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected
  [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A
  [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 
4.17.0-5-generic #6-Ubuntu
  [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched]
  [ 2362.080577] Call Trace:
  [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 
(unreliable)
  [ 2362.080590] [c000fb707930] [c003ba0c] 
eeh_dev_check_failure+0x5bc/0x5e0
  [ 2362.080593] [c000fb7079e0] [c003babc] 
eeh_check_failure+0x8c/0xd0
  [ 2362.080628] [c000fb707a20] [c0080cfa1b88] 
amdgpu_mm_rreg+0x280/0x2a0 [amdgpu]
  [ 2362.080676] [c000fb707a70] [c0080d04cf68] 
gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu]
  [ 2362.080711] [c000fb707aa0] [c0080cfa1194]