[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-25 Thread Michel Dänzer
On 23.03.2015 07:14, Carsten Emde wrote:
> Hi Michel,
> 
> [..]
> The most striking problem of kernel 3.18.9-rt4 affects all systems
> that
> are equipped with Radeon graphics (irrespective whether PCIe cards or
> APUs with on-chip graphics). They suffer from a hanging radeon driver.
> The block occurs when accelerated graphics load is created by
> x11perf or
> gltestperf. Sometimes only the graphics are frozen while ssh login
> still
> is possible, somtimes the entire box is no longer accessible at
> all. In
> any case, a reboot is needed to recover from this situation.
>
> Here is a selection of kernel messages:
 [...]
 The commits from
 http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=f957063fee6392bb9365370db6db74dc0b2dce0a


 to
 http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=cffefd9bb31cd35ab745d3b49005d10616d25bdc


 and
 http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=b6610101718d4ab90d793c482625e98eb1262cad


 might help for this.
>>>
>>> Thanks a lot. I have applied these patches to a number of systems:
>>> # quilt applied | tail -7
>>> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch
>>> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch
>>> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch
>>> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch
>>> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch
>>> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch
>>> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch
>>>
>>>
>>>
>>>   The graphic boards still crash and freeze the screen, but in contrast
>>> to the earlier situation the systems remain accessible, and the X
>>> Window server can be restarted after the offensive programs are
>>> removed. The crashes were reliably triggered by
>>> - gltestperf
>>>or
>>> - x11perf -repeat 3 -subs 25 -time 2 -rect10
> This is not entirely correct, since gltestperf does not reliably crash
> the graphics controller. However, "x11perf -repeat 3 -subs 25 -time 2
> -rect10" always does a reliable job to trigger the crash.
> 
>>> but the crashes also occur several times per day during normal work
>>> such as browsing the Internet or writing a text document. If you wish
>>> me to provide additional diagnostic information such as running test
>>> programs while the graphic boards are unresponsive, I certainly can do
>>> that.
>>
>> Does it also happen with a kernel built from a current drm-fixes tree?
>> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes
> No. Apparently, you need full preemption to expose the problem.
> 
> The following list contains the results whether the command "x11perf
> -repeat 3 -subs 25 -time 2 -rect10" freezes the Radeon board under test
> (Radeon HD 7970 XFS / R9 280X) or not:
> linux-3.12.33-rt47   no
> linux-3.14.34-rt32   no
> linux-3.14.34-drm-3.16.7-rt32*   no
> linux-3.18.7-rt1YES
> linux-3.18.9-rt4YES
> linux-3.18.9-rt5YES
> linux-3.18.9-drm-3.16.7-rt5**no
> linux-4.0.0-rc4  no
> linux-drm-fixes  no
> *DRM subsystem backported from linux-3.16.7 to linux-3.14.34-rt32.
> **DRM subsystem ported from linux-3.16.7 to linux-3.18.9-rt5. 

Can you test a non-rt 3.18.y kernel? There were some intermittent issues
around 3.18 fixed by the patches I referenced above. Maybe I missed some
other fixes, though. Maarten, do you remember any other fixes offhand
that might help?


> More observations:
> If full function tracing is enabled (which makes the system about five
> times slower), the graphics controller no longer freezes. With partial
> function tracing such as "echo *drm* >set_ftrace_filter", the
> controller still freezes. The trace then contains vblank interrupt
> processing only, ioctls are no longer executed.
> 
> This is the location where the driver hangs:
> [25104.509258] INFO: task Xorg.bin:16591 blocked for more than 120 seconds.
> [25104.516322]   Not tainted 3.18.9-rt5 #2
> [25104.520715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [25104.528853] Xorg.binD 8171ed90 0 16591  16239
> 0x10400080
> [25104.536102]  8800ba0bb8d8 0002 8800ba0bbfd8
> 0006
> [25104.536103]  dc08 880626d0dc08 8800ba0bbfd8
> dc08
> [25104.536104]  88061b2cdcd0 880616d3a940 880035c1
> 880616d3a940
> [25104.559274] Call Trace:
> [25104.561844]  [] schedule+0x34/0xa0
> [25104.561846]  [] schedule_timeout+0x23c/0x2a0
> [25104.561870]  [] ? radeon_fence_process+0x16/0x40
> [radeon]
> [25104.561879]  [] ?
> radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [25104.561887]  []
> radeon_fence_wait_seq_timeout.constprop.8+0x327/0x380 [radeon]
> [25104.561889]  [] ? 

[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-23 Thread Carsten Emde
Hi Michel,

 [..]
 The most striking problem of kernel 3.18.9-rt4 affects all systems that
 are equipped with Radeon graphics (irrespective whether PCIe cards or
 APUs with on-chip graphics). They suffer from a hanging radeon driver.
 The block occurs when accelerated graphics load is created by x11perf or
 gltestperf. Sometimes only the graphics are frozen while ssh login still
 is possible, somtimes the entire box is no longer accessible at all. In
 any case, a reboot is needed to recover from this situation.

 Here is a selection of kernel messages:
>>> [...]
>>> The commits from
>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=f957063fee6392bb9365370db6db74dc0b2dce0a
>>>
>>> to
>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=cffefd9bb31cd35ab745d3b49005d10616d25bdc
>>>
>>> and
>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=b6610101718d4ab90d793c482625e98eb1262cad
>>>
>>> might help for this.
>>
>> Thanks a lot. I have applied these patches to a number of systems:
>> # quilt applied | tail -7
>> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch
>> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch
>>
>>
>>   The graphic boards still crash and freeze the screen, but in contrast
>> to the earlier situation the systems remain accessible, and the X
>> Window server can be restarted after the offensive programs are
>> removed. The crashes were reliably triggered by
>> - gltestperf
>>or
>> - x11perf -repeat 3 -subs 25 -time 2 -rect10
This is not entirely correct, since gltestperf does not reliably crash
the graphics controller. However, "x11perf -repeat 3 -subs 25 -time 2
-rect10" always does a reliable job to trigger the crash.

>> but the crashes also occur several times per day during normal work
>> such as browsing the Internet or writing a text document. If you wish
>> me to provide additional diagnostic information such as running test
>> programs while the graphic boards are unresponsive, I certainly can do
>> that.
>
> Does it also happen with a kernel built from a current drm-fixes tree?
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes
No. Apparently, you need full preemption to expose the problem.

The following list contains the results whether the command "x11perf
-repeat 3 -subs 25 -time 2 -rect10" freezes the Radeon board under test
(Radeon HD 7970 XFS / R9 280X) or not:
linux-3.12.33-rt47   no
linux-3.14.34-rt32   no
linux-3.14.34-drm-3.16.7-rt32*   no
linux-3.18.7-rt1YES
linux-3.18.9-rt4YES
linux-3.18.9-rt5YES
linux-3.18.9-drm-3.16.7-rt5**no
linux-4.0.0-rc4  no
linux-drm-fixes  no
*DRM subsystem backported from linux-3.16.7 to linux-3.14.34-rt32.
**DRM subsystem ported from linux-3.16.7 to linux-3.18.9-rt5.

More observations:
If full function tracing is enabled (which makes the system about five
times slower), the graphics controller no longer freezes. With partial
function tracing such as "echo *drm* >set_ftrace_filter", the
controller still freezes. The trace then contains vblank interrupt
processing only, ioctls are no longer executed.

This is the location where the driver hangs:
[25104.509258] INFO: task Xorg.bin:16591 blocked for more than 120 seconds.
[25104.516322]   Not tainted 3.18.9-rt5 #2
[25104.520715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[25104.528853] Xorg.binD 8171ed90 0 16591  16239 
0x10400080
[25104.536102]  8800ba0bb8d8 0002 8800ba0bbfd8 
0006
[25104.536103]  dc08 880626d0dc08 8800ba0bbfd8 
dc08
[25104.536104]  88061b2cdcd0 880616d3a940 880035c1 
880616d3a940
[25104.559274] Call Trace:
[25104.561844]  [] schedule+0x34/0xa0
[25104.561846]  [] schedule_timeout+0x23c/0x2a0
[25104.561870]  [] ? radeon_fence_process+0x16/0x40 
[radeon]
[25104.561879]  [] ? 
radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[25104.561887]  [] 
radeon_fence_wait_seq_timeout.constprop.8+0x327/0x380 [radeon]
[25104.561889]  [] ? __wake_up_sync+0x20/0x20
[25104.561898]  [] radeon_fence_wait_any+0x57/0x70 
[radeon]
[25104.561914]  [] radeon_sa_bo_new+0x2af/0x4b0 [radeon]
[25104.561916]  [] ? debug_smp_processor_id+0x17/0x20
[25104.561918]  [] ? __kmalloc+0x8a/0x300
[25104.561932]  [] radeon_ib_get+0x37/0xe0 [radeon]
[25104.561943]  [] radeon_cs_ioctl+0x22e/0x860 [radeon]
[25104.561952]  [] drm_ioctl+0x197/0x670 [drm]
[25104.561954]  [] ? debug_smp_processor_id+0x17/0x20
[25104.561956]  [] ? 

[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-17 Thread Michel Dänzer
On 16.03.2015 23:52, Carsten Emde wrote:
> Hi Michel,
> 
>>> [..]
>>> The most striking problem of kernel 3.18.9-rt4 affects all systems that
>>> are equipped with Radeon graphics (irrespective whether PCIe cards or
>>> APUs with on-chip graphics). They suffer from a hanging radeon driver.
>>> The block occurs when accelerated graphics load is created by x11perf or
>>> gltestperf. Sometimes only the graphics are frozen while ssh login still
>>> is possible, somtimes the entire box is no longer accessible at all. In
>>> any case, a reboot is needed to recover from this situation.
>>>
>>> Here is a selection of kernel messages:
>> [...]
>> The commits from
>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=f957063fee6392bb9365370db6db74dc0b2dce0a
>>
>> to
>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=cffefd9bb31cd35ab745d3b49005d10616d25bdc
>>
>> and
>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=b6610101718d4ab90d793c482625e98eb1262cad
>>
>> might help for this.
> 
> Thanks a lot. I have applied these patches to a number of systems:
> # quilt applied | tail -7
> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch
> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch
> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch
> 
> 
>  The graphic boards still crash and freeze the screen, but in contrast
> to the earlier situation the systems remain accessible, and the X
> Window server can be restarted after the offensive programs are
> removed. The crashes were reliably triggered by
> - gltestperf
>   or
> - x11perf -repeat 3 -subs 25 -time 2 -rect10
> but the crashes also occur several times per day during normal work
> such as browsing the Internet or writing a text document. If you wish
> me to provide additional diagnostic information such as running test
> programs while the graphic boards are unresponsive, I certainly can do
> that.

Does it also happen with a kernel built from a current drm-fixes tree?
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes

I might have missed other needed fixes.


> Rack #0/Slot #3 [AMD/ATI] RV730 XT [Radeon HD 4670]:
> 
> [21001.244036] INFO: task kworker/u24:6:267 blocked for more than 120 seconds.
> [21001.257773]   Not tainted 3.18.9-rt4 #27
> [21001.266284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [21001.281911] kworker/u24:6   D 88081ed8b340 0   267  2 
> 0x1000
> [21001.281937] Workqueue: radeon-crtc radeon_flip_work_func [radeon]
> [21001.281940]  880805d2fbe8 0046 88081ed0c700 
> 
> [21001.281941]  9000 c920 8808112fb420 
> 880035254e30
> [21001.281943]  c280 0100c280 0003 
> 880035254e30
> [21001.281945] Call Trace:
> [21001.281950]  [] schedule+0x34/0xa0
> [21001.281953]  [] schedule_timeout+0x22c/0x2d0
> [21001.281962]  [] ? radeon_fence_process+0x16/0x40 [radeon]
> [21001.281971]  [] ? 
> radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [21001.281979]  [] 
> radeon_fence_wait_seq_timeout.constprop.8+0x2e7/0x340 [radeon]
> [21001.281982]  [] ? __wake_up_sync+0x20/0x20
> [21001.281991]  [] radeon_fence_wait+0x86/0xc0 [radeon]
> [21001.282000]  [] radeon_flip_work_func+0x15c/0x190 
> [radeon]
> [21001.282003]  [] process_one_work+0x154/0x450
> [21001.282004]  [] worker_thread+0x6b/0x4d0
> [21001.282006]  [] ? rescuer_thread+0x290/0x290
> [21001.282007]  [] ? rescuer_thread+0x290/0x290
> [21001.282009]  [] kthread+0xcd/0xf0
> [21001.282010]  [] ? kthread_worker_fn+0x1d0/0x1d0
> [21001.282013]  [] ret_from_fork+0x7c/0xb0
> [21001.282014]  [] ? kthread_worker_fn+0x1d0/0x1d0
> 
> 
> Rack #0/Slot #7 [AMD/ATI] Cayman XT [Radeon HD 6970]
> 
> [  481.091132] INFO: task Xorg:3459 blocked for more than 120 seconds.
> [  481.103594]   Not tainted 3.18.9-rt4 #28
> [  481.112101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  481.127746] XorgD 88041e68ab40 0  3459   3452 
> 0x1044
> [  481.141882]  880413da38e8 0002 88041e60c460 
> 8800c3ea3380
> [  481.141882]  880413da38d8 8108603f c5a8 
> c5c8
> [  481.141883]  81c19460 8800c3ea3380 000c 
> 8800c3ea3380
> [  481.186228] Call Trace:
> [  481.191114]  [] ? queue_delayed_work_on+0xff/0x110
> [  481.191118]  [] schedule+0x34/0xa0
> [  481.191119]  [] schedule_timeout+0x204/0x270
> [  481.191148]  [] ? radeon_fence_process+0x16/0x40 [radeon]
> [  481.191157]  [] ? 
> radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
> [  481.191165]  [] 
> 

[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-16 Thread Carsten Emde
Hi Michel,

>> [..]
>> The most striking problem of kernel 3.18.9-rt4 affects all systems that
>> are equipped with Radeon graphics (irrespective whether PCIe cards or
>> APUs with on-chip graphics). They suffer from a hanging radeon driver.
>> The block occurs when accelerated graphics load is created by x11perf or
>> gltestperf. Sometimes only the graphics are frozen while ssh login still
>> is possible, somtimes the entire box is no longer accessible at all. In
>> any case, a reboot is needed to recover from this situation.
>>
>> Here is a selection of kernel messages:
> [...]
> The commits from
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=f957063fee6392bb9365370db6db74dc0b2dce0a
> to
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=cffefd9bb31cd35ab745d3b49005d10616d25bdc
> and
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=b6610101718d4ab90d793c482625e98eb1262cad
> might help for this.

Thanks a lot. I have applied these patches to a number of systems:
# quilt applied | tail -7
patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch
patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch
patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch
patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch
patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch
patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch
patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch

  The graphic boards still crash and freeze the screen, but in contrast
to the earlier situation the systems remain accessible, and the X
Window server can be restarted after the offensive programs are
removed. The crashes were reliably triggered by
- gltestperf
   or
- x11perf -repeat 3 -subs 25 -time 2 -rect10
but the crashes also occur several times per day during normal work
such as browsing the Internet or writing a text document. If you wish
me to provide additional diagnostic information such as running test
programs while the graphic boards are unresponsive, I certainly can do
that.

Below are the related kernel messages.

Thanks,
-Carsten.


Rack #0/Slot #3 [AMD/ATI] RV730 XT [Radeon HD 4670]:

[21001.244036] INFO: task kworker/u24:6:267 blocked for more than 120 
seconds.
[21001.257773]   Not tainted 3.18.9-rt4 #27
[21001.266284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[21001.281911] kworker/u24:6   D 88081ed8b340 0   267  2 
0x1000
[21001.281937] Workqueue: radeon-crtc radeon_flip_work_func [radeon]
[21001.281940]  880805d2fbe8 0046 88081ed0c700 

[21001.281941]  9000 c920 8808112fb420 
880035254e30
[21001.281943]  c280 0100c280 0003 
880035254e30
[21001.281945] Call Trace:
[21001.281950]  [] schedule+0x34/0xa0
[21001.281953]  [] schedule_timeout+0x22c/0x2d0
[21001.281962]  [] ? radeon_fence_process+0x16/0x40 
[radeon]
[21001.281971]  [] ? 
radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[21001.281979]  [] 
radeon_fence_wait_seq_timeout.constprop.8+0x2e7/0x340 [radeon]
[21001.281982]  [] ? __wake_up_sync+0x20/0x20
[21001.281991]  [] radeon_fence_wait+0x86/0xc0 [radeon]
[21001.282000]  [] radeon_flip_work_func+0x15c/0x190 
[radeon]
[21001.282003]  [] process_one_work+0x154/0x450
[21001.282004]  [] worker_thread+0x6b/0x4d0
[21001.282006]  [] ? rescuer_thread+0x290/0x290
[21001.282007]  [] ? rescuer_thread+0x290/0x290
[21001.282009]  [] kthread+0xcd/0xf0
[21001.282010]  [] ? kthread_worker_fn+0x1d0/0x1d0
[21001.282013]  [] ret_from_fork+0x7c/0xb0
[21001.282014]  [] ? kthread_worker_fn+0x1d0/0x1d0


Rack #0/Slot #7 [AMD/ATI] Cayman XT [Radeon HD 6970]

[  481.091132] INFO: task Xorg:3459 blocked for more than 120 seconds.
[  481.103594]   Not tainted 3.18.9-rt4 #28
[  481.112101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  481.127746] XorgD 88041e68ab40 0  3459   3452 
0x1044
[  481.141882]  880413da38e8 0002 88041e60c460 
8800c3ea3380
[  481.141882]  880413da38d8 8108603f c5a8 
c5c8
[  481.141883]  81c19460 8800c3ea3380 000c 
8800c3ea3380
[  481.186228] Call Trace:
[  481.191114]  [] ? queue_delayed_work_on+0xff/0x110
[  481.191118]  [] schedule+0x34/0xa0
[  481.191119]  [] schedule_timeout+0x204/0x270
[  481.191148]  [] ? radeon_fence_process+0x16/0x40 
[radeon]
[  481.191157]  [] ? 
radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[  481.191165]  [] 
radeon_fence_wait_seq_timeout.constprop.7+0x227/0x330 [radeon]
[  481.191167]  [] ? prepare_to_wait_event+0x110/0x110
[  481.191175]  [] radeon_fence_wait_any+0x57/0x70 
[radeon]
[  481.191191]  [] radeon_sa_bo_new+0x2cf/0x4e0 [radeon]
[  481.191194]  [] ? debug_smp_processor_id+0x17/0x20
[  481.191207]  [] radeon_ib_get+0x37/0xf0 [radeon]
[  481.191218]  [] 

[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-13 Thread Sebastian Andrzej Siewior
On 03/13/2015 03:23 AM, Michel Dänzer wrote:
> The commits from
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=f957063fee6392bb9365370db6db74dc0b2dce0a
> to
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=cffefd9bb31cd35ab745d3b49005d10616d25bdc
> and
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=b6610101718d4ab90d793c482625e98eb1262cad
> might help for this.
Thanks.

I can't reproduce this myself but I pulled in the commits you mentioned
and "drm/radeon: only enable kv/kb dpm interrupts once v3" to avoid a
reject. The box runs, glxgears and so on seem to do something, can't
look at the screen :) All of those commits (and a ton more) are marked
stable so I will probably get them anyway…

Sebastian


[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-13 Thread Michel Dänzer
On 13.03.2015 08:23, Carsten Emde wrote:
> (About 30 OSADL QA Farm systems are now running 3.18.9-rt4. BTW: To
> check out what kernels are under test you may sort the kernel list
> (https://www.osadl.org/?id=933) by kernel version
> (https://www.osadl.org/?id=1001) and scroll down the page.)
> 
> The most striking problem of kernel 3.18.9-rt4 affects all systems that
> are equipped with Radeon graphics (irrespective whether PCIe cards or
> APUs with on-chip graphics). They suffer from a hanging radeon driver.
> The block occurs when accelerated graphics load is created by x11perf or
> gltestperf. Sometimes only the graphics are frozen while ssh login still
> is possible, somtimes the entire box is no longer accessible at all. In
> any case, a reboot is needed to recover from this situation.
> 
> Here is a selection of kernel messages:

[...]

The commits from
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=f957063fee6392bb9365370db6db74dc0b2dce0a
to
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=cffefd9bb31cd35ab745d3b49005d10616d25bdc
and
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes=b6610101718d4ab90d793c482625e98eb1262cad
might help for this.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


[OSADL QA 3.18.9-rt4 #1] Radeon driver hangs

2015-03-13 Thread Carsten Emde
(About 30 OSADL QA Farm systems are now running 3.18.9-rt4. BTW: To 
check out what kernels are under test you may sort the kernel list 
(https://www.osadl.org/?id=933) by kernel version 
(https://www.osadl.org/?id=1001) and scroll down the page.)

The most striking problem of kernel 3.18.9-rt4 affects all systems that 
are equipped with Radeon graphics (irrespective whether PCIe cards or 
APUs with on-chip graphics). They suffer from a hanging radeon driver. 
The block occurs when accelerated graphics load is created by x11perf or 
gltestperf. Sometimes only the graphics are frozen while ssh login still 
is possible, somtimes the entire box is no longer accessible at all. In 
any case, a reboot is needed to recover from this situation.

Here is a selection of kernel messages:

Rack #0/Slot #3 [AMD/ATI] RV730 XT [Radeon HD 4670]:
[16081.272035] INFO: task kworker/u24:4:268 blocked for more than 120 
seconds.
[16081.285776]   Not tainted 3.18.9-rt4 #26
[16081.294286] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[16081.309901] kworker/u24:4   D 88081ed8b340 0   268  2 
0x1000
[16081.309938] Workqueue: radeon-crtc radeon_flip_work_func [radeon]
[16081.309960]  880805ccfbe8 0046 88081ed0c700 

[16081.309962]  9000 c920 8808112fb420 
880805cc1a10
[16081.309963]  880805ccfbf8 01008108a0da 880805ccfc98 
880805cc1a10
[16081.309966] Call Trace:
[16081.309972]  [] schedule+0x34/0xa0
[16081.309974]  [] schedule_timeout+0x22c/0x2d0
[16081.309984]  [] ? radeon_fence_process+0x16/0x40 
[radeon]
[16081.309993]  [] ? 
radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[16081.310001]  [] 
radeon_fence_wait_seq_timeout.constprop.8+0x2e7/0x340 [radeon]
[16081.310004]  [] ? __wake_up_sync+0x20/0x20
[16081.310013]  [] radeon_fence_wait+0x86/0xc0 [radeon]
[16081.310023]  [] radeon_flip_work_func+0x15c/0x190 
[radeon]
[16081.310025]  [] process_one_work+0x154/0x450
[16081.310026]  [] worker_thread+0x6b/0x4d0
[16081.310028]  [] ? rescuer_thread+0x290/0x290
[16081.310029]  [] kthread+0xcd/0xf0
[16081.310031]  [] ? kthread_worker_fn+0x1d0/0x1d0
[16081.310034]  [] ret_from_fork+0x7c/0xb0
[16081.310035]  [] ? kthread_worker_fn+0x1d0/0x1d0


Rack #0/Slot #7 [AMD/ATI] Cayman XT [Radeon HD 6970]:
INFO: task Xorg:10038 blocked for more than 120 seconds.
  Not tainted 3.18.9-rt4 #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
XorgD 816b7f88 0 10038  10032 0x1044
8800c5ad78e8 0002 88041e80c460 c5c8
88041e80c5c8 0002 c5a8 c5c8
880417728000 88041401 000c 88041401
Call Trace:
[] schedule+0x34/0xa0
[] schedule_timeout+0x204/0x270
[] ? radeon_fence_process+0x16/0x40 [radeon]
[] ? radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[] 
radeon_fence_wait_seq_timeout.constprop.7+0x227/0x330 [radeon]
[] ? prepare_to_wait_event+0x110/0x110
[] radeon_fence_wait_any+0x57/0x70 [radeon]
[] radeon_sa_bo_new+0x2cf/0x4e0 [radeon]
[] ? debug_smp_processor_id+0x17/0x20
[] radeon_ib_get+0x37/0xf0 [radeon]
[] radeon_cs_ioctl+0x22d/0x820 [radeon]
[] drm_ioctl+0x1a4/0x630 [drm]
[] ? debug_smp_processor_id+0x17/0x20
[] ? unpin_current_cpu+0x1a/0x70
[] ? migrate_enable+0xb0/0x1b0
[] radeon_drm_ioctl+0x4b/0x80 [radeon]
[] do_vfs_ioctl+0x2e0/0x4d0
[] ? __fget+0x72/0xa0
[] SyS_ioctl+0x81/0xa0
[] tracesys_phase2+0xd4/0xd9


Rack #4/Slot #1 Chipset: "KAVERI" (ChipID = 0x130c)
[  600.266245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  600.281856] XorgD 0002 0  3821   3812 
0x00400080
[  600.281865]  880223ddf908 0082 c1c0 
c328
[  600.281867]  88023720c328 0002 c308 
c328
[  600.281869]  81c1b480 880036cfcb60 000c 
880036cfcb60
[  600.281873] Call Trace:
[  600.281882]  [] schedule+0x34/0xa0
[  600.281885]  [] schedule_timeout+0x204/0x270
[  600.281929]  [] ? radeon_fence_process+0x16/0x40 
[radeon]
[  600.281949]  [] ? 
radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[  600.281968]  [] 
radeon_fence_wait_seq_timeout.constprop.7+0x227/0x330 [radeon]
[  600.281972]  [] ? prepare_to_wait_event+0x110/0x110
[  600.281992]  [] radeon_fence_wait_any+0x57/0x70 
[radeon]
[  600.282023]  [] radeon_sa_bo_new+0x2cf/0x4e0 [radeon]
[  600.282027]  [] ? dequeue_task_fair+0x43e/0x650
[  600.282055]  [] radeon_ib_get+0x37/0xf0 [radeon]
[  600.282078]  [] radeon_cs_ioctl+0x22d/0x820 [radeon]
[  600.282098]  [] drm_ioctl+0x1a4/0x630 [drm]
[  600.282104]  [] ? do_futex+0x109/0xb20
[  600.282106]  [] ? put_prev_entity+0x96/0x3f0
[  600.282122]  [] radeon_drm_ioctl+0xe/0x10 [radeon]
[  600.282125]  [] do_vfs_ioctl+0x2e0/0x4d0
[  600.282128]  [] ? __fget+0x72/0xa0
[  600.282131]  [] SyS_ioctl+0x81/0xa0
[  600.282134]  [] ?