Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2019-09-01 Thread Luís Mendes
Good News!! I've re-tested this with Linux Kernel 5.3.0-rc6 with Ubuntu
19.4 and Mesa 19.0.2 and Mesa 19.0.8 with the same Polaris RX460 and the
same TYAN S7002 and TYAN S7025 and now it is working properly, being able
to enter the desktop and running glmark2 as well as OpenCL.




On Sat, Oct 20, 2018 at 6:58 PM Luís Mendes  wrote:

> The problems remains with Linux 4.18 and Linux 4.19 kernels. I am unable
> to use AMD RX 460 and AMD RX 550 on my x64 Linux platforms.
>
> I've installed Windows 10 in the same machine along with
> win10-64bit-radeon-software-adrenalin-edition-18.10.1-oct18.exe and under
> Windows the same RX 460 card *works fine* and I am able to run OpenCL
> applications.
>
> The driver is hanging since kernel 4.15, I am getting:
> [   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
> *ERROR* [CRTC:42:crtc-0] flip_done timed out
> [   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
> out
> [   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
> out
> [   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
> out
> [   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
> amdgpu_dm_commit_planes: acrtc 0, already busy
>
> And after commit: drm/amdgpu: defer test IBs on the rings at boot (V3)
>
> 2c773de2ecb8c327f2448bd1eecad224e9227087
> 
> I get with kernels 4.18 and 4.19 as well as Ubuntu 18.10 stock kernel
> (can't even install Ubuntu 18.10 because it hangs with amdgpu):
> [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
> [drm_amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on
> GFX ring (-110).
> [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring
> test failed (-110).
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last
> signalled seq=0, last emitted seq=1
> and the kernel blocks indefinitely:
> task plymouthd:449 blocked for more than 120 seconds.
>
> Is there any hope on getting this fixed?
>
>
> On Thu, Jul 12, 2018 at 2:56 PM Luís Mendes 
> wrote:
>
>> Hi Jim,
>>
>> Replies in between.
>>
>> Regards,
>> Luís
>>
>> On Thu, Jul 12, 2018 at 3:16 AM, jimqu  wrote:
>>
>>>
>>>
>>> On 2018年07月12日 05:27, Luís Mendes wrote:
>>>
>>> Hi Jim,
>>>
>>> I followed your suggestion and was able to bisect the kernel patches.
>>> The offending patch is: drm/amdgpu: defer test IBs on the rings at boot
>>> (V3)
>>> commit:
>>>
>>> 2c773de2ecb8c327f2448bd1eecad224e9227087
>>> 
>>>
>>> After reverting this patch the IB test succeeded with kernel v4.18-rc4
>>> on both systems and the amdgpu driver was correctly loaded both on SAPPHIRE
>>> RX550 4GB and on SAPPHIRE RX460 2GB.
>>>
>>>
>>> Alex, Christian, What do you think about the patch?
>>>
>>> The GPU hang remains, however.
>>>  I will try to configure a remote IPMI connection to see what is
>>> happening with the kernel boot or setup a serial console for the Kernel.
>>>
>>>
>>> *You can set up remote connection by ssh, and also you can add amdgpu to
>>> blacklist first, and manually modprobe amdgpu.*
>>>
>> R: I was able to setup a remote serial console with console=ttyS0,11520n8
>> kernel parameter.
>> Boot log follows attached as file kernel_bisected_v4.18-rc4_log.txt.
>> First noticeable issue seems to be:
>> [6.131989] amdgpu: [powerplay]
>> [6.131989]  last message was failed ret is 65535
>> ...
>> and later hangs with:
>> [   33.504100] [drm:drm_atomic_helper_wait_for_flip_done
>> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
>> [   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
>> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
>> out
>> [   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
>> [drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
>> out
>> [   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
>> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
>> out
>> [   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
>> amdgpu_dm_commit_planes: acrtc 0, already
>> busy
>>
>>
>>
>>> *What's about xinit? What is MESA driver version on your platform?*
>>>
>> R: I am running Ubuntu 18.04 with bisected kernel 4.18-rc4 using
>> libdrm-2.4.92 and mesa-18.1.0.
>> xinit output follows attached as xinit_log.txt
>>
>>
>>>
>>> Thanks & Regards,
>>> Luís
>>>
>>> On Wed, Jul 11, 2018 at 10:56 AM, jimqu  wrote:
>>>
 HI Luis,


 Let us trace the issue one by one.


 IB test fail:

 This should be regression issue on 4.18, you can bisect the kernel
 

Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-10-20 Thread Luís Mendes
The problems remains with Linux 4.18 and Linux 4.19 kernels. I am unable to
use AMD RX 460 and AMD RX 550 on my x64 Linux platforms.

I've installed Windows 10 in the same machine along with
win10-64bit-radeon-software-adrenalin-edition-18.10.1-oct18.exe and under
Windows the same RX 460 card *works fine* and I am able to run OpenCL
applications.

The driver is hanging since kernel 4.15, I am getting:
[   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:42:crtc-0] flip_done timed out
[   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
out
[   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
out
[   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
out
[   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
amdgpu_dm_commit_planes: acrtc 0, already busy

And after commit: drm/amdgpu: defer test IBs on the rings at boot (V3)

2c773de2ecb8c327f2448bd1eecad224e9227087

I get with kernels 4.18 and 4.19 as well as Ubuntu 18.10 stock kernel
(can't even install Ubuntu 18.10 because it hangs with amdgpu):
[drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[drm_amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on
GFX ring (-110).
[drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test
failed (-110).
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last
signalled seq=0, last emitted seq=1
and the kernel blocks indefinitely:
task plymouthd:449 blocked for more than 120 seconds.

Is there any hope on getting this fixed?


On Thu, Jul 12, 2018 at 2:56 PM Luís Mendes  wrote:

> Hi Jim,
>
> Replies in between.
>
> Regards,
> Luís
>
> On Thu, Jul 12, 2018 at 3:16 AM, jimqu  wrote:
>
>>
>>
>> On 2018年07月12日 05:27, Luís Mendes wrote:
>>
>> Hi Jim,
>>
>> I followed your suggestion and was able to bisect the kernel patches.
>> The offending patch is: drm/amdgpu: defer test IBs on the rings at boot
>> (V3)
>> commit:
>>
>> 2c773de2ecb8c327f2448bd1eecad224e9227087
>> 
>>
>> After reverting this patch the IB test succeeded with kernel v4.18-rc4 on
>> both systems and the amdgpu driver was correctly loaded both on SAPPHIRE
>> RX550 4GB and on SAPPHIRE RX460 2GB.
>>
>>
>> Alex, Christian, What do you think about the patch?
>>
>> The GPU hang remains, however.
>>  I will try to configure a remote IPMI connection to see what is
>> happening with the kernel boot or setup a serial console for the Kernel.
>>
>>
>> *You can set up remote connection by ssh, and also you can add amdgpu to
>> blacklist first, and manually modprobe amdgpu.*
>>
> R: I was able to setup a remote serial console with console=ttyS0,11520n8
> kernel parameter.
> Boot log follows attached as file kernel_bisected_v4.18-rc4_log.txt.
> First noticeable issue seems to be:
> [6.131989] amdgpu: [powerplay]
> [6.131989]  last message was failed ret is 65535
> ...
> and later hangs with:
> [   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
> *ERROR* [CRTC:42:crtc-0] flip_done timed out
> [   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
> out
> [   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
> out
> [   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
> out
> [   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
> amdgpu_dm_commit_planes: acrtc 0, already
> busy
>
>
>
>> *What's about xinit? What is MESA driver version on your platform?*
>>
> R: I am running Ubuntu 18.04 with bisected kernel 4.18-rc4 using
> libdrm-2.4.92 and mesa-18.1.0.
> xinit output follows attached as xinit_log.txt
>
>
>>
>> Thanks & Regards,
>> Luís
>>
>> On Wed, Jul 11, 2018 at 10:56 AM, jimqu  wrote:
>>
>>> HI Luis,
>>>
>>>
>>> Let us trace the issue one by one.
>>>
>>>
>>> IB test fail:
>>>
>>> This should be regression issue on 4.18, you can bisect the kernel
>>> patches.
>>>
>>> GPU hang:
>>>
>>> Fix IB test fail first.
>>>
>>>
>>> Thanks
>>>
>>> JimQu
>>>
>>>
>>>
>>> On 2018年07月11日 17:34, Luís Mendes wrote:
>>>
>>> Hi Jim,
>>>
>>> Thanks for your interest in this issue. Actually this is a multiple
>>> issue... not only the IB ring test is failing... as I am having quite some
>>> trouble getting the cards SAPPHIRE RX 550 4GB on a Tyan S7025 and SAPPHIRE
>>> RX 460 2GB on a TYAN S7002 to work, both systems using same Ubuntu 18.04
>>> with vanilla kernel.
>>>
>>> 

Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-07-12 Thread Luís Mendes
Hi Jim,

Replies in between.

Regards,
Luís

On Thu, Jul 12, 2018 at 3:16 AM, jimqu  wrote:

>
>
> On 2018年07月12日 05:27, Luís Mendes wrote:
>
> Hi Jim,
>
> I followed your suggestion and was able to bisect the kernel patches.
> The offending patch is: drm/amdgpu: defer test IBs on the rings at boot
> (V3)
> commit:
>
> 2c773de2ecb8c327f2448bd1eecad224e9227087
> 
>
> After reverting this patch the IB test succeeded with kernel v4.18-rc4 on
> both systems and the amdgpu driver was correctly loaded both on SAPPHIRE
> RX550 4GB and on SAPPHIRE RX460 2GB.
>
>
> Alex, Christian, What do you think about the patch?
>
> The GPU hang remains, however.
>  I will try to configure a remote IPMI connection to see what is happening
> with the kernel boot or setup a serial console for the Kernel.
>
>
> *You can set up remote connection by ssh, and also you can add amdgpu to
> blacklist first, and manually modprobe amdgpu.*
>
R: I was able to setup a remote serial console with console=ttyS0,11520n8
kernel parameter.
Boot log follows attached as file kernel_bisected_v4.18-rc4_log.txt.
First noticeable issue seems to be:
[6.131989] amdgpu: [powerplay]
[6.131989]  last message was failed ret is 65535
...
and later hangs with:
[   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:42:crtc-0] flip_done timed out
[   43.744094] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed
out
[   53.984089] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed
out
[   64.224036] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed
out
[   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
amdgpu_dm_commit_planes: acrtc 0, already
busy



> *What's about xinit? What is MESA driver version on your platform?*
>
R: I am running Ubuntu 18.04 with bisected kernel 4.18-rc4 using
libdrm-2.4.92 and mesa-18.1.0.
xinit output follows attached as xinit_log.txt


>
> Thanks & Regards,
> Luís
>
> On Wed, Jul 11, 2018 at 10:56 AM, jimqu  wrote:
>
>> HI Luis,
>>
>>
>> Let us trace the issue one by one.
>>
>>
>> IB test fail:
>>
>> This should be regression issue on 4.18, you can bisect the kernel
>> patches.
>>
>> GPU hang:
>>
>> Fix IB test fail first.
>>
>>
>> Thanks
>>
>> JimQu
>>
>>
>>
>> On 2018年07月11日 17:34, Luís Mendes wrote:
>>
>> Hi Jim,
>>
>> Thanks for your interest in this issue. Actually this is a multiple
>> issue... not only the IB ring test is failing... as I am having quite some
>> trouble getting the cards SAPPHIRE RX 550 4GB on a Tyan S7025 and SAPPHIRE
>> RX 460 2GB on a TYAN S7002 to work, both systems using same Ubuntu 18.04
>> with vanilla kernel.
>>
>> *1. May you also test earlier kernel? v4.17 or v4.16.*
>> I've tested kernels v4.17.5 and v4.16.6 with same system and both are
>> able to pass the IB ring test and system boots into X using NVIDIA as the
>> display connected card.
>> dmesg log attached for kernel 4.17.5, file TYAN_S7025_kernelv4.17.5_amdgp
>> u_IB_ring_test_OK.txt.
>>
>> *2. May you test the issue only with amdgpu?*
>> - I've tested on a TYAN S7002 system with a single SAPPHIRE RX 460 2GB,
>> on-board VGA enabled and used as primary display.
>> Kernel v4.18-rc4 fails the IB ring test, system is able to enter X
>> through the on-board VGA.
>> dmesg log attached for kernel 4.18-rc4, file
>> TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.
>>
>> - Same TYAN S7002 system, but now with on-board VGA disabled and using RX
>> 460 as display connected card.
>> Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test, but GPU
>> hangs before entering X. Don't have logs for these yet.
>>
>> Regards,
>> Luís Mendes
>> Aparapi contributor and MSc Researcher
>>
>>
>>
>>
>>
>> On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim  wrote:
>>
>>> Hi Luis,
>>>
>>> 1. May you also test earlier kernel? v4.17 or v4.16.
>>> 2. May you test the issue only with amdgpu?
>>>
>>> Thanks
>>> JimQu
>>>
>>> 
>>> 发件人: amd-gfx  代表 Luís Mendes <
>>> luis.p.men...@gmail.com>
>>> 发送时间: 2018年7月11日 6:04:00
>>> 收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
>>> 主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on
>>> power-up
>>>
>>> Hi,
>>>
>>> Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.
>>>
>>> Logs follow attached.
>>>
>>> Regards,
>>> Luis
>>>
>>> On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes >> > wrote:
>>> Hi,
>>>
>>> I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050 Ti and an
>>> AMD RX 550 4GB and the RX 550 card is failing the IB ring test.
>>>
>>> [5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: ib
>>> test failed (scratch(0xC040)=0x)
>>> [5.033264] 

Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-07-12 Thread Luís Mendes
Hi Christian,

Sure, how can I help to fix that?

Regards,
Luís

On Thu, Jul 12, 2018 at 8:13 AM, Christian König 
wrote:

> Hi Luis,
>
> well was "drm/amdgpu: defer test IBs on the rings at boot (V3)" does is
> delaying the IB test a bit and running it async to the rest of the bootup.
>
> So what most likely happens is that some hardware feature (like power or
> clock gating) which doesn't works correctly on your system kicks in and
> lets the IB test fail.
>
> It's rather likely that this problem is also responsible for the crashes
> you expect later on. So I think we should concentrate on fixing that.
>
> Regards,
> Christian.
>
>
> Am 11.07.2018 um 23:27 schrieb Luís Mendes:
>
> Hi Jim,
>
> I followed your suggestion and was able to bisect the kernel patches.
> The offending patch is: drm/amdgpu: defer test IBs on the rings at boot
> (V3)
> commit:
>
> 2c773de2ecb8c327f2448bd1eecad224e9227087
> 
>
> After reverting this patch the IB test succeeded with kernel v4.18-rc4 on
> both systems and the amdgpu driver was correctly loaded both on SAPPHIRE
> RX550 4GB and on SAPPHIRE RX460 2GB.
>
> The GPU hang remains, however.
>  I will try to configure a remote IPMI connection to see what is happening
> with the kernel boot or setup a serial console for the Kernel.
>
> Thanks & Regards,
> Luís
>
> On Wed, Jul 11, 2018 at 10:56 AM, jimqu  wrote:
>
>> HI Luis,
>>
>>
>> Let us trace the issue one by one.
>>
>>
>> IB test fail:
>>
>> This should be regression issue on 4.18, you can bisect the kernel
>> patches.
>>
>> GPU hang:
>>
>> Fix IB test fail first.
>>
>>
>> Thanks
>>
>> JimQu
>>
>>
>>
>> On 2018年07月11日 17:34, Luís Mendes wrote:
>>
>> Hi Jim,
>>
>> Thanks for your interest in this issue. Actually this is a multiple
>> issue... not only the IB ring test is failing... as I am having quite some
>> trouble getting the cards SAPPHIRE RX 550 4GB on a Tyan S7025 and SAPPHIRE
>> RX 460 2GB on a TYAN S7002 to work, both systems using same Ubuntu 18.04
>> with vanilla kernel.
>>
>> *1. May you also test earlier kernel? v4.17 or v4.16.*
>> I've tested kernels v4.17.5 and v4.16.6 with same system and both are
>> able to pass the IB ring test and system boots into X using NVIDIA as the
>> display connected card.
>> dmesg log attached for kernel 4.17.5, file TYAN_S7025_kernelv4.17.5_amdgp
>> u_IB_ring_test_OK.txt.
>>
>> *2. May you test the issue only with amdgpu?*
>> - I've tested on a TYAN S7002 system with a single SAPPHIRE RX 460 2GB,
>> on-board VGA enabled and used as primary display.
>> Kernel v4.18-rc4 fails the IB ring test, system is able to enter X
>> through the on-board VGA.
>> dmesg log attached for kernel 4.18-rc4, file
>> TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.
>>
>> - Same TYAN S7002 system, but now with on-board VGA disabled and using RX
>> 460 as display connected card.
>> Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test, but GPU
>> hangs before entering X. Don't have logs for these yet.
>>
>> Regards,
>> Luís Mendes
>> Aparapi contributor and MSc Researcher
>>
>>
>>
>>
>>
>> On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim  wrote:
>>
>>> Hi Luis,
>>>
>>> 1. May you also test earlier kernel? v4.17 or v4.16.
>>> 2. May you test the issue only with amdgpu?
>>>
>>> Thanks
>>> JimQu
>>>
>>> 
>>> 发件人: amd-gfx  代表 Luís Mendes <
>>> luis.p.men...@gmail.com>
>>> 发送时间: 2018年7月11日 6:04:00
>>> 收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
>>> 主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on
>>> power-up
>>>
>>> Hi,
>>>
>>> Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.
>>>
>>> Logs follow attached.
>>>
>>> Regards,
>>> Luis
>>>
>>> On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes >> > wrote:
>>> Hi,
>>>
>>> I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050 Ti and an
>>> AMD RX 550 4GB and the RX 550 card is failing the IB ring test.
>>>
>>> [5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: ib
>>> test failed (scratch(0xC040)=0x)
>>> [5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu:
>>> failed testing IB on ring 6 (-22).
>>>
>>> Please see the attached log.
>>>
>>> Regards,
>>> Luís
>>>
>>>
>>
>>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-07-12 Thread Christian König

Hi Luis,

well was "drm/amdgpu: defer test IBs on the rings at boot (V3)" does is 
delaying the IB test a bit and running it async to the rest of the bootup.


So what most likely happens is that some hardware feature (like power or 
clock gating) which doesn't works correctly on your system kicks in and 
lets the IB test fail.


It's rather likely that this problem is also responsible for the crashes 
you expect later on. So I think we should concentrate on fixing that.


Regards,
Christian.

Am 11.07.2018 um 23:27 schrieb Luís Mendes:

Hi Jim,

I followed your suggestion and was able to bisect the kernel patches.
The offending patch is: drm/amdgpu: defer test IBs on the rings at 
boot (V3)

commit:

	2c773de2ecb8c327f2448bd1eecad224e9227087 
 




After reverting this patch the IB test succeeded with kernel v4.18-rc4 
on both systems and the amdgpu driver was correctly loaded both on 
SAPPHIRE RX550 4GB and on SAPPHIRE RX460 2GB.


The GPU hang remains, however.
 I will try to configure a remote IPMI connection to see what is 
happening with the kernel boot or setup a serial console for the Kernel.


Thanks & Regards,
Luís

On Wed, Jul 11, 2018 at 10:56 AM, jimqu > wrote:


HI Luis,


Let us trace the issue one by one.


IB test fail:

This should be regression issue on 4.18, you can bisect the kernel
patches.

GPU hang:

Fix IB test fail first.


Thanks

JimQu



On 2018年07月11日 17:34, Luís Mendes wrote:

Hi Jim,

Thanks for your interest in this issue. Actually this is a
multiple issue... not only the IB ring test is failing... as I am
having quite some trouble getting the cards SAPPHIRE RX 550 4GB
on a Tyan S7025 and SAPPHIRE RX 460 2GB on a TYAN S7002 to work,
both systems using same Ubuntu 18.04 with vanilla kernel.

*1. May you also test earlier kernel? v4.17 or v4.16.*
I've tested kernels v4.17.5 and v4.16.6 with same system and both
are able to pass the IB ring test and system boots into X using
NVIDIA as the display connected card.
dmesg log attached for kernel 4.17.5, file
TYAN_S7025_kernelv4.17.5_amdgpu_IB_ring_test_OK.txt.

*2. May you test the issue only with amdgpu?*
- I've tested on a TYAN S7002 system with a single SAPPHIRE RX
460 2GB, on-board VGA enabled and used as primary display.
Kernel v4.18-rc4 fails the IB ring test, system is able to enter
X through the on-board VGA.
dmesg log attached for kernel 4.18-rc4, file
TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.

- Same TYAN S7002 system, but now with on-board VGA disabled and
using RX 460 as display connected card.
Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test,
but GPU hangs before entering X. Don't have logs for these yet.

Regards,
Luís Mendes
Aparapi contributor and MSc Researcher





On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim mailto:jim...@amd.com>> wrote:

Hi Luis,

1. May you also test earlier kernel? v4.17 or v4.16.
2. May you test the issue only with amdgpu?

Thanks
JimQu


发件人: amd-gfx mailto:amd-gfx-boun...@lists.freedesktop.org>> 代表 Luís
Mendes mailto:luis.p.men...@gmail.com>>
发送时间: 2018年7月11日 6:04:00
收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB
ring test on power-up

Hi,

Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.

Logs follow attached.

Regards,
Luis

On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes
mailto:luis.p.men...@gmail.com>>> wrote:
Hi,

I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050
Ti and an AMD RX 550 4GB and the RX 550 card is failing the
IB ring test.

[    5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
amdgpu: ib test failed (scratch(0xC040)=0x)
[    5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
amdgpu: failed testing IB on ring 6 (-22).

Please see the attached log.

Regards,
Luís







___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-07-11 Thread jimqu



On 2018年07月12日 05:27, Luís Mendes wrote:

Hi Jim,

I followed your suggestion and was able to bisect the kernel patches.
The offending patch is: drm/amdgpu: defer test IBs on the rings at 
boot (V3)

commit:

	2c773de2ecb8c327f2448bd1eecad224e9227087 
 




After reverting this patch the IB test succeeded with kernel v4.18-rc4 
on both systems and the amdgpu driver was correctly loaded both on 
SAPPHIRE RX550 4GB and on SAPPHIRE RX460 2GB.




Alex, Christian, What do you think about the patch?


The GPU hang remains, however.
 I will try to configure a remote IPMI connection to see what is 
happening with the kernel boot or setup a serial console for the Kernel.




You can set up remote connection by ssh, and also you can add amdgpu to 
blacklist first, and manually modprobe amdgpu.

What's about xinit? What is MESA driver version on your platform?


Thanks & Regards,
Luís

On Wed, Jul 11, 2018 at 10:56 AM, jimqu > wrote:


HI Luis,


Let us trace the issue one by one.


IB test fail:

This should be regression issue on 4.18, you can bisect the kernel
patches.

GPU hang:

Fix IB test fail first.


Thanks

JimQu



On 2018年07月11日 17:34, Luís Mendes wrote:

Hi Jim,

Thanks for your interest in this issue. Actually this is a
multiple issue... not only the IB ring test is failing... as I am
having quite some trouble getting the cards SAPPHIRE RX 550 4GB
on a Tyan S7025 and SAPPHIRE RX 460 2GB on a TYAN S7002 to work,
both systems using same Ubuntu 18.04 with vanilla kernel.

*1. May you also test earlier kernel? v4.17 or v4.16.*
I've tested kernels v4.17.5 and v4.16.6 with same system and both
are able to pass the IB ring test and system boots into X using
NVIDIA as the display connected card.
dmesg log attached for kernel 4.17.5, file
TYAN_S7025_kernelv4.17.5_amdgpu_IB_ring_test_OK.txt.

*2. May you test the issue only with amdgpu?*
- I've tested on a TYAN S7002 system with a single SAPPHIRE RX
460 2GB, on-board VGA enabled and used as primary display.
Kernel v4.18-rc4 fails the IB ring test, system is able to enter
X through the on-board VGA.
dmesg log attached for kernel 4.18-rc4, file
TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.

- Same TYAN S7002 system, but now with on-board VGA disabled and
using RX 460 as display connected card.
Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test,
but GPU hangs before entering X. Don't have logs for these yet.

Regards,
Luís Mendes
Aparapi contributor and MSc Researcher





On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim mailto:jim...@amd.com>> wrote:

Hi Luis,

1. May you also test earlier kernel? v4.17 or v4.16.
2. May you test the issue only with amdgpu?

Thanks
JimQu


发件人: amd-gfx mailto:amd-gfx-boun...@lists.freedesktop.org>> 代表 Luís
Mendes mailto:luis.p.men...@gmail.com>>
发送时间: 2018年7月11日 6:04:00
收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB
ring test on power-up

Hi,

Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.

Logs follow attached.

Regards,
Luis

On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes
mailto:luis.p.men...@gmail.com>>> wrote:
Hi,

I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050
Ti and an AMD RX 550 4GB and the RX 550 card is failing the
IB ring test.

[    5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
amdgpu: ib test failed (scratch(0xC040)=0x)
[    5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
amdgpu: failed testing IB on ring 6 (-22).

Please see the attached log.

Regards,
Luís







___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-07-11 Thread Luís Mendes
Hi Jim,

I followed your suggestion and was able to bisect the kernel patches.
The offending patch is: drm/amdgpu: defer test IBs on the rings at boot (V3)
commit:

2c773de2ecb8c327f2448bd1eecad224e9227087


After reverting this patch the IB test succeeded with kernel v4.18-rc4 on
both systems and the amdgpu driver was correctly loaded both on SAPPHIRE
RX550 4GB and on SAPPHIRE RX460 2GB.

The GPU hang remains, however.
 I will try to configure a remote IPMI connection to see what is happening
with the kernel boot or setup a serial console for the Kernel.

Thanks & Regards,
Luís

On Wed, Jul 11, 2018 at 10:56 AM, jimqu  wrote:

> HI Luis,
>
>
> Let us trace the issue one by one.
>
>
> IB test fail:
>
> This should be regression issue on 4.18, you can bisect the kernel patches.
>
> GPU hang:
>
> Fix IB test fail first.
>
>
> Thanks
>
> JimQu
>
>
>
> On 2018年07月11日 17:34, Luís Mendes wrote:
>
> Hi Jim,
>
> Thanks for your interest in this issue. Actually this is a multiple
> issue... not only the IB ring test is failing... as I am having quite some
> trouble getting the cards SAPPHIRE RX 550 4GB on a Tyan S7025 and SAPPHIRE
> RX 460 2GB on a TYAN S7002 to work, both systems using same Ubuntu 18.04
> with vanilla kernel.
>
> *1. May you also test earlier kernel? v4.17 or v4.16.*
> I've tested kernels v4.17.5 and v4.16.6 with same system and both are able
> to pass the IB ring test and system boots into X using NVIDIA as the
> display connected card.
> dmesg log attached for kernel 4.17.5, file TYAN_S7025_kernelv4.17.5_
> amdgpu_IB_ring_test_OK.txt.
>
> *2. May you test the issue only with amdgpu?*
> - I've tested on a TYAN S7002 system with a single SAPPHIRE RX 460 2GB,
> on-board VGA enabled and used as primary display.
> Kernel v4.18-rc4 fails the IB ring test, system is able to enter X through
> the on-board VGA.
> dmesg log attached for kernel 4.18-rc4, file TYAN_S7002_kernel_v4.18-rc4_
> IB_ring_test_fail.txt.
>
> - Same TYAN S7002 system, but now with on-board VGA disabled and using RX
> 460 as display connected card.
> Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test, but GPU
> hangs before entering X. Don't have logs for these yet.
>
> Regards,
> Luís Mendes
> Aparapi contributor and MSc Researcher
>
>
>
>
>
> On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim  wrote:
>
>> Hi Luis,
>>
>> 1. May you also test earlier kernel? v4.17 or v4.16.
>> 2. May you test the issue only with amdgpu?
>>
>> Thanks
>> JimQu
>>
>> 
>> 发件人: amd-gfx  代表 Luís Mendes <
>> luis.p.men...@gmail.com>
>> 发送时间: 2018年7月11日 6:04:00
>> 收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
>> 主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on
>> power-up
>>
>> Hi,
>>
>> Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.
>>
>> Logs follow attached.
>>
>> Regards,
>> Luis
>>
>> On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes > > wrote:
>> Hi,
>>
>> I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050 Ti and an
>> AMD RX 550 4GB and the RX 550 card is failing the IB ring test.
>>
>> [5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: ib
>> test failed (scratch(0xC040)=0x)
>> [5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed
>> testing IB on ring 6 (-22).
>>
>> Please see the attached log.
>>
>> Regards,
>> Luís
>>
>>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 答复: Regression with kernel 4.18 - AMD RX 550 fails IB ring test on power-up

2018-07-11 Thread jimqu

HI Luis,


Let us trace the issue one by one.


IB test fail:

This should be regression issue on 4.18, you can bisect the kernel patches.

GPU hang:

Fix IB test fail first.


Thanks

JimQu



On 2018年07月11日 17:34, Luís Mendes wrote:

Hi Jim,

Thanks for your interest in this issue. Actually this is a multiple 
issue... not only the IB ring test is failing... as I am having quite 
some trouble getting the cards SAPPHIRE RX 550 4GB on a Tyan S7025 and 
SAPPHIRE RX 460 2GB on a TYAN S7002 to work, both systems using same 
Ubuntu 18.04 with vanilla kernel.


*1. May you also test earlier kernel? v4.17 or v4.16.*
I've tested kernels v4.17.5 and v4.16.6 with same system and both are 
able to pass the IB ring test and system boots into X using NVIDIA as 
the display connected card.
dmesg log attached for kernel 4.17.5, file 
TYAN_S7025_kernelv4.17.5_amdgpu_IB_ring_test_OK.txt.


*2. May you test the issue only with amdgpu?*
- I've tested on a TYAN S7002 system with a single SAPPHIRE RX 460 
2GB, on-board VGA enabled and used as primary display.
Kernel v4.18-rc4 fails the IB ring test, system is able to enter X 
through the on-board VGA.
dmesg log attached for kernel 4.18-rc4, file 
TYAN_S7002_kernel_v4.18-rc4_IB_ring_test_fail.txt.


- Same TYAN S7002 system, but now with on-board VGA disabled and using 
RX 460 as display connected card.
Kernels v4.17.5 and v4.16.6 are able to pass the IB ring test, but GPU 
hangs before entering X. Don't have logs for these yet.


Regards,
Luís Mendes
Aparapi contributor and MSc Researcher





On Wed, Jul 11, 2018 at 3:49 AM, Qu, Jim > wrote:


Hi Luis,

1. May you also test earlier kernel? v4.17 or v4.16.
2. May you test the issue only with amdgpu?

Thanks
JimQu


发件人: amd-gfx mailto:amd-gfx-boun...@lists.freedesktop.org>> 代表 Luís Mendes
mailto:luis.p.men...@gmail.com>>
发送时间: 2018年7月11日 6:04:00
收件人: Michel Dänzer; Koenig, Christian; amd-gfx list
主题: Re: Regression with kernel 4.18 - AMD RX 550 fails IB ring
test on power-up

Hi,

Issue remains in kernel 4.18-rc4 using SAPPHIRE RX 550 4GB.

Logs follow attached.

Regards,
Luis

On Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes
mailto:luis.p.men...@gmail.com>>> wrote:
Hi,

I've tried kernel 4.18-rc2 on a system with a NVIDIA GTX 1050 Ti
and an AMD RX 550 4GB and the RX 550 card is failing the IB ring test.

[    5.033217] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
amdgpu: ib test failed (scratch(0xC040)=0x)
[    5.033264] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu:
failed testing IB on ring 6 (-22).

Please see the attached log.

Regards,
Luís




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx