RE: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)

2019-06-04 Thread S, Shirish
Thanks Christian.
Have sent the patch for uvd & vcn.
(https://patchwork.freedesktop.org/patch/308575/)


Regards,
Shirish S

-Original Message-
From: Christian König  
Sent: Tuesday, June 4, 2019 4:38 PM
To: S, Shirish ; Deucher, Alexander 
; Koenig, Christian ; 
jerry.zh...@amd.com; Deng, Emily ; Liu, Leo 

Cc: amd-gfx@lists.freedesktop.org; Li, Ching-shih (Louis) 

Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0 (V2)

Am 04.06.19 um 10:36 schrieb S, Shirish:
> From: Louis Li 
>
> [What]
> vce ring test fails consistently during resume in s3 cycle, due to 
> mismatch read & write pointers.
> On debug/analysis its found that rptr to be compared is not being 
> correctly updated/read, which leads to this failure.
> Below is the failure signature:
>   [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
>   [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
>  failed -110
>   [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
> (-110).
>
> [How]
> fetch rptr appropriately, meaning move its read location further down 
> in the code flow.
> With this patch applied the s3 failure is no more seen for >5k s3 
> cycles, which otherwise is pretty consistent.
>
> V2: remove reduntant fetch of rptr
>
> Signed-off-by: Louis Li 

Reviewed-by: Christian König 
CC: stable...

Who does the same patch for UVD and VCN? Exactly the same thing is wrong there 
as well.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index c021b11..f7189e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -1072,7 +1072,7 @@ void amdgpu_vce_ring_emit_fence(struct amdgpu_ring 
> *ring, u64 addr, u64 seq,
>   int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
>   {
>   struct amdgpu_device *adev = ring->adev;
> - uint32_t rptr = amdgpu_ring_get_rptr(ring);
> + uint32_t rptr;
>   unsigned i;
>   int r, timeout = adev->usec_timeout;
>   
> @@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
>   if (r)
>   return r;
>   
> + rptr = amdgpu_ring_get_rptr(ring);
> +
>   amdgpu_ring_write(ring, VCE_CMD_END);
>   amdgpu_ring_commit(ring);
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)

2019-06-04 Thread Christian König

Am 04.06.19 um 10:36 schrieb S, Shirish:

From: Louis Li 

[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
 failed -110
[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
(-110).

[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.

V2: remove reduntant fetch of rptr

Signed-off-by: Louis Li 


Reviewed-by: Christian König 
CC: stable...

Who does the same patch for UVD and VCN? Exactly the same thing is wrong 
there as well.


Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index c021b11..f7189e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -1072,7 +1072,7 @@ void amdgpu_vce_ring_emit_fence(struct amdgpu_ring *ring, 
u64 addr, u64 seq,
  int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
  {
struct amdgpu_device *adev = ring->adev;
-   uint32_t rptr = amdgpu_ring_get_rptr(ring);
+   uint32_t rptr;
unsigned i;
int r, timeout = adev->usec_timeout;
  
@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

if (r)
return r;
  
+	rptr = amdgpu_ring_get_rptr(ring);

+
amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)

2019-06-04 Thread S, Shirish
From: Louis Li 

[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
 failed -110
[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
(-110).

[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.

V2: remove reduntant fetch of rptr

Signed-off-by: Louis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index c021b11..f7189e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -1072,7 +1072,7 @@ void amdgpu_vce_ring_emit_fence(struct amdgpu_ring *ring, 
u64 addr, u64 seq,
 int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
 {
struct amdgpu_device *adev = ring->adev;
-   uint32_t rptr = amdgpu_ring_get_rptr(ring);
+   uint32_t rptr;
unsigned i;
int r, timeout = adev->usec_timeout;
 
@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
if (r)
return r;
 
+   rptr = amdgpu_ring_get_rptr(ring);
+
amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-29 Thread Koenig, Christian
Hi Louis,

please don't create tickets for this, that is just overkill.

Those are obvious and simple to fix bugs and should go upstream immediately.

Regards,
Christian.

Am 29.05.19 um 03:14 schrieb Li, Ching-shih (Louis):
Hi Christian,

Your explanation matches my code study and test results. Well, I will remove 
original read. Shirish and I will re-test it. I will submit it after test done.
As for other related code, yes, I assumed there might be similar issues as 
well. My plan is to create internal ticket and write test code to check if any 
problem can be hit. Then we can do fix. I have the current patch focus on the 
current Chrome issue.

Thanks for your review and information.

BR,
Louis

From: Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>
Sent: Tuesday, May 28, 2019 3:24 PM
To: Li, Ching-shih (Louis) 
<mailto:ching-shih...@amd.com>; Liu, Leo 
<mailto:leo@amd.com>; S, Shirish 
<mailto:shiris...@amd.com>; Grodzovsky, Andrey 
<mailto:andrey.grodzov...@amd.com>; Zhang, Jerry 
<mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0

[CAUTION: External Email]
Wow, really good catch!

The underlying problem is most likely that VCE block is either power or clock 
gated and because of this the readptr read always returns zero.

Now amdgpu_ring_alloc() informs the power management code that the block is 
about to be used and so the gating is turned off.

Mhm, that is probably wrong at a hole bunch of other places, at least the UVD 
and VCN code comes to mind.

I agree with Leo that you should remove the original read (so to not read 
twice) and it would be realy nice if you could double check the other code 
(UVD/VCN) for similar problems as well.

Regards,
Christian.

Am 27.05.19 um 19:20 schrieb Li, Ching-shih (Louis):
I don’t mean to read it twice. The solution is to make first read later. I 
didn’t modify the original code to make code difference less and simple. I 
guess it should work to remove the original read there.


From: Liu, Leo <mailto:leo@amd.com>
Sent: Tuesday, May 28, 2019 12:40 AM
To: Li, Ching-shih (Louis) 
<mailto:ching-shih...@amd.com>; S, Shirish 
<mailto:shiris...@amd.com>; Grodzovsky, Andrey 
<mailto:andrey.grodzov...@amd.com>; Zhang, Jerry 
<mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0


int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
{
struct amdgpu_device *adev = ring->adev;

uint32_t rptr = amdgpu_ring_get_rptr(ring);

unsigned i;
int r, timeout = adev->usec_timeout;

/* skip ring test for sriov*/
if (amdgpu_sriov_vf(adev))
return 0;

r = amdgpu_ring_alloc(ring, 16);
if (r)
return r;

amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);



Above is original code, rptr is updated when called, and below is your patch, 
my question is why do you need to get rptr twice?

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

if (r)

   return r;



+   rptr = amdgpu_ring_get_rptr(ring);

+

amdgpu_ring_write(ring, VCE_CMD_END);

amdgpu_ring_commit(ring);




On 5/27/19 12:22 PM, Li, Ching-shih (Louis) wrote:
Hi Leo,

Yes, I confirm it is the root cause for the Chrome S3 issue. Whenever system is 
resumed, the original instruction always gets zero. However, I have no idea why 
it fails, and didn’t verify this problem on CRB or any other Linux platform yet.
Although I think the ideal solution is an indicator, e.g. a register, for 
driver to check if related firmware and hardware are ready to work. So driver 
can make sure it is ok to read rptr. Without any reference document, I can only 
try to solve the problem by modifying driver. Debug traces reveal that only 
first rptr read fails, but the read in check loop is ok. Therefore, a solution 
comes to mind: to update rptr later for initial rptr value. Tests prove it 
working in Chrome platforms. Fyi~

BR,
Louis

From: Liu, Leo <mailto:leo@amd.com>
Sent: Monday, May 27, 2019 9:01 PM
To: S, Shirish <mailto:shiris...@amd.com>; Grodzovsky, 
Andrey <mailto:andrey.grodzov...@amd.com>; Zhang, 
Jerry <mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Li, 
Ching-shih (Louis) <mailto:ching-shih...@amd.com>
Subject: Re: [PATC

RE: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-28 Thread Li, Ching-shih (Louis)
Hi Christian,

Your explanation matches my code study and test results. Well, I will remove 
original read. Shirish and I will re-test it. I will submit it after test done.
As for other related code, yes, I assumed there might be similar issues as 
well. My plan is to create internal ticket and write test code to check if any 
problem can be hit. Then we can do fix. I have the current patch focus on the 
current Chrome issue.

Thanks for your review and information.

BR,
Louis

From: Christian König 
Sent: Tuesday, May 28, 2019 3:24 PM
To: Li, Ching-shih (Louis) ; Liu, Leo ; 
S, Shirish ; Grodzovsky, Andrey ; 
Zhang, Jerry ; Deng, Emily ; Deucher, 
Alexander 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0

[CAUTION: External Email]
Wow, really good catch!

The underlying problem is most likely that VCE block is either power or clock 
gated and because of this the readptr read always returns zero.

Now amdgpu_ring_alloc() informs the power management code that the block is 
about to be used and so the gating is turned off.

Mhm, that is probably wrong at a hole bunch of other places, at least the UVD 
and VCN code comes to mind.

I agree with Leo that you should remove the original read (so to not read 
twice) and it would be realy nice if you could double check the other code 
(UVD/VCN) for similar problems as well.

Regards,
Christian.

Am 27.05.19 um 19:20 schrieb Li, Ching-shih (Louis):
I don’t mean to read it twice. The solution is to make first read later. I 
didn’t modify the original code to make code difference less and simple. I 
guess it should work to remove the original read there.


From: Liu, Leo <mailto:leo@amd.com>
Sent: Tuesday, May 28, 2019 12:40 AM
To: Li, Ching-shih (Louis) 
<mailto:ching-shih...@amd.com>; S, Shirish 
<mailto:shiris...@amd.com>; Grodzovsky, Andrey 
<mailto:andrey.grodzov...@amd.com>; Zhang, Jerry 
<mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0


int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
{
struct amdgpu_device *adev = ring->adev;

uint32_t rptr = amdgpu_ring_get_rptr(ring);

unsigned i;
int r, timeout = adev->usec_timeout;

/* skip ring test for sriov*/
if (amdgpu_sriov_vf(adev))
return 0;

r = amdgpu_ring_alloc(ring, 16);
if (r)
return r;

amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);



Above is original code, rptr is updated when called, and below is your patch, 
my question is why do you need to get rptr twice?

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

if (r)

   return r;



+   rptr = amdgpu_ring_get_rptr(ring);

+

amdgpu_ring_write(ring, VCE_CMD_END);

amdgpu_ring_commit(ring);




On 5/27/19 12:22 PM, Li, Ching-shih (Louis) wrote:
Hi Leo,

Yes, I confirm it is the root cause for the Chrome S3 issue. Whenever system is 
resumed, the original instruction always gets zero. However, I have no idea why 
it fails, and didn’t verify this problem on CRB or any other Linux platform yet.
Although I think the ideal solution is an indicator, e.g. a register, for 
driver to check if related firmware and hardware are ready to work. So driver 
can make sure it is ok to read rptr. Without any reference document, I can only 
try to solve the problem by modifying driver. Debug traces reveal that only 
first rptr read fails, but the read in check loop is ok. Therefore, a solution 
comes to mind: to update rptr later for initial rptr value. Tests prove it 
working in Chrome platforms. Fyi~

BR,
Louis

From: Liu, Leo <mailto:leo@amd.com>
Sent: Monday, May 27, 2019 9:01 PM
To: S, Shirish <mailto:shiris...@amd.com>; Grodzovsky, 
Andrey <mailto:andrey.grodzov...@amd.com>; Zhang, 
Jerry <mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Li, 
Ching-shih (Louis) <mailto:ching-shih...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0



On 5/27/19 3:42 AM, S, Shirish wrote:

From: Louis Li <mailto:ching-shih...@amd.com>



[What]

vce ring test fails consistently during resume in s3 cycle, due to

mismatch read & write pointers.

On debug/analysis its found that rptr to be compared is not being

correctly updated/read, which leads to this failure.

Below is the failure signature:

  [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed

  [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block

Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-28 Thread Christian König

Wow, really good catch!

The underlying problem is most likely that VCE block is either power or 
clock gated and because of this the readptr read always returns zero.


Now amdgpu_ring_alloc() informs the power management code that the block 
is about to be used and so the gating is turned off.


Mhm, that is probably wrong at a hole bunch of other places, at least 
the UVD and VCN code comes to mind.


I agree with Leo that you should remove the original read (so to not 
read twice) and it would be realy nice if you could double check the 
other code (UVD/VCN) for similar problems as well.


Regards,
Christian.

Am 27.05.19 um 19:20 schrieb Li, Ching-shih (Louis):


I don’t mean to read it twice. The solution is to make first read 
later. I didn’t modify the original code to make code difference less 
and simple. I guess it should work to remove the original read there.


*From:*Liu, Leo 
*Sent:* Tuesday, May 28, 2019 12:40 AM
*To:* Li, Ching-shih (Louis) ; S, Shirish 
; Grodzovsky, Andrey ; 
Zhang, Jerry ; Deng, Emily ; 
Deucher, Alexander 

*Cc:* amd-gfx@lists.freedesktop.org
*Subject:* Re: [PATCH] drm/amdgpu: fix ring test failure issue during 
s3 in vce 3.0


int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
{
    struct amdgpu_device *adev = ring->adev;

    uint32_t rptr = amdgpu_ring_get_rptr(ring);

    unsigned i;
    int r, timeout = adev->usec_timeout;

    /* skip ring test for sriov*/
    if (amdgpu_sriov_vf(adev))
        return 0;

    r = amdgpu_ring_alloc(ring, 16);
    if (r)
        return r;

    amdgpu_ring_write(ring, VCE_CMD_END);
    amdgpu_ring_commit(ring);

Above is original code, rptr is updated when called, and below is your 
patch, my question is why do you need to get rptr twice?


@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
     if (r)
    return r;
  
+   rptr = amdgpu_ring_get_rptr(ring);

+
     amdgpu_ring_write(ring, VCE_CMD_END);
     amdgpu_ring_commit(ring);
  


On 5/27/19 12:22 PM, Li, Ching-shih (Louis) wrote:

Hi Leo,

Yes, I confirm it is the root cause *for the Chrome S3 issue*.
Whenever system is resumed, the original instruction always gets
zero. However, I have no idea why it fails, and didn’t verify this
problem on CRB or any other Linux platform yet.

Although I think the ideal solution is an indicator, e.g. a
register, for driver to check if related firmware and hardware are
ready to work. So driver can make sure it is ok to read rptr.
Without any reference document, I can only try to solve the
problem by modifying driver. Debug traces reveal that only first
rptr read fails, but the read in check loop is ok. Therefore, a
solution comes to mind: to update rptr later for initial rptr
value. Tests prove it working in Chrome platforms. Fyi~

BR,

Louis

*From:*Liu, Leo  <mailto:leo@amd.com>
*Sent:* Monday, May 27, 2019 9:01 PM
*To:* S, Shirish  <mailto:shiris...@amd.com>;
Grodzovsky, Andrey 
<mailto:andrey.grodzov...@amd.com>; Zhang, Jerry
 <mailto:jerry.zh...@amd.com>; Deng, Emily
 <mailto:emily.d...@amd.com>; Deucher,
Alexander 
<mailto:alexander.deuc...@amd.com>
*Cc:* amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>; Li, Ching-shih (Louis)
 <mailto:ching-shih...@amd.com>
*Subject:* Re: [PATCH] drm/amdgpu: fix ring test failure issue
during s3 in vce 3.0

On 5/27/19 3:42 AM, S, Shirish wrote:

From: Louis Li  <mailto:ching-shih...@amd.com>

  


[What]

vce ring test fails consistently during resume in s3 cycle, due to

mismatch read & write pointers.

On debug/analysis its found that rptr to be compared is not being

correctly updated/read, which leads to this failure.

Below is the failure signature:

   [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed

   [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
 failed -110

   [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
(-110).

  


[How]

fetch rptr appropriately, meaning move its read location further down

in the code flow.

With this patch applied the s3 failure is no more seen for >5k s3 
cycles,

which otherwise is pretty consistent.

  


Signed-off-by: Louis Li  
<mailto:ching-shih...@amd.com>

---

  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++

  1 file changed, 2 insertions(+)

  


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

index c021b11..92f9d46 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

@@ -1084,6 +1084,8 @@ i

RE: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-27 Thread Li, Ching-shih (Louis)
I don’t mean to read it twice. The solution is to make first read later. I 
didn’t modify the original code to make code difference less and simple. I 
guess it should work to remove the original read there.


From: Liu, Leo 
Sent: Tuesday, May 28, 2019 12:40 AM
To: Li, Ching-shih (Louis) ; S, Shirish 
; Grodzovsky, Andrey ; Zhang, 
Jerry ; Deng, Emily ; Deucher, 
Alexander 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0


int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
{
struct amdgpu_device *adev = ring->adev;

uint32_t rptr = amdgpu_ring_get_rptr(ring);

unsigned i;
int r, timeout = adev->usec_timeout;

/* skip ring test for sriov*/
if (amdgpu_sriov_vf(adev))
return 0;

r = amdgpu_ring_alloc(ring, 16);
if (r)
return r;

amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);



Above is original code, rptr is updated when called, and below is your patch, 
my question is why do you need to get rptr twice?

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

if (r)

   return r;



+   rptr = amdgpu_ring_get_rptr(ring);

+

amdgpu_ring_write(ring, VCE_CMD_END);

amdgpu_ring_commit(ring);




On 5/27/19 12:22 PM, Li, Ching-shih (Louis) wrote:
Hi Leo,

Yes, I confirm it is the root cause for the Chrome S3 issue. Whenever system is 
resumed, the original instruction always gets zero. However, I have no idea why 
it fails, and didn’t verify this problem on CRB or any other Linux platform yet.
Although I think the ideal solution is an indicator, e.g. a register, for 
driver to check if related firmware and hardware are ready to work. So driver 
can make sure it is ok to read rptr. Without any reference document, I can only 
try to solve the problem by modifying driver. Debug traces reveal that only 
first rptr read fails, but the read in check loop is ok. Therefore, a solution 
comes to mind: to update rptr later for initial rptr value. Tests prove it 
working in Chrome platforms. Fyi~

BR,
Louis

From: Liu, Leo <mailto:leo@amd.com>
Sent: Monday, May 27, 2019 9:01 PM
To: S, Shirish <mailto:shiris...@amd.com>; Grodzovsky, 
Andrey <mailto:andrey.grodzov...@amd.com>; Zhang, 
Jerry <mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Li, 
Ching-shih (Louis) <mailto:ching-shih...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0



On 5/27/19 3:42 AM, S, Shirish wrote:

From: Louis Li <mailto:ching-shih...@amd.com>



[What]

vce ring test fails consistently during resume in s3 cycle, due to

mismatch read & write pointers.

On debug/analysis its found that rptr to be compared is not being

correctly updated/read, which leads to this failure.

Below is the failure signature:

  [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed

  [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block  
failed -110

  [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).



[How]

fetch rptr appropriately, meaning move its read location further down

in the code flow.

With this patch applied the s3 failure is no more seen for >5k s3 cycles,

which otherwise is pretty consistent.



Signed-off-by: Louis Li <mailto:ching-shih...@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++

 1 file changed, 2 insertions(+)



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

index c021b11..92f9d46 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

  if (r)

 return r;



+ rptr = amdgpu_ring_get_rptr(ring);

+

The rptr update is there:

uint32_t rptr = amdgpu_ring_get_rptr(ring);



Are you sure this is the root cause?



Regards,

Leo







  amdgpu_ring_write(ring, VCE_CMD_END);

  amdgpu_ring_commit(ring);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-27 Thread Liu, Leo
int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
{
struct amdgpu_device *adev = ring->adev;

uint32_t rptr = amdgpu_ring_get_rptr(ring);

unsigned i;
int r, timeout = adev->usec_timeout;

/* skip ring test for sriov*/
if (amdgpu_sriov_vf(adev))
return 0;

r = amdgpu_ring_alloc(ring, 16);
if (r)
return r;

amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);


Above is original code, rptr is updated when called, and below is your patch, 
my question is why do you need to get rptr twice?

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
if (r)
return r;

+   rptr = amdgpu_ring_get_rptr(ring);
+
amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);



On 5/27/19 12:22 PM, Li, Ching-shih (Louis) wrote:
Hi Leo,

Yes, I confirm it is the root cause for the Chrome S3 issue. Whenever system is 
resumed, the original instruction always gets zero. However, I have no idea why 
it fails, and didn’t verify this problem on CRB or any other Linux platform yet.
Although I think the ideal solution is an indicator, e.g. a register, for 
driver to check if related firmware and hardware are ready to work. So driver 
can make sure it is ok to read rptr. Without any reference document, I can only 
try to solve the problem by modifying driver. Debug traces reveal that only 
first rptr read fails, but the read in check loop is ok. Therefore, a solution 
comes to mind: to update rptr later for initial rptr value. Tests prove it 
working in Chrome platforms. Fyi~

BR,
Louis

From: Liu, Leo <mailto:leo@amd.com>
Sent: Monday, May 27, 2019 9:01 PM
To: S, Shirish <mailto:shiris...@amd.com>; Grodzovsky, 
Andrey <mailto:andrey.grodzov...@amd.com>; Zhang, 
Jerry <mailto:jerry.zh...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Li, 
Ching-shih (Louis) <mailto:ching-shih...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0



On 5/27/19 3:42 AM, S, Shirish wrote:

From: Louis Li <mailto:ching-shih...@amd.com>



[What]

vce ring test fails consistently during resume in s3 cycle, due to

mismatch read & write pointers.

On debug/analysis its found that rptr to be compared is not being

correctly updated/read, which leads to this failure.

Below is the failure signature:

  [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed

  [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block  
failed -110

  [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).



[How]

fetch rptr appropriately, meaning move its read location further down

in the code flow.

With this patch applied the s3 failure is no more seen for >5k s3 cycles,

which otherwise is pretty consistent.



Signed-off-by: Louis Li <mailto:ching-shih...@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++

 1 file changed, 2 insertions(+)



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

index c021b11..92f9d46 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

  if (r)

 return r;



+ rptr = amdgpu_ring_get_rptr(ring);

+

The rptr update is there:

uint32_t rptr = amdgpu_ring_get_rptr(ring);



Are you sure this is the root cause?



Regards,

Leo







  amdgpu_ring_write(ring, VCE_CMD_END);

  amdgpu_ring_commit(ring);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-27 Thread Li, Ching-shih (Louis)
Hi Leo,

Yes, I confirm it is the root cause for the Chrome S3 issue. Whenever system is 
resumed, the original instruction always gets zero. However, I have no idea why 
it fails, and didn’t verify this problem on CRB or any other Linux platform yet.
Although I think the ideal solution is an indicator, e.g. a register, for 
driver to check if related firmware and hardware are ready to work. So driver 
can make sure it is ok to read rptr. Without any reference document, I can only 
try to solve the problem by modifying driver. Debug traces reveal that only 
first rptr read fails, but the read in check loop is ok. Therefore, a solution 
comes to mind: to update rptr later for initial rptr value. Tests prove it 
working in Chrome platforms. Fyi~

BR,
Louis

From: Liu, Leo 
Sent: Monday, May 27, 2019 9:01 PM
To: S, Shirish ; Grodzovsky, Andrey 
; Zhang, Jerry ; Deng, Emily 
; Deucher, Alexander 
Cc: amd-gfx@lists.freedesktop.org; Li, Ching-shih (Louis) 

Subject: Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 
3.0



On 5/27/19 3:42 AM, S, Shirish wrote:

From: Louis Li <mailto:ching-shih...@amd.com>



[What]

vce ring test fails consistently during resume in s3 cycle, due to

mismatch read & write pointers.

On debug/analysis its found that rptr to be compared is not being

correctly updated/read, which leads to this failure.

Below is the failure signature:

  [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed

  [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block  
failed -110

  [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).



[How]

fetch rptr appropriately, meaning move its read location further down

in the code flow.

With this patch applied the s3 failure is no more seen for >5k s3 cycles,

which otherwise is pretty consistent.



Signed-off-by: Louis Li <mailto:ching-shih...@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++

 1 file changed, 2 insertions(+)



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

index c021b11..92f9d46 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

  if (r)

 return r;



+ rptr = amdgpu_ring_get_rptr(ring);

+

The rptr update is there:

uint32_t rptr = amdgpu_ring_get_rptr(ring);



Are you sure this is the root cause?



Regards,

Leo







  amdgpu_ring_write(ring, VCE_CMD_END);

  amdgpu_ring_commit(ring);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-27 Thread Liu, Leo

On 5/27/19 3:42 AM, S, Shirish wrote:

From: Louis Li 

[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
 failed -110
[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
(-110).

[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.

Signed-off-by: Louis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index c021b11..92f9d46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
if (r)
return r;

+   rptr = amdgpu_ring_get_rptr(ring);
+

The rptr update is there:

uint32_t rptr = amdgpu_ring_get_rptr(ring);

Are you sure this is the root cause?

Regards,
Leo





amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

2019-05-27 Thread S, Shirish
From: Louis Li 

[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
 failed -110
[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
(-110).

[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.

Signed-off-by: Louis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index c021b11..92f9d46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
if (r)
return r;
 
+   rptr = amdgpu_ring_get_rptr(ring);
+
amdgpu_ring_write(ring, VCE_CMD_END);
amdgpu_ring_commit(ring);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx