Re: [PATCH] drm/amdgpu: correctly report gpu recover status

2020-01-01 Thread Christian König

Hi Evan,


But still what I care more(which is also the easiest way to me) is the correct 
return value of the API.

Well exactly that's the point ther return value is not correct for the API.

For example when the GPU reset function would return -EFAULT your 
program which reads the debugfs file would crash with a segmentation 
fault. That is not correct behavior.


In other words the result of the GPU reset can't be used as result of 
the debugfs read.


Regards,
Christian.

Am 19.12.19 um 02:48 schrieb Quan, Evan:

Hi Christian,

Here is some background for this change:
I'm debugging a random failure issue on baco reset.
I used a while loop to run the continuous baco reset tests and hope it can exit 
immediately on failure occurred.
However, due to wrong return value, it did not. And as you can image, the 
failure scene was ruined.

I can add this "seq_printf(m, "gpu recover %d\n", r);".
But still what I care more(which is also the easiest way to me) is the correct 
return value of the API.

Regards,
Evan

-Original Message-
From: Christian König 
Sent: Wednesday, December 18, 2019 5:57 PM
To: Quan, Evan ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: correctly report gpu recover status

Am 18.12.19 um 04:25 schrieb Evan Quan:

Knowing whether gpu recovery was performed successfully or not is
important for our BACO development.

Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
Signed-off-by: Evan Quan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
   1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index e9efee04ca23..5dff5c0dd882 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct

seq_file *m, void *data)

struct amdgpu_device *adev = dev->dev_private;

seq_printf(m, "gpu recover\n");
-   amdgpu_device_gpu_recover(adev, NULL);
-
-   return 0;
+   return amdgpu_device_gpu_recover(adev, NULL);

NAK, what we could do here is the following:

r = amdgpu_device_gpu_recover();
seq_printf(m, "gpu recover %d\n", r);

But returning the error code from the GPU recovery to userspace doesn't make
to much sense.

Christian.


   }

   static const struct drm_info_list amdgpu_debugfs_fence_list[] = {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: correctly report gpu recover status

2019-12-18 Thread Quan, Evan
Hi Christian,

Here is some background for this change:
I'm debugging a random failure issue on baco reset.
I used a while loop to run the continuous baco reset tests and hope it can exit 
immediately on failure occurred.
However, due to wrong return value, it did not. And as you can image, the 
failure scene was ruined.

I can add this "seq_printf(m, "gpu recover %d\n", r);".
But still what I care more(which is also the easiest way to me) is the correct 
return value of the API.

Regards,
Evan
> -Original Message-
> From: Christian König 
> Sent: Wednesday, December 18, 2019 5:57 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: correctly report gpu recover status
> 
> Am 18.12.19 um 04:25 schrieb Evan Quan:
> > Knowing whether gpu recovery was performed successfully or not is
> > important for our BACO development.
> >
> > Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
> > Signed-off-by: Evan Quan 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
> >   1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index e9efee04ca23..5dff5c0dd882 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct
> seq_file *m, void *data)
> > struct amdgpu_device *adev = dev->dev_private;
> >
> > seq_printf(m, "gpu recover\n");
> > -   amdgpu_device_gpu_recover(adev, NULL);
> > -
> > -   return 0;
> > +   return amdgpu_device_gpu_recover(adev, NULL);
> 
> NAK, what we could do here is the following:
> 
> r = amdgpu_device_gpu_recover();
> seq_printf(m, "gpu recover %d\n", r);
> 
> But returning the error code from the GPU recovery to userspace doesn't make
> to much sense.
> 
> Christian.
> 
> >   }
> >
> >   static const struct drm_info_list amdgpu_debugfs_fence_list[] = {

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: correctly report gpu recover status

2019-12-18 Thread Christian König

Am 18.12.19 um 04:25 schrieb Evan Quan:

Knowing whether gpu recovery was performed successfully or not
is important for our BACO development.

Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
Signed-off-by: Evan Quan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index e9efee04ca23..5dff5c0dd882 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct seq_file *m, 
void *data)
struct amdgpu_device *adev = dev->dev_private;
  
  	seq_printf(m, "gpu recover\n");

-   amdgpu_device_gpu_recover(adev, NULL);
-
-   return 0;
+   return amdgpu_device_gpu_recover(adev, NULL);


NAK, what we could do here is the following:

r = amdgpu_device_gpu_recover();
seq_printf(m, "gpu recover %d\n", r);

But returning the error code from the GPU recovery to userspace doesn't 
make to much sense.


Christian.


  }
  
  static const struct drm_info_list amdgpu_debugfs_fence_list[] = {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: correctly report gpu recover status

2019-12-17 Thread Evan Quan
Knowing whether gpu recovery was performed successfully or not
is important for our BACO development.

Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index e9efee04ca23..5dff5c0dd882 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct seq_file *m, 
void *data)
struct amdgpu_device *adev = dev->dev_private;
 
seq_printf(m, "gpu recover\n");
-   amdgpu_device_gpu_recover(adev, NULL);
-
-   return 0;
+   return amdgpu_device_gpu_recover(adev, NULL);
 }
 
 static const struct drm_info_list amdgpu_debugfs_fence_list[] = {
-- 
2.24.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx