RE: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2023-03-20 Thread Quan, Evan
[AMD Official Use Only - General]

OK, I see. Thanks for the explanations.

BR
Evan
> -Original Message-
> From: Koenig, Christian 
> Sent: Tuesday, March 21, 2023 1:32 AM
> To: Kim, Jonathan ; Christian König
> ; Quan, Evan ;
> amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix 
> Subject: Re: [PATCH] drm/amdgpu: improve debug VRAM access
> performance using sdma
> 
> Ah, yes! GART doesn't need to be read to make a GTT allocation.
> 
> When GART becomes ready it will be filled with all the buffers which were
> allocated before it was ready.
> 
> So this is perfectly fine.
> 
> Thanks,
> Christian.
> 
> Am 20.03.23 um 18:24 schrieb Kim, Jonathan:
> > [Public]
> >
> > This was a long time ago but I think we agreed allocation was ok before
> GART was ready.
> > IIRC, there was also some mentioned related scenario where APUs needed
> to work without VRAM but allocations were required (but I don't know the
> details regarding that).
> > I vaguely remember the requirement for GART readiness for the bounce
> buffer allocation caused some problems elsewhere.
> > Are there problems observed with the bounce buffer being allocated
> without GART readiness?
> >
> > Thanks,
> >
> > Jon
> >> -Original Message-
> >> From: Christian König 
> >> Sent: Monday, March 20, 2023 1:02 PM
> >> To: Quan, Evan ; Kim, Jonathan
> >> ; amd-gfx@lists.freedesktop.org
> >> Cc: Kuehling, Felix ; Koenig, Christian
> >> 
> >> Subject: Re: [PATCH] drm/amdgpu: improve debug VRAM access
> >> performance using sdma
> >>
> >> Caution: This message originated from an External Source. Use proper
> >> caution when opening attachments, clicking links, or responding.
> >>
> >>
> >> I don't think so. Have we recently re-ordered something here?
> >>
> >> Christian.
> >>
> >> Am 20.03.23 um 08:05 schrieb Quan, Evan:
> >>> [AMD Official Use Only - General]
> >>>
> >>> I happened to find the sdma_access_bo allocation from GTT seems
> >> performing before gart is ready.
> >>> That makes the "amdgpu_gart_map" is skipped since adev->gart.ptr is
> >>> still
> >> NULL.
> >>> Is that done intentionally ?
> >>>
> >>> Evan
> >>>> -Original Message-
> >>>> From: amd-gfx  On Behalf
> Of
> >>>> Jonathan Kim
> >>>> Sent: Wednesday, January 5, 2022 3:12 AM
> >>>> To: amd-gfx@lists.freedesktop.org
> >>>> Cc: Kuehling, Felix ; Kim, Jonathan
> >>>> ; Koenig, Christian
> >> 
> >>>> Subject: [PATCH] drm/amdgpu: improve debug VRAM access
> performance
> >>>> using sdma
> >>>>
> >>>> For better performance during VRAM access for debugged processes,
> >>>> do read/write copies over SDMA.
> >>>>
> >>>> In order to fulfill post mortem debugging on a broken device,
> >>>> fallback to stable MMIO access when gpu recovery is disabled or
> >>>> when job
> >> submission
> >>>> time outs are set to max.  Failed SDMA access should automatically
> >>>> fall back to MMIO access.
> >>>>
> >>>> Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
> >>>> page-table updates and TLB flushes on access.
> >>>>
> >>>> Signed-off-by: Jonathan Kim 
> >>>> ---
> >>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
> >>>> +
> >>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
> >>>>2 files changed, 82 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>> index 367abed1d6e6..512df4c09772 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>> @@ -48,6 +48,7 @@
> >>>>#include 
> >>>>
> >>>>#include 
> >>>> +#include 
> >>>>
> >>>>#include "amdgpu.h"
> >>>>#include "amdgpu_object.h"
> >>>> @@ -1429,6 +1430,70 @@ static void
> >> amdgpu_ttm_vram_mm_access(struct
> >>>> amdgpu_device *adev

Re: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2023-03-20 Thread Christian König

Ah, yes! GART doesn't need to be read to make a GTT allocation.

When GART becomes ready it will be filled with all the buffers which 
were allocated before it was ready.


So this is perfectly fine.

Thanks,
Christian.

Am 20.03.23 um 18:24 schrieb Kim, Jonathan:

[Public]

This was a long time ago but I think we agreed allocation was ok before GART 
was ready.
IIRC, there was also some mentioned related scenario where APUs needed to work 
without VRAM but allocations were required (but I don't know the details 
regarding that).
I vaguely remember the requirement for GART readiness for the bounce buffer 
allocation caused some problems elsewhere.
Are there problems observed with the bounce buffer being allocated without GART 
readiness?

Thanks,

Jon

-Original Message-
From: Christian König 
Sent: Monday, March 20, 2023 1:02 PM
To: Quan, Evan ; Kim, Jonathan
; amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Koenig, Christian

Subject: Re: [PATCH] drm/amdgpu: improve debug VRAM access performance
using sdma

Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.


I don't think so. Have we recently re-ordered something here?

Christian.

Am 20.03.23 um 08:05 schrieb Quan, Evan:

[AMD Official Use Only - General]

I happened to find the sdma_access_bo allocation from GTT seems

performing before gart is ready.

That makes the "amdgpu_gart_map" is skipped since adev->gart.ptr is still

NULL.

Is that done intentionally ?

Evan

-Original Message-
From: amd-gfx  On Behalf Of
Jonathan Kim
Sent: Wednesday, January 5, 2022 3:12 AM
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Kim, Jonathan
; Koenig, Christian



Subject: [PATCH] drm/amdgpu: improve debug VRAM access performance
using sdma

For better performance during VRAM access for debugged processes, do
read/write copies over SDMA.

In order to fulfill post mortem debugging on a broken device, fallback to
stable MMIO access when gpu recovery is disabled or when job

submission

time outs are set to max.  Failed SDMA access should automatically fall
back to MMIO access.

Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
page-table updates and TLB flushes on access.

Signed-off-by: Jonathan Kim 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
+
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
   2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 367abed1d6e6..512df4c09772 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -48,6 +48,7 @@
   #include 

   #include 
+#include 

   #include "amdgpu.h"
   #include "amdgpu_object.h"
@@ -1429,6 +1430,70 @@ static void

amdgpu_ttm_vram_mm_access(struct

amdgpu_device *adev, loff_t pos,
  }
   }

+static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object
*bo,
+unsigned long offset, void *buf, int
len, int write)
+{
+struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
+struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+struct amdgpu_job *job;
+struct dma_fence *fence;
+uint64_t src_addr, dst_addr;
+unsigned int num_dw;
+int r, idx;
+
+if (len != PAGE_SIZE)
+return -EINVAL;
+
+if (!adev->mman.sdma_access_ptr)
+return -EACCES;
+
+r = drm_dev_enter(adev_to_drm(adev), &idx);
+if (r)
+return r;
+
+if (write)
+memcpy(adev->mman.sdma_access_ptr, buf, len);
+
+num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
+r = amdgpu_job_alloc_with_ib(adev, num_dw * 4,
AMDGPU_IB_POOL_DELAYED, &job);
+if (r)
+goto out;
+
+src_addr = write ? amdgpu_bo_gpu_offset(adev-

mman.sdma_access_bo) :

+amdgpu_bo_gpu_offset(abo);
+dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
+amdgpu_bo_gpu_offset(adev-

mman.sdma_access_bo);

+amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
PAGE_SIZE, false);
+
+amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
+WARN_ON(job->ibs[0].length_dw > num_dw);
+
+r = amdgpu_job_submit(job, &adev->mman.entity,
AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
+if (r) {
+amdgpu_job_free(job);
+goto out;
+}
+
+if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
+r = -ETIMEDOUT;
+dma_fence_put(fence);
+
+if (!(r || write))
+memcpy(buf, adev->mman.sdma_access_ptr, len);
+out:
+drm_dev_exit(idx);
+return r;
+}
+
+static inline bool amdgpu_ttm_allow_post_mortem_debug(struct
amdgpu_device *adev)
+{
+return amdgpu_gpu_recovery == 0 ||
+adev->gfx_

RE: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2023-03-20 Thread Kim, Jonathan
[Public]

This was a long time ago but I think we agreed allocation was ok before GART 
was ready.
IIRC, there was also some mentioned related scenario where APUs needed to work 
without VRAM but allocations were required (but I don't know the details 
regarding that).
I vaguely remember the requirement for GART readiness for the bounce buffer 
allocation caused some problems elsewhere.
Are there problems observed with the bounce buffer being allocated without GART 
readiness?

Thanks,

Jon
> -Original Message-
> From: Christian König 
> Sent: Monday, March 20, 2023 1:02 PM
> To: Quan, Evan ; Kim, Jonathan
> ; amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix ; Koenig, Christian
> 
> Subject: Re: [PATCH] drm/amdgpu: improve debug VRAM access performance
> using sdma
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> I don't think so. Have we recently re-ordered something here?
>
> Christian.
>
> Am 20.03.23 um 08:05 schrieb Quan, Evan:
> > [AMD Official Use Only - General]
> >
> > I happened to find the sdma_access_bo allocation from GTT seems
> performing before gart is ready.
> > That makes the "amdgpu_gart_map" is skipped since adev->gart.ptr is still
> NULL.
> > Is that done intentionally ?
> >
> > Evan
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Jonathan Kim
> >> Sent: Wednesday, January 5, 2022 3:12 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Kuehling, Felix ; Kim, Jonathan
> >> ; Koenig, Christian
> 
> >> Subject: [PATCH] drm/amdgpu: improve debug VRAM access performance
> >> using sdma
> >>
> >> For better performance during VRAM access for debugged processes, do
> >> read/write copies over SDMA.
> >>
> >> In order to fulfill post mortem debugging on a broken device, fallback to
> >> stable MMIO access when gpu recovery is disabled or when job
> submission
> >> time outs are set to max.  Failed SDMA access should automatically fall
> >> back to MMIO access.
> >>
> >> Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
> >> page-table updates and TLB flushes on access.
> >>
> >> Signed-off-by: Jonathan Kim 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
> >> +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
> >>   2 files changed, 82 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> index 367abed1d6e6..512df4c09772 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> @@ -48,6 +48,7 @@
> >>   #include 
> >>
> >>   #include 
> >> +#include 
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_object.h"
> >> @@ -1429,6 +1430,70 @@ static void
> amdgpu_ttm_vram_mm_access(struct
> >> amdgpu_device *adev, loff_t pos,
> >>  }
> >>   }
> >>
> >> +static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object
> >> *bo,
> >> +unsigned long offset, void *buf, int
> >> len, int write)
> >> +{
> >> +struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
> >> +struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
> >> +struct amdgpu_job *job;
> >> +struct dma_fence *fence;
> >> +uint64_t src_addr, dst_addr;
> >> +unsigned int num_dw;
> >> +int r, idx;
> >> +
> >> +if (len != PAGE_SIZE)
> >> +return -EINVAL;
> >> +
> >> +if (!adev->mman.sdma_access_ptr)
> >> +return -EACCES;
> >> +
> >> +r = drm_dev_enter(adev_to_drm(adev), &idx);
> >> +if (r)
> >> +return r;
> >> +
> >> +if (write)
> >> +memcpy(adev->mman.sdma_access_ptr, buf, len);
> >> +
> >> +num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
> >> +r = amdgpu_job_alloc_with_ib(adev, num_dw * 4,
> >> AMDGPU_IB_POOL_DELAYED, &job);
> >> +if (r)
> >> +goto out;
> >> +
> >> +src_addr = write ? amdgpu_bo_gpu_offset(adev-
> >>> mman.sdma_access_bo) :
> >

Re: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2023-03-20 Thread Christian König

I don't think so. Have we recently re-ordered something here?

Christian.

Am 20.03.23 um 08:05 schrieb Quan, Evan:

[AMD Official Use Only - General]

I happened to find the sdma_access_bo allocation from GTT seems performing 
before gart is ready.
That makes the "amdgpu_gart_map" is skipped since adev->gart.ptr is still NULL.
Is that done intentionally ?

Evan

-Original Message-
From: amd-gfx  On Behalf Of
Jonathan Kim
Sent: Wednesday, January 5, 2022 3:12 AM
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Kim, Jonathan
; Koenig, Christian 
Subject: [PATCH] drm/amdgpu: improve debug VRAM access performance
using sdma

For better performance during VRAM access for debugged processes, do
read/write copies over SDMA.

In order to fulfill post mortem debugging on a broken device, fallback to
stable MMIO access when gpu recovery is disabled or when job submission
time outs are set to max.  Failed SDMA access should automatically fall
back to MMIO access.

Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
page-table updates and TLB flushes on access.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
+
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
  2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 367abed1d6e6..512df4c09772 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -48,6 +48,7 @@
  #include 

  #include 
+#include 

  #include "amdgpu.h"
  #include "amdgpu_object.h"
@@ -1429,6 +1430,70 @@ static void amdgpu_ttm_vram_mm_access(struct
amdgpu_device *adev, loff_t pos,
}
  }

+static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object
*bo,
+   unsigned long offset, void *buf, int
len, int write)
+{
+   struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
+   struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+   struct amdgpu_job *job;
+   struct dma_fence *fence;
+   uint64_t src_addr, dst_addr;
+   unsigned int num_dw;
+   int r, idx;
+
+   if (len != PAGE_SIZE)
+   return -EINVAL;
+
+   if (!adev->mman.sdma_access_ptr)
+   return -EACCES;
+
+   r = drm_dev_enter(adev_to_drm(adev), &idx);
+   if (r)
+   return r;
+
+   if (write)
+   memcpy(adev->mman.sdma_access_ptr, buf, len);
+
+   num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
+   r = amdgpu_job_alloc_with_ib(adev, num_dw * 4,
AMDGPU_IB_POOL_DELAYED, &job);
+   if (r)
+   goto out;
+
+   src_addr = write ? amdgpu_bo_gpu_offset(adev-

mman.sdma_access_bo) :

+   amdgpu_bo_gpu_offset(abo);
+   dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
+   amdgpu_bo_gpu_offset(adev-

mman.sdma_access_bo);

+   amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
PAGE_SIZE, false);
+
+   amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
+   WARN_ON(job->ibs[0].length_dw > num_dw);
+
+   r = amdgpu_job_submit(job, &adev->mman.entity,
AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
+   if (r) {
+   amdgpu_job_free(job);
+   goto out;
+   }
+
+   if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
+   r = -ETIMEDOUT;
+   dma_fence_put(fence);
+
+   if (!(r || write))
+   memcpy(buf, adev->mman.sdma_access_ptr, len);
+out:
+   drm_dev_exit(idx);
+   return r;
+}
+
+static inline bool amdgpu_ttm_allow_post_mortem_debug(struct
amdgpu_device *adev)
+{
+   return amdgpu_gpu_recovery == 0 ||
+   adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->video_timeout == MAX_SCHEDULE_TIMEOUT;
+}
+
  /**
   * amdgpu_ttm_access_memory - Read or Write memory that backs a
buffer object.
   *
@@ -1453,6 +1518,10 @@ static int amdgpu_ttm_access_memory(struct
ttm_buffer_object *bo,
if (bo->resource->mem_type != TTM_PL_VRAM)
return -EIO;

+   if (!amdgpu_ttm_allow_post_mortem_debug(adev) &&
+   !amdgpu_ttm_access_memory_sdma(bo, offset, buf,
len, write))
+   return len;
+
amdgpu_res_first(bo->resource, offset, len, &cursor);
while (cursor.remaining) {
size_t count, size = cursor.size;
@@ -1793,6 +1862,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}

+   if (amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   &adev->mman.sdma_access_bo, NULL,
+   adev->mman.sdma_access_ptr))
+   DRM_WARN("

RE: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2023-03-20 Thread Quan, Evan
[AMD Official Use Only - General]

I happened to find the sdma_access_bo allocation from GTT seems performing 
before gart is ready.
That makes the "amdgpu_gart_map" is skipped since adev->gart.ptr is still NULL.
Is that done intentionally ?

Evan
> -Original Message-
> From: amd-gfx  On Behalf Of
> Jonathan Kim
> Sent: Wednesday, January 5, 2022 3:12 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix ; Kim, Jonathan
> ; Koenig, Christian 
> Subject: [PATCH] drm/amdgpu: improve debug VRAM access performance
> using sdma
> 
> For better performance during VRAM access for debugged processes, do
> read/write copies over SDMA.
> 
> In order to fulfill post mortem debugging on a broken device, fallback to
> stable MMIO access when gpu recovery is disabled or when job submission
> time outs are set to max.  Failed SDMA access should automatically fall
> back to MMIO access.
> 
> Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
> page-table updates and TLB flushes on access.
> 
> Signed-off-by: Jonathan Kim 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
> +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
>  2 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 367abed1d6e6..512df4c09772 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -48,6 +48,7 @@
>  #include 
> 
>  #include 
> +#include 
> 
>  #include "amdgpu.h"
>  #include "amdgpu_object.h"
> @@ -1429,6 +1430,70 @@ static void amdgpu_ttm_vram_mm_access(struct
> amdgpu_device *adev, loff_t pos,
>   }
>  }
> 
> +static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object
> *bo,
> + unsigned long offset, void *buf, int
> len, int write)
> +{
> + struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
> + struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
> + struct amdgpu_job *job;
> + struct dma_fence *fence;
> + uint64_t src_addr, dst_addr;
> + unsigned int num_dw;
> + int r, idx;
> +
> + if (len != PAGE_SIZE)
> + return -EINVAL;
> +
> + if (!adev->mman.sdma_access_ptr)
> + return -EACCES;
> +
> + r = drm_dev_enter(adev_to_drm(adev), &idx);
> + if (r)
> + return r;
> +
> + if (write)
> + memcpy(adev->mman.sdma_access_ptr, buf, len);
> +
> + num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
> + r = amdgpu_job_alloc_with_ib(adev, num_dw * 4,
> AMDGPU_IB_POOL_DELAYED, &job);
> + if (r)
> + goto out;
> +
> + src_addr = write ? amdgpu_bo_gpu_offset(adev-
> >mman.sdma_access_bo) :
> + amdgpu_bo_gpu_offset(abo);
> + dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
> + amdgpu_bo_gpu_offset(adev-
> >mman.sdma_access_bo);
> + amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
> PAGE_SIZE, false);
> +
> + amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
> + WARN_ON(job->ibs[0].length_dw > num_dw);
> +
> + r = amdgpu_job_submit(job, &adev->mman.entity,
> AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
> + if (r) {
> + amdgpu_job_free(job);
> + goto out;
> + }
> +
> + if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
> + r = -ETIMEDOUT;
> + dma_fence_put(fence);
> +
> + if (!(r || write))
> + memcpy(buf, adev->mman.sdma_access_ptr, len);
> +out:
> + drm_dev_exit(idx);
> + return r;
> +}
> +
> +static inline bool amdgpu_ttm_allow_post_mortem_debug(struct
> amdgpu_device *adev)
> +{
> + return amdgpu_gpu_recovery == 0 ||
> + adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
> + adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
> + adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
> + adev->video_timeout == MAX_SCHEDULE_TIMEOUT;
> +}
> +
>  /**
>   * amdgpu_ttm_access_memory - Read or Write memory that backs a
> buffer object.
>   *
> @@ -1453,6 +1518,10 @@ static int amdgpu_ttm_access_memory(struct
> ttm_buffer_object *bo,
>   if (bo->resource->mem_type != TTM_PL_VRAM)
>   return -EIO;
> 
> + if (!amdgpu_ttm_allow_post_mortem_debug(adev) &&
> + !amdgpu_ttm_access_memory_sdma(bo, offset, buf,
> len, write))
> + return len;
> +
>   amdgpu_res_first(bo->resource, offset, len, &cursor);
>   while (cursor.remaining) {
>   size_t count, size = cursor.size;
> @@ -1793,6 +1862,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>   return r;
>   }
> 
> + if (amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
> + AMDGPU_GEM_DOMAIN_GTT,
> + &adev->mman.sdma_access_bo, NULL,
> + adev->mman.sdma_acces

Re: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2022-01-12 Thread Christian König

Yeah, that's basically my fault.

I haven't even worked myself through all the mails which piled up during 
the xmas break :(


Christian.

Am 12.01.22 um 15:21 schrieb Kim, Jonathan:

[Public]

Thanks Christian.  I've already merged based on Felix's review.
I'll send your suggested cleanup for review out soon.

Jon


-Original Message-
From: Koenig, Christian 
Sent: January 12, 2022 2:33 AM
To: Kim, Jonathan ; amd-
g...@lists.freedesktop.org
Cc: Kuehling, Felix 
Subject: Re: [PATCH] drm/amdgpu: improve debug VRAM access
performance using sdma

Am 04.01.22 um 20:12 schrieb Jonathan Kim:

For better performance during VRAM access for debugged processes, do
read/write copies over SDMA.

In order to fulfill post mortem debugging on a broken device, fallback
to stable MMIO access when gpu recovery is disabled or when job
submission time outs are set to max.  Failed SDMA access should
automatically fall back to MMIO access.

Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
page-table updates and TLB flushes on access.

Signed-off-by: Jonathan Kim 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78

+

   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
   2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 367abed1d6e6..512df4c09772 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -48,6 +48,7 @@
   #include 

   #include 
+#include 

   #include "amdgpu.h"
   #include "amdgpu_object.h"
@@ -1429,6 +1430,70 @@ static void

amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, loff_t pos,

 }
   }

+static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object

*bo,

+   unsigned long offset, void *buf, int

len, int write) {

+   struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
+   struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+   struct amdgpu_job *job;
+   struct dma_fence *fence;
+   uint64_t src_addr, dst_addr;
+   unsigned int num_dw;
+   int r, idx;
+
+   if (len != PAGE_SIZE)
+   return -EINVAL;
+
+   if (!adev->mman.sdma_access_ptr)
+   return -EACCES;
+
+   r = drm_dev_enter(adev_to_drm(adev), &idx);
+   if (r)
+   return r;
+
+   if (write)
+   memcpy(adev->mman.sdma_access_ptr, buf, len);
+
+   num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
+   r = amdgpu_job_alloc_with_ib(adev, num_dw * 4,

AMDGPU_IB_POOL_DELAYED, &job);

+   if (r)
+   goto out;
+
+   src_addr = write ? amdgpu_bo_gpu_offset(adev-
mman.sdma_access_bo) :
+   amdgpu_bo_gpu_offset(abo);
+   dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
+   amdgpu_bo_gpu_offset(adev-
mman.sdma_access_bo);

I suggest to write this as

src_addr = a;
dst_addr = b;
if (write)
  swap(src_addr, dst_addr);

This way we are not duplicating getting the different offsets.


+   amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
+PAGE_SIZE, false);
+
+   amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job-
ibs[0]);
+   WARN_ON(job->ibs[0].length_dw > num_dw);
+
+   r = amdgpu_job_submit(job, &adev->mman.entity,

AMDGPU_FENCE_OWNER_UNDEFINED, &fence);

+   if (r) {
+   amdgpu_job_free(job);
+   goto out;
+   }
+
+   if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
+   r = -ETIMEDOUT;
+   dma_fence_put(fence);
+
+   if (!(r || write))
+   memcpy(buf, adev->mman.sdma_access_ptr, len);
+out:
+   drm_dev_exit(idx);
+   return r;
+}
+
+static inline bool amdgpu_ttm_allow_post_mortem_debug(struct
+amdgpu_device *adev) {
+   return amdgpu_gpu_recovery == 0 ||
+   adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->video_timeout == MAX_SCHEDULE_TIMEOUT; }

This should probably be inside amdgpu_device.c


+
   /**
* amdgpu_ttm_access_memory - Read or Write memory that backs a

buffer object.

*
@@ -1453,6 +1518,10 @@ static int amdgpu_ttm_access_memory(struct

ttm_buffer_object *bo,

 if (bo->resource->mem_type != TTM_PL_VRAM)
 return -EIO;

+   if (!amdgpu_ttm_allow_post_mortem_debug(adev) &&
+   !amdgpu_ttm_access_memory_sdma(bo, offset,

buf, len, write))

+   return len;
+
 amdgpu_res_first(bo->resource, offset, len, &cursor);
 while (cursor.remaining) {
 size_t count, size = cursor.size;
@@ -1793,6 +1862,12 @@ int amdgpu_ttm_init(struct amdgpu_device

*adev)

 return r;
 }

+   if (amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   &adev-

RE: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2022-01-12 Thread Kim, Jonathan
[Public]

Thanks Christian.  I've already merged based on Felix's review.
I'll send your suggested cleanup for review out soon.

Jon

> -Original Message-
> From: Koenig, Christian 
> Sent: January 12, 2022 2:33 AM
> To: Kim, Jonathan ; amd-
> g...@lists.freedesktop.org
> Cc: Kuehling, Felix 
> Subject: Re: [PATCH] drm/amdgpu: improve debug VRAM access
> performance using sdma
>
> Am 04.01.22 um 20:12 schrieb Jonathan Kim:
> > For better performance during VRAM access for debugged processes, do
> > read/write copies over SDMA.
> >
> > In order to fulfill post mortem debugging on a broken device, fallback
> > to stable MMIO access when gpu recovery is disabled or when job
> > submission time outs are set to max.  Failed SDMA access should
> > automatically fall back to MMIO access.
> >
> > Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
> > page-table updates and TLB flushes on access.
> >
> > Signed-off-by: Jonathan Kim 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
> +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
> >   2 files changed, 82 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 367abed1d6e6..512df4c09772 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -48,6 +48,7 @@
> >   #include 
> >
> >   #include 
> > +#include 
> >
> >   #include "amdgpu.h"
> >   #include "amdgpu_object.h"
> > @@ -1429,6 +1430,70 @@ static void
> amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, loff_t pos,
> > }
> >   }
> >
> > +static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object
> *bo,
> > +   unsigned long offset, void *buf, int
> len, int write) {
> > +   struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
> > +   struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
> > +   struct amdgpu_job *job;
> > +   struct dma_fence *fence;
> > +   uint64_t src_addr, dst_addr;
> > +   unsigned int num_dw;
> > +   int r, idx;
> > +
> > +   if (len != PAGE_SIZE)
> > +   return -EINVAL;
> > +
> > +   if (!adev->mman.sdma_access_ptr)
> > +   return -EACCES;
> > +
> > +   r = drm_dev_enter(adev_to_drm(adev), &idx);
> > +   if (r)
> > +   return r;
> > +
> > +   if (write)
> > +   memcpy(adev->mman.sdma_access_ptr, buf, len);
> > +
> > +   num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
> > +   r = amdgpu_job_alloc_with_ib(adev, num_dw * 4,
> AMDGPU_IB_POOL_DELAYED, &job);
> > +   if (r)
> > +   goto out;
> > +
> > +   src_addr = write ? amdgpu_bo_gpu_offset(adev-
> >mman.sdma_access_bo) :
> > +   amdgpu_bo_gpu_offset(abo);
> > +   dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
> > +   amdgpu_bo_gpu_offset(adev-
> >mman.sdma_access_bo);
>
> I suggest to write this as
>
> src_addr = a;
> dst_addr = b;
> if (write)
>  swap(src_addr, dst_addr);
>
> This way we are not duplicating getting the different offsets.
>
> > +   amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr,
> > +PAGE_SIZE, false);
> > +
> > +   amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job-
> >ibs[0]);
> > +   WARN_ON(job->ibs[0].length_dw > num_dw);
> > +
> > +   r = amdgpu_job_submit(job, &adev->mman.entity,
> AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
> > +   if (r) {
> > +   amdgpu_job_free(job);
> > +   goto out;
> > +   }
> > +
> > +   if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
> > +   r = -ETIMEDOUT;
> > +   dma_fence_put(fence);
> > +
> > +   if (!(r || write))
> > +   memcpy(buf, adev->mman.sdma_access_ptr, len);
> > +out:
> > +   drm_dev_exit(idx);
> > +   return r;
> > +}
> > +
> > +static inline bool amdgpu_ttm_allow_post_mortem_debug(struct
> > +amdgpu_device *adev) {
> > +   return amdgpu_gpu_recovery == 0 ||
> > +   adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
> > +   adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
> > +   adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
> > +   adev->video_timeout == MAX_SCHEDULE_TI

Re: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2022-01-11 Thread Christian König

Am 04.01.22 um 20:12 schrieb Jonathan Kim:

For better performance during VRAM access for debugged processes, do
read/write copies over SDMA.

In order to fulfill post mortem debugging on a broken device, fallback to
stable MMIO access when gpu recovery is disabled or when job submission
time outs are set to max.  Failed SDMA access should automatically fall
back to MMIO access.

Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
page-table updates and TLB flushes on access.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
  2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 367abed1d6e6..512df4c09772 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -48,6 +48,7 @@
  #include 
  
  #include 

+#include 
  
  #include "amdgpu.h"

  #include "amdgpu_object.h"
@@ -1429,6 +1430,70 @@ static void amdgpu_ttm_vram_mm_access(struct 
amdgpu_device *adev, loff_t pos,
}
  }
  
+static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,

+   unsigned long offset, void *buf, int 
len, int write)
+{
+   struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
+   struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+   struct amdgpu_job *job;
+   struct dma_fence *fence;
+   uint64_t src_addr, dst_addr;
+   unsigned int num_dw;
+   int r, idx;
+
+   if (len != PAGE_SIZE)
+   return -EINVAL;
+
+   if (!adev->mman.sdma_access_ptr)
+   return -EACCES;
+
+   r = drm_dev_enter(adev_to_drm(adev), &idx);
+   if (r)
+   return r;
+
+   if (write)
+   memcpy(adev->mman.sdma_access_ptr, buf, len);
+
+   num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
+   r = amdgpu_job_alloc_with_ib(adev, num_dw * 4, AMDGPU_IB_POOL_DELAYED, 
&job);
+   if (r)
+   goto out;
+
+   src_addr = write ? amdgpu_bo_gpu_offset(adev->mman.sdma_access_bo) :
+   amdgpu_bo_gpu_offset(abo);
+   dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
+   amdgpu_bo_gpu_offset(adev->mman.sdma_access_bo);


I suggest to write this as

src_addr = a;
dst_addr = b;
if (write)
    swap(src_addr, dst_addr);

This way we are not duplicating getting the different offsets.


+   amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, 
PAGE_SIZE, false);
+
+   amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
+   WARN_ON(job->ibs[0].length_dw > num_dw);
+
+   r = amdgpu_job_submit(job, &adev->mman.entity, 
AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
+   if (r) {
+   amdgpu_job_free(job);
+   goto out;
+   }
+
+   if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
+   r = -ETIMEDOUT;
+   dma_fence_put(fence);
+
+   if (!(r || write))
+   memcpy(buf, adev->mman.sdma_access_ptr, len);
+out:
+   drm_dev_exit(idx);
+   return r;
+}
+
+static inline bool amdgpu_ttm_allow_post_mortem_debug(struct amdgpu_device 
*adev)
+{
+   return amdgpu_gpu_recovery == 0 ||
+   adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
+   adev->video_timeout == MAX_SCHEDULE_TIMEOUT;
+}


This should probably be inside amdgpu_device.c


+
  /**
   * amdgpu_ttm_access_memory - Read or Write memory that backs a buffer object.
   *
@@ -1453,6 +1518,10 @@ static int amdgpu_ttm_access_memory(struct 
ttm_buffer_object *bo,
if (bo->resource->mem_type != TTM_PL_VRAM)
return -EIO;
  
+	if (!amdgpu_ttm_allow_post_mortem_debug(adev) &&

+   !amdgpu_ttm_access_memory_sdma(bo, offset, buf, len, 
write))
+   return len;
+
amdgpu_res_first(bo->resource, offset, len, &cursor);
while (cursor.remaining) {
size_t count, size = cursor.size;
@@ -1793,6 +1862,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}
  
+	if (amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,

+   AMDGPU_GEM_DOMAIN_GTT,
+   &adev->mman.sdma_access_bo, NULL,
+   adev->mman.sdma_access_ptr))
+   DRM_WARN("Debug VRAM access will use slowpath MM access\n");
+
return 0;
  }
  
@@ -1823,6 +1898,9 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)

ttm_range_man_fini(&adev->mman.bdev, AMDGPU_PL_OA);
ttm_device_fini(&adev->mman.bdev);
adev->mman.initialized = false;
+   if (adev->mman.sdma_access_ptr)


You can drop that if. Free functions can u

Re: [PATCH] drm/amdgpu: improve debug VRAM access performance using sdma

2022-01-11 Thread Felix Kuehling
Am 2022-01-04 um 2:12 p.m. schrieb Jonathan Kim:
> For better performance during VRAM access for debugged processes, do
> read/write copies over SDMA.
>
> In order to fulfill post mortem debugging on a broken device, fallback to
> stable MMIO access when gpu recovery is disabled or when job submission
> time outs are set to max.  Failed SDMA access should automatically fall
> back to MMIO access.
>
> Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
> page-table updates and TLB flushes on access.
>
> Signed-off-by: Jonathan Kim 

Reviewed-by: Felix Kuehling 


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  5 +-
>  2 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 367abed1d6e6..512df4c09772 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -48,6 +48,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #include "amdgpu.h"
>  #include "amdgpu_object.h"
> @@ -1429,6 +1430,70 @@ static void amdgpu_ttm_vram_mm_access(struct 
> amdgpu_device *adev, loff_t pos,
>   }
>  }
>  
> +static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo,
> + unsigned long offset, void *buf, int 
> len, int write)
> +{
> + struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
> + struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
> + struct amdgpu_job *job;
> + struct dma_fence *fence;
> + uint64_t src_addr, dst_addr;
> + unsigned int num_dw;
> + int r, idx;
> +
> + if (len != PAGE_SIZE)
> + return -EINVAL;
> +
> + if (!adev->mman.sdma_access_ptr)
> + return -EACCES;
> +
> + r = drm_dev_enter(adev_to_drm(adev), &idx);
> + if (r)
> + return r;
> +
> + if (write)
> + memcpy(adev->mman.sdma_access_ptr, buf, len);
> +
> + num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
> + r = amdgpu_job_alloc_with_ib(adev, num_dw * 4, AMDGPU_IB_POOL_DELAYED, 
> &job);
> + if (r)
> + goto out;
> +
> + src_addr = write ? amdgpu_bo_gpu_offset(adev->mman.sdma_access_bo) :
> + amdgpu_bo_gpu_offset(abo);
> + dst_addr = write ? amdgpu_bo_gpu_offset(abo) :
> + amdgpu_bo_gpu_offset(adev->mman.sdma_access_bo);
> + amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, 
> PAGE_SIZE, false);
> +
> + amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
> + WARN_ON(job->ibs[0].length_dw > num_dw);
> +
> + r = amdgpu_job_submit(job, &adev->mman.entity, 
> AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
> + if (r) {
> + amdgpu_job_free(job);
> + goto out;
> + }
> +
> + if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout))
> + r = -ETIMEDOUT;
> + dma_fence_put(fence);
> +
> + if (!(r || write))
> + memcpy(buf, adev->mman.sdma_access_ptr, len);
> +out:
> + drm_dev_exit(idx);
> + return r;
> +}
> +
> +static inline bool amdgpu_ttm_allow_post_mortem_debug(struct amdgpu_device 
> *adev)
> +{
> + return amdgpu_gpu_recovery == 0 ||
> + adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
> + adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
> + adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
> + adev->video_timeout == MAX_SCHEDULE_TIMEOUT;
> +}
> +
>  /**
>   * amdgpu_ttm_access_memory - Read or Write memory that backs a buffer 
> object.
>   *
> @@ -1453,6 +1518,10 @@ static int amdgpu_ttm_access_memory(struct 
> ttm_buffer_object *bo,
>   if (bo->resource->mem_type != TTM_PL_VRAM)
>   return -EIO;
>  
> + if (!amdgpu_ttm_allow_post_mortem_debug(adev) &&
> + !amdgpu_ttm_access_memory_sdma(bo, offset, buf, len, 
> write))
> + return len;
> +
>   amdgpu_res_first(bo->resource, offset, len, &cursor);
>   while (cursor.remaining) {
>   size_t count, size = cursor.size;
> @@ -1793,6 +1862,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>   return r;
>   }
>  
> + if (amdgpu_bo_create_kernel(adev, PAGE_SIZE, PAGE_SIZE,
> + AMDGPU_GEM_DOMAIN_GTT,
> + &adev->mman.sdma_access_bo, NULL,
> + adev->mman.sdma_access_ptr))
> + DRM_WARN("Debug VRAM access will use slowpath MM access\n");
> +
>   return 0;
>  }
>  
> @@ -1823,6 +1898,9 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
>   ttm_range_man_fini(&adev->mman.bdev, AMDGPU_PL_OA);
>   ttm_device_fini(&adev->mman.bdev);
>   adev->mman.initialized = false;
> + if (adev->mman.sdma_access_ptr)
> + amdgpu_bo_free_kernel(&adev->mman.sdma_access_bo, NULL,
> +