Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-07 Thread Shengyu Qu

Hi Alexander,

在 2024/2/6 1:12, Deucher, Alexander 写道:

Are you only seeing the problem with this patch applied or in general?  If you 
are seeing it in general, it likely related to a firmware issue that was 
recently fixed that will be resolved with an update CP firmware image.
Driver side changes:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3
https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c

Alex
This problem is not affected by this patch, so possible the firmware 
issue. Where can I get the newest firmware image? Or is it already 
pushed to linux-firmware repo?


Best regards,
Shengyu



OpenPGP_0xE3520CC91929C8E7.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

The firmware has not been released yet, It's still undergoing regression 
testing.

Alex



From: Shengyu Qu
Sent: Tuesday, February 6, 2024 5:08 AM
To: Deucher, Alexander; Kuehling, Felix; amd-gfx@lists.freedesktop.org
Cc: wiagn...@outlook.com; Cornwall, Jay; Koenig, Christian; Paneer Selvam, 
Arunpravin
Subject: Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

Hi Alexander,

在 2024/2/6 1:12, Deucher, Alexander 写道:

Are you only seeing the problem with this patch applied or in general?  If you 
are seeing it in general, it likely related to a firmware issue that was 
recently fixed that will be resolved with an update CP firmware image.
Driver side changes:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3
https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c

Alex


This problem is not affected by this patch, so possible the firmware issue. 
Where can I get the newest firmware image? Or is it already pushed to 
linux-firmware repo?

Best regards,
Shengyu



RE: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-05 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> Shengyu Qu
> Sent: Saturday, February 3, 2024 8:05 AM
> To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
> Cc: wiagn...@outlook.com; Cornwall, Jay ;
> Koenig, Christian ; Paneer Selvam, Arunpravin
> 
> Subject: Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)
>
> Hi Felix,
> Sorry for my late reply. I was busy this week.
> I just did some more tests using next-20240202 branch. Testing using blender
> 4.0.2, when only one HIP render task is running, there's no problem.
> However, when two tasks run together, software always crashes, but not
> crashes the whole system. Dmesg reports gpu reset in most cases, for
> example:
>
> [  176.071823] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> gfx_0.0.0 timeout, signaled seq=32608, emitted seq=32610 [  176.072000]
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> information: process blender pid 4256 thread blender:cs0 pid 4297
> [  176.072143] amdgpu :03:00.0: amdgpu: GPU reset begin!
> [  176.073571] amdgpu :03:00.0: amdgpu: Guilty job already signaled,
> skipping HW reset [  176.073593] amdgpu :03:00.0: amdgpu: GPU
> reset(4) succeeded!
>
> And in some rare cases, there would be a page fault report, see dmesg.log.
> Do you have any idea? Can I make it print more detailed diagnostic
> information?

Are you only seeing the problem with this patch applied or in general?  If you 
are seeing it in general, it likely related to a firmware issue that was 
recently fixed that will be resolved with an update CP firmware image.
Driver side changes:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3
https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c

Alex


>
> Best regards,
> Shengyu
>
>
> 在 2024/1/30 01:47, Felix Kuehling 写道:
> > On 2024-01-29 10:24, Shengyu Qu wrote:
> >> Hello Felix,
> >> I think you are right. This problem has existed for years(just look
> >> at the issue creation time in my link), and is thought caused by
> >> OpenGL-ROCM interop(that's why I think this patch might help). It is
> >> very easy to trigger this problem in blender(method is also mentioned
> >> in the link).
> >
> > This doesn't help you, but it's unlikely that this has been the same
> > issue for two years for everybody who chimed into this bug report.
> > Different kernel versions, GPUs, user mode ROCm and Mesa versions etc.
> >
> > Case in point, it's possible that you're seeing an issue specific to
> > RDNA3, which hasn't even been around for that long.
> >
> >
> >> Do
> >> you have any idea about this?
> >
> > Not without seeing a lot more diagnostic information. A full backtrace
> > from your kernel log would be a good start.
> >
> > Regards,
> >   Felix
> >
> >
> >> Best regards,
> >> Shengyu
> >> 在 2024/1/29 22:51, Felix Kuehling 写道:
> >>> On 2024-01-29 8:58, Shengyu Qu wrote:
> >>>> Hi,
> >>>> Seems rocm-opengl interop hang problem still exists[1]. Btw have
> >>>> you discovered into this problem?
> >>>> Best regards,
> >>>> Shengyu
> >>>> [1]
> >>>> https://projects.blender.org/blender/blender/issues/100353#issuecom
> >>>> ment-599
> >>>
> >>> Maybe you're having a different problem. Do you see this issue also
> >>> without any version of the "Relocate TBA/TMA ..." patch?
> >>>
> >>> Regards,
> >>>   Felix
> >>>
> >>>
> >>>>
> >>>> 在 2024/1/27 03:15, Shengyu Qu 写道:
> >>>>> Hello Felix,
> >>>>> This patch seems working on my system, also it seems fixes the
> >>>>> ROCM/OpenGL interop problem.
> >>>>> Is this intended to happen or not? Maybe we need more users to
> >>>>> test it.
> >>>>> Besides,
> >>>>> Tested-by: Shengyu Qu  Best Regards,
> Shengyu
> >>>>>
> >>>>> 在 2024/1/26 06:27, Felix Kuehling 写道:
> >>>>>> The TBA and TMA, along with an unused IB allocation, reside at
> >>>>>> low addresses in the VM address space. A stray VM fault which
> >>>>>> hits these pages must be serviced by making their page table entries
> invalid.
> >>>>>> The scheduler depends upon these pages being resident and fails,
> >>>>>> preventing a debug

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-05 Thread Shengyu Qu

Hi Felix,
Sorry for my late reply. I was busy this week.
I just did some more tests using next-20240202 branch. Testing using 
blender 4.0.2, when only one HIP render task is running, there's no problem.
However, when two tasks run together, software always crashes, but not 
crashes the whole system. Dmesg reports gpu reset in most cases, for 
example:


[  176.071823] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 
timeout, signaled seq=32608, emitted seq=32610
[  176.072000] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process 
information: process blender pid 4256 thread blender:cs0 pid 4297

[  176.072143] amdgpu :03:00.0: amdgpu: GPU reset begin!
[  176.073571] amdgpu :03:00.0: amdgpu: Guilty job already signaled, 
skipping HW reset

[  176.073593] amdgpu :03:00.0: amdgpu: GPU reset(4) succeeded!

And in some rare cases, there would be a page fault report, see dmesg.log.
Do you have any idea? Can I make it print more detailed diagnostic 
information?


Best regards,
Shengyu


在 2024/1/30 01:47, Felix Kuehling 写道:

On 2024-01-29 10:24, Shengyu Qu wrote:

Hello Felix,
I think you are right. This problem has existed for years(just look 
at the

issue creation time in my link), and is thought caused by OpenGL-ROCM
interop(that's why I think this patch might help). It is very easy to
trigger this problem in blender(method is also mentioned in the link).


This doesn't help you, but it's unlikely that this has been the same 
issue for two years for everybody who chimed into this bug report. 
Different kernel versions, GPUs, user mode ROCm and Mesa versions etc.


Case in point, it's possible that you're seeing an issue specific to 
RDNA3, which hasn't even been around for that long.




Do
you have any idea about this?


Not without seeing a lot more diagnostic information. A full backtrace 
from your kernel log would be a good start.


Regards,
  Felix



Best regards,
Shengyu
在 2024/1/29 22:51, Felix Kuehling 写道:

On 2024-01-29 8:58, Shengyu Qu wrote:

Hi,
Seems rocm-opengl interop hang problem still exists[1]. Btw have you
discovered into this problem?
Best regards,
Shengyu
[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


Maybe you're having a different problem. Do you see this issue also 
without any version of the "Relocate TBA/TMA ..." patch?


Regards,
  Felix




在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,
This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.
Is this intended to happen or not? Maybe we need more users to 
test it.

Besides,
Tested-by: Shengyu Qu 
Best Regards,
Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it 
much

less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 
+++-

  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << 
AMDGPU_GPU_PAGE_SHIFT;

+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device 
*adev)

  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    

Re: [PATCH 2/2] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-31 Thread Christian König




Am 30.01.24 um 21:08 schrieb Felix Kuehling:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 10 ++-
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
  4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
  
  uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)

  {
-   uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+   uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);


Maybe move the "adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT" part 
into the macro. That should be identical for all use cases.


Apart from that looks really good to me.

Regards,
Christian.

  
-	addr -= AMDGPU_VA_RESERVED_CSA_SIZE;

addr = amdgpu_gmc_sign_extend(addr);
  
  	return addr;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
  {
-   u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-   addr -= AMDGPU_VA_RESERVED_TOP;
-
-   return addr;
+   return AMDGPU_VA_RESERVED_SEQ64_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
  
  /**

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2c4053b29bb3..c2407f6a7e83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -137,9 +137,17 @@ struct amdgpu_mem_stats;
  
  /* Reserve space at top/bottom of address space for kernel use */

  #define AMDGPU_VA_RESERVED_CSA_SIZE   (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)  ((top) \
+- AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
(AMDGPU_VA_RESERVED_CSA_START(top) \
+- 
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE   (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \
+- AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM (1ULL << 16)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + 
\
+AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
  
  /* See vm_update_mode */

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include "amdgpu_vm.h"
  
  /*

   * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
 * with small reserved space for kernel.
 * Set them to CANONICAL addresses.
 */
-   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_base = max(SVM_USER_BASE, AMDGPU_VA_RESERVED_BOTTOM);
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
  
+	/* dGPUs: the reserved space for kernel

+* before SVM
+*/
+   pdd->qpd.cwsr_base = SVM_CWSR_BASE;

[PATCH 2/2] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Felix Kuehling
The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 10 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
 
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
 {
-   uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+   uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
 
-   addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
addr = amdgpu_gmc_sign_extend(addr);
 
return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
  */
 static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
 {
-   u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-   addr -= AMDGPU_VA_RESERVED_TOP;
-
-   return addr;
+   return AMDGPU_VA_RESERVED_SEQ64_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2c4053b29bb3..c2407f6a7e83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -137,9 +137,17 @@ struct amdgpu_mem_stats;
 
 /* Reserve space at top/bottom of address space for kernel use */
 #define AMDGPU_VA_RESERVED_CSA_SIZE(2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)  ((top) \
+- AMDGPU_VA_RESERVED_CSA_SIZE)
 #define AMDGPU_VA_RESERVED_SEQ64_SIZE  (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
(AMDGPU_VA_RESERVED_CSA_START(top) \
+- 
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE   (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \
+- AMDGPU_VA_RESERVED_TRAP_SIZE)
 #define AMDGPU_VA_RESERVED_BOTTOM  (1ULL << 16)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + 
\
+AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
 
 /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include "amdgpu_vm.h"
 
 /*
  * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
 * with small reserved space for kernel.
 * Set them to CANONICAL addresses.
 */
-   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_base = max(SVM_USER_BASE, AMDGPU_VA_RESERVED_BOTTOM);
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
 
+   /* dGPUs: the reserved space for kernel
+* before SVM
+*/
+   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+   pdd->qpd.ib_base = SVM_IB_BASE;
+
pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
 }
@@ -339,18 +346,19 @@ static void kfd_init_apertures_v9(struct 
kfd_process_device *pdd, 

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Shengyu Qu
Hi Felix,

Thanks for reply. I'll record a backtrace when I'm free. Besides, here is
a dmesg log from someone else in the issue discussion about this problem:
https://projects.blender.org/attachments/ea7b7db5-ac16-479d-935b-9e1da33cd6f0
Tested using next-20240129 with this patch applied, and setup is Plasma 6.0
RC1(Wayland) + RX 6600 XT.

Best regards,
Shengyu

在 2024/1/30 1:47, Felix Kuehling 写道:
> On 2024-01-29 10:24, Shengyu Qu wrote:
>> Hello Felix,
>> I think you are right. This problem has existed for years(just look at
>> the
>> issue creation time in my link), and is thought caused by OpenGL-ROCMTe
>> interop(that's why I think this patch might help). It is very easy to
>> trigger this problem in blender(method is also mentioned in the link).
> 
> This doesn't help you, but it's unlikely that this has been the same
> issue for two years for everybody who chimed into this bug report.
> Different kernel versions, GPUs, user mode ROCm and Mesa versions etc.
> 
> Case in point, it's possible that you're seeing an issue specific to
> RDNA3, which hasn't even been around for that long.
> 
> 
>> Do
>> you have any idea about this?
> 
> Not without seeing a lot more diagnostic information. A full backtrace
> from your kernel log would be a good start.
> 
> Regards,
>   Felix
> 
> 
>> Best regards,
>> Shengyu
>> 在 2024/1/29 22:51, Felix Kuehling 写道:
>>> On 2024-01-29 8:58, Shengyu Qu wrote:
 Hi,
 Seems rocm-opengl interop hang problem still exists[1]. Btw have you
 discovered into this problem?
 Best regards,
 Shengyu
 [1]
 https://projects.blender.org/blender/blender/issues/100353#issuecomment-599
>>>
>>> Maybe you're having a different problem. Do you see this issue also
>>> without any version of the "Relocate TBA/TMA ..." patch?
>>>
>>> Regards,
>>>   Felix
>>>
>>>

 在 2024/1/27 03:15, Shengyu Qu 写道:
> Hello Felix,
> This patch seems working on my system, also it seems fixes the
> ROCM/OpenGL
> interop problem.
> Is this intended to happen or not? Maybe we need more users to test
> it.
> Besides,
> Tested-by: Shengyu Qu 
> Best Regards,
> Shengyu
>
> 在 2024/1/26 06:27, Felix Kuehling 写道:
>> The TBA and TMA, along with an unused IB allocation, reside at low
>> addresses in the VM address space. A stray VM fault which hits these
>> pages must be serviced by making their page table entries invalid.
>> The scheduler depends upon these pages being resident and fails,
>> preventing a debugger from inspecting the failure state.
>>
>> By relocating these pages above 47 bits in the VM address space they
>> can only be reached when bits [63:48] are set to 1. This makes it
>> much
>> less likely for a misbehaving program to generate accesses to them.
>> The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
>> access with a small offset.
>>
>> v2:
>> - Move it to the reserved space to avoid concflicts with Mesa
>> - Add macros to make reserved space management easier
>>
>> Cc: Arunpravin Paneer Selvam 
>> Cc: Christian Koenig 
>> Signed-off-by: Jay Cornwall 
>> Signed-off-by: Felix Kuehling 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
>>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30
>> +++-
>>   4 files changed, 30 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> index 823d31f4a2a3..53d0a458d78e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>> @@ -28,9 +28,9 @@
>>     uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
>>   {
>> -    uint64_t addr = adev->vm_manager.max_pfn <<
>> AMDGPU_GPU_PAGE_SHIFT;
>> +    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
>> +    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
>>   -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
>>   addr = amdgpu_gmc_sign_extend(addr);
>>     return addr;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> index 3d0d56087d41..9e769ef50f2e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>> @@ -45,11 +45,8 @@
>>    */
>>   static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device
>> *adev)
>>   {
>> -    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
>> -
>> -    addr -= AMDGPU_VA_RESERVED_TOP;
>> -
>> -    return addr;
>> +    return AMDGPU_VA_RESERVED_SEQ64_START(
>> +    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
>>   }
>>     

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Shengyu Qu

Hello Felix,
I think you are right. This problem has existed for years(just look at the
issue creation time in my link), and is thought caused by OpenGL-ROCM
interop(that's why I think this patch might help). It is very easy to
trigger this problem in blender(method is also mentioned in the link). Do
you have any idea about this?
Best regards,
Shengyu
在 2024/1/29 22:51, Felix Kuehling 写道:

On 2024-01-29 8:58, Shengyu Qu wrote:

Hi,
Seems rocm-opengl interop hang problem still exists[1]. Btw have you
discovered into this problem?
Best regards,
Shengyu
[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


Maybe you're having a different problem. Do you see this issue also 
without any version of the "Relocate TBA/TMA ..." patch?


Regards,
  Felix




在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,
This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.
Is this intended to happen or not? Maybe we need more users to test it.
Besides,
Tested-by: Shengyu Qu 
Best Regards,
Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 
+++-

  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << 
AMDGPU_GPU_PAGE_SHIFT;

+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device 
*adev)

  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)

  -/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)    ((top) \
+ - AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top) 
(AMDGPU_VA_RESERVED_CSA_START(top) \

+ - AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \

+ - AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM    (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+ AMDGPU_VA_RESERVED_SEQ64_SIZE + \
   AMDGPU_VA_RESERVED_CSA_SIZE)
    /* See vm_update_mode */
diff --git 

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-30 Thread Shengyu Qu

Hi,

Seems rocm-opengl interop hang problem still exists[1]. Btw have you

discovered into this problem?

Best regards,

Shengyu

[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,

This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.

Is this intended to happen or not? Maybe we need more users to test it.

Besides,

Tested-by: Shengyu Qu 

Best Regards,

Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)

  -/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)    ((top) \
+ - AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top) 
(AMDGPU_VA_RESERVED_CSA_START(top) \

+ - AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \

+ - AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM    (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+ AMDGPU_VA_RESERVED_SEQ64_SIZE + \
   AMDGPU_VA_RESERVED_CSA_SIZE)
    /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c

index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include "amdgpu_vm.h"
    /*
   * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)

   * with small reserved space for kernel.
   * Set them to CANONICAL addresses.
   */
-    

Re: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Christian König

Am 29.01.24 um 18:48 schrieb Felix Kuehling:

On 2024-01-29 11:50, Arunpravin Paneer Selvam wrote:

@@ -339,18 +346,19 @@ static void kfd_init_apertures_v9(struct
kfd_process_device *pdd, uint8_t id)
   pdd->lds_base = MAKE_LDS_APP_BASE_V9();
   pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);

-    /* Raven needs SVM to support graphic handle, etc. Leave 
the small
- * reserved space before SVM on Raven as well, even 
though we don't

- * have to.
- * Set gpuvm_base and gpuvm_limit to CANONICAL addresses 
so that they

- * are used in Thunk to reserve SVM.
- */
-    pdd->gpuvm_base = SVM_USER_BASE;
+  pdd->gpuvm_base = AMDGPU_VA_RESERVED_BOTTOM;

Hi Felix,

pdd->gpuvm_base changes from 16KB to 2MB after this patch.

The default mmap_min_addr(/proc/sys/vm/mmap_min_addr) is 64KB.

That means user could get a CPU VA < 2MB while the corresponding 
GPU VA has been reserved. Will this break SVM?


It could break SVM if a process tries to map or access something 
below 2MB. I'm not sure what AMDGPU_VA_RESERVED_BOTTOM is used for 
in the GPU page tables. But if it's causing problems for real 
applications with SVM, we should look into lowering that reservation.


We have decided to keep AMDGPU_VA_RESERVED_BOTTOM free for catching 
NULL pointer dereferences.


Can this be made smaller? I think the 64KB minimum address used on the 
CPU side serves the same purpose. Could we match that on the GPU side?


Yeah I think that should work. There used to be a VCN firmware which was 
buggy and would write randomly into the first x MiB of the virtual 
address space.


Apart from that we never really encountered an issue with NULL pointers 
on the GFX side which a smaller reserved areas wouldn't have catched as 
well.


Regards,
Christian.



Regards,
  Felix




Regards,
Arun. 




Re: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling



On 2024-01-29 11:50, Arunpravin Paneer Selvam wrote:

@@ -339,18 +346,19 @@ static void kfd_init_apertures_v9(struct
kfd_process_device *pdd, uint8_t id)
   pdd->lds_base = MAKE_LDS_APP_BASE_V9();
   pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);

-    /* Raven needs SVM to support graphic handle, etc. Leave 
the small
- * reserved space before SVM on Raven as well, even though 
we don't

- * have to.
- * Set gpuvm_base and gpuvm_limit to CANONICAL addresses 
so that they

- * are used in Thunk to reserve SVM.
- */
-    pdd->gpuvm_base = SVM_USER_BASE;
+  pdd->gpuvm_base = AMDGPU_VA_RESERVED_BOTTOM;

Hi Felix,

pdd->gpuvm_base changes from 16KB to 2MB after this patch.

The default mmap_min_addr(/proc/sys/vm/mmap_min_addr) is 64KB.

That means user could get a CPU VA < 2MB while the corresponding GPU 
VA has been reserved. Will this break SVM?


It could break SVM if a process tries to map or access something 
below 2MB. I'm not sure what AMDGPU_VA_RESERVED_BOTTOM is used for in 
the GPU page tables. But if it's causing problems for real 
applications with SVM, we should look into lowering that reservation.


We have decided to keep AMDGPU_VA_RESERVED_BOTTOM free for catching 
NULL pointer dereferences.


Can this be made smaller? I think the 64KB minimum address used on the 
CPU side serves the same purpose. Could we match that on the GPU side?


Regards,
  Felix




Regards,
Arun. 


Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling

On 2024-01-29 10:24, Shengyu Qu wrote:

Hello Felix,
I think you are right. This problem has existed for years(just look at 
the

issue creation time in my link), and is thought caused by OpenGL-ROCM
interop(that's why I think this patch might help). It is very easy to
trigger this problem in blender(method is also mentioned in the link).


This doesn't help you, but it's unlikely that this has been the same 
issue for two years for everybody who chimed into this bug report. 
Different kernel versions, GPUs, user mode ROCm and Mesa versions etc.


Case in point, it's possible that you're seeing an issue specific to 
RDNA3, which hasn't even been around for that long.




Do
you have any idea about this?


Not without seeing a lot more diagnostic information. A full backtrace 
from your kernel log would be a good start.


Regards,
  Felix



Best regards,
Shengyu
在 2024/1/29 22:51, Felix Kuehling 写道:

On 2024-01-29 8:58, Shengyu Qu wrote:

Hi,
Seems rocm-opengl interop hang problem still exists[1]. Btw have you
discovered into this problem?
Best regards,
Shengyu
[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


Maybe you're having a different problem. Do you see this issue also 
without any version of the "Relocate TBA/TMA ..." patch?


Regards,
  Felix




在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,
This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.
Is this intended to happen or not? Maybe we need more users to test 
it.

Besides,
Tested-by: Shengyu Qu 
Best Regards,
Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it 
much

less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 
+++-

  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << 
AMDGPU_GPU_PAGE_SHIFT;

+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device 
*adev)

  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)

  -/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)    ((top) \
+ - AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top) 
(AMDGPU_VA_RESERVED_CSA_START(top) \

+ - AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 12)

Re: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Arunpravin Paneer Selvam

Hi Felix,

On 1/29/2024 8:24 PM, Felix Kuehling wrote:

On 2024-01-29 3:45, Yu, Lang wrote:

[AMD Official Use Only - General]


-Original Message-
From: amd-gfx  On Behalf Of 
Felix

Kuehling
Sent: Friday, January 26, 2024 6:28 AM
To: amd-gfx@lists.freedesktop.org
Cc: Cornwall, Jay ; Koenig, Christian
; Paneer Selvam, Arunpravin

Subject: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM 
hole (v2)


The TBA and TMA, along with an unused IB allocation, reside at low 
addresses in
the VM address space. A stray VM fault which hits these pages must 
be serviced

by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails, 
preventing a

debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they 
can only be
reached when bits [63:48] are set to 1. This makes it much less 
likely for a

misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL 
access with a

small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@

uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)  {
-  uint64_t addr = adev->vm_manager.max_pfn <<
AMDGPU_GPU_PAGE_SHIFT;
+  uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+  adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);

-  addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
   addr = amdgpu_gmc_sign_extend(addr);

   return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
  */
static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device 
*adev)  {

-  u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-  addr -= AMDGPU_VA_RESERVED_TOP;
-
-  return addr;
+  return AMDGPU_VA_RESERVED_SEQ64_START(
+  adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
}

/**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;  #define
AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) <
AMDGPU_MMHUB1_START)  #define AMDGPU_IS_MMHUB1(x) ((x) >=
AMDGPU_MMHUB1_START && (x) < AMDGPU_MAX_VMHUBS)

-/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
#define AMDGPU_VA_RESERVED_CSA_SIZE   (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top) ((top) \
+   -
AMDGPU_VA_RESERVED_CSA_SIZE)
#define AMDGPU_VA_RESERVED_SEQ64_SIZE (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
   (AMDGPU_VA_RESERVED_CSA_START(top) \
+   -
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE  (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top)
   (AMDGPU_VA_RESERVED_SEQ64_START(top) \
+   -
AMDGPU_VA_RESERVED_TRAP_SIZE)
#define AMDGPU_VA_RESERVED_BOTTOM (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP
   (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP
   (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+
AMDGPU_VA_RESERVED_SEQ64_SIZE + \

AMDGPU_VA_RESERVED_CSA_SIZE)

/* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
#include 
#include 
#include 
+#include "amdgpu_vm.h"

/*
  * The primary memory I/O features being added for revisions of 
gfxip @@ -
326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device

*pdd, uint8_t id)
    * with small reserved space for kernel.
    * Set them to CANONICAL addres

Re: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling

On 2024-01-29 3:45, Yu, Lang wrote:

[AMD Official Use Only - General]


-Original Message-
From: amd-gfx  On Behalf Of Felix
Kuehling
Sent: Friday, January 26, 2024 6:28 AM
To: amd-gfx@lists.freedesktop.org
Cc: Cornwall, Jay ; Koenig, Christian
; Paneer Selvam, Arunpravin

Subject: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

The TBA and TMA, along with an unused IB allocation, reside at low addresses in
the VM address space. A stray VM fault which hits these pages must be serviced
by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails, preventing a
debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they can only be
reached when bits [63:48] are set to 1. This makes it much less likely for a
misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL access with a
small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@

uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)  {
-  uint64_t addr = adev->vm_manager.max_pfn <<
AMDGPU_GPU_PAGE_SHIFT;
+  uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+  adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);

-  addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
   addr = amdgpu_gmc_sign_extend(addr);

   return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
  */
static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)  {
-  u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-  addr -= AMDGPU_VA_RESERVED_TOP;
-
-  return addr;
+  return AMDGPU_VA_RESERVED_SEQ64_START(
+  adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
}

/**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;  #define
AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) <
AMDGPU_MMHUB1_START)  #define AMDGPU_IS_MMHUB1(x) ((x) >=
AMDGPU_MMHUB1_START && (x) < AMDGPU_MAX_VMHUBS)

-/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
#define AMDGPU_VA_RESERVED_CSA_SIZE   (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top) ((top) \
+   -
AMDGPU_VA_RESERVED_CSA_SIZE)
#define AMDGPU_VA_RESERVED_SEQ64_SIZE (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
   (AMDGPU_VA_RESERVED_CSA_START(top) \
+   -
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE  (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top)
   (AMDGPU_VA_RESERVED_SEQ64_START(top) \
+   -
AMDGPU_VA_RESERVED_TRAP_SIZE)
#define AMDGPU_VA_RESERVED_BOTTOM (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP
   (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP
   (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+
AMDGPU_VA_RESERVED_SEQ64_SIZE + \

AMDGPU_VA_RESERVED_CSA_SIZE)

/* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
#include 
#include 
#include 
+#include "amdgpu_vm.h"

/*
  * The primary memory I/O features being added for revisions of gfxip @@ -
326,10 +327,16 @@ static void kfd_init_apertures_vi(struct kfd_process_device
*pdd, uint8_t id)
* with small reserved space for kernel.
* Set them to CANONICAL addresses.
*/
-  pdd->gpuvm_base = SVM_USER_BASE;
+  pdd->

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Felix Kuehling

On 2024-01-29 8:58, Shengyu Qu wrote:

Hi,

Seems rocm-opengl interop hang problem still exists[1]. Btw have you

discovered into this problem?

Best regards,

Shengyu

[1] 
https://projects.blender.org/blender/blender/issues/100353#issuecomment-599


Maybe you're having a different problem. Do you see this issue also 
without any version of the "Relocate TBA/TMA ..." patch?


Regards,
  Felix




在 2024/1/27 03:15, Shengyu Qu 写道:

Hello Felix,

This patch seems working on my system, also it seems fixes the 
ROCM/OpenGL

interop problem.

Is this intended to happen or not? Maybe we need more users to test it.

Besides,

Tested-by: Shengyu Qu 

Best Regards,

Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c    |  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 
+++-

  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
  {
-    uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+    uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  -    addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
  addr = amdgpu_gmc_sign_extend(addr);
    return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device 
*adev)

  {
-    u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-    addr -= AMDGPU_VA_RESERVED_TOP;
-
-    return addr;
+    return AMDGPU_VA_RESERVED_SEQ64_START(
+    adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)

  -/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)    ((top) \
+ - AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE    (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top) 
(AMDGPU_VA_RESERVED_CSA_START(top) \

+ - AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \

+ - AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM    (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + \
+ AMDGPU_VA_RESERVED_SEQ64_SIZE + \
   AMDGPU_VA_RESERVED_CSA_SIZE)
    /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c

index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include "amdgpu_vm.h"
    /*
   * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ 

Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Shengyu Qu

Hello Felix,

This patch seems working on my system, also it seems fixes the ROCM/OpenGL
interop problem.

Is this intended to happen or not? Maybe we need more users to test it.

Besides,

Tested-by: Shengyu Qu 

Best Regards,

Shengyu

在 2024/1/26 06:27, Felix Kuehling 写道:

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
  4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
  
  uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)

  {
-   uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+   uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  
-	addr -= AMDGPU_VA_RESERVED_CSA_SIZE;

addr = amdgpu_gmc_sign_extend(addr);
  
  	return addr;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
   */
  static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
  {
-   u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-   addr -= AMDGPU_VA_RESERVED_TOP;
-
-   return addr;
+   return AMDGPU_VA_RESERVED_SEQ64_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
  }
  
  /**

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
  #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
  #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)
  
-/* Reserve 2MB at top/bottom of address space for kernel use */

+/* Reserve space at top/bottom of address space for kernel use */
  #define AMDGPU_VA_RESERVED_CSA_SIZE   (2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)  ((top) \
+- AMDGPU_VA_RESERVED_CSA_SIZE)
  #define AMDGPU_VA_RESERVED_SEQ64_SIZE (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
(AMDGPU_VA_RESERVED_CSA_START(top) \
+- 
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE   (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \
+- AMDGPU_VA_RESERVED_TRAP_SIZE)
  #define AMDGPU_VA_RESERVED_BOTTOM (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + 
\
+AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
  
  /* See vm_update_mode */

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include "amdgpu_vm.h"
  
  /*

   * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
 * with small reserved space for kernel.
 * Set them to CANONICAL addresses.
 */
-   pdd->gpuvm_base = SVM_USER_BASE;
+   

RE: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-29 Thread Yu, Lang
[AMD Official Use Only - General]

>-Original Message-
>From: amd-gfx  On Behalf Of Felix
>Kuehling
>Sent: Friday, January 26, 2024 6:28 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Cornwall, Jay ; Koenig, Christian
>; Paneer Selvam, Arunpravin
>
>Subject: [PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)
>
>The TBA and TMA, along with an unused IB allocation, reside at low addresses in
>the VM address space. A stray VM fault which hits these pages must be serviced
>by making their page table entries invalid.
>The scheduler depends upon these pages being resident and fails, preventing a
>debugger from inspecting the failure state.
>
>By relocating these pages above 47 bits in the VM address space they can only 
>be
>reached when bits [63:48] are set to 1. This makes it much less likely for a
>misbehaving program to generate accesses to them.
>The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL access with 
>a
>small offset.
>
>v2:
>- Move it to the reserved space to avoid concflicts with Mesa
>- Add macros to make reserved space management easier
>
>Cc: Arunpravin Paneer Selvam 
>Cc: Christian Koenig 
>Signed-off-by: Jay Cornwall 
>Signed-off-by: Felix Kuehling 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
> drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
> 4 files changed, 30 insertions(+), 23 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>index 823d31f4a2a3..53d0a458d78e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>@@ -28,9 +28,9 @@
>
> uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)  {
>-  uint64_t addr = adev->vm_manager.max_pfn <<
>AMDGPU_GPU_PAGE_SHIFT;
>+  uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
>+  adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
>
>-  addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
>   addr = amdgpu_gmc_sign_extend(addr);
>
>   return addr;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>index 3d0d56087d41..9e769ef50f2e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
>@@ -45,11 +45,8 @@
>  */
> static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)  {
>-  u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
>-
>-  addr -= AMDGPU_VA_RESERVED_TOP;
>-
>-  return addr;
>+  return AMDGPU_VA_RESERVED_SEQ64_START(
>+  adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
> }
>
> /**
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>index 98a57192..f23b6153d310 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;  #define
>AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) <
>AMDGPU_MMHUB1_START)  #define AMDGPU_IS_MMHUB1(x) ((x) >=
>AMDGPU_MMHUB1_START && (x) < AMDGPU_MAX_VMHUBS)
>
>-/* Reserve 2MB at top/bottom of address space for kernel use */
>+/* Reserve space at top/bottom of address space for kernel use */
> #define AMDGPU_VA_RESERVED_CSA_SIZE   (2ULL << 20)
>+#define AMDGPU_VA_RESERVED_CSA_START(top) ((top) \
>+   -
>AMDGPU_VA_RESERVED_CSA_SIZE)
> #define AMDGPU_VA_RESERVED_SEQ64_SIZE (2ULL << 20)
>+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
>   (AMDGPU_VA_RESERVED_CSA_START(top) \
>+   -
>AMDGPU_VA_RESERVED_SEQ64_SIZE)
>+#define AMDGPU_VA_RESERVED_TRAP_SIZE  (2ULL << 12)
>+#define AMDGPU_VA_RESERVED_TRAP_START(top)
>   (AMDGPU_VA_RESERVED_SEQ64_START(top) \
>+   -
>AMDGPU_VA_RESERVED_TRAP_SIZE)
> #define AMDGPU_VA_RESERVED_BOTTOM (2ULL << 20)
>-#define AMDGPU_VA_RESERVED_TOP
>   (AMDGPU_VA_RESERVED_SEQ64_SIZE + \
>+#define AMDGPU_VA_RESERVED_TOP
>   (AMDGPU_VA_RESERVED_TRAP_SIZE + \
>+
>AMDGPU_VA_RESERVED_SEQ64_SIZE + \
>
>AMDGPU_VA_RESERVED_CSA_SIZE)
>
> /* See vm_update_mode */
>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>index 6604a3f99c5e..f899cce25b2a 100644
>--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>+++ b/drivers/gpu/drm

[PATCH] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-01-25 Thread Felix Kuehling
The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

Cc: Arunpravin Paneer Selvam 
Cc: Christian Koenig 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c|  7 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 +++-
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 823d31f4a2a3..53d0a458d78e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -28,9 +28,9 @@
 
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
 {
-   uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
+   uint64_t addr = AMDGPU_VA_RESERVED_CSA_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
 
-   addr -= AMDGPU_VA_RESERVED_CSA_SIZE;
addr = amdgpu_gmc_sign_extend(addr);
 
return addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..9e769ef50f2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -45,11 +45,8 @@
  */
 static inline u64 amdgpu_seq64_get_va_base(struct amdgpu_device *adev)
 {
-   u64 addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;
-
-   addr -= AMDGPU_VA_RESERVED_TOP;
-
-   return addr;
+   return AMDGPU_VA_RESERVED_SEQ64_START(
+   adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98a57192..f23b6153d310 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -135,11 +135,19 @@ struct amdgpu_mem_stats;
 #define AMDGPU_IS_MMHUB0(x) ((x) >= AMDGPU_MMHUB0_START && (x) < 
AMDGPU_MMHUB1_START)
 #define AMDGPU_IS_MMHUB1(x) ((x) >= AMDGPU_MMHUB1_START && (x) < 
AMDGPU_MAX_VMHUBS)
 
-/* Reserve 2MB at top/bottom of address space for kernel use */
+/* Reserve space at top/bottom of address space for kernel use */
 #define AMDGPU_VA_RESERVED_CSA_SIZE(2ULL << 20)
+#define AMDGPU_VA_RESERVED_CSA_START(top)  ((top) \
+- AMDGPU_VA_RESERVED_CSA_SIZE)
 #define AMDGPU_VA_RESERVED_SEQ64_SIZE  (2ULL << 20)
+#define AMDGPU_VA_RESERVED_SEQ64_START(top)
(AMDGPU_VA_RESERVED_CSA_START(top) \
+- 
AMDGPU_VA_RESERVED_SEQ64_SIZE)
+#define AMDGPU_VA_RESERVED_TRAP_SIZE   (2ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_START(top) 
(AMDGPU_VA_RESERVED_SEQ64_START(top) \
+- AMDGPU_VA_RESERVED_TRAP_SIZE)
 #define AMDGPU_VA_RESERVED_BOTTOM  (2ULL << 20)
-#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
+#define AMDGPU_VA_RESERVED_TOP (AMDGPU_VA_RESERVED_TRAP_SIZE + 
\
+AMDGPU_VA_RESERVED_SEQ64_SIZE 
+ \
 AMDGPU_VA_RESERVED_CSA_SIZE)
 
 /* See vm_update_mode */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 6604a3f99c5e..f899cce25b2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include "amdgpu_vm.h"
 
 /*
  * The primary memory I/O features being added for revisions of gfxip
@@ -326,10 +327,16 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
 * with small reserved space for kernel.
 * Set them to CANONICAL addresses.
 */
-   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_base = max(SVM_USER_BASE, AMDGPU_VA_RESERVED_BOTTOM);
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
 
+   /* dGPUs: the reserved space for kernel
+* before SVM
+*/
+   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+   pdd->qpd.ib_base =