from:"Jay Cornwall"

[PATCH] drm/amdkfd: Extend gfx12 trap handler fix to gfx10/11

2024-06-05 Thread Jay Cornwall

In commit 6d1878882d2d
("drm/amdkfd: gfx12 context save/restore trap handler fixes") the
following fix was introduced but incorrectly restricted to gfx12.
The same issue and a corresponding fix apply to gfx10 and gfx11.

Do not overwrite TRAPSTS.{SAVECTX,HOST_TRAP} when restoring this
register. Both of these fields can assert while the wavefront is
running the trap handler.

Signed-off-by: Jay Cornwall 
Cc: Lancelot Six 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 16 +---
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 38 ++-
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 665122d1bbbd..02f7ba8c93cd 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -1136,7 +1136,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x705d, 0x807c817c,
0x8070ff70, 0x0080,
0xbf0a7b7c, 0xbf85fff8,
-   0xbf82013d, 0xbef4037e,
+   0xbf82013f, 0xbef4037e,
0x8775ff7f, 0x,
0x8875ff75, 0x0004,
0xbef60380, 0xbef703ff,
@@ -1275,7 +1275,8 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x80788478, 0xbf8c,
0xb9eef815, 0xbefc036f,
0xbefe0370, 0xbeff0371,
-   0xb9f9f816, 0xb9fbf803,
+   0xb9f9f816, 0xb9fb4803,
+   0x907b8b7b, 0xb9fba2c3,
0xb9f3f801, 0xb96e3a05,
0x806e816e, 0xbf0d9972,
0xbf850002, 0x8f6e896e,
@@ -2544,7 +2545,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xe0704000, 0x705d,
0x807c817c, 0x8070ff70,
0x0080, 0xbf0a7b7c,
-   0xbf85fff8, 0xbf820134,
+   0xbf85fff8, 0xbf820136,
0xbef4037e, 0x8775ff7f,
0x, 0x8875ff75,
0x0004, 0xbef60380,
@@ -2683,7 +2684,8 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xf000, 0x80788478,
0xbf8c, 0xb9eef815,
0xbefc036f, 0xbefe0370,
-   0xbeff0371, 0xb9fbf803,
+   0xbeff0371, 0xb9fb4803,
+   0x907b8b7b, 0xb9fba2c3,
0xb9f3f801, 0xb96e3a05,
0x806e816e, 0xbf0d9972,
0xbf850002, 0x8f6e896e,
@@ -2981,7 +2983,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0x701d, 0x807d817d,
0x8070ff70, 0x0080,
0xbf0a7b7d, 0xbfa2fff8,
-   0xbfa0013f, 0xbef4007e,
+   0xbfa00143, 0xbef4007e,
0x8b75ff7f, 0x,
0x8c75ff75, 0x0004,
0xbef60080, 0xbef700ff,
@@ -3123,7 +3125,9 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0x80788478, 0xbf89,
0xb96ef815, 0xbefd006f,
0xbefe0070, 0xbeff0071,
-   0xb97bf803, 0xb973f801,
+   0xb97b4803, 0x857b8b7b,
+   0xb97b22c3, 0x857b867b,
+   0xb97b7443, 0xb973f801,
0xb8ee3b05, 0x806e816e,
0xbf0d9972, 0xbfa20002,
0x846e896e, 0xbfa1,
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
index ac3702b8e3c4..44772eec9ef4 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
@@ -119,9 +119,12 @@ var SQ_WAVE_TRAPSTS_ADDR_WATCH_SHIFT   = 7
 var SQ_WAVE_TRAPSTS_MEM_VIOL_MASK  = 0x100
 var SQ_WAVE_TRAPSTS_MEM_VIOL_SHIFT = 8
 var SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK  = 0x800
+var SQ_WAVE_TRAPSTS_ILLEGAL_INST_SHIFT = 11
 var SQ_WAVE_TRAPSTS_EXCP_HI_MASK   = 0x7000
 #if ASIC_FAMILY >= CHIP_PLUM_BONITO
+var SQ_WAVE_TRAPSTS_HOST_TRAP_SHIFT= 16
 var SQ_WAVE_TRAPSTS_WAVE_START_MASK= 0x2
+var SQ_WAVE_TRAPSTS_WAVE_START_SHIFT   = 17
 var SQ_WAVE_TRAPSTS_WAVE_END_MASK  = 0x4
 var SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK   = 0x10
 #endif
@@ -137,14 +140,23 @@ var SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK = 0x003F8000
 
 var SQ_WAVE_MODE_DEBUG_EN_MASK = 0x800
 
+var S_TRAPSTS_RESTORE_PART_1_SIZE  = SQ_WAVE_TRAPSTS_SAVECTX_SHIFT
+var S_TRAPSTS_RESTORE_PART_2_SHIFT = 
SQ_WAVE_TRAPSTS_ILLEGAL_INST_SHIFT
+
 #if ASIC_FAMILY < CHIP_PLUM_BONITO
 var S_TRAPSTS_NON_MASKABLE_EXCP_MASK   = 
SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK
+var S_TRAPSTS_RESTORE_PART_2_SIZE  = 32 - 
S_TRAPSTS_RESTORE_PART_2_SHIFT
+var S_TRAPSTS_RESTORE_PART_3_SHIFT = 0
+var S_TRAPSTS_RESTORE_PART_3_SIZE  = 0
 #else
 var S_TRAPSTS_NON_MASKABLE_EXCP_MASK   = SQ_WAVE_TRAPSTS_MEM_VIOL_MASK 
|\
  
SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK |\
  
SQ_WAVE_TRAPSTS_WAVE_START_MASK   |\
  SQ_WAVE_TRAPS

Re: [PATCH v2] drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler

2024-05-29 Thread Jay Cornwall


On 5/29/2024 16:07, Lancelot SIX wrote:


On 29/05/2024 20:35, Jay Cornwall wrote:

A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires VGPRs. In this case the trap handler should
terminate the wavefront.

Fixes intermittent VM faults under context switching load.

V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters


Hi Jay,

Thanks for the V2.

FYI,as far as I can see, the .h part of the patch does not seem to apply 
directly on current amd-staging-drm-next, but I guess we just have a 
different bases.


Sorry, it's parented to the commit below. This has been submitted but is 
working its way through post-submit testing.


Thanks for the review.


commit d6449614e21cc166f888b3d5fc59cd1156ed7e7d
Author: Jay Cornwall 
Date:   Thu May 23 09:00:28 2024 -0500

drm/amdkfd: gfx12 context save/restore trap handler fixes

[PATCH v2] drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler

2024-05-29 Thread Jay Cornwall

A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires VGPRs. In this case the trap handler should
terminate the wavefront.

Fixes intermittent VM faults under context switching load.

V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters

Signed-off-by: Jay Cornwall 
Cc: Lancelot Six 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 695 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  17 +
 2 files changed, 366 insertions(+), 346 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 85a41e121cce..665122d1bbbd 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -2705,7 +2705,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx11_hex[] = {
-   0xbfa1, 0xbfa00224,
+   0xbfa1, 0xbfa00227,
0xb0804006, 0xb8f8f802,
0x9178ff78, 0x00020006,
0xb8fbf803, 0xbf0d9e6d,
@@ -2750,399 +2750,400 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0x8b6dff6d, 0x,
0x8bfe7e7e, 0x8bea6a6a,
0xb978f802, 0xbe804a6c,
-   0x8b6dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xbeee007e, 0xbeef007f,
-   0xbefe0180, 0xbefe4d84,
-   0xbf89, 0x8b7aff7f,
-   0x0400, 0x847a857a,
-   0x8c6d7a6d, 0xbefa007e,
-   0x8b7bff7f, 0x,
-   0xbefe00c1, 0xbeff00c1,
-   0xdca6c000, 0x007a,
-   0x7e000280, 0xbefe007a,
-   0xbeff007b, 0xb8fb02dc,
-   0x847b997b, 0xb8fa3b05,
-   0x807a817a, 0xbf0d997b,
-   0xbfa20002, 0x847a897a,
-   0xbfa1, 0x847a8a7a,
-   0xb8fb1e06, 0x847b8a7b,
-   0x807a7b7a, 0x8b7bff7f,
-   0x, 0x807aff7a,
-   0x0200, 0x807a7e7a,
-   0x827b807b, 0xd761,
-   0x00010870, 0xd761,
-   0x00010a71, 0xd761,
-   0x00010c72, 0xd761,
-   0x00010e73, 0xd761,
-   0x00011074, 0xd761,
-   0x00011275, 0xd761,
-   0x00011476, 0xd761,
-   0x00011677, 0xd761,
-   0x00011a79, 0xd761,
-   0x00011c7e, 0xd761,
-   0x00011e7f, 0xbefe00ff,
-   0x3fff, 0xbeff0080,
-   0xdca6c040, 0x007a,
-   0xd760007a, 0x00011d00,
-   0xd760007b, 0x00011f00,
+   0xbf0d9878, 0xbfa10001,
+   0xbfb0, 0x8b6dff6d,
+   0x, 0xbefa0080,
+   0xb97a0283, 0xbeee007e,
+   0xbeef007f, 0xbefe0180,
+   0xbefe4d84, 0xbf89,
+   0x8b7aff7f, 0x0400,
+   0x847a857a, 0x8c6d7a6d,
+   0xbefa007e, 0x8b7bff7f,
+   0x, 0xbefe00c1,
+   0xbeff00c1, 0xdca6c000,
+   0x007a, 0x7e000280,
0xbefe007a, 0xbeff007b,
-   0xbef4007e, 0x8b75ff7f,
-   0x, 0x8c75ff75,
-   0x0004, 0xbef60080,
-   0xbef700ff, 0x10807fac,
-   0xbef1007d, 0xbef00080,
-   0xb8f302dc, 0x84739973,
-   0xbefe00c1, 0x857d9973,
-   0x8b7d817d, 0xbf06817d,
-   0xbfa20002, 0xbeff0080,
-   0xbfa2, 0xbeff00c1,
-   0xbfa9, 0xbef600ff,
-   0x0100, 0xe0685080,
-   0x701d0100, 0xe0685100,
-   0x701d0200, 0xe0685180,
-   0x701d0300, 0xbfa8,
+   0xb8fb02dc, 0x847b997b,
+   0xb8fa3b05, 0x807a817a,
+   0xbf0d997b, 0xbfa20002,
+   0x847a897a, 0xbfa1,
+   0x847a8a7a, 0xb8fb1e06,
+   0x847b8a7b, 0x807a7b7a,
+   0x8b7bff7f, 0x,
+   0x807aff7a, 0x0200,
+   0x807a7e7a, 0x827b807b,
+   0xd761, 0x00010870,
+   0xd761, 0x00010a71,
+   0xd761, 0x00010c72,
+   0xd761, 0x00010e73,
+   0xd761, 0x00011074,
+   0xd761, 0x00011275,
+   0xd761, 0x00011476,
+   0xd761, 0x00011677,
+   0xd761, 0x00011a79,
+   0xd761, 0x00011c7e,
+   0xd761, 0x00011e7f,
+   0xbefe00ff, 0x3fff,
+   0xbeff0080, 0xdca6c040,
+   0x007a, 0xd760007a,
+   0x00011d00, 0xd760007b,
+   0x00011f00, 0xbefe007a,
+   0xbeff007b, 0xbef4007e,
+   0x8b75ff7f, 0x,
+   0x8c75ff75, 0x0004,
+   0xbef60080, 0xbef700ff,
+   0x10807fac, 0xbef1007d,
+   0xbef00080, 0xb8f302dc,
+   0x84739973, 0xbefe00c1,
+   0x857d9973, 0x8b7d817d,
+   0xbf06817d, 0xbfa20002,
+   0xbeff0080, 0xbfa2,
+   0xbeff00c1, 0xbfa9,
0xbef600ff, 0x0100,
-   0xe0685100, 0x701d0100,
-   0xe0685200, 0x701d0200,
-   0xe0685300, 0x701d0300,
+   0xe0685080, 0x701d0100,
+   0xe0685100, 0x701d0200,
+   0xe0685180, 0x701d0300,
+   0xbfa8, 0xbef600ff,
+   0x0100, 0xe0685100,
+   0x701d0100, 0xe0685200,
+   0x701d0200, 0xe0685300,
+   0x701d0300, 0xb8f03b05,
+   0x80708170, 0xbf0d9973,
+   0xbfa20002

[PATCH] drm/amdkfd: Handle deallocated VPGRs in gfx10+ trap handler

2024-05-28 Thread Jay Cornwall

A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires VGPRs. In this case the trap handler should
terminate the wavefront.

Fixes intermittent VM faults under context switching load.

Signed-off-by: Jay Cornwall 
Cc: Lancelot Six 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 744 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  13 +
 2 files changed, 386 insertions(+), 371 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 85a41e121cce..74228b3b4905 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -2705,7 +2705,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx11_hex[] = {
-   0xbfa1, 0xbfa00224,
+   0xbfa1, 0xbfa00226,
0xb0804006, 0xb8f8f802,
0x9178ff78, 0x00020006,
0xb8fbf803, 0xbf0d9e6d,
@@ -2750,6 +2750,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0x8b6dff6d, 0x,
0x8bfe7e7e, 0x8bea6a6a,
0xb978f802, 0xbe804a6c,
+   0xbf0d9878, 0xbfa2030b,
0x8b6dff6d, 0x,
0xbefa0080, 0xb97a0283,
0xbeee007e, 0xbeef007f,
@@ -3635,7 +3636,7 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx12_hex[] = {
-   0xbfa1, 0xbfa00247,
+   0xbfa1, 0xbfa0024a,
0xb0804009, 0xb8f8f804,
0x9178ff78, 0x8c00,
0xb8fbf811, 0x8b6eff78,
@@ -3675,155 +3676,123 @@ static const uint32_t cwsr_trap_gfx12_hex[] = {
0x8b6dff6d, 0x,
0x8bfe7e7e, 0x8bea6a6a,
0x85788978, 0xb9783244,
-   0xbe804a6c, 0x8b6dff6d,
-   0x, 0xbefa0080,
-   0xb97a0151, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbefe4d84, 0xbf8a,
-   0x8b7aff7f, 0x0400,
-   0x847a857a, 0x8c6d7a6d,
-   0xbefa007e, 0x8b7bff7f,
-   0x, 0xbefe00c1,
-   0xbeff00c1, 0xee0a407a,
-   0x000c, 0x,
-   0x7e000280, 0xbefe007a,
-   0xbeff007b, 0xb8fb0742,
-   0x847b997b, 0xb8fa3b05,
-   0x807a817a, 0xbf0d997b,
-   0xbfa20002, 0x847a897a,
-   0xbfa1, 0x847a8a7a,
-   0xb8fb1e06, 0x847b8a7b,
-   0x807a7b7a, 0x8b7bff7f,
-   0x, 0x807aff7a,
-   0x0200, 0x807a7e7a,
-   0x827b807b, 0xd761,
-   0x00010870, 0xd761,
-   0x00010a71, 0xd761,
-   0x00010c72, 0xd761,
-   0x00010e73, 0xd761,
-   0x00011074, 0xd761,
-   0x00011275, 0xd761,
-   0x00011476, 0xd761,
-   0x00011677, 0xd761,
-   0x00011a79, 0xd761,
-   0x00011c7e, 0xd761,
-   0x00011e7f, 0xbefe00ff,
-   0x3fff, 0xbeff0080,
+   0xbe804a6c, 0xb8faf802,
+   0xbf0d987a, 0xbfa20366,
+   0x8b6dff6d, 0x,
+   0xbefa0080, 0xb97a0151,
+   0xbeee007e, 0xbeef007f,
+   0xbefe0180, 0xbefe4d84,
+   0xbf8a, 0x8b7aff7f,
+   0x0400, 0x847a857a,
+   0x8c6d7a6d, 0xbefa007e,
+   0x8b7bff7f, 0x,
+   0xbefe00c1, 0xbeff00c1,
0xee0a407a, 0x000c,
-   0x4000, 0xd760007a,
-   0x00011d00, 0xd760007b,
-   0x00011f00, 0xbefe007a,
-   0xbeff007b, 0xbef4007e,
-   0x8b75ff7f, 0x,
-   0x8c75ff75, 0x0004,
-   0xbef60080, 0xbef700ff,
-   0x10807fac, 0xbef1007d,
-   0xbef00080, 0xb8f30742,
-   0x84739973, 0xbefe00c1,
-   0x857d9973, 0x8b7d817d,
-   0xbf06817d, 0xbfa20002,
-   0xbeff0080, 0xbfa2,
-   0xbeff00c1, 0xbfac,
-   0xbef600ff, 0x0100,
-   0xc4068070, 0x008ce801,
-   0x8000, 0xc4068070,
-   0x008ce802, 0x0001,
-   0xc4068070, 0x008ce803,
-   0x00018000, 0xbfab,
-   0xbef600ff, 0x0100,
-   0xc4068070, 0x008ce801,
+   0x, 0x7e000280,
+   0xbefe007a, 0xbeff007b,
+   0xb8fb0742, 0x847b997b,
+   0xb8fa3b05, 0x807a817a,
+   0xbf0d997b, 0xbfa20002,
+   0x847a897a, 0xbfa1,
+   0x847a8a7a, 0xb8fb1e06,
+   0x847b8a7b, 0x807a7b7a,
+   0x8b7bff7f, 0x,
+   0x807aff7a, 0x0200,
+   0x807a7e7a, 0x827b807b,
+   0xd761, 0x00010870,
+   0xd761, 0x00010a71,
+   0xd761, 0x00010c72,
+   0xd761, 0x00010e73,
+   0xd761, 0x00011074,
+   0xd761, 0x00011275,
+   0xd761, 0x00011476,
+   0xd761, 0x00011677,
+   0xd761, 0x00011a79,
+   0xd761, 0x00011c7e,
+   0xd761, 0x00011e7f,
+   0xbefe00ff, 0x3fff,
+   0xbeff0080, 0xee0a407a,
+   0x000c, 0x4000,
+   0xd760007a, 0x00011d00,
+   0xd760007b, 0x00011f00,
+   0xbefe007a, 0xbeff007b,
+   0xbef4007e, 0x8b75ff7f,
+   0x

Re: [PATCH 3/3] drm/amdkfd: gfx12 context save/restore trap handler fixes

2024-05-23 Thread Jay Cornwall


On 5/23/2024 13:37, Lancelot SIX wrote:


@@ -622,8 +638,15 @@ L_SAVE_HWREG:
  #if ASIC_FAMILY >= CHIP_GFX12
  // Ensure no further changes to barrier or LDS state.
+    // STATE_PRIV.BARRIER_COMPLETE may change up to this point.
  s_barrier_signal    -2
  s_barrier_wait    -2
+
+    // Re-read final state of BARRIER_COMPLETE field for save.
+    s_getreg_b32    s_save_tmp, hwreg(S_STATUS_HWREG)
+    s_and_b32    s_save_tmp, s_save_tmp, 
SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK
+    s_andn2_b32    s_save_status, s_save_status, 
SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK


Even if BARRIER_COMPLETE can be asserted while we are in the trap 
hadler, I do not think it can be cleared.  That being said, it might be 
easier to just replace the bit, making it clearer.


Yes, I chose to structure it this way to make the intent clearer. We 
don't gain much from dropping the s_andn2. Most of the time spent in the 
save handler is stalled on memory instructions.



@@ -1351,7 +1369,17 @@ L_SKIP_BARRIER_RESTORE:
  s_setreg_b32    hwreg(HW_REG_SHADER_XNACK_MASK), 
s_restore_xnack_mask

  #endif
+#if ASIC_FAMILY < CHIP_GFX12
  s_setreg_b32    hwreg(S_TRAPSTS_HWREG), s_restore_trapsts


Wouldn't other gfx1x architectures have a similar issue when writing 
TRAPSTS here?  That is if TRAPSTS.SAVECTX is set while we are restoring, 
wouldn't we loose it?


And for gfx11, there is TRAPSTS.HOST_TRAP that could have the same issue 
to some degree (not sure if we would loose the host trap completly, or 
re-enter with trap ID + HT bit set in ttmp1).


Prior to gfx12 context save and host trap exceptions are not delivered 
to a wave until STATUS.PRIV=0, i.e. it leaves the trap handler.


The changes needed for gfx12 are due to a design change in this area. 
Exceptions are now flagged immediately and cause re-entry to the trap if 
any are non-zero.

[PATCH 3/3] drm/amdkfd: gfx12 context save/restore trap handler fixes

2024-05-23 Thread Jay Cornwall

Fix LDS size interpretation: 512 bytes (>= gfx12) vs 256 (< gfx12).

Ensure STATE_PRIV.BARRIER_COMPLETE cannot change after reading or
before writing. Other waves in the threadgroup may cause this field
to assert if they complete the barrier.

Do not overwrite EXCP_FLAG_PRIV.{SAVE_CONTEXT,HOST_TRAP} when
restoring this register. Both of these fields can assert while the
wavefront is running the trap handler.

Signed-off-by: Jay Cornwall 
Cc: Lancelot Six 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 1191 +
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|   55 +-
 2 files changed, 639 insertions(+), 607 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index d61b2c3bd0ac..85a41e121cce 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -678,7 +678,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
-   0xbf820001, 0xbf820394,
+   0xbf820001, 0xbf820393,
0xb0804004, 0xb978f802,
0x8a78ff78, 0x00020006,
0xb97bf803, 0x876eff78,
@@ -932,23 +932,48 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbf850002, 0xbeff0380,
0xbf820001, 0xbeff03c1,
0xb97b4306, 0x877bc17b,
-   0xbf840086, 0xbf8a,
+   0xbf840085, 0xbf8a,
0x877aff6d, 0x8000,
-   0xbf840082, 0x8f7b867b,
-   0x8f7b827b, 0xbef6037b,
-   0xb9703a05, 0x80708170,
-   0xbf0d9973, 0xbf850002,
-   0x8f708970, 0xbf820001,
-   0x8f708a70, 0xb97a1e06,
-   0x8f7a8a7a, 0x80707a70,
-   0x8070ff70, 0x0200,
-   0x8070ff70, 0x0080,
-   0xbef603ff, 0x0100,
-   0xd765, 0x000100c1,
-   0xd766, 0x000200c1,
-   0x1684, 0x907c9973,
-   0x877c817c, 0xbf06817c,
-   0xbefc0380, 0xbf850033,
+   0xbf840081, 0x8f7b887b,
+   0xbef6037b, 0xb9703a05,
+   0x80708170, 0xbf0d9973,
+   0xbf850002, 0x8f708970,
+   0xbf820001, 0x8f708a70,
+   0xb97a1e06, 0x8f7a8a7a,
+   0x80707a70, 0x8070ff70,
+   0x0200, 0x8070ff70,
+   0x0080, 0xbef603ff,
+   0x0100, 0xd765,
+   0x000100c1, 0xd766,
+   0x000200c1, 0x1684,
+   0x907c9973, 0x877c817c,
+   0xbf06817c, 0xbefc0380,
+   0xbf850033, 0xb97af803,
+   0x8a7a7aff, 0x1000,
+   0xbf85001d, 0xd8d8,
+   0x0100, 0xbf8c,
+   0xbe840380, 0xd760,
+   0x0901, 0x80048104,
+   0xd761, 0x0901,
+   0x80048104, 0xd762,
+   0x0901, 0x80048104,
+   0xd763, 0x0901,
+   0x80048104, 0xf469003a,
+   0xe000, 0x80709070,
+   0xbf06a004, 0xbf84ffef,
+   0x807cff7c, 0x0080,
+   0xd525, 0x0001ff00,
+   0x0080, 0xbf0a7b7c,
+   0xbf85ffe4, 0xbf820044,
+   0xbe8303ff, 0x0080,
+   0xbf80, 0xbf80,
+   0xbf80, 0xd8d8,
+   0x0100, 0xbf8c,
+   0xe0704000, 0x705d0100,
+   0x807c037c, 0x80700370,
+   0xd525, 0x0001ff00,
+   0x0080, 0xbf0a7b7c,
+   0xbf85fff4, 0xbf820032,
0xb97af803, 0x8a7a7aff,
0x1000, 0xbf85001d,
0xd8d8, 0x0100,
@@ -960,24 +985,45 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x80048104, 0xd763,
0x0901, 0x80048104,
0xf469003a, 0xe000,
-   0x80709070, 0xbf06a004,
+   0x80709070, 0xbf06c004,
0xbf84ffef, 0x807cff7c,
-   0x0080, 0xd525,
-   0x0001ff00, 0x0080,
+   0x0100, 0xd525,
+   0x0001ff00, 0x0100,
0xbf0a7b7c, 0xbf85ffe4,
-   0xbf820044, 0xbe8303ff,
-   0x0080, 0xbf80,
+   0xbf820011, 0xbe8303ff,
+   0x0100, 0xbf80,
0xbf80, 0xbf80,
0xd8d8, 0x0100,
0xbf8c, 0xe0704000,
0x705d0100, 0x807c037c,
0x80700370, 0xd525,
-   0x0001ff00, 0x0080,
+   0x0001ff00, 0x0100,
0xbf0a7b7c, 0xbf85fff4,
-   0xbf820032, 0xb97af803,
-   0x8a7a7aff, 0x1000,
-   0xbf85001d, 0xd8d8,
-   0x0100, 0xbf8c,
+   0xbefe03c1, 0x907c9973,
+   0x877c817c, 0xbf06817c,
+   0xbf850004, 0xbef003ff,
+   0x0200, 0xbeff0380,
+   0xbf820003, 0xbef003ff,
+   0x0400, 0xbeff03c1,
+   0xb97b3a05, 0x807b817b,
+   0x8f7b827b, 0x907c9973,
+   0x877c817c, 0xbf06817c,
+   0xbf85006b, 0xbef603ff,
+   0x0100, 0xbefc0384,
+   0xbf0a7b7c, 0xbf8400fa,
+   0xb97af803, 0x8a7a7aff,
+   0x1000, 0xbf850050,
+   0x7e008700, 0x7e028701,
+   0x7e048702, 0x7e068703,
+   0xbe840380, 0xd760,
+   0x0900, 0x80048104,
+   0xd761, 0x0900,
+   0x80048104, 0xd762,
+   0x0900, 0x80048104,
+   0xd763, 0x0900,
+   0x80048104, 0xf469003a,
+   0xe0

[PATCH 2/3] drm/amdkfd: Replace deprecated gfx12 trap handler instructions

2024-05-23 Thread Jay Cornwall

Newer assemblers reject S_WAITCNT. All instances of S_WAITCNT can be
replaced by S_WAITCNT 0 (< gfx12) or S_WAIT_IDLE (>= gfx12) since
there is no concurrency of different memory instruction classes.

Signed-off-by: Jay Cornwall 
Cc: Lancelot Six 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 140 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  52 +++
 2 files changed, 97 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 11d076eb770c..d61b2c3bd0ac 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -711,12 +711,12 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbf0d8f7b, 0xbf840002,
0x887bff7b, 0x,
0xf4011bbd, 0xfa10,
-   0xbf8cc07f, 0x8f6e976e,
+   0xbf8c, 0x8f6e976e,
0x8a77ff77, 0x0080,
0x88776e77, 0xf4051bbd,
-   0xfa00, 0xbf8cc07f,
+   0xfa00, 0xbf8c,
0xf4051ebd, 0xfa08,
-   0xbf8cc07f, 0x87ee6e6e,
+   0xbf8c, 0x87ee6e6e,
0xbf840001, 0xbe80206e,
0x876eff6d, 0x00ff,
0xbf850008, 0x876eff6d,
@@ -1185,7 +1185,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x785d, 0xe0304080,
0x785d0100, 0xe0304100,
0x785d0200, 0xe0304180,
-   0x785d0300, 0xbf8c3f70,
+   0x785d0300, 0xbf8c,
0x7e008500, 0x7e028501,
0x7e048502, 0x7e068503,
0x807c847c, 0x8078ff78,
@@ -1194,7 +1194,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x6e5d, 0xe0304080,
0x6e5d0100, 0xe0304100,
0x6e5d0200, 0xe0304180,
-   0x6e5d0300, 0xbf8c3f70,
+   0x6e5d0300, 0xbf8c,
0xbf820034, 0xbef603ff,
0x0100, 0xbeee0378,
0x8078ff78, 0x0400,
@@ -1203,7 +1203,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x785d, 0xe0304100,
0x785d0100, 0xe0304200,
0x785d0200, 0xe0304300,
-   0x785d0300, 0xbf8c3f70,
+   0x785d0300, 0xbf8c,
0x7e008500, 0x7e028501,
0x7e048502, 0x7e068503,
0x807c847c, 0x8078ff78,
@@ -1213,7 +1213,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x8f6f836f, 0x806f7c6f,
0xbefe03c1, 0xbeff0380,
0xe0304000, 0x785d,
-   0xbf8c3f70, 0x7e008500,
+   0xbf8c, 0x7e008500,
0x807c817c, 0x8078ff78,
0x0080, 0xbf0a6f7c,
0xbf85fff7, 0xbeff03c1,
@@ -1221,7 +1221,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xe0304100, 0x6e5d0100,
0xe0304200, 0x6e5d0200,
0xe0304300, 0x6e5d0300,
-   0xbf8c3f70, 0xb9783a05,
+   0xbf8c, 0xb9783a05,
0x80788178, 0xbf0d9972,
0xbf850002, 0x8f788978,
0xbf820001, 0x8f788a78,
@@ -1232,16 +1232,16 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x0100, 0xbefc03ff,
0x006c, 0x80f89078,
0xf429003a, 0xf000,
-   0xbf8cc07f, 0x80fc847c,
+   0xbf8c, 0x80fc847c,
0xbf80, 0xbe803100,
0xbe823102, 0x80f8a078,
0xf42d003a, 0xf000,
-   0xbf8cc07f, 0x80fc887c,
+   0xbf8c, 0x80fc887c,
0xbf80, 0xbe803100,
0xbe823102, 0xbe843104,
0xbe863106, 0x80f8c078,
0xf431003a, 0xf000,
-   0xbf8cc07f, 0x80fc907c,
+   0xbf8c, 0x80fc907c,
0xbf80, 0xbe803100,
0xbe823102, 0xbe843104,
0xbe863106, 0xbe883108,
@@ -1271,9 +1271,9 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xf4211cfa, 0xf000,
0x80788478, 0xf4211bba,
0xf000, 0x80788478,
-   0xbf8cc07f, 0xb9eef814,
+   0xbf8c, 0xb9eef814,
0xf4211bba, 0xf000,
-   0x80788478, 0xbf8cc07f,
+   0x80788478, 0xbf8c,
0xb9eef815, 0xbefc036f,
0xbefe0370, 0xbeff0371,
0xb9f9f816, 0xb9fbf803,
@@ -1288,7 +1288,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x, 0xf4091c37,
0xfa50, 0xf4091d37,
0xfa60, 0xf4011e77,
-   0xfa74, 0xbf8cc07f,
+   0xfa74, 0xbf8c,
0x906e8977, 0x876fff6e,
0x003f8000, 0x906e8677,
0x876eff6e, 0x0200,
@@ -2299,12 +2299,12 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf0d8f7b, 0xbf840002,
0x887bff7b, 0x,
0xf4011bbd, 0xfa10,
-   0xbf8cc07f, 0x8f6e976e,
+   0xbf8c, 0x8f6e976e,
0x8a77ff77, 0x0080,
0x88776e77, 0xf4051bbd,
-   0xfa00, 0xbf8cc07f,
+   0xfa00, 0xbf8c,
0xf4051ebd, 0xfa08,
-   0xbf8cc07f, 0x87ee6e6e,
+   0xbf8c, 0x87ee6e6e,
0xbf840001, 0xbe80206e,
0x876eff6d, 0x00ff,
0xbf850008, 0x876eff6d,
@@ -2319,7 +2319,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x, 0xbefa0380,
0xb9fa0283, 0xbeee037e,
0xbe

[PATCH 1/3] drm/amdkfd: Sync trap handler binary with source

2024-05-23 Thread Jay Cornwall

Source and binary have become mismatched during branch activity.

Signed-off-by: Jay Cornwall 
Cc: Lancelot Six 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 57 ---
 1 file changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 73d3772cdb76..11d076eb770c 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -718,12 +718,12 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xf4051ebd, 0xfa08,
0xbf8cc07f, 0x87ee6e6e,
0xbf840001, 0xbe80206e,
-   0x876eff6d, 0x01ff,
-   0xbf850005, 0x8878ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x876eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
+   0x876eff6d, 0x00ff,
+   0xbf850008, 0x876eff6d,
+   0x0100, 0xbf850007,
+   0x8878ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820002, 0x806c846c,
0x826d806d, 0x876dff6d,
0x, 0x907a8977,
0x877bff7a, 0x003f8000,
@@ -1136,7 +1136,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xe0704000, 0x705d,
0x807c817c, 0x8070ff70,
0x0080, 0xbf0a7b7c,
-   0xbf85fff8, 0xbf820144,
+   0xbf85fff8, 0xbf82013e,
0xbef4037e, 0x8775ff7f,
0x, 0x8875ff75,
0x0004, 0xbef60380,
@@ -1276,10 +1276,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x80788478, 0xbf8cc07f,
0xb9eef815, 0xbefc036f,
0xbefe0370, 0xbeff0371,
-   0x876f7bff, 0x03ff,
-   0xb9ef4803, 0xb9f9f816,
-   0x876f7bff, 0xf800,
-   0x906f8b6f, 0xb9efa2c3,
+   0xb9f9f816, 0xb9fbf803,
0xb9f3f801, 0xb96e3a05,
0x806e816e, 0xbf0d9972,
0xbf850002, 0x8f6e896e,
@@ -2309,12 +2306,12 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xf4051ebd, 0xfa08,
0xbf8cc07f, 0x87ee6e6e,
0xbf840001, 0xbe80206e,
-   0x876eff6d, 0x01ff,
-   0xbf850005, 0x8878ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x876eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
+   0x876eff6d, 0x00ff,
+   0xbf850008, 0x876eff6d,
+   0x0100, 0xbf850007,
+   0x8878ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820002, 0x806c846c,
0x826d806d, 0x876dff6d,
0x, 0x87fe7e7e,
0x87ea6a6a, 0xb9f8f802,
@@ -2549,7 +2546,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x705d, 0x807c817c,
0x8070ff70, 0x0080,
0xbf0a7b7c, 0xbf85fff8,
-   0xbf82013b, 0xbef4037e,
+   0xbf820135, 0xbef4037e,
0x8775ff7f, 0x,
0x8875ff75, 0x0004,
0xbef60380, 0xbef703ff,
@@ -2688,10 +2685,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xf000, 0x80788478,
0xbf8cc07f, 0xb9eef815,
0xbefc036f, 0xbefe0370,
-   0xbeff0371, 0x876f7bff,
-   0x03ff, 0xb9ef4803,
-   0x876f7bff, 0xf800,
-   0x906f8b6f, 0xb9efa2c3,
+   0xbeff0371, 0xb9fbf803,
0xb9f3f801, 0xb96e3a05,
0x806e816e, 0xbf0d9972,
0xbf850002, 0x8f6e896e,
@@ -2749,11 +2743,11 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0xf808, 0xbf89fc07,
0x8bee6e6e, 0xbfa10001,
0xbe80486e, 0x8b6eff6d,
-   0x01ff, 0xbfa20005,
-   0x8c78ff78, 0x2000,
-   0x80ec886c, 0x82ed806d,
-   0xbfa5, 0x8b6eff6d,
-   0x0100, 0xbfa20002,
+   0x00ff, 0xbfa20008,
+   0x8b6eff6d, 0x0100,
+   0xbfa20007, 0x8c78ff78,
+   0x2000, 0x80ec886c,
+   0x82ed806d, 0xbfa2,
0x806c846c, 0x826d806d,
0x8b6dff6d, 0x,
0x8bfe7e7e, 0x8bea6a6a,
@@ -2988,7 +2982,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0x701d, 0x807d817d,
0x8070ff70, 0x0080,
0xbf0a7b7d, 0xbfa2fff8,
-   0xbfa00146, 0xbef4007e,
+   0xbfa00140, 0xbef4007e,
0x8b75ff7f, 0x,
0x8c75ff75, 0x0004,
0xbef60080, 0xbef700ff,
@@ -3130,10 +3124,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0xf000, 0x80788478,
0xbf89fc07, 0xb96ef815,
0xbefd006f, 0xbefe0070,
-   0xbeff0071, 0x8b6f7bff,
-   0x03ff, 0xb96f4803,
-   0x8b6f7bff, 0xf800,
-   0x856f8b6f, 0xb96fa2c3,
+   0xbeff0071, 0xb97bf803,
0xb973f801, 0xb8ee3b05,
0x806e816e, 0xbf0d9972,
0xbfa20002, 0x846e896e,
@@ -4119,7 +4110,7 @@ static const uint32_t cwsr_trap_gfx12_hex[] = {
0x8b6dff6d, 0x,
0x8bfe7e7e, 0x8bea6a6a,
0xb97af804, 0xbe804a6c,
-   0xbfb0, 0xbf9f,
+   0xbfb1, 0xbf9f,
0xbf9f, 0xbf9f,
0xbf9f, 0xbf9f,
 };
-- 
2.34.1

Re: [PATCH] drm/amdkfd: update buffer_{store,load}_* modifiers for gfx940

2024-04-29 Thread Jay Cornwall


On 4/29/2024 06:06, Lancelot SIX wrote:

Instruction modifiers of the untyped vector memory buffer instructions
(MUBUF encoded) changed in gfx940.  The slc, scc and glc modifiers have
been replaced with sc0, sc1 and nt.

The current CWSR trap handler is written using pre-gfx940 modifier
names, making the source incompatible with a strict gfx940 assembler.

This patch updates the cwsr_trap_handler_gfx9.s source file to be
compatible with all gfx9 variants of the ISA.  The binary assembled code
is unchanged (so the behaviour is unchanged as well), only the source
representation is updated.

Signed-off-by: Lancelot SIX 
---
  .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index bb26338204f4..a2d597d7fb57 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -48,6 +48,12 @@ var ACK_SQC_STORE=   1   
//workaround for suspected SQC store bug causing
  var SAVE_AFTER_XNACK_ERROR=   1   //workaround for 
TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger
  var SINGLE_STEP_MISSED_WORKAROUND   = (ASIC_FAMILY <= CHIP_ALDEBARAN)  
//workaround for lost MODE.DEBUG_EN exception when SAVECTX raised
  
+#if ASIC_FAMILY < CHIP_GC_9_4_3

+#define VMEM_MODIFIERS slc:1 glc:1
+#else
+#define VMEM_MODIFIERS sc0:1 nt:1
+#endif
+
  /**/
  /*variables */
  /**/
@@ -581,7 +587,7 @@ end
  L_SAVE_LDS_LOOP_VECTOR:
ds_read_b64 v[0:1], v2  //x =LDS[a], byte address
s_waitcnt lgkmcnt(0)
-  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset 
offen:1  glc:1  slc:1
+  buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset 
VMEM_MODIFIERS offen:1
  //s_waitcnt vmcnt(0)
  //v_add_u32 v2, vcc[0:1], v2, v3
v_add_u32 v2, v2, v3
@@ -979,17 +985,17 @@ L_TCP_STORE_CHECK_DONE:
  end
  
  function write_4vgprs_to_mem(s_rsrc, s_mem_offset)

-   buffer_store_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-   buffer_store_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1  offset:256
-   buffer_store_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*2
-   buffer_store_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1  
offset:256*3
+   buffer_store_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+   buffer_store_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256
+   buffer_store_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+   buffer_store_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3
  end
  
  function read_4vgprs_from_mem(s_rsrc, s_mem_offset)

-   buffer_load_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1
-   buffer_load_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256
-   buffer_load_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*2
-   buffer_load_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*3
+   buffer_load_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
+   buffer_load_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256
+   buffer_load_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*2
+   buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS 
offset:256*3
s_waitcnt vmcnt(0)
  end
  


base-commit: cf743996352e327f483dc7d66606c90276f57380


Reviewed-by: Jay Cornwall

Re: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-03-01 Thread Jay Cornwall

On 3/1/2024 00:35, Kim, Jonathan wrote:

> The range check should probably flag any exception prefixed as 
> EC_QUEUE_PACKET_* as valid defined in kfd_dbg_trap_exception_code:
> https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L857
> + Jay to confirm this is the correct exception range for CP_BAD_OPCODE

Yes, that covers the full range of possible values.

Re: [PATCH] drm/amdkfd: Use SQC when TCP would fail in gfx10.1 context save

2024-02-26 Thread Jay Cornwall

On 2/23/2024 16:08, Laurent Morichetti wrote:
> Similarly to gfx9, gfx10.1 drops vector stores when an xnack error is
> raised. To work around this issue, use scalar stores instead of vector
> stores when trapsts.xnack_error == 1.
> 
> Signed-off-by: Laurent Morichetti 

Reviewed-by: Jay Cornwall

Re: [PATCH] amdkfd: fix the cwsr trap handler for gfx11

2024-01-31 Thread Jay Cornwall

On 1/31/2024 12:50, Laurent Morichetti wrote:

> Call the 2nd level trap handler if the cwsr handler is entered with any
> one of wave_state, wave_end, or trap_after_inst exceptions.

^ wave_start

A more descriptive title would be helpful. Perhaps something like "Pass debug 
exceptions to second-level trap handler".

Besides that:

Reviewed-by: Jay Cornwall

Re: [PATCH] drm/amdkfd: Use S_ENDPGM_SAVED in trap handler

2024-01-24 Thread Jay Cornwall

On 1/15/2024 13:07, Jay Cornwall wrote:
> This instruction has no functional difference to S_ENDPGM
> but allows performance counters to track save events correctly.
> 
> Signed-off-by: Jay Cornwall 
> ---
>  drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++---
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm |  2 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  |  2 +-
>  3 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
> b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> index df75863393fc..d1caaf0e6a7c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> @@ -674,7 +674,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
>   0x86ea6a6a, 0x8f6e837a,
>   0xb96ee0c2, 0xbf82,
>   0xb97a0002, 0xbf8a,
> - 0xbe801f6c, 0xbf81,
> + 0xbe801f6c, 0xbf9b,
>  };
>  
>  static const uint32_t cwsr_trap_nv1x_hex[] = {
> @@ -1091,7 +1091,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
>   0xb9eef807, 0x876dff6d,
>   0x, 0x87fe7e7e,
>   0x87ea6a6a, 0xb9faf802,
> - 0xbe80226c, 0xbf81,
> + 0xbe80226c, 0xbf9b,
>   0xbf9f, 0xbf9f,
>   0xbf9f, 0xbf9f,
>   0xbf9f, 0x,
> @@ -1574,7 +1574,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
>   0x86ea6a6a, 0x8f6e837a,
>   0xb96ee0c2, 0xbf82,
>   0xb97a0002, 0xbf8a,
> - 0xbe801f6c, 0xbf81,
> + 0xbe801f6c, 0xbf9b,
>  };
>  
>  static const uint32_t cwsr_trap_aldebaran_hex[] = {
> @@ -2065,7 +2065,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
>   0x86ea6a6a, 0x8f6e837a,
>   0xb96ee0c2, 0xbf82,
>   0xb97a0002, 0xbf8a,
> - 0xbe801f6c, 0xbf81,
> + 0xbe801f6c, 0xbf9b,
>  };
>  
>  static const uint32_t cwsr_trap_gfx10_hex[] = {
> @@ -2500,7 +2500,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
>   0x876dff6d, 0x,
>   0x87fe7e7e, 0x87ea6a6a,
>   0xb9faf802, 0xbe80226c,
> - 0xbf81, 0xbf9f,
> + 0xbf9b, 0xbf9f,
>   0xbf9f, 0xbf9f,
>   0xbf9f, 0xbf9f,
>  };
> @@ -2944,7 +2944,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
>   0xb8eef802, 0xbf0d866e,
>   0xbfa20002, 0xb97af802,
>   0xbe80486c, 0xb97af802,
> - 0xbe804a6c, 0xbfb0,
> + 0xbe804a6c, 0xbfb1,
>   0xbf9f, 0xbf9f,
>   0xbf9f, 0xbf9f,
>   0xbf9f, 0x,
> @@ -3436,5 +3436,5 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
>   0x86ea6a6a, 0x8f6e837a,
>   0xb96ee0c2, 0xbf82,
>   0xb97a0002, 0xbf8a,
> - 0xbe801f6c, 0xbf81,
> + 0xbe801f6c, 0xbf9b,
>  };
> diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
> b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> index e0140df0b0ec..71b3dc0c7363 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> @@ -1104,7 +1104,7 @@ L_RETURN_WITHOUT_PRIV:
>   s_rfe_b64   s_restore_pc_lo 
> //Return to the main shader program and resume execution
>  
>  L_END_PGM:
> - s_endpgm
> + s_endpgm_saved
>  end
>  
>  function write_hwreg_to_mem(s, s_rsrc, s_mem_offset)
> diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
> b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
> index e506411ad28a..bb26338204f4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
> @@ -921,7 +921,7 @@ L_RESTORE:
>  /*   the END   */
>  /**/
>  L_END_PGM:
> -s_endpgm
> +s_endpgm_saved
>  
>  end
>  

Ping. Patch has been tested and verified, just looking for an Ack.

[PATCH] drm/amdkfd: Use S_ENDPGM_SAVED in trap handler

2024-01-15 Thread Jay Cornwall

This instruction has no functional difference to S_ENDPGM
but allows performance counters to track save events correctly.

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm |  2 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index df75863393fc..d1caaf0e6a7c 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -674,7 +674,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x86ea6a6a, 0x8f6e837a,
0xb96ee0c2, 0xbf82,
0xb97a0002, 0xbf8a,
-   0xbe801f6c, 0xbf81,
+   0xbe801f6c, 0xbf9b,
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
@@ -1091,7 +1091,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb9eef807, 0x876dff6d,
0x, 0x87fe7e7e,
0x87ea6a6a, 0xb9faf802,
-   0xbe80226c, 0xbf81,
+   0xbe80226c, 0xbf9b,
0xbf9f, 0xbf9f,
0xbf9f, 0xbf9f,
0xbf9f, 0x,
@@ -1574,7 +1574,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0x86ea6a6a, 0x8f6e837a,
0xb96ee0c2, 0xbf82,
0xb97a0002, 0xbf8a,
-   0xbe801f6c, 0xbf81,
+   0xbe801f6c, 0xbf9b,
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
@@ -2065,7 +2065,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0x86ea6a6a, 0x8f6e837a,
0xb96ee0c2, 0xbf82,
0xb97a0002, 0xbf8a,
-   0xbe801f6c, 0xbf81,
+   0xbe801f6c, 0xbf9b,
 };
 
 static const uint32_t cwsr_trap_gfx10_hex[] = {
@@ -2500,7 +2500,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x876dff6d, 0x,
0x87fe7e7e, 0x87ea6a6a,
0xb9faf802, 0xbe80226c,
-   0xbf81, 0xbf9f,
+   0xbf9b, 0xbf9f,
0xbf9f, 0xbf9f,
0xbf9f, 0xbf9f,
 };
@@ -2944,7 +2944,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
0xb8eef802, 0xbf0d866e,
0xbfa20002, 0xb97af802,
0xbe80486c, 0xb97af802,
-   0xbe804a6c, 0xbfb0,
+   0xbe804a6c, 0xbfb1,
0xbf9f, 0xbf9f,
0xbf9f, 0xbf9f,
0xbf9f, 0x,
@@ -3436,5 +3436,5 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
0x86ea6a6a, 0x8f6e837a,
0xb96ee0c2, 0xbf82,
0xb97a0002, 0xbf8a,
-   0xbe801f6c, 0xbf81,
+   0xbe801f6c, 0xbf9b,
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
index e0140df0b0ec..71b3dc0c7363 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
@@ -1104,7 +1104,7 @@ L_RETURN_WITHOUT_PRIV:
s_rfe_b64   s_restore_pc_lo 
//Return to the main shader program and resume execution
 
 L_END_PGM:
-   s_endpgm
+   s_endpgm_saved
 end
 
 function write_hwreg_to_mem(s, s_rsrc, s_mem_offset)
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index e506411ad28a..bb26338204f4 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -921,7 +921,7 @@ L_RESTORE:
 /* the END   */
 /**/
 L_END_PGM:
-s_endpgm
+s_endpgm_saved
 
 end
 
-- 
2.25.1

Re: [PATCH] drm/amdkfd: Fix the shift-out-of-bounds warning

2024-01-11 Thread Jay Cornwall

On 1/11/2024 11:25, Kim, Jonathan wrote:

>> This looks OK. The compiler must be warning about a potential problem
>> here, not a definite one.
>>
>> Question for Jon, how does the firmware encode the error code in the
>> context ID? I see these macros:
>>
>> #define KFD_DEBUG_CP_BAD_OP_ECODE_MASK  0x3fffc00
>> #define KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT 10
>> #define KFD_DEBUG_CP_BAD_OP_ECODE(ctxid0) (((ctxid0) &  \
>>  KFD_DEBUG_CP_BAD_OP_ECODE_MASK) \
>>  >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT)
>>
>> It looks like we have 16 bits for the ECODE. That's enough to have a bit
>> mask. Do we really need KFD_EC_MASK to convert an error number into a
>> bitmask here?
> 
> Added Jay for confirmation.
> I could be wrong but IIRC (and I'm quite fuzzy on this ... probably should 
> document this), unlike the wave trap code interrupt mask (bit mask) the CP 
> bad op code is a single error code that directly points to one of the 
> exception code enums that we defined in the user API header.
> If that's the case, the KFD_EC_MASK is convenient for the kfd debugger code 
> to mask the payload to send to the debugger or runtime.
> If that's been wrong this whole time (i.e. the bad ops code is actually a 
> bitwise mask of ecodes), then I'm not sure how we were able to get away with 
> running the runtime negative tests for as long as we have and we'd need to 
> recheck those tests.

That's right. Queue errors are serialized. The error code is recorded directly.

Wavefront errors may occur concurrently within a wavefront. Those are recorded 
as a bitmask.

>> In the code above, if ecode is 0, that would lead to calling
>> kfd_set_dbg_ev_from_interrupt with a event mask of 0. Not sure if that
>> even makes sense. Jon, so we need special handling of cases where the
>> error code is 0 or out of range, so we can warn about buggy firmware
>> rather than creating nonsensical events for the debugger?
> 
> That makes sense.  Again, deferring to Jay if a NULL cp bad op code is 
> expected under any circumstances.
> Either way, raising undefined events to the debugger or runtime isn't useful 
> so range checking to filter out non-encoded cp bad op interrupts would be 
> needed.

On AQL queues this interrupt carries an error code beginning from 16.

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Jay Cornwall

On 1/3/2024 12:58, Felix Kuehling wrote:

> A segfault in Mesa seems to be a different issue from what's mentioned 
> in the commit message. I'd let Christian or Marek comment on 
> compatibility with graphics UMDs. I'm not sure why this patch would 
> affect them at all.

I was referencing this issue in OpenCL/OpenGL interop, which certainly looked 
related:

[   91.769002] amdgpu :0a:00.0: amdgpu: bo 9bba4692 va 
0x08-0x0801ff conflict with 0x08-0x080002
[   91.769141] ocltst[2781]: segfault at b2 ip 7f3fb90a7c39 sp 
7ffd3c011ba0 error 4 in radeonsi_dri.so[7f3fb888e000+1196000] likely on CPU 
15 (core 7, socket 0)

> 
> Looking at the logs in the tickets, it looks like a fence reference 
> counting error. I don't see how Jay's patch could have caused that. I 
> made another change in that code recently that could make a difference 
> for this issue:
> 
> commit 8f08c5b24ced1be7eb49692e4816c1916233c79b
> Author: Felix Kuehling 
> Date:   Fri Oct 27 18:21:55 2023 -0400
> 
>      drm/amdkfd: Run restore_workers on freezable WQs
> 
>      Make restore workers freezable so we don't have to explicitly
> flush them
>      in suspend and GPU reset code paths, and we don't accidentally
> try to
>      restore BOs while the GPU is suspended. Not having to flush
> restore_work
>      also helps avoid lock/fence dependencies in the GPU reset case
> where we're
>      not allowed to wait for fences.
> 
>      A side effect of this is, that we can now have multiple
> concurrent threads
>      trying to signal the same eviction fence. Rework eviction fence
> signaling
>      and replacement to account for that.
> 
>      The GPU reset path can no longer rely on restore_process_worker
> to resume
>      queues because evict/restore workers can run independently of
> it. Instead
>      call a new restore_process_helper directly.
> 
>      This is an RFC and request for testing.
> 
>      v2:
>      - Reworked eviction fence signaling
>      - Introduced restore_process_helper
> 
>      v3:
>      - Handle unsignaled eviction fences in restore_process_bos
> 
>      Signed-off-by: Felix Kuehling 
>      Acked-by: Christian König 
>      Tested-by: Emily Deng 
>      Signed-off-by: Alex Deucher 
> 
> 
> FWIW, I built a plain 6.6 kernel, and was not able to reproduce the 
> crash with some simple tests.
> 
> Regards,
>    Felix
> 
> 
>>
>> So I agree, let's revert it.
>>
>> Reviewed-by: Jay Cornwall

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Jay Cornwall

On 1/3/2024 09:19, Alex Deucher wrote:
> + Jay, Felix
> 
> On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma  wrote:
>>
>> That commit causes NULL pointer dereferences in dmesgs when
>> running applications using ROCm, including clinfo, blender,
>> and PyTorch, since v6.6.1. Revert it to fix blender again.
>>
>> This reverts commit 96c211f1f9ef82183493f4ceed4e347b52849149.
>>
>> Closes: https://github.com/ROCm/ROCm/issues/2596
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2991
>> Signed-off-by: Kaibo Ma 
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 26 ++--
>>  1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>> index 62b205dac..6604a3f99 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
>> @@ -330,12 +330,6 @@ static void kfd_init_apertures_vi(struct 
>> kfd_process_device *pdd, uint8_t id)
>> pdd->gpuvm_limit =
>> pdd->dev->kfd->shared_resources.gpuvm_size - 1;
>>
>> -   /* dGPUs: the reserved space for kernel
>> -* before SVM
>> -*/
>> -   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
>> -   pdd->qpd.ib_base = SVM_IB_BASE;
>> -
>> pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
>> pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
>>  }
>> @@ -345,18 +339,18 @@ static void kfd_init_apertures_v9(struct 
>> kfd_process_device *pdd, uint8_t id)
>> pdd->lds_base = MAKE_LDS_APP_BASE_V9();
>> pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
>>
>> -   pdd->gpuvm_base = PAGE_SIZE;
>> +/* Raven needs SVM to support graphic handle, etc. Leave the small
>> + * reserved space before SVM on Raven as well, even though we don't
>> + * have to.
>> + * Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that 
>> they
>> + * are used in Thunk to reserve SVM.
>> + */
>> +pdd->gpuvm_base = SVM_USER_BASE;
>> pdd->gpuvm_limit =
>> pdd->dev->kfd->shared_resources.gpuvm_size - 1;
>>
>> pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9();
>> pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
>> -
>> -   /*
>> -* Place TBA/TMA on opposite side of VM hole to prevent
>> -* stray faults from triggering SVM on these pages.
>> -*/
>> -   pdd->qpd.cwsr_base = pdd->dev->kfd->shared_resources.gpuvm_size;
>>  }
>>
>>  int kfd_init_apertures(struct kfd_process *process)
>> @@ -413,6 +407,12 @@ int kfd_init_apertures(struct kfd_process *process)
>> return -EINVAL;
>> }
>> }
>> +
>> +/* dGPUs: the reserved space for kernel
>> + * before SVM
>> + */
>> +pdd->qpd.cwsr_base = SVM_CWSR_BASE;
>> +pdd->qpd.ib_base = SVM_IB_BASE;
>> }
>>
>> dev_dbg(kfd_device, "node id %u\n", id);
>> --
>> 2.42.0
>>

I saw a segfault issue in Mesa yesterday. Not sure about the others, but I 
don't know how to make this change while compatibility with older UMDs.

So I agree, let's revert it.

Reviewed-by: Jay Cornwall

Re: [PATCH] drm/amdkfd: Clear the VALU exception state in the trap handler

2023-11-08 Thread Jay Cornwall

On 11/8/2023 18:23, Laurent Morichetti wrote:

> The trap handler could be entered with pending VALU exceptions, so
> clear the exception state before issuing vector instructions.
> 
> Signed-off-by: Laurent Morichetti 

Reviewed-by: Jay Cornwall

[PATCH] drm/amdgpu: Improve MES responsiveness during oversubscription

2023-10-04 Thread Jay Cornwall

When MES is oversubscribed it may not frequently check for new
command submissions from driver if the scheduling load is high.
Response latency as high as 5 seconds has been observed.

Enable a flag which adds a check for new commands between
scheduling quantums.

Signed-off-by: Jay Cornwall 
Cc: Alexandru Tudor 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 4a3020b5b30f..31b26e6f0b30 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -406,6 +406,7 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
+   mes_set_hw_res_pkt.enable_level_process_quantum_check = 1;
mes_set_hw_res_pkt.oversubscription_timer = 50;
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
-- 
2.25.1

[PATCH] drm/amdkfd: Add missing tba_hi programming on aldebaran

2023-08-09 Thread Jay Cornwall

Previously asymptomatic because high 32 bits were zero.

Fixes: 615222cfed20 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole")
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8fda16e6fee6..8ce6f5200905 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -121,6 +121,7 @@ static int pm_map_process_aldebaran(struct packet_manager 
*pm,
packet->sh_mem_bases = qpd->sh_mem_bases;
if (qpd->tba_addr) {
packet->sq_shader_tba_lo = lower_32_bits(qpd->tba_addr >> 8);
+   packet->sq_shader_tba_hi = upper_32_bits(qpd->tba_addr >> 8);
packet->sq_shader_tma_lo = lower_32_bits(qpd->tma_addr >> 8);
packet->sq_shader_tma_hi = upper_32_bits(qpd->tma_addr >> 8);
}
-- 
2.25.1

[PATCH 2/3] drm/amdkfd: Sign-extend TMA address in trap handler

2023-07-31 Thread Jay Cornwall

SMEM instructions can reach addresses above 47 bits but require
bit 47 to be sign-extended through bits [63:48].

This allows the TMA to be relocated in a following patch.

Signed-off-by: Jay Cornwall 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 58 ---
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  5 ++
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  5 ++
 3 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 717ad0633dbe..d7cd5fa313ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820254,
+   0xbf820001, 0xbf820258,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850051, 0xbf8e0010,
+   0xbf850055, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -294,13 +294,15 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850036,
+   0x0400, 0xbf85003a,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
0x8a77, 0xba7ff807,
0x, 0xb8faf812,
0xb8fbf813, 0x8efa887a,
+   0xbf0d8f7b, 0xbf840002,
+   0x877bff7b, 0x,
0xc0031bbd, 0x0010,
0xbf8cc07f, 0x8e6e976e,
0x8977ff77, 0x0080,
@@ -676,14 +678,14 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
-   0xbf820001, 0xbf8201f1,
+   0xbf820001, 0xbf8201f5,
0xb0804004, 0xb978f802,
0x8a78ff78, 0x00020006,
0xb97bf803, 0x876eff78,
0x2000, 0xbf840009,
0x876eff6d, 0x00ff,
0xbf85001e, 0x876eff7b,
-   0x0400, 0xbf850057,
+   0x0400, 0xbf85005b,
0xbf8e0010, 0xb97bf803,
0xbf82fffa, 0x876eff7b,
0x0900, 0xbf850015,
@@ -697,7 +699,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb96ef801, 0x876eff6e,
0x0800, 0xbf850003,
0x876eff7b, 0x0400,
-   0xbf85003c, 0x8a77ff77,
+   0xbf850040, 0x8a77ff77,
0xff00, 0xb97af807,
0x877bff7a, 0x0200,
0x8f7b867b, 0x88777b77,
@@ -706,6 +708,8 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x8a7aff7a, 0x023f8000,
0xb9faf807, 0xb97af812,
0xb97bf813, 0x8ffa887a,
+   0xbf0d8f7b, 0xbf840002,
+   0x887bff7b, 0x,
0xf4011bbd, 0xfa10,
0xbf8cc07f, 0x8f6e976e,
0x8a77ff77, 0x0080,
@@ -1094,14 +1098,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202d0,
+   0xbf820001, 0xbf8202d4,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850051, 0xbf8e0010,
+   0xbf850055, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1114,13 +1118,15 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850036,
+   0x0400, 0xbf85003a,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
0x8a77, 0xba7ff807,
0x, 0xb8faf812,
0xb8fbf813, 0x8efa887a,
+   0xbf0d8f7b, 0xbf840002,
+   0x877bff7b, 0x,
0xc0031bbd, 0x0010,
0xbf8cc07f, 0x8e6e976e,
0x8977ff77, 0x0080,
@@ -1572,14 +1578,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
-   0xbf820001, 0xbf8202db,
+   0xbf820001, 0xbf8202df,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850051, 0xbf8e0010,
+   0xbf850055, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1592,13 +1598,15 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850036,
+   0x0400, 0xbf85003a,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a

[PATCH 3/3] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole

2023-07-31 Thread Jay Cornwall

The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 ++--
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index da2ca00d79e5..dd6984c785ad 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -330,6 +330,12 @@ static void kfd_init_apertures_vi(struct 
kfd_process_device *pdd, uint8_t id)
pdd->gpuvm_base = SVM_USER_BASE;
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
+
+   /* dGPUs: the reserved space for kernel
+* before SVM
+*/
+   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+   pdd->qpd.ib_base = SVM_IB_BASE;
} else {
/* set them to non CANONICAL addresses, and no SVM is
 * allocated.
@@ -348,18 +354,20 @@ static void kfd_init_apertures_v9(struct 
kfd_process_device *pdd, uint8_t id)
pdd->lds_base = MAKE_LDS_APP_BASE_V9();
pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
-   /* Raven needs SVM to support graphic handle, etc. Leave the small
-* reserved space before SVM on Raven as well, even though we don't
-* have to.
-* Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that they
-* are used in Thunk to reserve SVM.
-*/
-   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_base = PAGE_SIZE;
pdd->gpuvm_limit =
pdd->dev->kfd->shared_resources.gpuvm_size - 1;
 
pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9();
pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+
+   if (!pdd->dev->kfd->use_iommu_v2) {
+   /*
+* Place TBA/TMA on opposite side of VM hole to prevent
+* stray faults from triggering SVM on these pages.
+*/
+   pdd->qpd.cwsr_base = pdd->dev->kfd->shared_resources.gpuvm_size;
+   }
 }
 
 int kfd_init_apertures(struct kfd_process *process)
@@ -416,14 +424,6 @@ int kfd_init_apertures(struct kfd_process *process)
return -EINVAL;
}
}
-
-   if (!dev->kfd->use_iommu_v2) {
-   /* dGPUs: the reserved space for kernel
-* before SVM
-*/
-   pdd->qpd.cwsr_base = SVM_CWSR_BASE;
-   pdd->qpd.ib_base = SVM_IB_BASE;
-   }
}
 
dev_dbg(kfd_device, "node id %u\n", id);
-- 
2.25.1

[PATCH 1/3] drm/amdkfd: Sync trap handler binaries with source

2023-07-31 Thread Jay Cornwall

Some changes have been lost during rebases. Rebuild sources.

Signed-off-by: Jay Cornwall 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 741 +-
 1 file changed, 371 insertions(+), 370 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 73ca9aebf086..717ad0633dbe 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -283,7 +283,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x866eff7b, 0x0400,
0xbf850051, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
-   0x866eff7b, 0x0900,
+   0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
0x71ff, 0xbf840008,
0x866fff7b, 0x7080,
@@ -1103,7 +1103,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0x866eff7b, 0x0400,
0xbf850051, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
-   0x866eff7b, 0x0900,
+   0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
0x71ff, 0xbf840008,
0x866fff7b, 0x7080,
@@ -1581,7 +1581,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0x866eff7b, 0x0400,
0xbf850051, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
-   0x866eff7b, 0x0900,
+   0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
0x71ff, 0xbf840008,
0x866fff7b, 0x7080,
@@ -2494,6 +2494,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf9f, 0xbf9f,
0xbf9f, 0x,
 };
+
 static const uint32_t cwsr_trap_gfx11_hex[] = {
0xbfa1, 0xbfa00221,
0xb0804006, 0xb8f8f802,
@@ -2938,211 +2939,149 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
-   0xbf820001, 0xbf8202d6,
-   0xb8f8f802, 0x89788678,
-   0xb8fbf803, 0x866eff78,
-   0x2000, 0xbf840009,
-   0x866eff6d, 0x00ff,
-   0xbf85001a, 0x866eff7b,
-   0x0400, 0xbf85004d,
-   0xbf8e0010, 0xb8fbf803,
-   0xbf82fffa, 0x866eff7b,
-   0x03c00900, 0xbf850011,
-   0x866eff7b, 0x71ff,
-   0xbf840008, 0x866fff7b,
-   0x7080, 0xbf840001,
-   0xbeee1a87, 0xb8eff801,
-   0x8e6e8c6e, 0x866e6f6e,
-   0xbf850006, 0x866eff6d,
-   0x00ff, 0xbf850003,
+   0xbf820001, 0xbf8202d7,
+   0xb8f8f802, 0x8978ff78,
+   0x00020006, 0xb8fbf803,
+   0x866eff78, 0x2000,
+   0xbf840009, 0x866eff6d,
+   0x00ff, 0xbf85001a,
0x866eff7b, 0x0400,
-   0xbf850036, 0xb8faf807,
+   0xbf85004d, 0xbf8e0010,
+   0xb8fbf803, 0xbf82fffa,
+   0x866eff7b, 0x03c00900,
+   0xbf850011, 0x866eff7b,
+   0x71ff, 0xbf840008,
+   0x866fff7b, 0x7080,
+   0xbf840001, 0xbeee1a87,
+   0xb8eff801, 0x8e6e8c6e,
+   0x866e6f6e, 0xbf850006,
+   0x866eff6d, 0x00ff,
+   0xbf850003, 0x866eff7b,
+   0x0400, 0xbf850036,
+   0xb8faf807, 0x867aff7a,
+   0x001f8000, 0x8e7a8b7a,
+   0x8979ff79, 0xfc00,
+   0x87797a79, 0xba7ff807,
+   0x, 0xb8faf812,
+   0xb8fbf813, 0x8efa887a,
+   0xc0031bbd, 0x0010,
+   0xbf8cc07f, 0x8e6e976e,
+   0x8979ff79, 0x0080,
+   0x87796e79, 0xc0071bbd,
+   0x, 0xbf8cc07f,
+   0xc0071ebd, 0x0008,
+   0xbf8cc07f, 0x86ee6e6e,
+   0xbf840001, 0xbe801d6e,
+   0x866eff6d, 0x01ff,
+   0xbf850005, 0x8778ff78,
+   0x2000, 0x80ec886c,
+   0x82ed806d, 0xbf820005,
+   0x866eff6d, 0x0100,
+   0xbf850002, 0x806c846c,
+   0x826d806d, 0x866dff6d,
+   0x, 0x8f7a8b79,
0x867aff7a, 0x001f8000,
-   0x8e7a8b7a, 0x8979ff79,
-   0xfc00, 0x87797a79,
-   0xba7ff807, 0x,
-   0xb8faf812, 0xb8fbf813,
-   0x8efa887a, 0xc0031bbd,
-   0x0010, 0xbf8cc07f,
-   0x8e6e976e, 0x8979ff79,
-   0x0080, 0x87796e79,
-   0xc0071bbd, 0x,
-   0xbf8cc07f, 0xc0071ebd,
-   0x0008, 0xbf8cc07f,
-   0x86ee6e6e, 0xbf840001,
-   0xbe801d6e, 0x866eff6d,
-   0x01ff, 0xbf850005,
-   0x8778ff78, 0x2000,
-   0x80ec886c, 0x82ed806d,
-   0xbf820005, 0x866eff6d,
-   0x0100, 0xbf850002,
-   0x806c846c, 0x826d806d,
+   0xb97af807, 0x86fe7e7e,
+   0x86ea6a6a, 0x8f6e8378,
+   0xb96ee0c2, 0xbf82,
+   0xb9780002, 0xbe801f6c,
0x866dff6d, 0x,
-   0x8f7a8b79, 0x867aff7a,
-   0x001f8000, 0xb97af807,
-   0x86fe7e7e, 0x86ea6a6a,
-   0x8f6e8378, 0xb96ee0c2,
-   0xbf82, 0xb9780002,
-   0xbe801f6c, 0x866dff6d,
-   0x, 0xbefa0080,
-   0xb97a0283, 0xb8faf807,
-   0x867aff7a, 0x001f8000,
-   0x8e7a8b7a, 0x8979ff79,
-   0xfc00, 0x87797a79,
-   0xba7ff807, 0x,
-   0xbeee007e, 0xbeef007f,
-   0xbefe0180, 0xbf94,
-   0x877a8478

Re: [PATCH 1/1] drm/amdgpu: Read clock counter via MMIO to reduce delay (v4)

2021-06-30 Thread Jay Cornwall

On Wed, Jun 30, 2021, at 05:10, YuBiao Wang wrote:
> [Why]
> GPU timing counters are read via KIQ under sriov, which will introduce
> a delay.
> 
> [How]
> It could be directly read by MMIO.
> 
> v2: Add additional check to prevent carryover issue.
> v3: Only check for carryover for once to prevent performance issue.
> v4: Add comments of the rough frequency where carryover happens.
> 
> Signed-off-by: YuBiao Wang 
> Acked-by: Horace Chen 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index ff7e9f49040e..9355494002a1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -7609,7 +7609,7 @@ static int gfx_v10_0_soft_reset(void *handle)
>  
>  static uint64_t gfx_v10_0_get_gpu_clock_counter(struct amdgpu_device *adev)
>  {
> - uint64_t clock;
> + uint64_t clock, clock_lo, clock_hi, hi_check;
>  
>   amdgpu_gfx_off_ctrl(adev, false);

This clock can be read with gfxoff enabled.

>   mutex_lock(&adev->gfx.gpu_clock_mutex);

Is the mutex relevant with this clock? It doesn't snapshot like RLC.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdkfd: Move set_trap_handler out of dqm->ops

2021-03-04 Thread Jay Cornwall

Trap handler is set per-process per-device and is unrelated
to queue management.

Move implementation closer to TMA setup code.

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  6 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 ---
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  5 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  4 
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 19 
 5 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8cc51cec988a..6802c616e10e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -556,11 +556,7 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
goto out;
}
 
-   if (dev->dqm->ops.set_trap_handler(dev->dqm,
-   &pdd->qpd,
-   args->tba_addr,
-   args->tma_addr))
-   err = -EINVAL;
+   kfd_process_set_trap_handler(&pdd->qpd, args->tba_addr, args->tma_addr);
 
 out:
mutex_unlock(&p->mutex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c37e9c4b1fb4..6bb778f24441 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1596,26 +1596,6 @@ static bool set_cache_memory_policy(struct 
device_queue_manager *dqm,
return retval;
 }
 
-static int set_trap_handler(struct device_queue_manager *dqm,
-   struct qcm_process_device *qpd,
-   uint64_t tba_addr,
-   uint64_t tma_addr)
-{
-   uint64_t *tma;
-
-   if (dqm->dev->cwsr_enabled) {
-   /* Jump from CWSR trap handler to user trap */
-   tma = (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
-   tma[0] = tba_addr;
-   tma[1] = tma_addr;
-   } else {
-   qpd->tba_addr = tba_addr;
-   qpd->tma_addr = tma_addr;
-   }
-
-   return 0;
-}
-
 static int process_termination_nocpsch(struct device_queue_manager *dqm,
struct qcm_process_device *qpd)
 {
@@ -1859,7 +1839,6 @@ struct device_queue_manager 
*device_queue_manager_init(struct kfd_dev *dev)
dqm->ops.create_kernel_queue = create_kernel_queue_cpsch;
dqm->ops.destroy_kernel_queue = destroy_kernel_queue_cpsch;
dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
-   dqm->ops.set_trap_handler = set_trap_handler;
dqm->ops.process_termination = process_termination_cpsch;
dqm->ops.evict_process_queues = evict_process_queues_cpsch;
dqm->ops.restore_process_queues = restore_process_queues_cpsch;
@@ -1878,7 +1857,6 @@ struct device_queue_manager 
*device_queue_manager_init(struct kfd_dev *dev)
dqm->ops.initialize = initialize_nocpsch;
dqm->ops.uninitialize = uninitialize;
dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
-   dqm->ops.set_trap_handler = set_trap_handler;
dqm->ops.process_termination = process_termination_nocpsch;
dqm->ops.evict_process_queues = evict_process_queues_nocpsch;
dqm->ops.restore_process_queues =
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 16262e5d93f5..aee033b1d148 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -121,11 +121,6 @@ struct device_queue_manager_ops {
   void __user *alternate_aperture_base,
   uint64_t alternate_aperture_size);
 
-   int (*set_trap_handler)(struct device_queue_manager *dqm,
-   struct qcm_process_device *qpd,
-   uint64_t tba_addr,
-   uint64_t tma_addr);
-
int (*process_termination)(struct device_queue_manager *dqm,
struct qcm_process_device *qpd);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e2ebd5a1d4de..8f839154bf1f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -944,6 +944,10 @@ bool interrupt_is_wanted(struct kfd_dev *dev,
 /* amdkfd Apertures */
 int kfd_init_apertures(struct kfd_process *process);
 
+void kfd_process_set_trap_handler(struct qcm_process_device *q

[PATCH] drm/amdkfd: Use same SQ prefetch setting as amdgpu

2020-10-19 Thread Jay Cornwall

0 causes instruction fetch stall at cache line boundary under some
conditions on Navi10. A non-zero prefetch is the preferred default
in any case.

Fixes soft hang in Luxmark.

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c
index 72e4d61ac752..ad0593342333 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c
@@ -58,8 +58,9 @@ static int update_qpd_v10(struct device_queue_manager *dqm,
/* check if sh_mem_config register already configured */
if (qpd->sh_mem_config == 0) {
qpd->sh_mem_config =
-   SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
-   SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT;
+   (SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
+   SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) |
+   (3 << SH_MEM_CONFIG__INITIAL_INST_PREFETCH__SHIFT);
 #if 0
/* TODO:
 *This shouldn't be an issue with Navi10.  Verify.
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/4] drm/amdkfd: Save TTMPs on all ASICs in gfx10 trap handler

2020-10-01 Thread Jay Cornwall

Trap temporary GPRs are not currently saved/restored on ASICs
without scalar store instructions. They contain data useful to a
user-mode debugger.

Use vector store instructons to save TTMPs on these ASICs.

Signed-off-by: Jay Cornwall 
Cc: Laurent Morichetti 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 119 ++
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  46 ++-
 2 files changed, 114 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 9c903c38dd74..d674f6d798f6 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -1534,7 +1534,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_gfx10_hex[] = {
-   0xbf820001, 0xbf8201cb,
+   0xbf820001, 0xbf8201f5,
0xb0804004, 0xb978f802,
0x8a788678, 0xb96ef801,
0x876eff6e, 0x0800,
@@ -1563,6 +1563,11 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf94, 0xbf8cc07f,
0x877aff7f, 0x0400,
0x8f7a857a, 0x886d7a6d,
+   0xbefa037e, 0x877bff7f,
+   0x, 0xbefe03c1,
+   0xbeff03c1, 0xdc5f8000,
+   0x007a, 0x7e000280,
+   0xbefe037a, 0xbeff037b,
0xb97b02dc, 0x8f7b997b,
0xb97a2a05, 0x807a817a,
0xbf0d997b, 0xbf850002,
@@ -1570,58 +1575,74 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x8f7a8a7a, 0x877bff7f,
0x, 0x807aff7a,
0x0200, 0x807a7e7a,
-   0x827b807b, 0xbef4037e,
-   0x8775ff7f, 0x,
-   0x8875ff75, 0x0004,
-   0xbef60380, 0xbef703ff,
-   0x10807fac, 0xbef1037c,
-   0xbef00380, 0xb97302dc,
-   0x8f739973, 0xbefe03c1,
-   0x907c9973, 0x877c817c,
-   0xbf06817c, 0xbf850002,
-   0xbeff0380, 0xbf820002,
-   0xbeff03c1, 0xbf82000b,
+   0x827b807b, 0xd761,
+   0x00010870, 0xd761,
+   0x00010a71, 0xd761,
+   0x00010c72, 0xd761,
+   0x00010e73, 0xd761,
+   0x00011074, 0xd761,
+   0x00011275, 0xd761,
+   0x00011476, 0xd761,
+   0x00011677, 0xd761,
+   0x00011a79, 0xd761,
+   0x00011c7e, 0xd761,
+   0x00011e7f, 0xbefe03ff,
+   0x3fff, 0xbeff0380,
+   0xdc5f8040, 0x007a,
+   0xd760007a, 0x00011d00,
+   0xd760007b, 0x00011f00,
+   0xbefe037a, 0xbeff037b,
+   0xbef4037e, 0x8775ff7f,
+   0x, 0x8875ff75,
+   0x0004, 0xbef60380,
+   0xbef703ff, 0x10807fac,
+   0xbef1037c, 0xbef00380,
+   0xb97302dc, 0x8f739973,
+   0xbefe03c1, 0x907c9973,
+   0x877c817c, 0xbf06817c,
+   0xbf850002, 0xbeff0380,
+   0xbf820002, 0xbeff03c1,
+   0xbf820009, 0xbef603ff,
+   0x0100, 0xe0704080,
+   0x705d0100, 0xe0704100,
+   0x705d0200, 0xe0704180,
+   0x705d0300, 0xbf820008,
0xbef603ff, 0x0100,
-   0xe0704000, 0x705d,
-   0xe0704080, 0x705d0100,
-   0xe0704100, 0x705d0200,
-   0xe0704180, 0x705d0300,
-   0xbf82000a, 0xbef603ff,
-   0x0100, 0xe0704000,
-   0x705d, 0xe0704100,
-   0x705d0100, 0xe0704200,
-   0x705d0200, 0xe0704300,
-   0x705d0300, 0xb9702a05,
-   0x80708170, 0xbf0d9973,
-   0xbf850002, 0x8f708970,
-   0xbf820001, 0x8f708a70,
-   0xb97a1e06, 0x8f7a8a7a,
-   0x80707a70, 0x8070ff70,
-   0x0200, 0xbef603ff,
-   0x0100, 0x7e000280,
-   0x7e020280, 0x7e040280,
-   0xbefc0380, 0xd7610002,
-   0xf871, 0x807c817c,
-   0xd7610002, 0xf86c,
-   0x807c817c, 0x8a7aff6d,
-   0x8000, 0xd7610002,
-   0xf87a, 0x807c817c,
-   0xd7610002, 0xf86e,
+   0xe0704100, 0x705d0100,
+   0xe0704200, 0x705d0200,
+   0xe0704300, 0x705d0300,
+   0xb9702a05, 0x80708170,
+   0xbf0d9973, 0xbf850002,
+   0x8f708970, 0xbf820001,
+   0x8f708a70, 0xb97a1e06,
+   0x8f7a8a7a, 0x80707a70,
+   0x8070ff70, 0x0200,
+   0xbef603ff, 0x0100,
+   0x7e000280, 0x7e020280,
+   0x7e040280, 0xbefc0380,
+   0xd7610002, 0xf871,
0x807c817c, 0xd7610002,
-   0xf86f, 0x807c817c,
-   0xd7610002, 0xf878,
-   0x807c817c, 0xb97af803,
+   0xf86c, 0x807c817c,
+   0x8a7aff6d, 0x8000,
0xd7610002, 0xf87a,
0x807c817c, 0xd7610002,
-   0xf87b, 0x807c817c,
-   0xb971f801, 0xd7610002,
-   0xf871, 0x807c817c,
-   0xb971f814, 0xd7610002,
-   0xf871, 0x807c817c,
-   0xb971f815, 0xd7610002,
-   0xf871, 0x807c817c,
-   0xbeff0380, 0xe0704000,
-   0x705d0200, 0xb9702a05,
+   0xf86e, 0x807c817c,
+   0xd7610002, 0xf86f,
+   0x807c817c, 0xd7610002,
+   0xf878, 0x807c817c,
+   0xb97af803, 0xd7610002,
+   0xf87a, 0x807c817c,
+   0xd7610002, 0xf87b,
+   0x807c817c

[PATCH 2/4] drm/amdkfd: Remove duplicated code from trap handler

2020-10-01 Thread Jay Cornwall

IB_STS bits are saved/restored in both PC and ttmp11 along different
code paths. Use ttmp11 on both paths to remove redundant code.

Signed-off-by: Jay Cornwall 
Cc: Laurent Morichetti 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 764 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  94 +--
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  66 +-
 3 files changed, 424 insertions(+), 500 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index aa2de525b2e0..9f435c777ba0 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820240,
+   0xbf820001, 0xbf82023e,
0xb8f8f802, 0x89788678,
0xb8eef801, 0x866eff6e,
0x0800, 0xbf840003,
0x866eff78, 0x2000,
0xbf840016, 0xb8fbf803,
0x866eff7b, 0x0400,
-   0xbf85003b, 0x866eff7b,
+   0xbf85003a, 0x866eff7b,
0x0800, 0xbf850003,
0x866eff7b, 0x0100,
0xbf84000c, 0x866eff78,
@@ -290,34 +290,33 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x866eff6e, 0x0400,
0xbf84fffb, 0x8778ff78,
0x2000, 0x80ec886c,
-   0x82ed806d, 0xb8eef807,
-   0x866fff6e, 0x001f8000,
-   0x8e6f8b6f, 0x8977ff77,
-   0xfc00, 0x87776f77,
-   0x896eff6e, 0x001f8000,
-   0xb96ef807, 0xb8faf812,
-   0xb8fbf813, 0x8efa887a,
-   0xc0071bbd, 0x,
-   0xbf8cc07f, 0xc0071ebd,
-   0x0008, 0xbf8cc07f,
-   0x86ee6e6e, 0xbf840001,
-   0xbe801d6e, 0xb8fbf803,
-   0x867bff7b, 0x01ff,
-   0xbf850002, 0x806c846c,
-   0x826d806d, 0x866dff6d,
-   0x, 0x8f6e8b77,
-   0x866eff6e, 0x001f8000,
-   0xb96ef807, 0x86fe7e7e,
-   0x86ea6a6a, 0x8f6e8378,
-   0xb96ee0c2, 0xbf82,
-   0xb9780002, 0xbe801f6c,
+   0x82ed806d, 0xb8faf807,
+   0x867aff7a, 0x001f8000,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
+   0xb8faf812, 0xb8fbf813,
+   0x8efa887a, 0xc0071bbd,
+   0x, 0xbf8cc07f,
+   0xc0071ebd, 0x0008,
+   0xbf8cc07f, 0x86ee6e6e,
+   0xbf840001, 0xbe801d6e,
+   0xb8fbf803, 0x867bff7b,
+   0x01ff, 0xbf850002,
+   0x806c846c, 0x826d806d,
0x866dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xb8fa2407, 0x8e7a9b7a,
-   0x876d7a6d, 0xb8fa03c7,
-   0x8e7a9a7a, 0x876d7a6d,
-   0xb8faf807, 0x867aff7a,
-   0x7fff, 0xb97af807,
+   0x8f7a8b77, 0x867aff7a,
+   0x001f8000, 0xb97af807,
+   0x86fe7e7e, 0x86ea6a6a,
+   0x8f6e8378, 0xb96ee0c2,
+   0xbf82, 0xb9780002,
+   0xbe801f6c, 0x866dff6d,
+   0x, 0xbefa0080,
+   0xb97a0283, 0xb8faf807,
+   0x867aff7a, 0x001f8000,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
0xbeee007e, 0xbeef007f,
0xbefe0180, 0xbf94,
0x877a8478, 0xb97af802,
@@ -562,7 +561,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x701d0300, 0x807c847c,
0x8070ff70, 0x0400,
0xbf0a7b7c, 0xbf85ffef,
-   0xbf9c, 0xbf8200cf,
+   0xbf9c, 0xbf8200c7,
0xbef4007e, 0x8675ff7f,
0x, 0x8775ff75,
0x0004, 0xbef60080,
@@ -655,12 +654,8 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xc00b1c37, 0x0050,
0xc00b1d37, 0x0060,
0xc0031e77, 0x0074,
-   0xbf8cc07f, 0x866fff6d,
-   0xf800, 0x8f6f9b6f,
-   0x8e6f906f, 0xbeee0080,
-   0x876e6f6e, 0x866fff6d,
-   0x0400, 0x8f6f9a6f,
-   0x8e6f8f6f, 0x876e6f6e,
+   0xbf8cc07f, 0x8f6e8b77,
+   0x866eff6e, 0x001f8000,
0xb96ef807, 0x866dff6d,
0x, 0x86fe7e7e,
0x86ea6a6a, 0x8f6e837a,
@@ -670,7 +665,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
-   0xbf820001, 0xbf8201c5,
+   0xbf820001, 0xbf8201c6,
0xb0804004, 0xb978f802,
0x8a788678, 0xb96ef801,
0x876eff6e, 0x0800,
@@ -681,13 +676,13 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x876eff7b, 0x0100,
0xbf840002, 0x8878ff78,
0x2000, 0x8a77ff77,
-   0xff00, 0xb96ef807,
-   0x876fff6e, 0x0200,
-   0x8f6f866f, 0x88776f77,
-   0x876fff6e, 0x003f8000,
-   0x8f6f896f, 0x88776f77,
-   0x8a6eff6e, 0x023f8000,
-   0xb9eef807, 0xb97af812,
+   0xff00, 0xb97af807,
+   0x877bff7a, 0x0200,
+   0x8f7b867b, 0x88777b77,
+   0x877bff7a, 0x003f8000,
+   0x8f7b897b, 0x88777b77,
+   0x8a7aff7a, 0x023f8000,
+   0xb9faf807, 0xb97af812,
0xb97bf813, 0x8ffa887a,
0xf4051bbd

[PATCH 1/4] drm/amdkfd: Remove legacy code from trap handler

2020-10-01 Thread Jay Cornwall

ATC and MTYPE fields do not exist in gfx9 or later.

Signed-off-by: Jay Cornwall 
Cc: Laurent Morichetti 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 93 ++-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 28 +-
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 30 +-
 3 files changed, 30 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index affbca7c0050..aa2de525b2e0 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,7 +274,7 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820248,
+   0xbf820001, 0xbf820240,
0xb8f8f802, 0x89788678,
0xb8eef801, 0x866eff6e,
0x0800, 0xbf840003,
@@ -336,10 +336,6 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x, 0x8775ff75,
0x0004, 0xbef60080,
0xbef700ff, 0x00807fac,
-   0x867aff7f, 0x0800,
-   0x8f7a837a, 0x8a77,
-   0x867aff7f, 0x7000,
-   0x8f7a817a, 0x8a77,
0xbef1007c, 0xbef00080,
0xb8f02a05, 0x80708170,
0x8e708a70, 0xb8fa1605,
@@ -566,15 +562,11 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x701d0300, 0x807c847c,
0x8070ff70, 0x0400,
0xbf0a7b7c, 0xbf85ffef,
-   0xbf9c, 0xbf8200da,
+   0xbf9c, 0xbf8200cf,
0xbef4007e, 0x8675ff7f,
0x, 0x8775ff75,
0x0004, 0xbef60080,
0xbef700ff, 0x00807fac,
-   0x866eff7f, 0x0800,
-   0x8f6e836e, 0x87776e77,
-   0x866eff7f, 0x7000,
-   0x8f6e816e, 0x87776e77,
0x866eff7f, 0x0400,
0xbf84001e, 0xbefe00c1,
0xbeff00c1, 0xb8ef4306,
@@ -669,18 +661,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0x876e6f6e, 0x866fff6d,
0x0400, 0x8f6f9a6f,
0x8e6f8f6f, 0x876e6f6e,
-   0x866fff7a, 0x0080,
-   0x8f6f976f, 0xb96ef807,
-   0x866dff6d, 0x,
-   0x86fe7e7e, 0x86ea6a6a,
-   0x8f6e837a, 0xb96ee0c2,
-   0xbf82, 0xb97a0002,
-   0xbf8a, 0x95806f6c,
-   0xbf81, 0x,
+   0xb96ef807, 0x866dff6d,
+   0x, 0x86fe7e7e,
+   0x86ea6a6a, 0x8f6e837a,
+   0xb96ee0c2, 0xbf82,
+   0xb97a0002, 0xbf8a,
+   0xbe801f6c, 0xbf81,
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
-   0xbf820001, 0xbf8201cd,
+   0xbf820001, 0xbf8201c5,
0xb0804004, 0xb978f802,
0x8a788678, 0xb96ef801,
0x876eff6e, 0x0800,
@@ -740,10 +730,6 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x, 0x8875ff75,
0x0004, 0xbef60380,
0xbef703ff, 0x10807fac,
-   0x877aff7f, 0x0800,
-   0x907a837a, 0x88777a77,
-   0x877aff7f, 0x7000,
-   0x907a817a, 0x88777a77,
0xbef1037c, 0xbef00380,
0xb97302dc, 0x8f739973,
0x8873737f, 0xb97bf816,
@@ -911,15 +897,11 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x705d, 0x807c817c,
0x8070ff70, 0x0080,
0xbf0a7b7c, 0xbf85fff8,
-   0xbf820151, 0xbef4037e,
+   0xbf820146, 0xbef4037e,
0x8775ff7f, 0x,
0x8875ff75, 0x0004,
0xbef60380, 0xbef703ff,
-   0x10807fac, 0x876eff7f,
-   0x0800, 0x906e836e,
-   0x88776e77, 0x876eff7f,
-   0x7000, 0x906e816e,
-   0x88776e77, 0xb97202dc,
+   0x10807fac, 0xb97202dc,
0x8f729972, 0x8872727f,
0x876eff7f, 0x0400,
0xbf840034, 0xbefe03c1,
@@ -1075,18 +1057,17 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x886e6f6e, 0x876fff6d,
0x0100, 0x906f986f,
0x8f6f996f, 0x886e6f6e,
-   0x876fff7a, 0x0080,
-   0x906f976f, 0xb9eef807,
-   0x876dff6d, 0x,
-   0x87fe7e7e, 0x87ea6a6a,
-   0xb9faf802, 0xbe80226c,
-   0xbf81, 0xbf9f,
+   0xb9eef807, 0x876dff6d,
+   0x, 0x87fe7e7e,
+   0x87ea6a6a, 0xb9faf802,
+   0xbe80226c, 0xbf81,
0xbf9f, 0xbf9f,
0xbf9f, 0xbf9f,
+   0xbf9f, 0x,
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202c4,
+   0xbf820001, 0xbf8202bc,
0xb8f8f802, 0x89788678,
0xb8eef801, 0x866eff6e,
0x0800, 0xbf840003,
@@ -1148,11 +1129,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0x8675ff7f, 0x,
0x8775ff75, 0x0004,
0xbef60080, 0xbef700ff,
-   0x00807fac, 0x867aff7f,
-   0x0800, 0x8f7a837a,
-   0x8a77, 0x867aff7f,
-   0x7000, 0x8f7a817a,
-   0x8a77, 0xbef1007c,
+   0x00807fac, 0xbef1007c,
0xbef00080, 0xb8f02a05,
0x80708170, 0x8e708a70,
0x8e708170, 0xb8fa1605,
@@ -1440,15 +1417,11 @@ static const uint32_t

[PATCH 3/4] drm/amdkfd: Move first_wave bit in gfx10 trap handler

2020-10-01 Thread Jay Cornwall

Save first_wave bit from exec_hi to ttmp1. This allows the high bits
of exec_lo/exec_hi (which hold a 48-bit address) to be cleared in a
follow-up patch.

Signed-off-by: Jay Cornwall 
Cc: Laurent Morichetti 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 596 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  14 +-
 2 files changed, 310 insertions(+), 300 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 9f435c777ba0..9c903c38dd74 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -665,7 +665,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
-   0xbf820001, 0xbf8201c6,
+   0xbf820001, 0xbf8201ca,
0xb0804004, 0xb978f802,
0x8a788678, 0xb96ef801,
0x876eff6e, 0x0800,
@@ -710,24 +710,25 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb9faf807, 0xbeee037e,
0xbeef037f, 0xbefe0480,
0xbf94, 0xbf8e0002,
-   0xbf88fffe, 0xb97b02dc,
-   0x8f7b997b, 0x887b7b7f,
-   0xb97a2a05, 0x807a817a,
-   0xbf0d997b, 0xbf850002,
-   0x8f7a897a, 0xbf820001,
-   0x8f7a8a7a, 0x877bff7f,
-   0x, 0x807aff7a,
-   0x0200, 0x807a7e7a,
-   0x827b807b, 0xf4491c3d,
-   0xfa50, 0xf4491d3d,
-   0xfa60, 0xf4411e7d,
-   0xfa74, 0xbef4037e,
-   0x8775ff7f, 0x,
-   0x8875ff75, 0x0004,
-   0xbef60380, 0xbef703ff,
-   0x10807fac, 0xbef1037c,
-   0xbef00380, 0xb97302dc,
-   0x8f739973, 0x8873737f,
+   0xbf88fffe, 0x877aff7f,
+   0x0400, 0x8f7a857a,
+   0x886d7a6d, 0xb97b02dc,
+   0x8f7b997b, 0xb97a2a05,
+   0x807a817a, 0xbf0d997b,
+   0xbf850002, 0x8f7a897a,
+   0xbf820001, 0x8f7a8a7a,
+   0x877bff7f, 0x,
+   0x807aff7a, 0x0200,
+   0x807a7e7a, 0x827b807b,
+   0xf4491c3d, 0xfa50,
+   0xf4491d3d, 0xfa60,
+   0xf4411e7d, 0xfa74,
+   0xbef4037e, 0x8775ff7f,
+   0x, 0x8875ff75,
+   0x0004, 0xbef60380,
+   0xbef703ff, 0x10807fac,
+   0xbef1037c, 0xbef00380,
+   0xb97302dc, 0x8f739973,
0xb97bf816, 0xba80f816,
0x, 0xbefe03c1,
0x907c9973, 0x877c817c,
@@ -757,8 +758,9 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbefc037e, 0xbefe037c,
0xbefc0370, 0xf4611b3a,
0xf800, 0x80708470,
-   0xbefc037e, 0xbefe037c,
-   0xbefc0370, 0xf4611b7a,
+   0xbefc037e, 0x8a7aff6d,
+   0x8000, 0xbefe037c,
+   0xbefc0370, 0xf4611eba,
0xf800, 0x80708470,
0xbefc037e, 0xbefe037c,
0xbefc0370, 0xf4611bba,
@@ -819,8 +821,8 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbeff0380, 0xbf820001,
0xbeff03c1, 0xb97b4306,
0x877bc17b, 0xbf840044,
-   0xbf8a, 0x877aff73,
-   0x0400, 0xbf840040,
+   0xbf8a, 0x877aff6d,
+   0x8000, 0xbf840040,
0x8f7b867b, 0x8f7b827b,
0xbef6037b, 0xb9702a05,
0x80708170, 0xbf0d9973,
@@ -892,169 +894,168 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xe0704000, 0x705d,
0x807c817c, 0x8070ff70,
0x0080, 0xbf0a7b7c,
-   0xbf85fff8, 0xbf82013d,
+   0xbf85fff8, 0xbf82013c,
0xbef4037e, 0x8775ff7f,
0x, 0x8875ff75,
0x0004, 0xbef60380,
0xbef703ff, 0x10807fac,
0xb97202dc, 0x8f729972,
-   0x8872727f, 0x876eff7f,
-   0x0400, 0xbf840034,
+   0x876eff7f, 0x0400,
+   0xbf840034, 0xbefe03c1,
+   0x907c9972, 0x877c817c,
+   0xbf06817c, 0xbf850002,
+   0xbeff0380, 0xbf820001,
+   0xbeff03c1, 0xb96f4306,
+   0x876fc16f, 0xbf840029,
+   0x8f6f866f, 0x8f6f826f,
+   0xbef6036f, 0xb9782a05,
+   0x80788178, 0xbf0d9972,
+   0xbf850002, 0x8f788978,
+   0xbf820001, 0x8f788a78,
+   0xb96e1e06, 0x8f6e8a6e,
+   0x80786e78, 0x8078ff78,
+   0x0200, 0x8078ff78,
+   0x0080, 0xbef603ff,
+   0x0100, 0x907c9972,
+   0x877c817c, 0xbf06817c,
+   0xbefc0380, 0xbf850009,
+   0xe031, 0x781d,
+   0x807cff7c, 0x0080,
+   0x8078ff78, 0x0080,
+   0xbf0a6f7c, 0xbf85fff8,
+   0xbf820008, 0xe031,
+   0x781d, 0x807cff7c,
+   0x0100, 0x8078ff78,
+   0x0100, 0xbf0a6f7c,
+   0xbf85fff8, 0xbef80380,
0xbefe03c1, 0x907c9972,
0x877c817c, 0xbf06817c,
0xbf850002, 0xbeff0380,
0xbf820001, 0xbeff03c1,
-   0xb96f4306, 0x876fc16f,
-   0xbf840029, 0x8f6f866f,
-   0x8f6f826f, 0xbef6036f,
-   0xb9782a05, 0x80788178,
-   0xbf0d9972, 0xbf850002,
-   0x8f788978, 0xbf820001,
-   0x8f788a78, 0xb96e1e06,
-   0x8f6e8a6e, 0x80786e78,
-   0x8078ff78, 0x0200,
-   0x8078ff78, 0x0080

[PATCH] drm/amdgpu: Update Arcturus golden registers

2019-11-20 Thread Jay Cornwall

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8073fcd..9f90448 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -692,6 +692,7 @@ static const struct soc15_reg_golden 
golden_settings_gc_9_4_1_arct[] =
SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CHAN_STEER_4_ARCT, 0x3fff, 
0xb90f5b1),
SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CHAN_STEER_5_ARCT, 0x3ff, 0x135),
SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_CONFIG, 0x, 0x011A),
+   SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_FIFO_SIZES, 0x, 0x0f00),
 };
 
 static const u32 GFX_RLC_SRM_INDEX_CNTL_ADDR_OFFSETS[] =
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdkfd: Extend CU mask to 8 SEs (v2)

2019-08-01 Thread Jay Cornwall

On Thu, Aug 1, 2019, at 13:47, Alex Deucher wrote:
> From: Jay Cornwall 
> 
> Following bitmap layout logic introduced by:
> "drm/amdgpu: support get_cu_info for Arcturus".
> 
> v2: squash in fixup for gfx_v9_0.c (Alex)

There's a second patch to squash, which fixed breakage here (%# swapped):

> - pr_debug("update cu mask to %#x %#x %#x %#x\n",
> + pr_debug("update cu mask to #%x #%x #%x #%x #%x #%x #%x #%x\n",
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/12] drm/amdgpu: send IVs to the KFD only after processing them

2018-09-26 Thread Jay Cornwall

On Wed, Sep 26, 2018, at 08:53, Christian König wrote:
> This allows us to filter out VM faults in the GMC code.
> 
> Signed-off-by: Christian König 

The KFD needs to receive notification of unhandled VM faults; when demand 
paging is disabled or the address is not pageable. It propagates this to the 
UMD (ROC runtime or the ROC debugger).

Does this patch change that behavior?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: KFD event handling questions

2017-10-03 Thread Jay Cornwall

On Mon, Oct 2, 2017, at 08:22, Kuehling, Felix wrote:
> Is the "new debug trap handler" already working? It seems right now I'm
> breaking the "old" debugger backend test. However, given the current
> status of that debugger, I guess we can disable those tests for now?
> 
> Can you speak on behalf of the debugger team, or should I consult someone
> else on their end as well?

The existing debug API was designed to interact with live wavefronts on
the device. This created problems for the scheduler, which needs to be
able to remove wavefronts at any time without notice. The unprivileged
debugger could interfere with privileged operations (e.g. debugging in
the presence of oversubscribed processes, world switch in SR-IOV). It
wasn't developed beyond internal test cases and the ioctls should not
have been upstreamed.

The debugger was redesigned to work with offline wavefront state
collected through wavefront context save (already implemented in the
scheduler and controlled through hsaKmtUpdateQueue). This respects the
privilege model and is robust in all scheduling scenarios.

There are stil some yet-to-be-determined interfaces to control
per-process debugging features. This could extend/repurpose an existing
ioctl or require a new one.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: set sched_hw_submission higher for KIQ

2017-08-22 Thread Jay Cornwall

On Tue, Aug 22, 2017, at 16:17, Felix Kuehling wrote:
> Thanks Alex!
> 
> Jay, do you think this is enough? This bumps the number of concurrent
> operations on KIQ to 4 by default.

I'm not sure what the best number is. Up to 8 KFD processes is common
(beyond that performance drops off due to VMID availability) but I'm not
sure how often they would need to submit to KIQ concurrently. If it's
not expensive I'd just bump it up to say 16.

The performance problem isn't that bad since all the KIQ requests are
serialized but the dmesg spam is not nice. Perhaps lowering the severity
of the 'rcu slot is busy' message would address that as well?

> 
> Regards,
>   Felix
> 
> 
> On 2017-08-22 04:49 PM, Alex Deucher wrote:
> > KIQ doesn't really use the GPU scheduler.  The base
> > drivers generally use the KIQ ring directly rather than
> > submitting IBs.  However, amdgpu_sched_hw_submission
> > (which defaults to 2) limits the number of outstanding
> > fences to 2.  KFD uses the KIQ for TLB flushes and the
> > 2 fence limit hurts performance when there are several KFD
> > processes running.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > index 6c5646b..f39b851 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -170,6 +170,16 @@ int amdgpu_ring_init(struct amdgpu_device *adev, 
> > struct amdgpu_ring *ring,
> >  unsigned irq_type)
> >  {
> > int r;
> > +   int sched_hw_submission = amdgpu_sched_hw_submission;
> > +
> > +   /* Set the hw submission limit higher for KIQ because
> > +* it's used for a number of gfx/compute tasks by both
> > +* KFD and KGD which may have outstanding fences and
> > +* it doesn't really use the gpu scheduler anyway;
> > +* KIQ tasks get submitted directly to the ring.
> > +*/
> > +   if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
> > +   sched_hw_submission *= 2;
> >  
> > if (ring->adev == NULL) {
> > if (adev->num_rings >= AMDGPU_MAX_RINGS)
> > @@ -179,7 +189,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> > amdgpu_ring *ring,
> > ring->idx = adev->num_rings++;
> > adev->rings[ring->idx] = ring;
> > r = amdgpu_fence_driver_init_ring(ring,
> > -   amdgpu_sched_hw_submission);
> > + sched_hw_submission);
> > if (r)
> > return r;
> > }
> > @@ -219,7 +229,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> > amdgpu_ring *ring,
> > }
> >  
> > ring->ring_size = roundup_pow_of_two(max_dw * 4 *
> > -amdgpu_sched_hw_submission);
> > +sched_hw_submission);
> >  
> > ring->buf_mask = (ring->ring_size / 4) - 1;
> > ring->ptr_mask = ring->funcs->support_64bit_ptrs ?
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/4] drm/radeon: Remove initialization of shared_resources.num_mec

2017-07-13 Thread Jay Cornwall

Dead code.

Change-Id: I2383e0b541ed55288570b6a0ec8a0d49cdd4df89
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/radeon/radeon_kfd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c 
b/drivers/gpu/drm/radeon/radeon_kfd.c
index 719ea51..8f8c7c1 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -251,7 +251,6 @@ void radeon_kfd_device_init(struct radeon_device *rdev)
if (rdev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = 0xFF00,
-   .num_mec = 1,
.num_pipe_per_mec = 4,
.num_queue_per_pipe = 8,
.gpuvm_size = (uint64_t)radeon_vm_size << 30
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v3 1/4] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-13 Thread Jay Cornwall

The number of compute queues available to the KFD was erroneously
calculated as 64. Only the first MEC can execute compute queues and
it has 32 queue slots.

This caused the oversubscription limit to be calculated incorrectly,
leading to a missing chained runlist command at the end of an
oversubscribed runlist.

v2: Remove unused num_mec field to avoid duplicate logic
v3: Separate num_mec removal into separate patches

Change-Id: I9e7bba2cc1928b624e3eeb1edb06fdb602e5294f
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7060daf..aa4006a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
/* According to linux/bitmap.h we shouldn't use bitmap_clear if
 * nbits is not compile time constant
 */
-   last_valid_bit = adev->gfx.mec.num_mec
+   last_valid_bit = 1 /* only first MEC can have compute queues */
* adev->gfx.mec.num_pipe_per_mec
* adev->gfx.mec.num_queue_per_pipe;
for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/4] drm/amdkfd: Remove unused references to shared_resources.num_mec

2017-07-13 Thread Jay Cornwall

Dead code.

Change-Id: Ic0bb1bcca87e96bc5e8fa9894727b0de152e8818
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 4 
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ---
 2 files changed, 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 1cf00d4..95f9396 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -494,10 +494,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
} else
kfd->max_proc_per_quantum = hws_max_conc_proc;
 
-   /* We only use the first MEC */
-   if (kfd->shared_resources.num_mec > 1)
-   kfd->shared_resources.num_mec = 1;
-
/* calculate max size of mqds needed for queues */
size = max_num_of_queues_per_device *
kfd->device_info->mqd_size_aligned;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7607989..306144f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -82,13 +82,6 @@ static bool is_pipe_enabled(struct device_queue_manager 
*dqm, int mec, int pipe)
return false;
 }
 
-unsigned int get_mec_num(struct device_queue_manager *dqm)
-{
-   BUG_ON(!dqm || !dqm->dev);
-
-   return dqm->dev->shared_resources.num_mec;
-}
-
 unsigned int get_queues_num(struct device_queue_manager *dqm)
 {
BUG_ON(!dqm || !dqm->dev);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/4] drm/amdgpu: Remove unused field kgd2kfd_shared_resources.num_mec

2017-07-13 Thread Jay Cornwall

Dead code.

Change-Id: I9575aa73b5741b80dc340f953cc773385c92b2be
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  | 1 -
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index aa4006a..8c710f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -116,7 +116,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
if (adev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = global_compute_vmid_bitmap,
-   .num_mec = adev->gfx.mec.num_mec,
.num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
.gpuvm_size = (uint64_t)amdgpu_vm_size << 30
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index a4d2fee..10794b3 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -107,9 +107,6 @@ struct kgd2kfd_shared_resources {
/* Bit n == 1 means VMID n is available for KFD. */
unsigned int compute_vmid_bitmap;
 
-   /* number of mec available from the hardware */
-   uint32_t num_mec;
-
/* number of pipes per mec */
uint32_t num_pipe_per_mec;
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-13 Thread Jay Cornwall

The number of compute queues available to the KFD was erroneously
calculated as 64. Only the first MEC can execute compute queues and
it has 32 queue slots.

This caused the oversubscription limit to be calculated incorrectly,
leading to a missing chained runlist command at the end of an
oversubscribed runlist.

v2: Remove unused num_mec field to avoid duplicate logic

Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 3 +--
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 4 
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   | 3 ---
 drivers/gpu/drm/radeon/radeon_kfd.c   | 1 -
 5 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7060daf..8c710f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -116,7 +116,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
if (adev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = global_compute_vmid_bitmap,
-   .num_mec = adev->gfx.mec.num_mec,
.num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
.gpuvm_size = (uint64_t)amdgpu_vm_size << 30
@@ -140,7 +139,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
/* According to linux/bitmap.h we shouldn't use bitmap_clear if
 * nbits is not compile time constant
 */
-   last_valid_bit = adev->gfx.mec.num_mec
+   last_valid_bit = 1 /* only first MEC can have compute queues */
* adev->gfx.mec.num_pipe_per_mec
* adev->gfx.mec.num_queue_per_pipe;
for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 1cf00d4..95f9396 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -494,10 +494,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
} else
kfd->max_proc_per_quantum = hws_max_conc_proc;
 
-   /* We only use the first MEC */
-   if (kfd->shared_resources.num_mec > 1)
-   kfd->shared_resources.num_mec = 1;
-
/* calculate max size of mqds needed for queues */
size = max_num_of_queues_per_device *
kfd->device_info->mqd_size_aligned;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7607989..306144f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -82,13 +82,6 @@ static bool is_pipe_enabled(struct device_queue_manager 
*dqm, int mec, int pipe)
return false;
 }
 
-unsigned int get_mec_num(struct device_queue_manager *dqm)
-{
-   BUG_ON(!dqm || !dqm->dev);
-
-   return dqm->dev->shared_resources.num_mec;
-}
-
 unsigned int get_queues_num(struct device_queue_manager *dqm)
 {
BUG_ON(!dqm || !dqm->dev);
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index a4d2fee..10794b3 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -107,9 +107,6 @@ struct kgd2kfd_shared_resources {
/* Bit n == 1 means VMID n is available for KFD. */
unsigned int compute_vmid_bitmap;
 
-   /* number of mec available from the hardware */
-   uint32_t num_mec;
-
/* number of pipes per mec */
uint32_t num_pipe_per_mec;
 
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c 
b/drivers/gpu/drm/radeon/radeon_kfd.c
index 719ea51..8f8c7c1 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -251,7 +251,6 @@ void radeon_kfd_device_init(struct radeon_device *rdev)
if (rdev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = 0xFF00,
-   .num_mec = 1,
.num_pipe_per_mec = 4,
.num_queue_per_pipe = 8,
.gpuvm_size = (uint64_t)radeon_vm_size << 30
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-13 Thread Jay Cornwall

On Thu, Jul 13, 2017, at 13:36, Andres Rodriguez wrote:
> On 2017-07-12 02:26 PM, Jay Cornwall wrote:
> > The number of compute queues available to the KFD was erroneously
> > calculated as 64. Only the first MEC can execute compute queues and
> > it has 32 queue slots.
> > 
> > This caused the oversubscription limit to be calculated incorrectly,
> > leading to a missing chained runlist command at the end of an
> > oversubscribed runlist.
> > 
> > Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
> > Signed-off-by: Jay Cornwall 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > index 7060daf..aa4006a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > @@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device 
> > *adev)
> > /* According to linux/bitmap.h we shouldn't use bitmap_clear if
> >  * nbits is not compile time constant
> >  */
> > -   last_valid_bit = adev->gfx.mec.num_mec
> > +   last_valid_bit = 1 /* only first MEC can have compute queues */
> 
> Hey Jay,
> 
> Minor nitpick. We already have some similar resource patching in 
> kgd2kfd_device_init(), and I think it would be good to keep all of these 
> together.

OK. I see shared_resources.num_mec is set to 1 in kgd2kfd_device_init.
That's not very clear (the number of MECs doesn't change) and num_mec
doesn't appear to be used anywhere except in dead code in kfd_device.c.
That code also runs after the queue bitmap setup.

How about I remove that field entirely?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-12 Thread Jay Cornwall

The number of compute queues available to the KFD was erroneously
calculated as 64. Only the first MEC can execute compute queues and
it has 32 queue slots.

This caused the oversubscription limit to be calculated incorrectly,
leading to a missing chained runlist command at the end of an
oversubscribed runlist.

Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7060daf..aa4006a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
/* According to linux/bitmap.h we shouldn't use bitmap_clear if
 * nbits is not compile time constant
 */
-   last_valid_bit = adev->gfx.mec.num_mec
+   last_valid_bit = 1 /* only first MEC can have compute queues */
* adev->gfx.mec.num_pipe_per_mec
* adev->gfx.mec.num_queue_per_pipe;
for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

2017-07-12 Thread Jay Cornwall

On Wed, Jul 12, 2017, at 12:37, Felix Kuehling wrote:
> On 17-07-12 11:59 AM, Alex Deucher wrote:
> > On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling  
> > wrote:
> >> Any comments?
> >>
> >> I believe this is a nice stability improvement. In case of VM faults
> >> they don't take down the whole GPU with an interrupt storm. With KFD we
> >> can recover without a GPU reset in many cases just by unmapping the
> >> offending process' queues.
> > Will this cause any problems with enabling recoverable page faults
> > later?  If not,
> > Acked-by: Alex Deucher 
> 
> Like John said, this will need to be backed out when we enable
> recoverable page faults. The nice thing on Vega10 is, that it's a
> per-VMID setting. That will allow us for example to enable recoverable
> page faults for KFD VMIDs for implementing a real HSA memory model,
> without affecting the graphics VMIDs.

Right, the plan is to re-enable this feature once the interrupt storm
has been resolved. There are a few options for this discussed internally
but not currently implemented as far as I know.

I have a backup plan for implementing recoverable page faults with
no-retry XNACK if that doesn't pan out.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Added more hqd debug messages

2017-03-02 Thread Jay Cornwall

On Wed, Mar 1, 2017, at 16:28, Zeng, Oak wrote:
> COMPUTE_PGM* registers are per pipe per queue - each queue of each pipe
> has a copy of those registers.

COMPUTE_* are ADC registers. These are instantiated once per pipe. The
values they hold corresponds to the most recent values written from the
connected queue (the one selected for execution at a given time) on the
pipe. They're saved to the MQD of the connected queue before a different
queue is selected for execution.

Alex is right. They're not indexed via SRBM_GFX_CNTL.QUEUE.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/gfx7: move eop programming per queue

2016-11-23 Thread Jay Cornwall

On Wed, Nov 23, 2016, at 14:27, Alex Deucher wrote:
> It's per queue not per pipe.

Are you sure? I was under the impression that EOP queeus were per-pipe
on Gfx7 and per-queue on Gfx8 onwards (to support context save/restore).
It's also hinted at by the register name (HPD == Hardware Pipe
Descriptor, HQD == Hardware Queue Descriptor).
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Fix memory trashing if UVD ring test fails

2016-08-10 Thread Jay Cornwall


On 2016-08-10 11:10, Alex Deucher wrote:

On Wed, Aug 3, 2016 at 2:39 PM, Jay Cornwall  wrote:

fence_put was called on an uninitialized variable.

Signed-off-by: Jay Cornwall 


Can you commit this internally or do you need one of us to?

Alex


I'm less familiar with the amdgpu branches. You could point me to the 
right one or commit this from your side.



---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c

index b11f4e8..4aa993d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1187,7 +1187,8 @@ int amdgpu_uvd_ring_test_ib(struct amdgpu_ring 
*ring, long timeout)

r = 0;
}

-error:
fence_put(fence);
+
+error:
return r;
 }
--
2.9.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


--
Jay Cornwall
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

2016-08-09 Thread Jay Cornwall


On 2016-08-09 11:35, Christian König wrote:

Am 09.08.2016 um 17:49 schrieb Jay Cornwall:

On 2016-08-09 07:52, Christian König wrote:

From: Christian König 

We align to 64KB, but when userspace aligns even more we can easily 
use more.


Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e6c030b..88f4109 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -817,13 +817,13 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params*params,
  * allocation size to the fragment size.
  */

-/* SI and newer are optimized for 64KB */
-uint64_t frag_flags = 
AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);

-uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
+const uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;

 uint64_t frag_start = ALIGN(start, frag_align);
 uint64_t frag_end = end & ~(frag_align - 1);

+uint32_t frag;
+
 /* system pages are non continuously */
 if (params->src || params->pages_addr || !(flags & 
AMDGPU_PTE_VALID) ||

 (frag_start >= frag_end)) {
@@ -832,6 +832,10 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params*params,
 return;
 }

+/* use more than 64KB fragment size if possible */
+frag = lower_32_bits(frag_start | frag_end);
+frag = likely(frag) ? __ffs(frag) : 31;
+
 /* handle the 4K area at the beginning */
 if (start != frag_start) {
 amdgpu_vm_update_ptes(params, vm, start, frag_start,
@@ -841,7 +845,7 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params*params,

 /* handle the area in the middle */
 amdgpu_vm_update_ptes(params, vm, frag_start, frag_end, dst,
-  flags | frag_flags);
+  flags | AMDGPU_PTE_FRAG(frag));

 /* handle the 4K area at the end */
 if (frag_end != end) {


Would this change not direct larger fragments away from the BigK TLB 
partition?


My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an 
exact match and not a minimum size. I can't find any immediate 
documentation on that topic to confirm.


Yeah I was questioning that myself as well, especially since you wrote
in the initial patch that SI and later are optimized for 64K.


The 64K figure came from VM documentation. It was otherwise unqualified 
but my guess is it was based on VidMM's page size (64K), the average 
gaming work set size, and the number of BigK entries. Apparently it's 
still good as we haven't changed it since.



So I tested it on Tonga and Polaris10 and it seems to work as
expected, e.g. a 1MB fragment size really results in not reading the
other page table entries as soon as it is cached.

But I'm not sure how exactly this partitioning of the L2 works and
what effect it should have.


OK. As long as there's no regression on e.g. Heaven, where I benchmarked 
the original change at + several percent, then it should be fine.


--
Jay Cornwall
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

2016-08-09 Thread Jay Cornwall


On 2016-08-09 07:52, Christian König wrote:

From: Christian König 

We align to 64KB, but when userspace aligns even more we can easily use 
more.


Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e6c030b..88f4109 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -817,13 +817,13 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params*params,
 * allocation size to the fragment size.
 */

-   /* SI and newer are optimized for 64KB */
-   uint64_t frag_flags = AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
-   uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
+   const uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;

uint64_t frag_start = ALIGN(start, frag_align);
uint64_t frag_end = end & ~(frag_align - 1);

+   uint32_t frag;
+
/* system pages are non continuously */
 	if (params->src || params->pages_addr || !(flags & AMDGPU_PTE_VALID) 
||

(frag_start >= frag_end)) {
@@ -832,6 +832,10 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params*params,
return;
}

+   /* use more than 64KB fragment size if possible */
+   frag = lower_32_bits(frag_start | frag_end);
+   frag = likely(frag) ? __ffs(frag) : 31;
+
/* handle the 4K area at the beginning */
if (start != frag_start) {
amdgpu_vm_update_ptes(params, vm, start, frag_start,
@@ -841,7 +845,7 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params*params,

/* handle the area in the middle */
amdgpu_vm_update_ptes(params, vm, frag_start, frag_end, dst,
- flags | frag_flags);
+ flags | AMDGPU_PTE_FRAG(frag));

/* handle the 4K area at the end */
if (frag_end != end) {


Would this change not direct larger fragments away from the BigK TLB 
partition?


My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an exact 
match and not a minimum size. I can't find any immediate documentation 
on that topic to confirm.


--
Jay Cornwall
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Fix memory trashing if UVD ring test fails

2016-08-03 Thread Jay Cornwall

fence_put was called on an uninitialized variable.

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b11f4e8..4aa993d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1187,7 +1187,8 @@ int amdgpu_uvd_ring_test_ib(struct amdgpu_ring *ring, 
long timeout)
r = 0;
}
 
-error:
fence_put(fence);
+
+error:
return r;
 }
-- 
2.9.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

49 matches

Mail list logo