from:"Rajneesh Bhardwaj"

[PATCH] drm/ttm: Make ttm shrinkers NUMA aware

2024-04-08 Thread Rajneesh Bhardwaj

Otherwise the nid is always passed as 0 during memory reclaim so
make TTM shrinkers NUMA aware.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index dbc96984d331..514261f44b78 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -794,7 +794,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
_pool_debugfs_shrink_fops);
 #endif
 
-   mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+   mm_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "drm-ttm_pool");
if (!mm_shrinker)
return -ENOMEM;
 
-- 
2.34.1

[PATCH] drm/ttm: Implement strict NUMA pool allocations

2024-03-22 Thread Rajneesh Bhardwaj

This change allows TTM to be flexible to honor NUMA localized
allocations which can result in significant performance improvement on a
multi socket NUMA system. On GFXIP 9.4.3 based AMD APUs, we see
manyfold benefits of this change resulting not only in ~10% performance
improvement in certain benchmarks but also generating more consistent
and less sporadic results specially when the NUMA balancing is not
explecitely disabled. In certain scenarios, workloads show a run-to-run
variability e.g. HPL would show a ~10x performance drop after running
back to back 4-5 times and would recover later on a subsequent run. This
is seen with memory intensive other workloads too. It was seen that when
CPU caches were dropped e.g. sudo sysctl -w vm.drop_caches=1, the
variability reduced but the performance was still well below a good run.

Use of __GFP_THISNODE flag ensures that during memory allocation, kernel
prioritizes memory allocations from the local or closest NUMA node
thereby reducing memory access latency. When memory is allocated using
__GFP_THISNODE flag, memory allocations will predominantly be done on
the local node, consequency, the shrinkers may priotitize reclaiming
memory from caches assocoated with local node to maintain memory
locality and minimize latency, thereby provide better shinker targeting.

Reduced memory pressure on remote nodes, can also indirectly influence
shrinker behavior by potentially reducing the frequency and intensity of
memory reclamation operation on remote nodes and could provide improved
overall system performance.

While this change could be more beneficial in general, i.e., without the
use of a module parameter, but in absence of widespread testing, limit
it to the AMD GFXIP 9.4.3 based ttm pool initializations only.


Cc: Joe Greathouse 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  7 ++-
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c | 10 +-
 drivers/gpu/drm/ttm/ttm_device.c  |  2 +-
 drivers/gpu/drm/ttm/ttm_pool.c|  7 ++-
 include/drm/ttm/ttm_pool.h|  4 +++-
 7 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c62552bec34..96532cfc6230 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -253,6 +253,7 @@ extern int amdgpu_user_partt_mode;
 extern int amdgpu_agp;
 
 extern int amdgpu_wbrf;
+extern bool strict_numa_alloc;
 
 #define AMDGPU_VM_MAX_NUM_CTX  4096
 #define AMDGPU_SG_THRESHOLD(256*1024*1024)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 80b9642f2bc4..a183a6b4493d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -781,6 +781,14 @@ int queue_preemption_timeout_ms = 9000;
 module_param(queue_preemption_timeout_ms, int, 0644);
 MODULE_PARM_DESC(queue_preemption_timeout_ms, "queue preemption timeout in ms 
(1 = Minimum, 9000 = default)");
 
+/**
+ * DOC: strict_numa_alloc(bool)
+ * Policy to force NUMA allocation requests from the proximity NUMA domain 
only.
+ */
+bool strict_numa_alloc;
+module_param(strict_numa_alloc, bool, 0444);
+MODULE_PARM_DESC(strict_numa_alloc, "Force NUMA allocation requests to be 
satisfied from the closest node only (false = default)");
+
 /**
  * DOC: debug_evictions(bool)
  * Enable extra debug messages to help determine the cause of evictions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b0ed10f4de60..a9f78f85e28c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1768,6 +1768,7 @@ static int amdgpu_ttm_reserve_tmr(struct amdgpu_device 
*adev)
 
 static int amdgpu_ttm_pools_init(struct amdgpu_device *adev)
 {
+   bool policy = true;
int i;
 
if (!adev->gmc.is_app_apu || !adev->gmc.num_mem_partitions)
@@ -1779,11 +1780,15 @@ static int amdgpu_ttm_pools_init(struct amdgpu_device 
*adev)
if (!adev->mman.ttm_pools)
return -ENOMEM;
 
+   /* Policy not only depends on the module param but also on the ASIC
+* setting use_strict_numa_alloc as well.
+*/
for (i = 0; i < adev->gmc.num_mem_partitions; i++) {
ttm_pool_init(>mman.ttm_pools[i], adev->dev,
  adev->gmc.mem_partitions[i].numa.node,
- false, false);
+ false, false, policy && strict_numa_alloc);
}
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c 
b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
index 2d9cae8cd984..6ff47aac570a 100644
--- a/drivers/gpu/drm/ttm/t

[PATCH] drm/ttm: set max_active to recommened default

2023-11-11 Thread Rajneesh Bhardwaj

To maximize per cpu execution context for the work items, use the
recommended settings i.e. WQ_DFL_ACTIVE(256). There is no apparent
reason to throttle to 16 while process tear down.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index bc97e3dd40f0..5443c0f19213 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -205,7 +205,7 @@ int ttm_device_init(struct ttm_device *bdev, struct 
ttm_device_funcs *funcs,
return ret;
 
bdev->wq = alloc_workqueue("ttm",
-  WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 
16);
+  WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0);
if (!bdev->wq) {
ttm_global_release();
return -ENOMEM;
-- 
2.34.1

[Patch v3] drm/ttm: Schedule delayed_delete worker closer

2023-11-11 Thread Rajneesh Bhardwaj

Try to allocate system memory on the NUMA node the device is closest to
and try to run delayed_delete workers on a CPU of this node as well.

To optimize the memory clearing operation when a TTM BO gets freed by
the delayed_delete worker, scheduling it closer to a NUMA node where the
memory was initially allocated helps avoid the cases where the worker
gets randomly scheduled on the CPU cores that are across interconnect
boundaries such as xGMI, PCIe etc.

This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
APU platforms such as GFXIP9.4.3.

Acked-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
---
Changes in v3:
 * Use WQ_UNBOUND to address the warning reported by CI pipeline.

 drivers/gpu/drm/ttm/ttm_bo.c | 8 +++-
 drivers/gpu/drm/ttm/ttm_device.c | 6 --
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 5757b9415e37..6f28a77a565b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -370,7 +370,13 @@ static void ttm_bo_release(struct kref *kref)
spin_unlock(>bdev->lru_lock);
 
INIT_WORK(>delayed_delete, ttm_bo_delayed_delete);
-   queue_work(bdev->wq, >delayed_delete);
+
+   /* Schedule the worker on the closest NUMA node. This
+* improves performance since system memory might be
+* cleared on free and that is best done on a CPU core
+* close to it.
+*/
+   queue_work_node(bdev->pool.nid, bdev->wq, 
>delayed_delete);
return;
}
 
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 43e27ab77f95..bc97e3dd40f0 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -204,7 +204,8 @@ int ttm_device_init(struct ttm_device *bdev, struct 
ttm_device_funcs *funcs,
if (ret)
return ret;
 
-   bdev->wq = alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGHPRI, 16);
+   bdev->wq = alloc_workqueue("ttm",
+  WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 
16);
if (!bdev->wq) {
ttm_global_release();
return -ENOMEM;
@@ -213,7 +214,8 @@ int ttm_device_init(struct ttm_device *bdev, struct 
ttm_device_funcs *funcs,
bdev->funcs = funcs;
 
ttm_sys_man_init(bdev);
-   ttm_pool_init(>pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
+
+   ttm_pool_init(>pool, dev, dev_to_node(dev), use_dma_alloc, 
use_dma32);
 
bdev->vma_manager = vma_manager;
spin_lock_init(>lru_lock);
-- 
2.34.1

[Patch v2] drm/ttm: Schedule delayed_delete worker closer

2023-11-08 Thread Rajneesh Bhardwaj

Try to allocate system memory on the NUMA node the device is closest to
and try to run delayed_delete workers on a CPU of this node as well.

To optimize the memory clearing operation when a TTM BO gets freed by
the delayed_delete worker, scheduling it closer to a NUMA node where the
memory was initially allocated helps avoid the cases where the worker
gets randomly scheduled on the CPU cores that are across interconnect
boundaries such as xGMI, PCIe etc.

This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
APU platforms such as GFXIP9.4.3.

Acked-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
---

Changes in v2:
 - Absorbed the feedback provided by Christian in the commit message and
   the comment.

 drivers/gpu/drm/ttm/ttm_bo.c | 8 +++-
 drivers/gpu/drm/ttm/ttm_device.c | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 5757b9415e37..6f28a77a565b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -370,7 +370,13 @@ static void ttm_bo_release(struct kref *kref)
spin_unlock(>bdev->lru_lock);
 
INIT_WORK(>delayed_delete, ttm_bo_delayed_delete);
-   queue_work(bdev->wq, >delayed_delete);
+
+   /* Schedule the worker on the closest NUMA node. This
+* improves performance since system memory might be
+* cleared on free and that is best done on a CPU core
+* close to it.
+*/
+   queue_work_node(bdev->pool.nid, bdev->wq, 
>delayed_delete);
return;
}
 
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 43e27ab77f95..72b81a2ee6c7 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -213,7 +213,8 @@ int ttm_device_init(struct ttm_device *bdev, struct 
ttm_device_funcs *funcs,
bdev->funcs = funcs;
 
ttm_sys_man_init(bdev);
-   ttm_pool_init(>pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
+
+   ttm_pool_init(>pool, dev, dev_to_node(dev), use_dma_alloc, 
use_dma32);
 
bdev->vma_manager = vma_manager;
spin_lock_init(>lru_lock);
-- 
2.34.1

[PATCH] drm/ttm: Schedule delayed_delete worker closer

2023-11-07 Thread Rajneesh Bhardwaj

When a TTM BO is getting freed, to optimize the clearing operation on
the workqueue, schedule it closer to a NUMA node where the memory was
allocated. This avoids the cases where the ttm_bo_delayed_delete gets
scheduled on the CPU cores that are across interconnect boundaries such
as xGMI, PCIe etc.

This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
APU platforms such as GFXIP9.4.3.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 10 +-
 drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 5757b9415e37..0d608441a112 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -370,7 +370,15 @@ static void ttm_bo_release(struct kref *kref)
spin_unlock(>bdev->lru_lock);
 
INIT_WORK(>delayed_delete, ttm_bo_delayed_delete);
-   queue_work(bdev->wq, >delayed_delete);
+   /* Schedule the worker on the closest NUMA node, if no
+* CPUs are available, this falls back to any CPU core
+* available system wide. This helps avoid the
+* bottleneck to clear memory in cases where the worker
+* is scheduled on a CPU which is remote to the node
+* where the memory is getting freed.
+*/
+
+   queue_work_node(bdev->pool.nid, bdev->wq, 
>delayed_delete);
return;
}
 
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 43e27ab77f95..72b81a2ee6c7 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -213,7 +213,8 @@ int ttm_device_init(struct ttm_device *bdev, struct 
ttm_device_funcs *funcs,
bdev->funcs = funcs;
 
ttm_sys_man_init(bdev);
-   ttm_pool_init(>pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
+
+   ttm_pool_init(>pool, dev, dev_to_node(dev), use_dma_alloc, 
use_dma32);
 
bdev->vma_manager = vma_manager;
spin_lock_init(>lru_lock);
-- 
2.34.1

[Patch v2] drm/ttm: Use init_on_free to delay release TTM BOs

2023-07-07 Thread Rajneesh Bhardwaj

Delay release TTM BOs when the kernel default setting is init_on_free.
This offloads the overhead of clearing the system memory to the work
item and potentially a different CPU. This could be very beneficial when
the application does a lot of malloc/free style allocations of system
memory.

Reviewed-by: Christian König .
Signed-off-by: Rajneesh Bhardwaj 
---
Changes in v2:
- Updated commit message as per Christian's feedback

 drivers/gpu/drm/ttm/ttm_bo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 326a3d13a829..bd2e7e4f497a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -347,6 +347,7 @@ static void ttm_bo_release(struct kref *kref)
 
if (!dma_resv_test_signaled(bo->base.resv,
DMA_RESV_USAGE_BOOKKEEP) ||
+   (want_init_on_free() && (bo->ttm != NULL)) ||
!dma_resv_trylock(bo->base.resv)) {
/* The BO is not idle, resurrect it for delayed destroy 
*/
ttm_bo_flush_all_fences(bo);
-- 
2.17.1

[PATCH] drm/ttm: Use init_on_free to early release TTM BOs

2023-07-05 Thread Rajneesh Bhardwaj

Early release TTM BOs when the kernel default setting is init_on_free to
wipe out and reinitialize system memory chunks. This could potentially
optimize performance when an application does a lot of malloc/free style
allocations with unified system memory.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 326a3d13a829..bd2e7e4f497a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -347,6 +347,7 @@ static void ttm_bo_release(struct kref *kref)
 
if (!dma_resv_test_signaled(bo->base.resv,
DMA_RESV_USAGE_BOOKKEEP) ||
+   (want_init_on_free() && (bo->ttm != NULL)) ||
!dma_resv_trylock(bo->base.resv)) {
/* The BO is not idle, resurrect it for delayed destroy 
*/
ttm_bo_flush_all_fences(bo);
-- 
2.17.1

[PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-03 Thread Rajneesh Bhardwaj

Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.

[  +0.03] WARNING: possible recursive locking detected
[  +0.03] 5.13.0-kfd-rajneesh #1030 Not tainted
[  +0.04] 
[  +0.02] python/4822 is trying to acquire lock:
[  +0.04] 932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000203]
  but task is already holding lock:
[  +0.03] 932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.17]
  other info that might help us debug this:
[  +0.02]  Possible unsafe locking scenario:

[  +0.03]CPU0
[  +0.02]
[  +0.02]   lock(reservation_ww_class_mutex);
[  +0.04]   lock(reservation_ww_class_mutex);
[  +0.03]
   *** DEADLOCK ***

[  +0.02]  May be due to missing lock nesting notation

[  +0.03] 7 locks held by python/4822:
[  +0.03]  #0: 932c4ac028d0 (>mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[  +0.000232]  #1: 932c55e830a8 (>lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[  +0.000241]  #2: 932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[  +0.000236]  #3: b2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[  +0.000235]  #4: 932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.15]  #5: c045f700 (*(sspp++)){}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[  +0.38]  #6: 932c52da7078 (>eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[  +0.000195]
  stack backtrace:
[  +0.03] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[  +0.05] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[  +0.03] Call Trace:
[  +0.03]  dump_stack+0x6d/0x89
[  +0.10]  __lock_acquire+0xb93/0x1a90
[  +0.09]  lock_acquire+0x25d/0x2d0
[  +0.05]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  ? lock_is_held_type+0xa2/0x110
[  +0.06]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
[  +0.07]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  ? lock_release+0x13f/0x270
[  +0.05]  ? lock_is_held_type+0xa2/0x110
[  +0.06]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
[  +0.10]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
[  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[  +0.000190]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
[  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[  +0.000216]  ? lock_release+0x13f/0x270
[  +0.06]  ? __fget_files+0x107/0x1e0
[  +0.07]  __x64_sys_ioctl+0x8b/0xd0
[  +0.07]  do_syscall_64+0x36/0x70
[  +0.04]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.07] RIP: 0033:0x7fbff90a7317
[  +0.04] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[  +0.05] RSP: 002b:7fbe301fe648 EFLAGS: 0246 ORIG_RAX:
0010
[  +0.06] RAX: ffda RBX: 7fbcc402d820 RCX:
7fbff90a7317
[  +0.03] RDX: 7fbe301fe690 RSI: c0184b18 RDI:
0004
[  +0.03] RBP: 7fbe301fe690 R08:  R09:
7fbcc402d880
[  +0.03] R10: 02001000 R11: 0246 R12:
c0184b18
[  +0.03] R13: 0004 R14: 7fbf689593a0 R15:
7fbcc402d820

Cc: Christian König 
Cc: Felix Kuehling 
Cc: Alex Deucher 

Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is
enabled")
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 36bb41b027ec..6ccd2be685f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

[Patch v5 14/24] drm/amdkfd: CRIU checkpoint and restore events

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  70 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 272 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  27 ++-
 3 files changed, 280 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 608214ea634d..a4be758647f9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1008,57 +1008,11 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
 * through the event_page_offset field.
 */
if (args->event_page_offset) {
-   struct kfd_dev *kfd;
-   struct kfd_process_device *pdd;
-   void *mem, *kern_addr;
-   uint64_t size;
-
-   kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
-   if (!kfd) {
-   pr_err("Getting device by id failed in %s\n", __func__);
-   return -EINVAL;
-   }
-
mutex_lock(>mutex);
-
-   if (p->signal_page) {
-   pr_err("Event page is already set\n");
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   pdd = kfd_bind_process_to_device(kfd, p);
-   if (IS_ERR(pdd)) {
-   err = PTR_ERR(pdd);
-   goto out_unlock;
-   }
-
-   mem = kfd_process_device_translate_handle(pdd,
-   GET_IDR_HANDLE(args->event_page_offset));
-   if (!mem) {
-   pr_err("Can't find BO, offset is 0x%llx\n",
-  args->event_page_offset);
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->adev,
-   mem, _addr, );
-   if (err) {
-   pr_err("Failed to map event page to kernel\n");
-   goto out_unlock;
-   }
-
-   err = kfd_event_page_set(p, kern_addr, size);
-   if (err) {
-   pr_err("Failed to set event page\n");
-   amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(kfd->adev, 
mem);
-   goto out_unlock;
-   }
-
-   p->signal_handle = args->event_page_offset;
-
+   err = kfd_kmap_event_page(p, args->event_page_offset);
mutex_unlock(>mutex);
+   if (err)
+   return err;
}
 
err = kfd_event_create(filp, p, args->event_type,
@@ -1067,10 +1021,7 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
>event_page_offset,
>event_slot_index);
 
-   return err;
-
-out_unlock:
-   mutex_unlock(>mutex);
+   pr_debug("Created event (id:0x%08x) (%s)\n", args->event_id, __func__);
return err;
 }
 
@@ -2031,7 +1982,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
if (ret)
return ret;
 
-   num_events = 0; /* TODO: Implement Events */
+   num_events = kfd_get_num_events(p);
num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
 
*num_objects = num_queues + num_events + num_svm_ranges;
@@ -2040,7 +1991,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
priv_size = sizeof(struct kfd_criu_process_priv_data);
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
priv_size += queues_priv_data_size;
-   /* TODO: Add Events priv size */
+   priv_size += num_events * sizeof(struct 
kfd_criu_event_priv_data);
/* TODO: Add SVM ranges priv size */
*objs_priv_size = priv_size;
}
@@ -2102,7 +2053,10 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto exit_unlock;
 
-   /* TODO: Dump Events */
+   ret = kfd_criu_checkpoint_events(p, (uint8_t __user 
*)args->priv_data,
+_offset);
+   if (ret)
+   goto exit_unlock;
 
/* TODO: Dump SVM-Ranges */
}
@@ -2410,8 +2364,8 @@ static int criu_restore_objects(struct file *filep,
goto exit;
break;
case KFD_CRIU_OBJE

[Patch v5 24/24] drm/amdkfd: Bump up KFD API version for CRIU

2022-02-03 Thread Rajneesh Bhardwaj

 - Change KFD minor version to 7 for CRIU

Proposed userspace changes:
https://github.com/RadeonOpenCompute/criu

Signed-off-by: Rajneesh Bhardwaj 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 49429a6c42fc..e6a56c146920 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -32,9 +32,10 @@
  * - 1.4 - Indicate new SRAM EDC bit in device properties
  * - 1.5 - Add SVM API
  * - 1.6 - Query clear flags in SVM get_attr API
+ * - 1.7 - Checkpoint Restore (CRIU) API
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 6
+#define KFD_IOCTL_MINOR_VERSION 7
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.17.1

[Patch v5 19/24] drm/amdkfd: use user_gpu_id for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj

Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore
support its possible that the SVM ranges can be resumed on another node
where the actual_gpu_id may not be same as the original (user_gpu_id)
gpu id. So modify svm code to use user_gpu_id.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 06e6e9180fbc..8e2780d2f735 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1797,7 +1797,7 @@ int kfd_process_gpuidx_from_gpuid(struct kfd_process *p, 
uint32_t gpu_id)
int i;
 
for (i = 0; i < p->n_pdds; i++)
-   if (p->pdds[i] && gpu_id == p->pdds[i]->dev->id)
+   if (p->pdds[i] && gpu_id == p->pdds[i]->user_gpu_id)
return i;
return -EINVAL;
 }
@@ -1810,7 +1810,7 @@ kfd_process_gpuid_from_adev(struct kfd_process *p, struct 
amdgpu_device *adev,
 
for (i = 0; i < p->n_pdds; i++)
if (p->pdds[i] && p->pdds[i]->dev->adev == adev) {
-   *gpuid = p->pdds[i]->dev->id;
+   *gpuid = p->pdds[i]->user_gpu_id;
*gpuidx = i;
return 0;
}
-- 
2.17.1

[Patch v5 15/24] drm/amdkfd: CRIU implement gpu_id remapping

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 468 --
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  45 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  11 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  32 ++
 .../amd/amdkfd/kfd_process_queue_manager.c|  18 +-
 5 files changed, 414 insertions(+), 160 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a4be758647f9..69edeaf3893e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -293,14 +293,17 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
return err;
 
pr_debug("Looking for gpu id 0x%x\n", args->gpu_id);
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev) {
-   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
-   return -EINVAL;
-   }
 
mutex_lock(>mutex);
 
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto err_pdd;
+   }
+   dev = pdd->dev;
+
pdd = kfd_bind_process_to_device(dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
@@ -345,6 +348,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
 
 err_create_queue:
 err_bind_process:
+err_pdd:
mutex_unlock(>mutex);
return err;
 }
@@ -491,7 +495,6 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_memory_policy_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
enum cache_policy default_policy, alternate_policy;
@@ -506,13 +509,15 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
return -EINVAL;
}
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto err_pdd;
+   }
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -525,7 +530,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
(args->alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
   ? cache_policy_coherent : cache_policy_noncoherent;
 
-   if (!dev->dqm->ops.set_cache_memory_policy(dev->dqm,
+   if (!pdd->dev->dqm->ops.set_cache_memory_policy(pdd->dev->dqm,
>qpd,
default_policy,
alternate_policy,
@@ -534,6 +539,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
err = -EINVAL;
 
 out:
+err_pdd:
mutex_unlock(>mutex);
 
return err;
@@ -543,17 +549,18 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_trap_handler_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(>mutex);
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   err = -EINVAL;
+   goto err_pdd;
+   }
+
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -562,6 +569,7 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
kfd_process_set_trap_handler(>qpd, args->tba_addr, args->tma_addr);
 
 out:
+err_pdd:
mutex_unlock(>mutex);
 
return err;
@@ -577,16 +585,20 @@ static int kfd_ioctl_dbg_register(struct file *filep,
bool create_ok;
long status = 0;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -E

[Patch v5 22/24] drm/amdkfd: CRIU prepare for svm resume

2022-02-03 Thread Rajneesh Bhardwaj

During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an imminent CRIU
resume phase.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 58 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 12 +
 4 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 721c86ceba22..c143f242a84d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2643,8 +2643,8 @@ static int criu_restore_objects(struct file *filep,
goto exit;
break;
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
-   /* TODO: Implement SVM range */
-   *priv_offset += sizeof(struct 
kfd_criu_svm_range_priv_data);
+   ret = kfd_criu_restore_svm(p, (uint8_t __user 
*)args->priv_data,
+priv_offset, 
max_priv_data_size);
if (ret)
goto exit;
break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 715dd0d4fac5..74ff4132a163 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -790,6 +790,7 @@ struct svm_range_list {
struct list_headlist;
struct work_struct  deferred_list_work;
struct list_headdeferred_range_list;
+   struct list_headcriu_svm_metadata_list;
spinlock_t  deferred_list_lock;
atomic_tevicted_ranges;
atomic_tdrain_pagefaults;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7cf63995c079..41ac049b3316 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -45,6 +45,11 @@
  */
 #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING   2000
 
+struct criu_svm_metadata {
+   struct list_head list;
+   struct kfd_criu_svm_range_priv_data data;
+};
+
 static void svm_range_evict_svm_bo_worker(struct work_struct *work);
 static bool
 svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
@@ -2875,6 +2880,7 @@ int svm_range_list_init(struct kfd_process *p)
INIT_DELAYED_WORK(>restore_work, svm_range_restore_work);
INIT_WORK(>deferred_list_work, svm_range_deferred_list_work);
INIT_LIST_HEAD(>deferred_range_list);
+   INIT_LIST_HEAD(>criu_svm_metadata_list);
spin_lock_init(>deferred_list_lock);
 
for (i = 0; i < p->n_pdds; i++)
@@ -3481,6 +3487,58 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int kfd_criu_restore_svm(struct kfd_process *p,
+uint8_t __user *user_priv_ptr,
+uint64_t *priv_data_offset,
+uint64_t max_priv_data_size)
+{
+   uint64_t svm_priv_data_size, svm_object_md_size, svm_attrs_size;
+   int nattr_common = 4, nattr_accessibility = 1;
+   struct criu_svm_metadata *criu_svm_md = NULL;
+   struct svm_range_list *svms = >svms;
+   uint32_t num_devices;
+   int ret = 0;
+
+   num_devices = p->n_pdds;
+   /* Handle one SVM range object at a time, also the number of gpus are
+* assumed to be same on the restore node, checking must be done while
+* evaluating the topology earlier */
+
+   svm_attrs_size = sizeof(struct kfd_ioctl_svm_attribute) *
+   (nattr_common + nattr_accessibility * num_devices);
+   svm_object_md_size = sizeof(struct criu_svm_metadata) + svm_attrs_size;
+
+   svm_priv_data_size = sizeof(struct kfd_criu_svm_range_priv_data) +
+   svm_attrs_size;
+
+   criu_svm_md = kzalloc(svm_object_md_size, GFP_KERNEL);
+   if (!criu_svm_md) {
+   pr_err("failed to allocate memory to store svm metadata\n");
+   return -ENOMEM;
+   }
+   if (*priv_data_offset + svm_priv_data_size > max_priv_data_size) {
+   ret = -EINVAL;
+   goto exit;
+   }
+
+   ret = copy_from_user(_svm_md->data, user_priv_ptr + 
*priv_data_offset,
+svm_priv_data_size);
+   if (ret) {
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_data_offset += svm_priv_data_size;
+
+   list_add_tail(_svm_md->list, >criu_svm_metadata_list);
+
+   return 0;
+

[Patch v5 21/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2022-02-03 Thread Rajneesh Bhardwaj

During checkpoint stage, save the shared virtual memory ranges and
attributes for the target process. A process may contain a number of svm
ranges and each range might contain a number of attributes. While not
all attributes may be applicable for a given prange but during
checkpoint we store all possible values for the max possible attribute
types.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a755ea68a428..721c86ceba22 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2196,7 +2196,9 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto close_bo_fds;
 
-   /* TODO: Dump SVM-Ranges */
+   ret = kfd_criu_checkpoint_svm(p, (uint8_t __user 
*)args->priv_data, _offset);
+   if (ret)
+   goto close_bo_fds;
}
 
 close_bo_fds:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 64cd7712c098..7cf63995c079 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3540,6 +3540,101 @@ int svm_range_get_info(struct kfd_process *p, uint32_t 
*num_svm_ranges,
return 0;
 }
 
+int kfd_criu_checkpoint_svm(struct kfd_process *p,
+   uint8_t __user *user_priv_data,
+   uint64_t *priv_data_offset)
+{
+   struct kfd_criu_svm_range_priv_data *svm_priv = NULL;
+   struct kfd_ioctl_svm_attribute *query_attr = NULL;
+   uint64_t svm_priv_data_size, query_attr_size = 0;
+   int index, nattr_common = 4, ret = 0;
+   struct svm_range_list *svms;
+   int num_devices = p->n_pdds;
+   struct svm_range *prange;
+   struct mm_struct *mm;
+
+   svms = >svms;
+   if (!svms)
+   return -EINVAL;
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_err("failed to get mm for the target process\n");
+   return -ESRCH;
+   }
+
+   query_attr_size = sizeof(struct kfd_ioctl_svm_attribute) *
+   (nattr_common + num_devices);
+
+   query_attr = kzalloc(query_attr_size, GFP_KERNEL);
+   if (!query_attr) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   query_attr[0].type = KFD_IOCTL_SVM_ATTR_PREFERRED_LOC;
+   query_attr[1].type = KFD_IOCTL_SVM_ATTR_PREFETCH_LOC;
+   query_attr[2].type = KFD_IOCTL_SVM_ATTR_SET_FLAGS;
+   query_attr[3].type = KFD_IOCTL_SVM_ATTR_GRANULARITY;
+
+   for (index = 0; index < num_devices; index++) {
+   struct kfd_process_device *pdd = p->pdds[index];
+
+   query_attr[index + nattr_common].type =
+   KFD_IOCTL_SVM_ATTR_ACCESS;
+   query_attr[index + nattr_common].value = pdd->user_gpu_id;
+   }
+
+   svm_priv_data_size = sizeof(*svm_priv) + query_attr_size;
+
+   svm_priv = kzalloc(svm_priv_data_size, GFP_KERNEL);
+   if (!svm_priv) {
+   ret = -ENOMEM;
+   goto exit_query;
+   }
+
+   index = 0;
+   list_for_each_entry(prange, >list, list) {
+
+   svm_priv->object_type = KFD_CRIU_OBJECT_TYPE_SVM_RANGE;
+   svm_priv->start_addr = prange->start;
+   svm_priv->size = prange->npages;
+   memcpy(_priv->attrs, query_attr, query_attr_size);
+   pr_debug("CRIU: prange: 0x%p start: 0x%lx\t npages: 0x%llx end: 
0x%llx\t size: 0x%llx\n",
+prange, prange->start, prange->npages,
+prange->start + prange->npages - 1,
+prange->npages * PAGE_SIZE);
+
+   ret = svm_range_get_attr(p, mm, svm_priv->start_addr,
+svm_priv->size,
+(nattr_common + num_devices),
+svm_priv->attrs);
+   if (ret) {
+   pr_err("CRIU: failed to obtain range attributes\n");
+   goto exit_priv;
+   }
+
+   ret = copy_to_user(user_priv_data + *priv_data_offset,
+  svm_priv, svm_priv_data_size);
+   if (ret) {
+   pr_err("Failed to copy svm priv to user\n");
+   goto exit_priv;
+   }
+
+   *priv_data_offset += svm_priv_data_size;
+
+   }
+
+
+exit_priv:
+   kfree(svm_priv);
+exit_query:
+   kfree(query_attr);
+exit

[Patch v5 17/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2022-02-03 Thread Rajneesh Bhardwaj

Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically used for querying
the current xnack mode without modifying it.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 15 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ab5107a3fe36..3ec44f71307d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1848,6 +1848,11 @@ static int criu_checkpoint_process(struct kfd_process *p,
memset(_priv, 0, sizeof(process_priv));
 
process_priv.version = KFD_CRIU_PRIV_VERSION;
+   /* For CR, we don't consider negative xnack mode which is used for
+* querying without changing it, here 0 simply means disabled and 1
+* means enabled so retry for finding a valid PTE.
+*/
+   process_priv.xnack_mode = p->xnack_enabled ? 1 : 0;
 
ret = copy_to_user(user_priv_data + *priv_offset,
_priv, sizeof(process_priv));
@@ -2241,6 +2246,16 @@ static int criu_restore_process(struct kfd_process *p,
return -EINVAL;
}
 
+   pr_debug("Setting XNACK mode\n");
+   if (process_priv.xnack_mode && !kfd_process_xnack_mode(p, true)) {
+   pr_err("xnack mode cannot be set\n");
+   ret = -EPERM;
+   goto exit;
+   } else {
+   pr_debug("set xnack mode: %d\n", process_priv.xnack_mode);
+   p->xnack_enabled = process_priv.xnack_mode;
+   }
+
 exit:
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index df68c4274bd9..903ad4a263f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1056,6 +1056,7 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
 
 struct kfd_criu_process_priv_data {
uint32_t version;
+   uint32_t xnack_mode;
 };
 
 struct kfd_criu_device_priv_data {
-- 
2.17.1

[Patch v5 20/24] drm/amdkfd: CRIU Discover svm ranges

2022-02-03 Thread Rajneesh Bhardwaj

A KFD process may contain a number of virtual address ranges for shared
virtual memory management and each such range can have many SVM
attributes spanning across various nodes within the process boundary.
This change reports the total number of such SVM ranges and
their total private data size by extending the PROCESS_INFO op of the the
CRIU IOCTL to discover the svm ranges in the target process and a future
patches brings in the required support for checkpoint and restore for
SVM ranges.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 59 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 11 +
 4 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 3ec44f71307d..a755ea68a428 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2099,10 +2099,9 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
uint32_t *num_objects,
uint64_t *objs_priv_size)
 {
-   int ret;
-   uint64_t priv_size;
+   uint64_t queues_priv_data_size, svm_priv_data_size, priv_size;
uint32_t num_queues, num_events, num_svm_ranges;
-   uint64_t queues_priv_data_size;
+   int ret;
 
*num_devices = p->n_pdds;
*num_bos = get_process_num_bos(p);
@@ -2112,7 +2111,10 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
return ret;
 
num_events = kfd_get_num_events(p);
-   num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
+
+   ret = svm_range_get_info(p, _svm_ranges, _priv_data_size);
+   if (ret)
+   return ret;
 
*num_objects = num_queues + num_events + num_svm_ranges;
 
@@ -2122,7 +2124,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
priv_size += queues_priv_data_size;
priv_size += num_events * sizeof(struct 
kfd_criu_event_priv_data);
-   /* TODO: Add SVM ranges priv size */
+   priv_size += svm_priv_data_size;
*objs_priv_size = priv_size;
}
return 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 903ad4a263f0..715dd0d4fac5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1082,7 +1082,10 @@ enum kfd_criu_object_type {
 
 struct kfd_criu_svm_range_priv_data {
uint32_t object_type;
-   uint32_t reserved;
+   uint64_t start_addr;
+   uint64_t size;
+   /* Variable length array of attributes */
+   struct kfd_ioctl_svm_attribute attrs[0];
 };
 
 struct kfd_criu_queue_priv_data {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index d34508f5e88b..64cd7712c098 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3481,6 +3481,65 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int svm_range_get_info(struct kfd_process *p, uint32_t *num_svm_ranges,
+  uint64_t *svm_priv_data_size)
+{
+   uint64_t total_size, accessibility_size, common_attr_size;
+   int nattr_common = 4, nattr_accessibility = 1;
+   int num_devices = p->n_pdds;
+   struct svm_range_list *svms;
+   struct svm_range *prange;
+   uint32_t count = 0;
+
+   *svm_priv_data_size = 0;
+
+   svms = >svms;
+   if (!svms)
+   return -EINVAL;
+
+   mutex_lock(>lock);
+   list_for_each_entry(prange, >list, list) {
+   pr_debug("prange: 0x%p start: 0x%lx\t npages: 0x%llx\t end: 
0x%llx\n",
+prange, prange->start, prange->npages,
+prange->start + prange->npages - 1);
+   count++;
+   }
+   mutex_unlock(>lock);
+
+   *num_svm_ranges = count;
+   /* Only the accessbility attributes need to be queried for all the gpus
+* individually, remaining ones are spanned across the entire process
+* regardless of the various gpu nodes. Of the remaining attributes,
+* KFD_IOCTL_SVM_ATTR_CLR_FLAGS need not be saved.
+*
+* KFD_IOCTL_SVM_ATTR_PREFERRED_LOC
+* KFD_IOCTL_SVM_ATTR_PREFETCH_LOC
+* KFD_IOCTL_SVM_ATTR_SET_FLAGS
+* KFD_IOCTL_SVM_ATTR_GRANULARITY
+*
+* ** ACCESSBILITY ATTRIBUTES **
+* (Considered as one, type is altered during query, value is gpuid)
+* KFD_IOCTL_SVM_ATTR_ACCESS
+* KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE
+* KFD_IOCTL_SVM_ATTR_NO_ACCESS

[Patch v5 13/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 ---
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  9 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  | 11 +++-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 13 ++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 14 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 29 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 22 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
 .../amd/amdkfd/kfd_process_queue_manager.c| 62 +--
 11 files changed, 138 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 999672602252..608214ea634d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -311,7 +311,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL, NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 3a5303ebcabf..8eca9ed3ab36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL, NULL, NULL);
+   , , NULL, NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 42933610d4e1..63b3c7af681b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -323,7 +323,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
struct queue *q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void 
*restore_ctl_stack)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -385,7 +385,8 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
 
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
->properties, restore_mqd);
+>properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
@@ -1342,7 +1343,7 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void *restore_ctl_stack)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1391,7 +1392,8 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
 
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
->properties, restore_mqd);
+>properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
@@ -1799,7 +1801,8 @@ static int get_wave_state(struct device_queue_manager 
*dqm,
 
 static void get_queue_checkpoint_info(struct device_queue_manager *dqm,
const struct queue *q,
-   u32 *mqd_size)
+   u32 *mqd_size,
+   u32 *ctl_stack_size)
 {
struct mqd_manager *mqd_mgr

[Patch v5 23/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2022-02-03 Thread Rajneesh Bhardwaj

In CRIU resume stage, resume all the shared virtual memory ranges from
the data stored inside the resuming kfd process during CRIU restore
phase. Also setup xnack mode and free up the resources.

KFD_IOCTL_SVM_ATTR_CLR_FLAGS is not available for querying via get_attr
interface but we must clear the flags during restore as there might be
some default flags set when the prange is created. Also handle the
invalid PREFETCH atribute values saved during checkpoint by replacing
them with another dummy KFD_IOCTL_SVM_ATTR_SET_FLAGS attribute.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  10 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 102 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |   6 ++
 3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index c143f242a84d..64e3b4e3a712 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2766,7 +2766,17 @@ static int criu_resume(struct file *filep,
}
 
mutex_lock(>mutex);
+   ret = kfd_criu_resume_svm(target);
+   if (ret) {
+   pr_err("kfd_criu_resume_svm failed for %i\n", args->pid);
+   goto exit;
+   }
+
ret =  amdgpu_amdkfd_criu_resume(target->kgd_process_info);
+   if (ret)
+   pr_err("amdgpu_amdkfd_criu_resume failed for %i\n", args->pid);
+
+exit:
mutex_unlock(>mutex);
 
kfd_unref_process(target);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 41ac049b3316..30ae21953da5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3487,6 +3487,108 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int kfd_criu_resume_svm(struct kfd_process *p)
+{
+   struct kfd_ioctl_svm_attribute *set_attr = NULL;
+   int nattr_common = 4, nattr_accessibility = 1;
+   struct criu_svm_metadata *criu_svm_md = NULL;
+   struct svm_range_list *svms = >svms;
+   struct criu_svm_metadata *next = NULL;
+   uint32_t set_flags = 0x;
+   int i, j, num_attrs, ret = 0;
+   uint64_t set_attr_size;
+   struct mm_struct *mm;
+
+   if (list_empty(>criu_svm_metadata_list)) {
+   pr_debug("No SVM data from CRIU restore stage 2\n");
+   return ret;
+   }
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_err("failed to get mm for the target process\n");
+   return -ESRCH;
+   }
+
+   num_attrs = nattr_common + (nattr_accessibility * p->n_pdds);
+
+   i = j = 0;
+   list_for_each_entry(criu_svm_md, >criu_svm_metadata_list, list) {
+   pr_debug("criu_svm_md[%d]\n\tstart: 0x%llx size: 0x%llx 
(npages)\n",
+i, criu_svm_md->data.start_addr, 
criu_svm_md->data.size);
+
+   for (j = 0; j < num_attrs; j++) {
+   pr_debug("\ncriu_svm_md[%d]->attrs[%d].type : 0x%x 
\ncriu_svm_md[%d]->attrs[%d].value : 0x%x\n",
+i,j, criu_svm_md->data.attrs[j].type,
+i,j, criu_svm_md->data.attrs[j].value);
+   switch (criu_svm_md->data.attrs[j].type) {
+   /* During Checkpoint operation, the query for
+* KFD_IOCTL_SVM_ATTR_PREFETCH_LOC attribute might
+* return KFD_IOCTL_SVM_LOCATION_UNDEFINED if they were
+* not used by the range which was checkpointed. Care
+* must be taken to not restore with an invalid value
+* otherwise the gpuidx value will be invalid and
+* set_attr would eventually fail so just replace those
+* with another dummy attribute such as
+* KFD_IOCTL_SVM_ATTR_SET_FLAGS.
+*/
+   case KFD_IOCTL_SVM_ATTR_PREFETCH_LOC:
+   if (criu_svm_md->data.attrs[j].value ==
+   KFD_IOCTL_SVM_LOCATION_UNDEFINED) {
+   criu_svm_md->data.attrs[j].type =
+   KFD_IOCTL_SVM_ATTR_SET_FLAGS;
+   criu_svm_md->data.attrs[j].value = 0;
+   }
+   break;
+   case KFD_IOCTL_SVM_ATTR_SET_FLAGS:
+   set_flags = criu_svm_md->data.attrs[j].value;
+   break;
+   default:
+   break;
+   }
+   }
+
+

[Patch v5 18/24] drm/amdkfd: CRIU allow external mm for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj

Both svm_range_get_attr and svm_range_set_attr helpers use mm struct
from current but for a Checkpoint or Restore operation, the current->mm
will fetch the mm for the CRIU master process. So modify these helpers to
accept the task mm for a target kfd process to support Checkpoint
Restore.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ffec25e642e2..d34508f5e88b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3203,10 +3203,10 @@ static void svm_range_evict_svm_bo_worker(struct 
work_struct *work)
 }
 
 static int
-svm_range_set_attr(struct kfd_process *p, uint64_t start, uint64_t size,
-  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
+  uint64_t start, uint64_t size, uint32_t nattr,
+  struct kfd_ioctl_svm_attribute *attrs)
 {
-   struct mm_struct *mm = current->mm;
struct list_head update_list;
struct list_head insert_list;
struct list_head remove_list;
@@ -3305,8 +3305,9 @@ svm_range_set_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
 }
 
 static int
-svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size,
-  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+svm_range_get_attr(struct kfd_process *p, struct mm_struct *mm,
+  uint64_t start, uint64_t size, uint32_t nattr,
+  struct kfd_ioctl_svm_attribute *attrs)
 {
DECLARE_BITMAP(bitmap_access, MAX_GPU_INSTANCE);
DECLARE_BITMAP(bitmap_aip, MAX_GPU_INSTANCE);
@@ -3316,7 +3317,6 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
bool get_accessible = false;
bool get_flags = false;
uint64_t last = start + size - 1UL;
-   struct mm_struct *mm = current->mm;
uint8_t granularity = 0xff;
struct interval_tree_node *node;
struct svm_range_list *svms;
@@ -3485,6 +3485,7 @@ int
 svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op op, uint64_t start,
  uint64_t size, uint32_t nattrs, struct kfd_ioctl_svm_attribute *attrs)
 {
+   struct mm_struct *mm = current->mm;
int r;
 
start >>= PAGE_SHIFT;
@@ -3492,10 +3493,10 @@ svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op 
op, uint64_t start,
 
switch (op) {
case KFD_IOCTL_SVM_OP_SET_ATTR:
-   r = svm_range_set_attr(p, start, size, nattrs, attrs);
+   r = svm_range_set_attr(p, mm, start, size, nattrs, attrs);
break;
case KFD_IOCTL_SVM_OP_GET_ATTR:
-   r = svm_range_get_attr(p, start, size, nattrs, attrs);
+   r = svm_range_get_attr(p, mm, start, size, nattrs, attrs);
break;
default:
r = EINVAL;
-- 
2.17.1

[Patch v5 09/24] drm/amdkfd: CRIU restore queue ids

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 37 +++
 4 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d049f9cbbc79..d35911550792 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -311,7 +311,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 1e30717b5253..0c50e67e2b51 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL);
+   , , NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 41aa7b150a96..59125d8f16a7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -461,6 +461,7 @@ enum KFD_QUEUE_PRIORITY {
  * it's user mode or kernel mode queue.
  *
  */
+
 struct queue_properties {
enum kfd_queue_type type;
enum kfd_queue_format format;
@@ -1156,6 +1157,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue_properties(struct process_queue_manager *pqm, unsigned 
int qid,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 38d3217f0f67..75bad4381421 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -42,6 +42,20 @@ static inline struct process_queue_node *get_queue_by_qid(
return NULL;
 }
 
+static int assign_queue_slot_by_qid(struct process_queue_manager *pqm,
+   unsigned int qid)
+{
+   if (qid >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+   return -EINVAL;
+
+   if (__test_and_set_bit(qid, pqm->queue_slot_bitmap)) {
+   pr_err("Cannot create new queue because requested qid(%u) is in 
use\n", qid);
+   return -ENOSPC;
+   }
+
+   return 0;
+}
+
 static int find_available_queue_slot(struct process_queue_manager *pqm,
unsigned int *qid)
 {
@@ -193,6 +207,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process)
 {
int retval;
@@ -224,7 +239,12 @@ int pqm_create_queue(struct process_queue_manager *pqm,
if (pdd->qpd.queue_count >= max_queues)
return -ENOSPC;
 
-   retval = find_available_queue_slot(pqm, qid);
+   if (q_data) {
+   retval = assign_queue_slot_by_qid(pqm, q_data->q_id);
+   *qid = q_data->q_id;
+   } else
+   retval = find_available_queue_slot(pqm, qid);
+
if (retval != 0)
return retval;
 
@@ -527,7 +547,7 @@ int kfd_process_get_queue_info(struct kfd_process *p,
return 0;
 }
 
-static void criu_dump_queue(struct kfd_process_device *pdd,
+static void criu_checkpoint_queue(struct kfd_process_device *pdd,
   struct queue *q,
   struct kfd_criu_queue_priv_data *q_data)
 {
@@ -559,7 +579,7 @@ static void criu_dump_queue(struct kfd_process

[Patch v5 10/24] drm/amdkfd: CRIU restore sdma id for queues

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  4 +-
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 4b6814949aad..15fa2dc6dcba 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -58,7 +58,7 @@ static inline void deallocate_hqd(struct device_queue_manager 
*dqm,
struct queue *q);
 static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q);
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q);
+   struct queue *q, const uint32_t 
*restore_sdma_id);
 static void kfd_process_hw_exception(struct work_struct *work);
 
 static inline
@@ -299,7 +299,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
-   struct qcm_process_device *qpd)
+   struct qcm_process_device *qpd,
+   const struct kfd_criu_queue_priv_data *qd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -339,7 +340,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
q->pipe, q->queue);
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
-   retval = allocate_sdma_queue(dqm, q);
+   retval = allocate_sdma_queue(dqm, q, qd ? >sdma_id : NULL);
if (retval)
goto deallocate_vmid;
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1034,7 +1035,7 @@ static void pre_reset(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q)
+   struct queue *q, const uint32_t 
*restore_sdma_id)
 {
int bit;
 
@@ -1044,9 +1045,21 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
return -ENOMEM;
}
 
-   bit = __ffs64(dqm->sdma_bitmap);
-   dqm->sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->sdma_bitmap & (1ULL << *restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   /* Find first available sdma_id */
+   bit = __ffs64(dqm->sdma_bitmap);
+   dqm->sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
+
q->properties.sdma_engine_id = q->sdma_id %
kfd_get_num_sdma_engines(dqm->dev);
q->properties.sdma_queue_id = q->sdma_id /
@@ -1056,9 +1069,19 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
pr_err("No more XGMI SDMA queue to allocate\n");
return -ENOMEM;
}
-   bit = __ffs64(dqm->xgmi_sdma_bitmap);
-   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->xgmi_sdma_bitmap & (1ULL << 
*restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   bit = __ffs64(dqm->xgmi_sdma_bitmap);
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
/* sdma_engine_id is sdma id including
 * both

[Patch v5 11/24] drm/amdkfd: CRIU restore queue doorbell id

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.

Signed-off-by: David Yat Sin 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +--
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 15fa2dc6dcba..13317d2c8959 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -144,7 +144,13 @@ static void decrement_queue_count(struct 
device_queue_manager *dqm,
dqm->active_cp_queue_count--;
 }
 
-static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+/*
+ * Allocate a doorbell ID to this queue.
+ * If doorbell_id is passed in, make sure requested ID is valid then allocate 
it.
+ */
+static int allocate_doorbell(struct qcm_process_device *qpd,
+struct queue *q,
+uint32_t const *restore_id)
 {
struct kfd_dev *dev = qpd->dqm->dev;
 
@@ -152,6 +158,10 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
/* On pre-SOC15 chips we need to use the queue ID to
 * preserve the user mode ABI.
 */
+
+   if (restore_id && *restore_id != q->properties.queue_id)
+   return -EINVAL;
+
q->doorbell_id = q->properties.queue_id;
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
@@ -160,25 +170,37 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
 * The doobell index distance between RLC (2*i) and (2*i+1)
 * for a SDMA engine is 512.
 */
-   uint32_t *idx_offset =
-   dev->shared_resources.sdma_doorbell_idx;
 
-   q->doorbell_id = idx_offset[q->properties.sdma_engine_id]
-   + (q->properties.sdma_queue_id & 1)
-   * KFD_QUEUE_DOORBELL_MIRROR_OFFSET
-   + (q->properties.sdma_queue_id >> 1);
+   uint32_t *idx_offset = dev->shared_resources.sdma_doorbell_idx;
+   uint32_t valid_id = idx_offset[q->properties.sdma_engine_id]
+   + (q->properties.sdma_queue_id 
& 1)
+   * 
KFD_QUEUE_DOORBELL_MIRROR_OFFSET
+   + (q->properties.sdma_queue_id 
>> 1);
+
+   if (restore_id && *restore_id != valid_id)
+   return -EINVAL;
+   q->doorbell_id = valid_id;
} else {
-   /* For CP queues on SOC15 reserve a free doorbell ID */
-   unsigned int found;
-
-   found = find_first_zero_bit(qpd->doorbell_bitmap,
-   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
-   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
-   pr_debug("No doorbells available");
-   return -EBUSY;
+   /* For CP queues on SOC15 */
+   if (restore_id) {
+   /* make sure that ID is free  */
+   if (__test_and_set_bit(*restore_id, 
qpd->doorbell_bitmap))
+   return -EINVAL;
+
+   q->doorbell_id = *restore_id;
+   } else {
+   /* or reserve a free doorbell ID */
+   unsigned int found;
+
+   found = find_first_zero_bit(qpd->doorbell_bitmap,
+   
KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+   pr_debug("No doorbells available");
+   return -EBUSY;
+   }
+   set_bit(found, qpd->doorbell_bitmap);
+   q->doorbell_id = found;
}
-   set_bit(found, qpd->doorbell_bitmap);
-   q->doorbell_id = found;
}
 
q->properties.doorbell_off =
@@ -346,7 +368,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? >doorbell_id : NULL);
if (retval)
goto out_deallocate_hqd;
 
@@ -1333,7 +1355,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
goto out;
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? >doorbell_id :

[Patch v5 16/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2022-02-03 Thread Rajneesh Bhardwaj

KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to further process
the sdma command submissions.

With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.

Suggested-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71 +++-
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 69edeaf3893e..ab5107a3fe36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
@@ -42,6 +43,7 @@
 #include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "amdgpu_dma_buf.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1936,6 +1938,33 @@ uint32_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_get_prime_handle(struct drm_gem_object *gobj, int flags,
+ u32 *shared_fd)
+{
+   struct dma_buf *dmabuf;
+   int ret;
+
+   dmabuf = amdgpu_gem_prime_export(gobj, flags);
+   if (IS_ERR(dmabuf)) {
+   ret = PTR_ERR(dmabuf);
+   pr_err("dmabuf export failed for the BO\n");
+   return ret;
+   }
+
+   ret = dma_buf_fd(dmabuf, flags);
+   if (ret < 0) {
+   pr_err("dmabuf create fd failed, ret:%d\n", ret);
+   goto out_free_dmabuf;
+   }
+
+   *shared_fd = ret;
+   return 0;
+
+out_free_dmabuf:
+   dma_buf_put(dmabuf);
+   return ret;
+}
+
 static int criu_checkpoint_bos(struct kfd_process *p,
   uint32_t num_bos,
   uint8_t __user *user_bos,
@@ -1997,6 +2026,14 @@ static int criu_checkpoint_bos(struct kfd_process *p,
goto exit;
}
}
+   if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+   ret = 
criu_get_prime_handle(_bo->tbo.base,
+   bo_bucket->alloc_flags &
+   
KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DRM_RDWR : 0,
+   _bucket->dmabuf_fd);
+   if (ret)
+   goto exit;
+   }
if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
KFD_MMAP_GPU_ID(pdd->dev->id);
@@ -2041,6 +2078,10 @@ static int criu_checkpoint_bos(struct kfd_process *p,
*priv_offset += num_bos * sizeof(*bo_privs);
 
 exit:
+   while (ret && bo_index--) {
+   if (bo_buckets[bo_index].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[bo_index].dmabuf_fd);
+   }
 
kvfree(bo_buckets);
kvfree(bo_privs);
@@ -2141,16 +2182,28 @@ static int criu_checkpoint(struct file *filep,
ret = kfd_criu_checkpoint_queues(p, (uint8_t __user 
*)args->priv_data,
 _offset);
if (ret)
-   goto exit_unlock;
+   goto close_bo_fds;
 
ret = kfd_criu_checkpoint_events(p, (uint8_t __user 
*)args->priv_data,
 _offset);
if (ret)
-   goto exit_unlock;
+   goto close_bo_fds;
 
/* TODO: Dump SVM-Ranges */
}
 
+close_bo_fds:
+   if (ret) {
+   /* If IOCTL returns err, user assumes all FDs opened in 
criu_dump_bos are closed */
+   uint32_t i;
+   struct kfd_criu_bo_bucket *bo_buckets = (struct 
kfd_criu_bo_bucket *) args->bos;
+
+   for (i = 0; i < num_bos; i++) {
+   if (bo_buckets[i].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[i].dmabuf_fd);
+   }
+   }
+
 exit_unlock:
mutex_unlock(>mutex);
if (ret)
@@ -2345,6 +2398,7 @@ static int criu

[Patch v5 02/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2022-02-03 Thread Rajneesh Bhardwaj

Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.

This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD only allows ioctl calls from the same
process that opened the KFD file descriptor. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.

(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 98 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 65 +++-
 include/uapi/linux/kfd_ioctl.h   | 81 +++-
 3 files changed, 241 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 214a2c67fba4..90e6d9e335a5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "kfd_priv.h"
@@ -1859,6 +1860,75 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int criu_checkpoint(struct file *filep,
+  struct kfd_process *p,
+  struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_restore(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_unpause(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_resume(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_process_info(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu(struct file *filep, struct kfd_process *p, void 
*data)
+{
+   struct kfd_ioctl_criu_args *args = data;
+   int ret;
+
+   dev_dbg(kfd_device, "CRIU operation: %d\n", args->op);
+   switch (args->op) {
+   case KFD_CRIU_OP_PROCESS_INFO:
+   ret = criu_process_info(filep, p, args);
+   break;
+   case KFD_CRIU_OP_CHECKPOINT:
+   ret = criu_checkpoint(filep, p, args);
+   break;
+   case KFD_CRIU_OP_UNPAUSE:
+   ret = criu_unpause(filep, p, args);
+   break;
+   case KFD_CRIU_OP_RESTORE:
+   ret = criu_restore(filep, p, args);
+   break;
+   case KFD_CRIU_OP_RESUME:
+   ret = criu_resume(filep, p, args);
+   break;
+   default:
+   dev_dbg(kfd_device, "Unsupported CRIU operation:%d\n", 
args->op);
+   ret = -EINVAL;
+   break;
+   }
+
+   if (ret)
+   dev_dbg(kfd_device, "CRIU operation:%d err:%d\n", args->op, 
ret);
+
+   return ret;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -1962,6 +2032,10 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
kfd_ioctl_set_xnack_mode, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_OP,
+   kfd_ioctl_criu, KFD_IOC_FLAG_CHECKPOINT_RESTORE),
+
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -1976,6 +2050,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
char *kdata = NULL;
unsigned int usize, asize;
int retcode = -EINVAL;
+   bool ptrace_attached = false;
 
if (nr >= AMDKFD_CORE_IOCTL_COUNT)
goto err_i1;
@@ -2001,7 +2076,15 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
 * processes need to create their own KFD device con

[Patch v5 06/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2022-02-03 Thread Rajneesh Bhardwaj

This adds support to create userptr BOs on restore and introduces a new
ioctl op to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl op is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  6 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 53 +--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 41 --
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 35 ++--
 5 files changed, 122 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 395ba9566afe..4cb14c2fe53f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -131,6 +131,7 @@ struct amdkfd_process_info {
atomic_t evicted_bos;
struct delayed_work restore_userptr_work;
struct pid *pid;
+   bool block_mmu_notifications;
 };
 
 int amdgpu_amdkfd_init(void);
@@ -268,7 +269,7 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct amdgpu_device *adev, uint64_t va, uint64_t size,
void *drm_priv, struct kgd_mem **mem,
-   uint64_t *offset, uint32_t flags);
+   uint64_t *offset, uint32_t flags, bool criu_resume);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv,
uint64_t *size);
@@ -298,6 +299,9 @@ int amdgpu_amdkfd_get_tile_config(struct amdgpu_device 
*adev,
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev,
bool reset);
 bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem 
*mem);
+void amdgpu_amdkfd_block_mmu_notifications(void *p);
+int amdgpu_amdkfd_criu_resume(void *p);
+
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 3485ef856860..69dc9e4d841c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -842,7 +842,8 @@ static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem 
*mem,
  *
  * Returns 0 for success, negative errno for errors.
  */
-static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
+static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
+  bool criu_resume)
 {
struct amdkfd_process_info *process_info = mem->process_info;
struct amdgpu_bo *bo = mem->bo;
@@ -864,6 +865,18 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
goto out;
}
 
+   if (criu_resume) {
+   /*
+* During a CRIU restore operation, the userptr buffer objects
+* will be validated in the restore_userptr_work worker at a
+* later stage when it is scheduled by another ioctl called by
+* CRIU master process for the target pid for restore.
+*/
+   atomic_inc(>invalid);
+   mutex_unlock(_info->lock);
+   return 0;
+   }
+
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
@@ -1452,10 +1465,39 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv)
return avm->pd_phys_addr;
 }
 
+void amdgpu_amdkfd_block_mmu_notifications(void *p)
+{
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   mutex_lock(>lock);
+   WRITE_ONCE(pinfo->block_mmu_notifications, true);
+   mutex_unlock(>lock);
+}
+
+int amdgpu_amdkfd_criu_resume(void *p)
+{
+   int ret = 0;
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   mutex_lock(>lock);
+   pr_debug("scheduling work\n");
+   atomic_inc(>evicted_bos);
+   if (!READ_ONCE(pinfo->block_mmu_notifications)) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   WRITE_ONCE(pinfo->block_mmu_notifications, false);
+   schedule_delayed_work(>restore_userptr_work, 0);
+
+out_u

[Patch v5 12/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |   2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  73 +++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  12 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   7 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  70 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  71 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  71 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  72 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 157 --
 11 files changed, 516 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d35911550792..999672602252 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -311,7 +311,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 0c50e67e2b51..3a5303ebcabf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL, NULL);
+   , , NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 13317d2c8959..42933610d4e1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -322,7 +322,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -381,8 +382,14 @@ static int create_queue_nocpsch(struct 
device_queue_manager *dqm,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
-   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
-   >gart_mqd_addr, >properties);
+
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
+>properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
+   >gart_mqd_addr, >properties);
+
if (q->properties.is_active) {
if (!dqm->sched_running) {
WARN_ONCE(1, "Load non-HWS mqd while stopped\n");
@@ -1334,7 +1341,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1380,8 +1388,13 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
-   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
-   >gart_mqd_addr, >properties);
+
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
+>properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
+   >gart_mqd_

[Patch v5 07/24] drm/amdkfd: CRIU Implement KFD unpause operation

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO
op the queues will be stay in an evicted state. Once the plugin is done
draining BO contents, it is safe to perform an UNPAUSE op for the queues
to resume.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 37 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  1 +
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 95fc5668195c..6af6deeda523 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2049,6 +2049,14 @@ static int criu_checkpoint(struct file *filep,
goto exit_unlock;
}
 
+   /* Confirm all process queues are evicted */
+   if (!p->queues_paused) {
+   pr_err("Cannot dump process when queues are not in evicted 
state\n");
+   /* CRIU plugin did not call op PROCESS_INFO before 
checkpointing */
+   ret = -EINVAL;
+   goto exit_unlock;
+   }
+
criu_get_process_object_info(p, _bos, _size);
 
if (num_bos != args->num_bos ||
@@ -2388,7 +2396,24 @@ static int criu_unpause(struct file *filep,
struct kfd_process *p,
struct kfd_ioctl_criu_args *args)
 {
-   return 0;
+   int ret;
+
+   mutex_lock(>mutex);
+
+   if (!p->queues_paused) {
+   mutex_unlock(>mutex);
+   return -EINVAL;
+   }
+
+   ret = kfd_process_restore_queues(p);
+   if (ret)
+   pr_err("Failed to unpause queues ret:%d\n", ret);
+   else
+   p->queues_paused = false;
+
+   mutex_unlock(>mutex);
+
+   return ret;
 }
 
 static int criu_resume(struct file *filep,
@@ -2440,6 +2465,12 @@ static int criu_process_info(struct file *filep,
goto err_unlock;
}
 
+   ret = kfd_process_evict_queues(p);
+   if (ret)
+   goto err_unlock;
+
+   p->queues_paused = true;
+
args->pid = task_pid_nr_ns(p->lead_thread,
task_active_pid_ns(p->lead_thread));
 
@@ -2447,6 +2478,10 @@ static int criu_process_info(struct file *filep,
 
dev_dbg(kfd_device, "Num of bos:%u\n", args->num_bos);
 err_unlock:
+   if (ret) {
+   kfd_process_restore_queues(p);
+   p->queues_paused = false;
+   }
mutex_unlock(>mutex);
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 9b347247055c..677f21447112 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -877,6 +877,8 @@ struct kfd_process {
bool xnack_enabled;
 
atomic_t poison;
+   /* Queues are in paused stated because we are in the process of doing a 
CRIU checkpoint */
+   bool queues_paused;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b3198e186622..0649064b8e95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1384,6 +1384,7 @@ static struct kfd_process *create_process(const struct 
task_struct *thread)
process->mm = thread->mm;
process->lead_thread = thread->group_leader;
process->n_pdds = 0;
+   process->queues_paused = false;
INIT_DELAYED_WORK(>eviction_work, evict_process_worker);
INIT_DELAYED_WORK(>restore_work, restore_process_worker);
process->last_restore_timestamp = get_jiffies_64();
-- 
2.17.1

[Patch v5 05/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2022-02-03 Thread Rajneesh Bhardwaj

This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to the checkpointed image.
This ioctl creates various types of buffer objects such as VRAM,
MMIO, Doorbell, GTT based on the date sent from the userspace plugin.
The data mostly contains the previously checkpointed KFD images from
some KFD processs.

While restoring a criu process, attach old IDR values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++-
 1 file changed, 297 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 17a937b7139f..342fc56b1940 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2078,11 +2078,307 @@ static int criu_checkpoint(struct file *filep,
return ret;
 }
 
+static int criu_restore_process(struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   int ret = 0;
+   struct kfd_criu_process_priv_data process_priv;
+
+   if (*priv_offset + sizeof(process_priv) > max_priv_data_size)
+   return -EINVAL;
+
+   ret = copy_from_user(_priv,
+   (void __user *)(args->priv_data + *priv_offset),
+   sizeof(process_priv));
+   if (ret) {
+   pr_err("Failed to copy process private information from 
user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_offset += sizeof(process_priv);
+
+   if (process_priv.version != KFD_CRIU_PRIV_VERSION) {
+   pr_err("Invalid CRIU API version (checkpointed:%d 
current:%d)\n",
+   process_priv.version, KFD_CRIU_PRIV_VERSION);
+   return -EINVAL;
+   }
+
+exit:
+   return ret;
+}
+
+static int criu_restore_bos(struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   struct kfd_criu_bo_priv_data *bo_privs;
+   bool flush_tlbs = false;
+   int ret = 0, j = 0;
+   uint32_t i;
+
+   if (*priv_offset + (args->num_bos * sizeof(*bo_privs)) > 
max_priv_data_size)
+   return -EINVAL;
+
+   bo_buckets = kvmalloc_array(args->num_bos, sizeof(*bo_buckets), 
GFP_KERNEL);
+   if (!bo_buckets)
+   return -ENOMEM;
+
+   ret = copy_from_user(bo_buckets, (void __user *)args->bos,
+args->num_bos * sizeof(*bo_buckets));
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+
+   bo_privs = kvmalloc_array(args->num_bos, sizeof(*bo_privs), GFP_KERNEL);
+   if (!bo_privs) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   ret = copy_from_user(bo_privs, (void __user *)args->priv_data + 
*priv_offset,
+args->num_bos * sizeof(*bo_privs));
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_offset += args->num_bos * sizeof(*bo_privs);
+
+   /* Create and map new BOs */
+   for (i = 0; i < args->num_bos; i++) {
+   struct kfd_criu_bo_bucket *bo_bucket;
+   struct kfd_criu_bo_priv_data *bo_priv;
+   struct kfd_dev *dev;
+   struct kfd_process_device *pdd;
+   void *mem;
+   u64 offset;
+   int idr_handle;
+
+   bo_bucket = _buckets[i];
+   bo_priv = _privs[i];
+
+   dev = kfd_device_by_id(bo_bucket->gpu_id);
+   if (!dev) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+   pdd = kfd_get_process_device_data(dev, p);
+   if (!pdd) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+
+   pr_debug("kfd restore ioctl - bo_bucket[%d]:\n", i);
+   pr_debug("size = 0x%llx, bo_addr = 0x%llx bo_offset = 0x%llx\n"
+   "gpu_id = 0x%x alloc_flags = 0x%x\n&q

[Patch v5 08/24] drm/amdkfd: CRIU add queues support

2022-02-03 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 110 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  43 +++-
 .../amd/amdkfd/kfd_process_queue_manager.c| 208 ++
 3 files changed, 353 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6af6deeda523..d049f9cbbc79 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2015,19 +2015,36 @@ static int criu_checkpoint_bos(struct kfd_process *p,
return ret;
 }
 
-static void criu_get_process_object_info(struct kfd_process *p,
-uint32_t *num_bos,
-uint64_t *objs_priv_size)
+static int criu_get_process_object_info(struct kfd_process *p,
+   uint32_t *num_bos,
+   uint32_t *num_objects,
+   uint64_t *objs_priv_size)
 {
+   int ret;
uint64_t priv_size;
+   uint32_t num_queues, num_events, num_svm_ranges;
+   uint64_t queues_priv_data_size;
 
*num_bos = get_process_num_bos(p);
 
+   ret = kfd_process_get_queue_info(p, _queues, 
_priv_data_size);
+   if (ret)
+   return ret;
+
+   num_events = 0; /* TODO: Implement Events */
+   num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
+
+   *num_objects = num_queues + num_events + num_svm_ranges;
+
if (objs_priv_size) {
priv_size = sizeof(struct kfd_criu_process_priv_data);
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
+   priv_size += queues_priv_data_size;
+   /* TODO: Add Events priv size */
+   /* TODO: Add SVM ranges priv size */
*objs_priv_size = priv_size;
}
+   return 0;
 }
 
 static int criu_checkpoint(struct file *filep,
@@ -2035,7 +2052,7 @@ static int criu_checkpoint(struct file *filep,
   struct kfd_ioctl_criu_args *args)
 {
int ret;
-   uint32_t num_bos;
+   uint32_t num_bos, num_objects;
uint64_t priv_size, priv_offset = 0;
 
if (!args->bos || !args->priv_data)
@@ -2057,9 +2074,12 @@ static int criu_checkpoint(struct file *filep,
goto exit_unlock;
}
 
-   criu_get_process_object_info(p, _bos, _size);
+   ret = criu_get_process_object_info(p, _bos, _objects, 
_size);
+   if (ret)
+   goto exit_unlock;
 
if (num_bos != args->num_bos ||
+   num_objects != args->num_objects ||
priv_size != args->priv_data_size) {
 
ret = -EINVAL;
@@ -2076,6 +2096,17 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto exit_unlock;
 
+   if (num_objects) {
+   ret = kfd_criu_checkpoint_queues(p, (uint8_t __user 
*)args->priv_data,
+_offset);
+   if (ret)
+   goto exit_unlock;
+
+   /* TODO: Dump Events */
+
+   /* TODO: Dump SVM-Ranges */
+   }
+
 exit_unlock:
mutex_unlock(>mutex);
if (ret)
@@ -2344,6 +2375,62 @@ static int criu_restore_bos(struct kfd_process *p,
return ret;
 }
 
+static int criu_restore_objects(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   int ret = 0;
+   uint32_t i;
+
+   BUILD_BUG_ON(offsetof(struct kfd_criu_queue_priv_data, object_type));
+   BUILD_BUG_ON(offsetof(struct kfd_criu_event_priv_data, object_type));
+   BUILD_BUG_ON(offsetof(struct kfd_criu_svm_range_priv_data, 
object_type));
+
+   for (i = 0; i < args->num_objects; i++) {
+   uint32_t object_type;
+
+   if (*priv_offset + sizeof(object_type) > max_priv_data_size) {
+   pr_err("Invalid private data size\n");
+   return -EINVAL;
+   }
+
+   ret = get_user(object_type, (uint32_t __user *)(args->priv_data 
+ *priv_offset));
+   if (ret) {
+   pr_err("Failed to copy private information from 
user\n");
+   goto exit;
+   }
+
+   switch (object_type) {
+   case KFD_CRIU_OBJECT_TYPE_QUEUE:
+   ret = kfd_criu_restore_queue(p, (uint8_t __user 
*)args->priv_data,
+

[Patch v5 04/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2022-02-03 Thread Rajneesh Bhardwaj

This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  11 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  20 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 177 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   4 +-
 6 files changed, 213 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ac841ae8f5cc..395ba9566afe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -297,6 +297,7 @@ int amdgpu_amdkfd_get_tile_config(struct amdgpu_device 
*adev,
struct tile_config *config);
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev,
bool reset);
+bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem 
*mem);
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 5df387c4d7fb..3485ef856860 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2629,3 +2629,14 @@ int amdgpu_amdkfd_get_tile_config(struct amdgpu_device 
*adev,
 
return 0;
 }
+
+bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem 
*mem)
+{
+   struct kfd_mem_attachment *entry;
+
+   list_for_each_entry(entry, >attachments, list) {
+   if (entry->is_mapped && entry->adev == adev)
+   return true;
+   }
+   return false;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b9637d1cf147..5a32ee66d8c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1127,6 +1127,26 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device 
*bdev,
return ttm_pool_free(>mman.bdev.pool, ttm);
 }
 
+/**
+ * amdgpu_ttm_tt_get_userptr - Return the userptr GTT ttm_tt for the current
+ * task
+ *
+ * @tbo: The ttm_buffer_object that contains the userptr
+ * @user_addr:  The returned value
+ */
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr)
+{
+   struct amdgpu_ttm_tt *gtt;
+
+   if (!tbo->ttm)
+   return -EINVAL;
+
+   gtt = (void *)tbo->ttm;
+   *user_addr = gtt->userptr;
+   return 0;
+}
+
 /**
  * amdgpu_ttm_tt_set_userptr - Initialize userptr GTT ttm_tt for the current
  * task
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index d9691f262f16..39d966e7185d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -181,6 +181,8 @@ static inline bool amdgpu_ttm_tt_get_user_pages_done(struct 
ttm_tt *ttm)
 #endif
 
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr);
 int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo,
  uint64_t addr, uint32_t flags);
 bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 29443419bbf0..17a937b7139f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1860,6 +1860,29 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int criu_checkpoint_process(struct kfd_process *p,
+uint8_t __user *user_priv_data,
+uint64_t *priv_offset)
+{
+   struct kfd_criu_process_priv_data process_priv;
+   int ret;
+
+   memset(_priv, 0, sizeof(process_priv));
+
+   process_priv.version = KFD_CRIU_PRIV_VERSION;
+
+   ret = copy_to_user(user_priv_data + *priv_offset,
+   _priv, sizeof(process_priv));
+
+   if (ret) {
+   pr_err("Failed to copy process information to user\n");
+   ret = -EFAULT;
+   }
+
+   *priv_offset += sizeof(process_priv);
+   return ret;
+}
+
 uint32_t get_process_num_bos(struct kfd_pro

[Patch v5 03/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2022-02-03 Thread Rajneesh Bhardwaj

This IOCTL op is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
IOCTL op.

The process_info IOCTL op determines the number of GPUs, buffer objects
that are associated with the target process, its process id in
caller's namespace since /proc/pid/mem interface maybe used to drain
the contents of the discovered buffer objects in userspace and getpid
returns the pid of CRIU dumper process. Also the pid of a process
inside a container might be different than its global pid so return
the ns pid.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 +++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 90e6d9e335a5..29443419bbf0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1860,6 +1860,42 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+uint32_t get_process_num_bos(struct kfd_process *p)
+{
+   uint32_t num_of_bos = 0;
+   int i;
+
+   /* Run over all PDDs of the process */
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+   void *mem;
+   int id;
+
+   idr_for_each_entry(>alloc_idr, mem, id) {
+   struct kgd_mem *kgd_mem = (struct kgd_mem *)mem;
+
+   if ((uint64_t)kgd_mem->va > pdd->gpuvm_base)
+   num_of_bos++;
+   }
+   }
+   return num_of_bos;
+}
+
+static void criu_get_process_object_info(struct kfd_process *p,
+uint32_t *num_bos,
+uint64_t *objs_priv_size)
+{
+   uint64_t priv_size;
+
+   *num_bos = get_process_num_bos(p);
+
+   if (objs_priv_size) {
+   priv_size = sizeof(struct kfd_criu_process_priv_data);
+   priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
+   *objs_priv_size = priv_size;
+   }
+}
+
 static int criu_checkpoint(struct file *filep,
   struct kfd_process *p,
   struct kfd_ioctl_criu_args *args)
@@ -1892,7 +1928,25 @@ static int criu_process_info(struct file *filep,
struct kfd_process *p,
struct kfd_ioctl_criu_args *args)
 {
-   return 0;
+   int ret = 0;
+
+   mutex_lock(>mutex);
+
+   if (!p->n_pdds) {
+   pr_err("No pdd for given process\n");
+   ret = -ENODEV;
+   goto err_unlock;
+   }
+
+   args->pid = task_pid_nr_ns(p->lead_thread,
+   task_active_pid_ns(p->lead_thread));
+
+   criu_get_process_object_info(p, >num_bos, >priv_data_size);
+
+   dev_dbg(kfd_device, "Num of bos:%u\n", args->num_bos);
+err_unlock:
+   mutex_unlock(>mutex);
+   return ret;
 }
 
 static int kfd_ioctl_criu(struct file *filep, struct kfd_process *p, void 
*data)
-- 
2.17.1

[Patch v5 01/24] x86/configs: CRIU update debug rock defconfig

2022-02-03 Thread Rajneesh Bhardwaj

 - Update debug config for Checkpoint-Restore (CR) support
 - Also include necessary options for CR with docker containers.

Reviewed-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
---
 arch/x86/configs/rock-dbg_defconfig | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/arch/x86/configs/rock-dbg_defconfig 
b/arch/x86/configs/rock-dbg_defconfig
index 4877da183599..bc2a34666c1d 100644
--- a/arch/x86/configs/rock-dbg_defconfig
+++ b/arch/x86/configs/rock-dbg_defconfig
@@ -249,6 +249,7 @@ CONFIG_KALLSYMS_ALL=y
 CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
 CONFIG_KALLSYMS_BASE_RELATIVE=y
 # CONFIG_USERFAULTFD is not set
+CONFIG_USERFAULTFD=y
 CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
 CONFIG_KCMP=y
 CONFIG_RSEQ=y
@@ -1015,6 +1016,11 @@ CONFIG_PACKET_DIAG=y
 CONFIG_UNIX=y
 CONFIG_UNIX_SCM=y
 CONFIG_UNIX_DIAG=y
+CONFIG_SMC_DIAG=y
+CONFIG_XDP_SOCKETS_DIAG=y
+CONFIG_INET_MPTCP_DIAG=y
+CONFIG_TIPC_DIAG=y
+CONFIG_VSOCKETS_DIAG=y
 # CONFIG_TLS is not set
 CONFIG_XFRM=y
 CONFIG_XFRM_ALGO=y
@@ -1052,15 +1058,17 @@ CONFIG_SYN_COOKIES=y
 # CONFIG_NET_IPVTI is not set
 # CONFIG_NET_FOU is not set
 # CONFIG_NET_FOU_IP_TUNNELS is not set
-# CONFIG_INET_AH is not set
-# CONFIG_INET_ESP is not set
-# CONFIG_INET_IPCOMP is not set
-CONFIG_INET_TUNNEL=y
-CONFIG_INET_DIAG=y
-CONFIG_INET_TCP_DIAG=y
-# CONFIG_INET_UDP_DIAG is not set
-# CONFIG_INET_RAW_DIAG is not set
-# CONFIG_INET_DIAG_DESTROY is not set
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_ESP_OFFLOAD=m
+CONFIG_INET_TUNNEL=m
+CONFIG_INET_XFRM_TUNNEL=m
+CONFIG_INET_DIAG=m
+CONFIG_INET_TCP_DIAG=m
+CONFIG_INET_UDP_DIAG=m
+CONFIG_INET_RAW_DIAG=m
+CONFIG_INET_DIAG_DESTROY=y
 CONFIG_TCP_CONG_ADVANCED=y
 # CONFIG_TCP_CONG_BIC is not set
 CONFIG_TCP_CONG_CUBIC=y
@@ -1085,12 +1093,14 @@ CONFIG_TCP_MD5SIG=y
 CONFIG_IPV6=y
 # CONFIG_IPV6_ROUTER_PREF is not set
 # CONFIG_IPV6_OPTIMISTIC_DAD is not set
-CONFIG_INET6_AH=y
-CONFIG_INET6_ESP=y
-# CONFIG_INET6_ESP_OFFLOAD is not set
-# CONFIG_INET6_ESPINTCP is not set
-# CONFIG_INET6_IPCOMP is not set
-# CONFIG_IPV6_MIP6 is not set
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_ESP_OFFLOAD=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_INET6_XFRM_TUNNEL=m
+CONFIG_INET_DCCP_DIAG=m
+CONFIG_INET_SCTP_DIAG=m
 # CONFIG_IPV6_ILA is not set
 # CONFIG_IPV6_VTI is not set
 CONFIG_IPV6_SIT=y
@@ -1146,8 +1156,13 @@ CONFIG_NF_CT_PROTO_UDPLITE=y
 # CONFIG_NF_CONNTRACK_SANE is not set
 # CONFIG_NF_CONNTRACK_SIP is not set
 # CONFIG_NF_CONNTRACK_TFTP is not set
-# CONFIG_NF_CT_NETLINK is not set
-# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
+CONFIG_COMPAT_NETLINK_MESSAGES=y
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NF_CT_NETLINK_TIMEOUT=m
+CONFIG_NF_CT_NETLINK_HELPER=m
+CONFIG_NETFILTER_NETLINK_GLUE_CT=y
+CONFIG_SCSI_NETLINK=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
 CONFIG_NF_NAT=m
 CONFIG_NF_NAT_REDIRECT=y
 CONFIG_NF_NAT_MASQUERADE=y
@@ -1992,7 +2007,7 @@ CONFIG_NETCONSOLE_DYNAMIC=y
 CONFIG_NETPOLL=y
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_RIONET is not set
-# CONFIG_TUN is not set
+CONFIG_TUN=y
 # CONFIG_TUN_VNET_CROSS_LE is not set
 CONFIG_VETH=y
 # CONFIG_NLMON is not set
@@ -3990,7 +4005,7 @@ CONFIG_MANDATORY_FILE_LOCKING=y
 CONFIG_FSNOTIFY=y
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY_USER=y
-# CONFIG_FANOTIFY is not set
+CONFIG_FANOTIFY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
-- 
2.17.1

[Patch v5 00/24] CHECKPOINT RESTORE WITH ROCm

2022-02-03 Thread Rajneesh Bhardwaj

V5: Proposed IOCTL APIs for CRIU with consolidated feedback

CRIU is a user space tool which is very popular for container live
migration in datacentres. It can checkpoint a running application, save
its complete state, memory contents and all system resources to images
on disk which can be migrated to another m achine and restored later.
More information on CRIU can be found at https://criu.org/Main_Page

CRIU currently does not support Checkpoint / Restore with applications
that have devices files open so it cannot perform checkpoint and restore
on GPU devices which are very complex and have their own VRAM managed
privately. CRIU, however can support external devices by using a plugin
architecture. We feel that we are getting close to finalizing our IOCTL
APIs which were again changed since V3 for an improved modular design.

Our changes to CRIU user space  are can be obtained from here:
https://github.com/RadeonOpenCompute/criu/tree/amdgpu_rfc-211222

We have tested the following scenarios:
 - Checkpoint / Restore of a Pytorch (BERT) workload
 - kfdtests with queues and events
 - Gfx9 and Gfx10 based multi GPU test systems 
 - On baremetal and inside a docker container
 - Restoring on a different system

V1: Initial
V2: Addressed review comments
V3: Rebased on latest amd-staging-drm-next (5.15 based)
v4: New API design and basic support for SVM, however there is an
outstanding issue with SVM restore which is currently under debug and
hopefully that won't impact the ioctl APIs as SVMs are treated as
private data hidden from user space like queues and events with the new
approch.
V5: Fix the SVM related issues and finalize the APIs. 

David Yat Sin (9):
  drm/amdkfd: CRIU Implement KFD unpause operation
  drm/amdkfd: CRIU add queues support
  drm/amdkfd: CRIU restore queue ids
  drm/amdkfd: CRIU restore sdma id for queues
  drm/amdkfd: CRIU restore queue doorbell id
  drm/amdkfd: CRIU checkpoint and restore queue mqds
  drm/amdkfd: CRIU checkpoint and restore queue control stack
  drm/amdkfd: CRIU checkpoint and restore events
  drm/amdkfd: CRIU implement gpu_id remapping

Rajneesh Bhardwaj (15):
  x86/configs: CRIU update debug rock defconfig
  drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  drm/amdkfd: CRIU Implement KFD process_info ioctl
  drm/amdkfd: CRIU Implement KFD checkpoint ioctl
  drm/amdkfd: CRIU Implement KFD restore ioctl
  drm/amdkfd: CRIU Implement KFD resume ioctl
  drm/amdkfd: CRIU export BOs as prime dmabuf objects
  drm/amdkfd: CRIU checkpoint and restore xnack mode
  drm/amdkfd: CRIU allow external mm for svm ranges
  drm/amdkfd: use user_gpu_id for svm ranges
  drm/amdkfd: CRIU Discover svm ranges
  drm/amdkfd: CRIU Save Shared Virtual Memory ranges
  drm/amdkfd: CRIU prepare for svm resume
  drm/amdkfd: CRIU resume shared virtual memory ranges
  drm/amdkfd: Bump up KFD API version for CRIU

 arch/x86/configs/rock-dbg_defconfig   |   53 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|7 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   64 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   20 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 1471 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  185 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  313 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   14 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |   75 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |   77 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |   92 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |   84 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  160 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   72 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  372 -
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  331 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h  |   39 +
 include/uapi/linux/kfd_ioctl.h|   84 +-
 21 files changed, 3193 insertions(+), 340 deletions(-)

-- 
2.17.1

[Patch v4 12/24] drm/amdkfd: CRIU restore queue doorbell id

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.

Signed-off-by: David Yat Sin 

---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +--
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7e49f70b81b9..a0f5b8533a03 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -153,7 +153,13 @@ static void decrement_queue_count(struct 
device_queue_manager *dqm,
dqm->active_cp_queue_count--;
 }
 
-static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+/*
+ * Allocate a doorbell ID to this queue.
+ * If doorbell_id is passed in, make sure requested ID is valid then allocate 
it.
+ */
+static int allocate_doorbell(struct qcm_process_device *qpd,
+struct queue *q,
+uint32_t const *restore_id)
 {
struct kfd_dev *dev = qpd->dqm->dev;
 
@@ -161,6 +167,10 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
/* On pre-SOC15 chips we need to use the queue ID to
 * preserve the user mode ABI.
 */
+
+   if (restore_id && *restore_id != q->properties.queue_id)
+   return -EINVAL;
+
q->doorbell_id = q->properties.queue_id;
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
@@ -169,25 +179,37 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
 * The doobell index distance between RLC (2*i) and (2*i+1)
 * for a SDMA engine is 512.
 */
-   uint32_t *idx_offset =
-   dev->shared_resources.sdma_doorbell_idx;
 
-   q->doorbell_id = idx_offset[q->properties.sdma_engine_id]
-   + (q->properties.sdma_queue_id & 1)
-   * KFD_QUEUE_DOORBELL_MIRROR_OFFSET
-   + (q->properties.sdma_queue_id >> 1);
+   uint32_t *idx_offset = dev->shared_resources.sdma_doorbell_idx;
+   uint32_t valid_id = idx_offset[q->properties.sdma_engine_id]
+   + (q->properties.sdma_queue_id 
& 1)
+   * 
KFD_QUEUE_DOORBELL_MIRROR_OFFSET
+   + (q->properties.sdma_queue_id 
>> 1);
+
+   if (restore_id && *restore_id != valid_id)
+   return -EINVAL;
+   q->doorbell_id = valid_id;
} else {
-   /* For CP queues on SOC15 reserve a free doorbell ID */
-   unsigned int found;
-
-   found = find_first_zero_bit(qpd->doorbell_bitmap,
-   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
-   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
-   pr_debug("No doorbells available");
-   return -EBUSY;
+   /* For CP queues on SOC15 */
+   if (restore_id) {
+   /* make sure that ID is free  */
+   if (__test_and_set_bit(*restore_id, 
qpd->doorbell_bitmap))
+   return -EINVAL;
+
+   q->doorbell_id = *restore_id;
+   } else {
+   /* or reserve a free doorbell ID */
+   unsigned int found;
+
+   found = find_first_zero_bit(qpd->doorbell_bitmap,
+   
KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+   pr_debug("No doorbells available");
+   return -EBUSY;
+   }
+   set_bit(found, qpd->doorbell_bitmap);
+   q->doorbell_id = found;
}
-   set_bit(found, qpd->doorbell_bitmap);
-   q->doorbell_id = found;
}
 
q->properties.doorbell_off =
@@ -355,7 +377,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? >doorbell_id : NULL);
if (retval)
goto out_deallocate_hqd;
 
@@ -1338,7 +1360,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
goto out;
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? >doorbell_id :

[Patch v4 19/24] drm/amdkfd: CRIU allow external mm for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj

Both svm_range_get_attr and svm_range_set_attr helpers use mm struct
from current but for a Checkpoint or Restore operation, the current->mm
will fetch the mm for the CRIU master process. So modify these helpers to
accept the task mm for a target kfd process to support Checkpoint
Restore.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 88360f23eb61..7c92116153fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3134,11 +3134,11 @@ static void svm_range_evict_svm_bo_worker(struct 
work_struct *work)
 }
 
 static int
-svm_range_set_attr(struct kfd_process *p, uint64_t start, uint64_t size,
-  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
+  uint64_t start, uint64_t size, uint32_t nattr,
+  struct kfd_ioctl_svm_attribute *attrs)
 {
struct amdkfd_process_info *process_info = p->kgd_process_info;
-   struct mm_struct *mm = current->mm;
struct list_head update_list;
struct list_head insert_list;
struct list_head remove_list;
@@ -3242,8 +3242,9 @@ svm_range_set_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
 }
 
 static int
-svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size,
-  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+svm_range_get_attr(struct kfd_process *p, struct mm_struct *mm,
+  uint64_t start, uint64_t size, uint32_t nattr,
+  struct kfd_ioctl_svm_attribute *attrs)
 {
DECLARE_BITMAP(bitmap_access, MAX_GPU_INSTANCE);
DECLARE_BITMAP(bitmap_aip, MAX_GPU_INSTANCE);
@@ -3253,7 +3254,6 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
bool get_accessible = false;
bool get_flags = false;
uint64_t last = start + size - 1UL;
-   struct mm_struct *mm = current->mm;
uint8_t granularity = 0xff;
struct interval_tree_node *node;
struct svm_range_list *svms;
@@ -3422,6 +3422,7 @@ int
 svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op op, uint64_t start,
  uint64_t size, uint32_t nattrs, struct kfd_ioctl_svm_attribute *attrs)
 {
+   struct mm_struct *mm = current->mm;
int r;
 
start >>= PAGE_SHIFT;
@@ -3429,10 +3430,10 @@ svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op 
op, uint64_t start,
 
switch (op) {
case KFD_IOCTL_SVM_OP_SET_ATTR:
-   r = svm_range_set_attr(p, start, size, nattrs, attrs);
+   r = svm_range_set_attr(p, mm, start, size, nattrs, attrs);
break;
case KFD_IOCTL_SVM_OP_GET_ATTR:
-   r = svm_range_get_attr(p, start, size, nattrs, attrs);
+   r = svm_range_get_attr(p, mm, start, size, nattrs, attrs);
break;
default:
r = EINVAL;
-- 
2.17.1

[Patch v4 16/24] drm/amdkfd: CRIU implement gpu_id remapping

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 465 --
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  45 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  11 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  32 ++
 .../amd/amdkfd/kfd_process_queue_manager.c|  18 +-
 5 files changed, 412 insertions(+), 159 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 08467fa2f514..20652d488cde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -294,18 +294,20 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
return err;
 
pr_debug("Looking for gpu id 0x%x\n", args->gpu_id);
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev) {
-   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
-   return -EINVAL;
-   }
 
mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto err_unlock;
+   }
+   dev = pdd->dev;
 
pdd = kfd_bind_process_to_device(dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
-   goto err_bind_process;
+   goto err_unlock;
}
 
pr_debug("Creating queue for PASID 0x%x on gpu 0x%x\n",
@@ -315,7 +317,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL, NULL,
_offset_in_process);
if (err != 0)
-   goto err_create_queue;
+   goto err_unlock;
 
args->queue_id = queue_id;
 
@@ -344,8 +346,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
 
return 0;
 
-err_create_queue:
-err_bind_process:
+err_unlock:
mutex_unlock(>mutex);
return err;
 }
@@ -492,7 +493,6 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_memory_policy_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
enum cache_policy default_policy, alternate_policy;
@@ -507,13 +507,15 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
return -EINVAL;
}
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto out;
+   }
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -526,7 +528,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
(args->alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
   ? cache_policy_coherent : cache_policy_noncoherent;
 
-   if (!dev->dqm->ops.set_cache_memory_policy(dev->dqm,
+   if (!pdd->dev->dqm->ops.set_cache_memory_policy(pdd->dev->dqm,
>qpd,
default_policy,
alternate_policy,
@@ -544,17 +546,18 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_trap_handler_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(>mutex);
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   err = -EINVAL;
+   goto out;
+   }
+
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -578,16 +581,20 @@ static int kfd_ioctl_dbg_register(struct file *filep,

[Patch v4 22/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2021-12-22 Thread Rajneesh Bhardwaj

During checkpoint stage, save the shared virtual memory ranges and
attributes for the target process. A process may contain a number of svm
ranges and each range might contain a number of arrtibutes. While not
all attributes may be applicable for a given prange but during
checkpoint we store all possible values for the max possible attribute
types.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1c25d5e9067c..916b8d000317 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2186,7 +2186,9 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto close_bo_fds;
 
-   /* TODO: Dump SVM-Ranges */
+   ret = kfd_criu_checkpoint_svm(p, (uint8_t __user 
*)args->priv_data, _offset);
+   if (ret)
+   goto close_bo_fds;
}
 
 close_bo_fds:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 49e05fb5c898..6d59f1bedcf2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3478,6 +3478,101 @@ int svm_range_get_info(struct kfd_process *p, uint32_t 
*num_svm_ranges,
return 0;
 }
 
+int kfd_criu_checkpoint_svm(struct kfd_process *p,
+   uint8_t __user *user_priv_data,
+   uint64_t *priv_data_offset)
+{
+   struct kfd_criu_svm_range_priv_data *svm_priv = NULL;
+   struct kfd_ioctl_svm_attribute *query_attr = NULL;
+   uint64_t svm_priv_data_size, query_attr_size = 0;
+   int index, nattr_common = 4, ret = 0;
+   struct svm_range_list *svms;
+   int num_devices = p->n_pdds;
+   struct svm_range *prange;
+   struct mm_struct *mm;
+
+   svms = >svms;
+   if (!svms)
+   return -EINVAL;
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_err("failed to get mm for the target process\n");
+   return -ESRCH;
+   }
+
+   query_attr_size = sizeof(struct kfd_ioctl_svm_attribute) *
+   (nattr_common + num_devices);
+
+   query_attr = kzalloc(query_attr_size, GFP_KERNEL);
+   if (!query_attr) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   query_attr[0].type = KFD_IOCTL_SVM_ATTR_PREFERRED_LOC;
+   query_attr[1].type = KFD_IOCTL_SVM_ATTR_PREFETCH_LOC;
+   query_attr[2].type = KFD_IOCTL_SVM_ATTR_SET_FLAGS;
+   query_attr[3].type = KFD_IOCTL_SVM_ATTR_GRANULARITY;
+
+   for (index = 0; index < num_devices; index++) {
+   struct kfd_process_device *pdd = p->pdds[index];
+
+   query_attr[index + nattr_common].type =
+   KFD_IOCTL_SVM_ATTR_ACCESS;
+   query_attr[index + nattr_common].value = pdd->user_gpu_id;
+   }
+
+   svm_priv_data_size = sizeof(*svm_priv) + query_attr_size;
+
+   svm_priv = kzalloc(svm_priv_data_size, GFP_KERNEL);
+   if (!svm_priv) {
+   ret = -ENOMEM;
+   goto exit_query;
+   }
+
+   index = 0;
+   list_for_each_entry(prange, >list, list) {
+
+   svm_priv->object_type = KFD_CRIU_OBJECT_TYPE_SVM_RANGE;
+   svm_priv->start_addr = prange->start;
+   svm_priv->size = prange->npages;
+   memcpy(_priv->attrs, query_attr, query_attr_size);
+   pr_debug("CRIU: prange: 0x%p start: 0x%lx\t npages: 0x%llx end: 
0x%llx\t size: 0x%llx\n",
+prange, prange->start, prange->npages,
+prange->start + prange->npages - 1,
+prange->npages * PAGE_SIZE);
+
+   ret = svm_range_get_attr(p, mm, svm_priv->start_addr,
+svm_priv->size,
+(nattr_common + num_devices),
+svm_priv->attrs);
+   if (ret) {
+   pr_err("CRIU: failed to obtain range attributes\n");
+   goto exit_priv;
+   }
+
+   ret = copy_to_user(user_priv_data + *priv_data_offset,
+  svm_priv, svm_priv_data_size);
+   if (ret) {
+   pr_err("Failed to copy svm priv to user\n");
+   goto exit_priv;
+   }
+
+   *priv_data_offset += svm_priv_data_size;
+
+   }
+
+
+exit_priv:
+   kfree(svm_priv);
+exit_query:
+   kfree(query_attr);
+exit

[Patch v4 23/24] drm/amdkfd: CRIU prepare for svm resume

2021-12-22 Thread Rajneesh Bhardwaj

During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an imminent CRIU
resume phase.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 99 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 12 +++
 4 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 916b8d000317..f7aa15b18f95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2638,8 +2638,8 @@ static int criu_restore_objects(struct file *filep,
goto exit;
break;
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
-   /* TODO: Implement SVM range */
-   *priv_offset += sizeof(struct 
kfd_criu_svm_range_priv_data);
+   ret = kfd_criu_restore_svm(p, (uint8_t __user 
*)args->priv_data,
+priv_offset, 
max_priv_data_size);
if (ret)
goto exit;
break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 87eb6739a78e..92191c541c29 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -790,6 +790,7 @@ struct svm_range_list {
struct list_headlist;
struct work_struct  deferred_list_work;
struct list_headdeferred_range_list;
+   struct list_headcriu_svm_metadata_list;
spinlock_t  deferred_list_lock;
atomic_tevicted_ranges;
booldrain_pagefaults;
@@ -1148,6 +1149,10 @@ int kfd_criu_restore_event(struct file *devkfd,
   uint8_t __user *user_priv_data,
   uint64_t *priv_data_offset,
   uint64_t max_priv_data_size);
+int kfd_criu_restore_svm(struct kfd_process *p,
+uint8_t __user *user_priv_data,
+uint64_t *priv_data_offset,
+uint64_t max_priv_data_size);
 /* CRIU - End */
 
 /* Queue Context Management */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 6d59f1bedcf2..e9f6c63c2a26 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -45,6 +45,14 @@
  */
 #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING   2000
 
+struct criu_svm_metadata {
+   struct list_head list;
+   __u64 start_addr;
+   __u64 size;
+   /* Variable length array of attributes */
+   struct kfd_ioctl_svm_attribute attrs[0];
+};
+
 static void svm_range_evict_svm_bo_worker(struct work_struct *work);
 static bool
 svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
@@ -2753,6 +2761,7 @@ int svm_range_list_init(struct kfd_process *p)
INIT_DELAYED_WORK(>restore_work, svm_range_restore_work);
INIT_WORK(>deferred_list_work, svm_range_deferred_list_work);
INIT_LIST_HEAD(>deferred_range_list);
+   INIT_LIST_HEAD(>criu_svm_metadata_list);
spin_lock_init(>deferred_list_lock);
 
for (i = 0; i < p->n_pdds; i++)
@@ -3418,6 +3427,96 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int svm_criu_prepare_for_resume(struct kfd_process *p,
+   struct kfd_criu_svm_range_priv_data *svm_priv)
+{
+   int nattr_common = 4, nattr_accessibility = 1;
+   struct criu_svm_metadata *criu_svm_md = NULL;
+   uint64_t svm_attrs_size, svm_object_md_size;
+   struct svm_range_list *svms = >svms;
+   int num_devices = p->n_pdds;
+   int i, ret = 0;
+
+   svm_attrs_size = sizeof(struct kfd_ioctl_svm_attribute) *
+   (nattr_common + nattr_accessibility * num_devices);
+   svm_object_md_size = sizeof(struct criu_svm_metadata) + svm_attrs_size;
+
+   criu_svm_md = kzalloc(svm_object_md_size, GFP_KERNEL);
+   if (!criu_svm_md) {
+   pr_err("failed to allocate memory to store svm metadata\n");
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   criu_svm_md->start_addr = svm_priv->start_addr;
+   criu_svm_md->size = svm_priv->size;
+   for (i = 0; i < svm_attrs_size; i++)
+   {
+   criu_svm_md->attrs[i].type = svm_priv->attrs[i].type;
+   criu_svm_md->attrs[i].value = svm_priv->attrs[i].value;
+   }
+
+   list_add_tail

[Patch v4 14/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 23 ---
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  9 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  | 11 +++-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 13 ++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 14 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 29 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 22 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
 .../amd/amdkfd/kfd_process_queue_manager.c| 62 +--
 11 files changed, 139 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 146879cd3f2b..582b4a393f95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL, NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 3a5303ebcabf..8eca9ed3ab36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL, NULL, NULL);
+   , , NULL, NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a92274f9f1f7..248e69c7960b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -332,7 +332,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
struct queue *q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void 
*restore_ctl_stack)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -394,7 +394,8 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
 
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
->properties, restore_mqd);
+>properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
@@ -1347,7 +1348,7 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void *restore_ctl_stack)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1393,9 +1394,11 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
+
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
->properties, restore_mqd);
+>properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
@@ -1788,7 +1791,8 @@ static int get_wave_state(struct device_queue_manager 
*dqm,
 
 static void get_queue_checkpoint_info(struct device_queue_manager *dqm,
const struct queue *q,
-

[Patch v4 18/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2021-12-22 Thread Rajneesh Bhardwaj

Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically used for querying
the current xnack mode without modifying it.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 15 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 178b0ccfb286..446eb9310915 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1845,6 +1845,11 @@ static int criu_checkpoint_process(struct kfd_process *p,
memset(_priv, 0, sizeof(process_priv));
 
process_priv.version = KFD_CRIU_PRIV_VERSION;
+   /* For CR, we don't consider negative xnack mode which is used for
+* querying without changing it, here 0 simply means disabled and 1
+* means enabled so retry for finding a valid PTE.
+*/
+   process_priv.xnack_mode = p->xnack_enabled ? 1 : 0;
 
ret = copy_to_user(user_priv_data + *priv_offset,
_priv, sizeof(process_priv));
@@ -2231,6 +2236,16 @@ static int criu_restore_process(struct kfd_process *p,
return -EINVAL;
}
 
+   pr_debug("Setting XNACK mode\n");
+   if (process_priv.xnack_mode && !kfd_process_xnack_mode(p, true)) {
+   pr_err("xnack mode cannot be set\n");
+   ret = -EPERM;
+   goto exit;
+   } else {
+   pr_debug("set xnack mode: %d\n", process_priv.xnack_mode);
+   p->xnack_enabled = process_priv.xnack_mode;
+   }
+
 exit:
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 855c162b85ea..d72dda84c18c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1057,6 +1057,7 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
 
 struct kfd_criu_process_priv_data {
uint32_t version;
+   uint32_t xnack_mode;
 };
 
 struct kfd_criu_device_priv_data {
-- 
2.17.1

[Patch v4 11/24] drm/amdkfd: CRIU restore sdma id for queues

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.

Signed-off-by: David Yat Sin 

---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  4 +-
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 62fe28244a80..7e49f70b81b9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -58,7 +58,7 @@ static inline void deallocate_hqd(struct device_queue_manager 
*dqm,
struct queue *q);
 static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q);
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q);
+   struct queue *q, const uint32_t 
*restore_sdma_id);
 static void kfd_process_hw_exception(struct work_struct *work);
 
 static inline
@@ -308,7 +308,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
-   struct qcm_process_device *qpd)
+   struct qcm_process_device *qpd,
+   const struct kfd_criu_queue_priv_data *qd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -348,7 +349,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
q->pipe, q->queue);
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
-   retval = allocate_sdma_queue(dqm, q);
+   retval = allocate_sdma_queue(dqm, q, qd ? >sdma_id : NULL);
if (retval)
goto deallocate_vmid;
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1040,7 +1041,7 @@ static void pre_reset(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q)
+   struct queue *q, const uint32_t 
*restore_sdma_id)
 {
int bit;
 
@@ -1050,9 +1051,21 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
return -ENOMEM;
}
 
-   bit = __ffs64(dqm->sdma_bitmap);
-   dqm->sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->sdma_bitmap & (1ULL << *restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   /* Find first available sdma_id */
+   bit = __ffs64(dqm->sdma_bitmap);
+   dqm->sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
+
q->properties.sdma_engine_id = q->sdma_id %
get_num_sdma_engines(dqm);
q->properties.sdma_queue_id = q->sdma_id /
@@ -1062,9 +1075,19 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
pr_err("No more XGMI SDMA queue to allocate\n");
return -ENOMEM;
}
-   bit = __ffs64(dqm->xgmi_sdma_bitmap);
-   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->xgmi_sdma_bitmap & (1ULL << 
*restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   bit = __ffs64(dqm->xgmi_sdma_bitmap);
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
/* sdma_engine_id is sdma id including
 * both PCIe-optimized SDMAs and XGMI-
 * optimized SDMAs. The calculation below
@@ -1293,7 +1316,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 }
 
 static int create_queue_cpsch(struct

[Patch v4 21/24] drm/amdkfd: CRIU Discover svm ranges

2021-12-22 Thread Rajneesh Bhardwaj

A KFD process may contain a number of virtual address ranges for shared
virtual memory management and each such range can have many SVM
attributes spanning across various nodes within the process boundary.
This change reports the total number of such SVM ranges and
their total private data size by extending the PROCESS_INFO op of the the
CRIU IOCTL to discover the svm ranges in the target process and a future
patches brings in the required support for checkpoint and restore for
SVM ranges.


Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 60 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 11 +
 4 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 446eb9310915..1c25d5e9067c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2089,10 +2089,9 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
uint32_t *num_objects,
uint64_t *objs_priv_size)
 {
-   int ret;
-   uint64_t priv_size;
+   uint64_t queues_priv_data_size, svm_priv_data_size, priv_size;
uint32_t num_queues, num_events, num_svm_ranges;
-   uint64_t queues_priv_data_size;
+   int ret;
 
*num_devices = p->n_pdds;
*num_bos = get_process_num_bos(p);
@@ -2102,7 +2101,10 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
return ret;
 
num_events = kfd_get_num_events(p);
-   num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
+
+   ret = svm_range_get_info(p, _svm_ranges, _priv_data_size);
+   if (ret)
+   return ret;
 
*num_objects = num_queues + num_events + num_svm_ranges;
 
@@ -2112,7 +2114,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
priv_size += queues_priv_data_size;
priv_size += num_events * sizeof(struct 
kfd_criu_event_priv_data);
-   /* TODO: Add SVM ranges priv size */
+   priv_size += svm_priv_data_size;
*objs_priv_size = priv_size;
}
return 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d72dda84c18c..87eb6739a78e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1082,7 +1082,10 @@ enum kfd_criu_object_type {
 
 struct kfd_criu_svm_range_priv_data {
uint32_t object_type;
-   uint64_t reserved;
+   uint64_t start_addr;
+   uint64_t size;
+   /* Variable length array of attributes */
+   struct kfd_ioctl_svm_attribute attrs[0];
 };
 
 struct kfd_criu_queue_priv_data {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7c92116153fe..49e05fb5c898 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3418,6 +3418,66 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int svm_range_get_info(struct kfd_process *p, uint32_t *num_svm_ranges,
+  uint64_t *svm_priv_data_size)
+{
+   uint64_t total_size, accessibility_size, common_attr_size;
+   int nattr_common = 4, naatr_accessibility = 1;
+   int num_devices = p->n_pdds;
+   struct svm_range_list *svms;
+   struct svm_range *prange;
+   uint32_t count = 0;
+
+   *svm_priv_data_size = 0;
+
+   svms = >svms;
+   if (!svms)
+   return -EINVAL;
+
+   mutex_lock(>lock);
+   list_for_each_entry(prange, >list, list) {
+   pr_debug("prange: 0x%p start: 0x%lx\t npages: 0x%llx\t end: 
0x%llx\n",
+prange, prange->start, prange->npages,
+prange->start + prange->npages - 1);
+   count++;
+   }
+   mutex_unlock(>lock);
+
+   *num_svm_ranges = count;
+   /* Only the accessbility attributes need to be queried for all the gpus
+* individually, remaining ones are spanned across the entire process
+* regardless of the various gpu nodes. Of the remaining attributes,
+* KFD_IOCTL_SVM_ATTR_CLR_FLAGS need not be saved.
+*
+* KFD_IOCTL_SVM_ATTR_PREFERRED_LOC
+* KFD_IOCTL_SVM_ATTR_PREFETCH_LOC
+* KFD_IOCTL_SVM_ATTR_SET_FLAGS
+* KFD_IOCTL_SVM_ATTR_GRANULARITY
+*
+* ** ACCESSBILITY ATTRIBUTES **
+* (Considered as one, type is altered during query, value is gpuid)
+* KFD_IOCTL_SVM_ATTR_ACCESS
+* KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE
+* KFD_IOCTL_SVM_ATTR_NO_ACCESS

[Patch v4 15/24] drm/amdkfd: CRIU checkpoint and restore events

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  70 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 272 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  27 ++-
 3 files changed, 280 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 582b4a393f95..08467fa2f514 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1009,57 +1009,11 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
 * through the event_page_offset field.
 */
if (args->event_page_offset) {
-   struct kfd_dev *kfd;
-   struct kfd_process_device *pdd;
-   void *mem, *kern_addr;
-   uint64_t size;
-
-   kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
-   if (!kfd) {
-   pr_err("Getting device by id failed in %s\n", __func__);
-   return -EINVAL;
-   }
-
mutex_lock(>mutex);
-
-   if (p->signal_page) {
-   pr_err("Event page is already set\n");
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   pdd = kfd_bind_process_to_device(kfd, p);
-   if (IS_ERR(pdd)) {
-   err = PTR_ERR(pdd);
-   goto out_unlock;
-   }
-
-   mem = kfd_process_device_translate_handle(pdd,
-   GET_IDR_HANDLE(args->event_page_offset));
-   if (!mem) {
-   pr_err("Can't find BO, offset is 0x%llx\n",
-  args->event_page_offset);
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->adev,
-   mem, _addr, );
-   if (err) {
-   pr_err("Failed to map event page to kernel\n");
-   goto out_unlock;
-   }
-
-   err = kfd_event_page_set(p, kern_addr, size);
-   if (err) {
-   pr_err("Failed to set event page\n");
-   amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(kfd->adev, 
mem);
-   goto out_unlock;
-   }
-
-   p->signal_handle = args->event_page_offset;
-
+   err = kfd_kmap_event_page(p, args->event_page_offset);
mutex_unlock(>mutex);
+   if (err)
+   return err;
}
 
err = kfd_event_create(filp, p, args->event_type,
@@ -1068,10 +1022,7 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
>event_page_offset,
>event_slot_index);
 
-   return err;
-
-out_unlock:
-   mutex_unlock(>mutex);
+   pr_debug("Created event (id:0x%08x) (%s)\n", args->event_id, __func__);
return err;
 }
 
@@ -2022,7 +1973,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
if (ret)
return ret;
 
-   num_events = 0; /* TODO: Implement Events */
+   num_events = kfd_get_num_events(p);
num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
 
*num_objects = num_queues + num_events + num_svm_ranges;
@@ -2031,7 +1982,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
priv_size = sizeof(struct kfd_criu_process_priv_data);
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
priv_size += queues_priv_data_size;
-   /* TODO: Add Events priv size */
+   priv_size += num_events * sizeof(struct 
kfd_criu_event_priv_data);
/* TODO: Add SVM ranges priv size */
*objs_priv_size = priv_size;
}
@@ -2093,7 +2044,10 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto exit_unlock;
 
-   /* TODO: Dump Events */
+   ret = kfd_criu_checkpoint_events(p, (uint8_t __user 
*)args->priv_data,
+_offset);
+   if (ret)
+   goto exit_unlock;
 
/* TODO: Dump SVM-Ranges */
}
@@ -2406,8 +2360,8 @@ static int criu_restore_objects(struct file *filep,
goto exit;
break;
case KFD_CRIU_OBJECT_TYPE_EVENT:
-   /* TODO: Implement Events */
-   *priv_offset += sizeof(struct kfd_criu_event_priv_data);
+

[Patch v4 17/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2021-12-22 Thread Rajneesh Bhardwaj

KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to further process
the sdma command submissions.

With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.

Suggested-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71 +++-
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 20652d488cde..178b0ccfb286 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
@@ -43,6 +44,7 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_object.h"
+#include "amdgpu_dma_buf.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1932,6 +1934,33 @@ uint64_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_get_prime_handle(struct drm_gem_object *gobj, int flags,
+ u32 *shared_fd)
+{
+   struct dma_buf *dmabuf;
+   int ret;
+
+   dmabuf = amdgpu_gem_prime_export(gobj, flags);
+   if (IS_ERR(dmabuf)) {
+   ret = PTR_ERR(dmabuf);
+   pr_err("dmabuf export failed for the BO\n");
+   return ret;
+   }
+
+   ret = dma_buf_fd(dmabuf, flags);
+   if (ret < 0) {
+   pr_err("dmabuf create fd failed, ret:%d\n", ret);
+   goto out_free_dmabuf;
+   }
+
+   *shared_fd = ret;
+   return 0;
+
+out_free_dmabuf:
+   dma_buf_put(dmabuf);
+   return ret;
+}
+
 static int criu_checkpoint_bos(struct kfd_process *p,
   uint32_t num_bos,
   uint8_t __user *user_bos,
@@ -1992,6 +2021,14 @@ static int criu_checkpoint_bos(struct kfd_process *p,
goto exit;
}
}
+   if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+   ret = 
criu_get_prime_handle(_bo->tbo.base,
+   bo_bucket->alloc_flags &
+   
KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DRM_RDWR : 0,
+   _bucket->dmabuf_fd);
+   if (ret)
+   goto exit;
+   }
if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
KFD_MMAP_GPU_ID(pdd->dev->id);
@@ -2031,6 +2068,10 @@ static int criu_checkpoint_bos(struct kfd_process *p,
*priv_offset += num_bos * sizeof(*bo_privs);
 
 exit:
+   while (ret && bo_index--) {
+   if (bo_buckets[bo_index].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[bo_index].dmabuf_fd);
+   }
 
kvfree(bo_buckets);
kvfree(bo_privs);
@@ -2131,16 +2172,28 @@ static int criu_checkpoint(struct file *filep,
ret = kfd_criu_checkpoint_queues(p, (uint8_t __user 
*)args->priv_data,
 _offset);
if (ret)
-   goto exit_unlock;
+   goto close_bo_fds;
 
ret = kfd_criu_checkpoint_events(p, (uint8_t __user 
*)args->priv_data,
 _offset);
if (ret)
-   goto exit_unlock;
+   goto close_bo_fds;
 
/* TODO: Dump SVM-Ranges */
}
 
+close_bo_fds:
+   if (ret) {
+   /* If IOCTL returns err, user assumes all FDs opened in 
criu_dump_bos are closed */
+   uint32_t i;
+   struct kfd_criu_bo_bucket *bo_buckets = (struct 
kfd_criu_bo_bucket *) args->bos;
+
+   for (i = 0; i < num_bos; i++) {
+   if (bo_buckets[i].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[i].dmabuf_fd);
+   }
+   }
+
 exit_unlock:
mutex_unlock(>mutex);
if (ret)
@@ -2335,6 +2388,7 @@ static int criu

[Patch v4 09/24] drm/amdkfd: CRIU add queues support

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 110 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  43 +++-
 .../amd/amdkfd/kfd_process_queue_manager.c| 212 ++
 3 files changed, 357 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index db2bb302a8d4..9665c8657929 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2006,19 +2006,36 @@ static int criu_checkpoint_bos(struct kfd_process *p,
return ret;
 }
 
-static void criu_get_process_object_info(struct kfd_process *p,
-uint32_t *num_bos,
-uint64_t *objs_priv_size)
+static int criu_get_process_object_info(struct kfd_process *p,
+   uint32_t *num_bos,
+   uint32_t *num_objects,
+   uint64_t *objs_priv_size)
 {
+   int ret;
uint64_t priv_size;
+   uint32_t num_queues, num_events, num_svm_ranges;
+   uint64_t queues_priv_data_size;
 
*num_bos = get_process_num_bos(p);
 
+   ret = kfd_process_get_queue_info(p, _queues, 
_priv_data_size);
+   if (ret)
+   return ret;
+
+   num_events = 0; /* TODO: Implement Events */
+   num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
+
+   *num_objects = num_queues + num_events + num_svm_ranges;
+
if (objs_priv_size) {
priv_size = sizeof(struct kfd_criu_process_priv_data);
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
+   priv_size += queues_priv_data_size;
+   /* TODO: Add Events priv size */
+   /* TODO: Add SVM ranges priv size */
*objs_priv_size = priv_size;
}
+   return 0;
 }
 
 static int criu_checkpoint(struct file *filep,
@@ -2026,7 +2043,7 @@ static int criu_checkpoint(struct file *filep,
   struct kfd_ioctl_criu_args *args)
 {
int ret;
-   uint32_t num_bos;
+   uint32_t num_bos, num_objects;
uint64_t priv_size, priv_offset = 0;
 
if (!args->bos || !args->priv_data)
@@ -2048,9 +2065,12 @@ static int criu_checkpoint(struct file *filep,
goto exit_unlock;
}
 
-   criu_get_process_object_info(p, _bos, _size);
+   ret = criu_get_process_object_info(p, _bos, _objects, 
_size);
+   if (ret)
+   goto exit_unlock;
 
if (num_bos != args->num_bos ||
+   num_objects != args->num_objects ||
priv_size != args->priv_data_size) {
 
ret = -EINVAL;
@@ -2067,6 +2087,17 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto exit_unlock;
 
+   if (num_objects) {
+   ret = kfd_criu_checkpoint_queues(p, (uint8_t __user 
*)args->priv_data,
+_offset);
+   if (ret)
+   goto exit_unlock;
+
+   /* TODO: Dump Events */
+
+   /* TODO: Dump SVM-Ranges */
+   }
+
 exit_unlock:
mutex_unlock(>mutex);
if (ret)
@@ -2340,6 +2371,62 @@ static int criu_restore_bos(struct kfd_process *p,
return ret;
 }
 
+static int criu_restore_objects(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   int ret = 0;
+   uint32_t i;
+
+   BUILD_BUG_ON(offsetof(struct kfd_criu_queue_priv_data, object_type));
+   BUILD_BUG_ON(offsetof(struct kfd_criu_event_priv_data, object_type));
+   BUILD_BUG_ON(offsetof(struct kfd_criu_svm_range_priv_data, 
object_type));
+
+   for (i = 0; i < args->num_objects; i++) {
+   uint32_t object_type;
+
+   if (*priv_offset + sizeof(object_type) > max_priv_data_size) {
+   pr_err("Invalid private data size\n");
+   return -EINVAL;
+   }
+
+   ret = get_user(object_type, (uint32_t __user *)(args->priv_data 
+ *priv_offset));
+   if (ret) {
+   pr_err("Failed to copy private information from 
user\n");
+   goto exit;
+   }
+
+   switch (object_type) {
+   case KFD_CRIU_OBJECT_TYPE_QUEUE:
+   ret = kfd_criu_restore_queue(p, (uint8_t __user 
*)args->priv_data,
+priv_offset, 
max_priv_data_size);
+

[Patch v4 02/24] x86/configs: Add rock-rel_defconfig for amd-feature-criu branch

2021-12-22 Thread Rajneesh Bhardwaj

 - Add rock-rel_defconfig for release builds.

Signed-off-by: Rajneesh Bhardwaj 
---
 arch/x86/configs/rock-rel_defconfig | 4927 +++
 1 file changed, 4927 insertions(+)
 create mode 100644 arch/x86/configs/rock-rel_defconfig

diff --git a/arch/x86/configs/rock-rel_defconfig 
b/arch/x86/configs/rock-rel_defconfig
new file mode 100644
index ..f038ce7a0d06
--- /dev/null
+++ b/arch/x86/configs/rock-rel_defconfig
@@ -0,0 +1,4927 @@
+#
+# Automatically generated file; DO NOT EDIT.
+# Linux/x86 5.13.0 Kernel Configuration
+#
+CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
+CONFIG_CC_IS_GCC=y
+CONFIG_GCC_VERSION=70500
+CONFIG_CLANG_VERSION=0
+CONFIG_AS_IS_GNU=y
+CONFIG_AS_VERSION=23000
+CONFIG_LD_IS_BFD=y
+CONFIG_LD_VERSION=23000
+CONFIG_LLD_VERSION=0
+CONFIG_CC_CAN_LINK=y
+CONFIG_CC_CAN_LINK_STATIC=y
+CONFIG_CC_HAS_ASM_GOTO=y
+CONFIG_CC_HAS_ASM_INLINE=y
+CONFIG_IRQ_WORK=y
+CONFIG_BUILDTIME_TABLE_SORT=y
+CONFIG_THREAD_INFO_IN_TASK=y
+
+#
+# General setup
+#
+CONFIG_INIT_ENV_ARG_LIMIT=32
+# CONFIG_COMPILE_TEST is not set
+CONFIG_LOCALVERSION="-kfd"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_BUILD_SALT=""
+CONFIG_HAVE_KERNEL_GZIP=y
+CONFIG_HAVE_KERNEL_BZIP2=y
+CONFIG_HAVE_KERNEL_LZMA=y
+CONFIG_HAVE_KERNEL_XZ=y
+CONFIG_HAVE_KERNEL_LZO=y
+CONFIG_HAVE_KERNEL_LZ4=y
+CONFIG_HAVE_KERNEL_ZSTD=y
+CONFIG_KERNEL_GZIP=y
+# CONFIG_KERNEL_BZIP2 is not set
+# CONFIG_KERNEL_LZMA is not set
+# CONFIG_KERNEL_XZ is not set
+# CONFIG_KERNEL_LZO is not set
+# CONFIG_KERNEL_LZ4 is not set
+# CONFIG_KERNEL_ZSTD is not set
+CONFIG_DEFAULT_INIT=""
+CONFIG_DEFAULT_HOSTNAME="(none)"
+CONFIG_SWAP=y
+CONFIG_SYSVIPC=y
+CONFIG_SYSVIPC_SYSCTL=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_POSIX_MQUEUE_SYSCTL=y
+# CONFIG_WATCH_QUEUE is not set
+CONFIG_CROSS_MEMORY_ATTACH=y
+CONFIG_USELIB=y
+CONFIG_AUDIT=y
+CONFIG_HAVE_ARCH_AUDITSYSCALL=y
+CONFIG_AUDITSYSCALL=y
+
+#
+# IRQ subsystem
+#
+CONFIG_GENERIC_IRQ_PROBE=y
+CONFIG_GENERIC_IRQ_SHOW=y
+CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
+CONFIG_GENERIC_PENDING_IRQ=y
+CONFIG_GENERIC_IRQ_MIGRATION=y
+CONFIG_HARDIRQS_SW_RESEND=y
+CONFIG_IRQ_DOMAIN=y
+CONFIG_IRQ_DOMAIN_HIERARCHY=y
+CONFIG_GENERIC_MSI_IRQ=y
+CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
+CONFIG_IRQ_MSI_IOMMU=y
+CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
+CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
+CONFIG_IRQ_FORCED_THREADING=y
+CONFIG_SPARSE_IRQ=y
+# CONFIG_GENERIC_IRQ_DEBUGFS is not set
+# end of IRQ subsystem
+
+CONFIG_CLOCKSOURCE_WATCHDOG=y
+CONFIG_ARCH_CLOCKSOURCE_INIT=y
+CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
+CONFIG_GENERIC_TIME_VSYSCALL=y
+CONFIG_GENERIC_CLOCKEVENTS=y
+CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
+CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
+CONFIG_GENERIC_CMOS_UPDATE=y
+CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
+CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y
+
+#
+# Timers subsystem
+#
+CONFIG_TICK_ONESHOT=y
+CONFIG_NO_HZ_COMMON=y
+# CONFIG_HZ_PERIODIC is not set
+CONFIG_NO_HZ_IDLE=y
+# CONFIG_NO_HZ_FULL is not set
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+# end of Timers subsystem
+
+CONFIG_BPF=y
+CONFIG_HAVE_EBPF_JIT=y
+CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
+
+#
+# BPF subsystem
+#
+CONFIG_BPF_SYSCALL=y
+# CONFIG_BPF_JIT is not set
+# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
+# CONFIG_BPF_PRELOAD is not set
+# end of BPF subsystem
+
+# CONFIG_PREEMPT_NONE is not set
+CONFIG_PREEMPT_VOLUNTARY=y
+# CONFIG_PREEMPT is not set
+CONFIG_PREEMPT_COUNT=y
+
+#
+# CPU/Task time and stats accounting
+#
+CONFIG_TICK_CPU_ACCOUNTING=y
+# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
+# CONFIG_IRQ_TIME_ACCOUNTING is not set
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_BSD_PROCESS_ACCT_V3=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+# CONFIG_PSI is not set
+# end of CPU/Task time and stats accounting
+
+# CONFIG_CPU_ISOLATION is not set
+
+#
+# RCU Subsystem
+#
+CONFIG_TREE_RCU=y
+# CONFIG_RCU_EXPERT is not set
+CONFIG_SRCU=y
+CONFIG_TREE_SRCU=y
+CONFIG_TASKS_RCU_GENERIC=y
+CONFIG_TASKS_RUDE_RCU=y
+CONFIG_TASKS_TRACE_RCU=y
+CONFIG_RCU_STALL_COMMON=y
+CONFIG_RCU_NEED_SEGCBLIST=y
+# end of RCU Subsystem
+
+CONFIG_BUILD_BIN2C=y
+# CONFIG_IKCONFIG is not set
+# CONFIG_IKHEADERS is not set
+CONFIG_LOG_BUF_SHIFT=18
+CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
+CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
+CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
+
+#
+# Scheduler features
+#
+# CONFIG_UCLAMP_TASK is not set
+# end of Scheduler features
+
+CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
+CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
+CONFIG_CC_HAS_INT128=y
+CONFIG_ARCH_SUPPORTS_INT128=y
+CONFIG_NUMA_BALANCING=y
+CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
+CONFIG_CGROUPS=y
+CONFIG_PAGE_COUNTER=y
+CONFIG_MEMCG=y
+CONFIG_MEMCG_SWAP=y
+CONFIG_MEMCG_KMEM=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_WRITEBACK=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_FAIR_GROUP_SCHED=y
+CONFIG_CFS_BANDWIDTH=y
+# CONFIG_RT_GROUP_SCHED is not set
+CONFIG_CGROUP_PIDS=y
+# CONFIG_CGROUP_RDMA is not set
+CONFIG_CGROUP_FR

[Patch v4 13/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.

Signed-off-by: David Yat Sin 

---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |   2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  72 +++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   7 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  67 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  68 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  68 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  69 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 158 --
 11 files changed, 506 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 3fb155f756fd..146879cd3f2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 0c50e67e2b51..3a5303ebcabf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL, NULL);
+   , , NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a0f5b8533a03..a92274f9f1f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -331,7 +331,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -390,8 +391,14 @@ static int create_queue_nocpsch(struct 
device_queue_manager *dqm,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
-   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
-   >gart_mqd_addr, >properties);
+
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
+>properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
+   >gart_mqd_addr, >properties);
+
if (q->properties.is_active) {
if (!dqm->sched_running) {
WARN_ONCE(1, "Load non-HWS mqd while stopped\n");
@@ -1339,7 +1346,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1385,8 +1393,12 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
-   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
-   >gart_mqd_addr, >properties);
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
+>properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
+   >gart_mqd_addr, >properties);
 
list_add(>list, >queues_list);
qpd->queue_count++;
@@ -1774,6 +1786,50 @@ static int get_wave_state(struct device_queue_manager 
*dqm,

[Patch v4 06/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-12-22 Thread Rajneesh Bhardwaj

This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic foundation for the
userptrs buffer objects which will be added in a separate patch.
This ioctl creates various types of buffer objects such as VRAM,
MMIO, Doorbell, GTT based on the date sent from the userspace plugin.
The data mostly contains the previously checkpointed KFD images from
some KFD processs.

While restoring a criu process, attach old IDR values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++-
 1 file changed, 297 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index cdbb92972338..c93f74ad073f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2069,11 +2069,307 @@ static int criu_checkpoint(struct file *filep,
return ret;
 }
 
+static int criu_restore_process(struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   int ret = 0;
+   struct kfd_criu_process_priv_data process_priv;
+
+   if (*priv_offset + sizeof(process_priv) > max_priv_data_size)
+   return -EINVAL;
+
+   ret = copy_from_user(_priv,
+   (void __user *)(args->priv_data + *priv_offset),
+   sizeof(process_priv));
+   if (ret) {
+   pr_err("Failed to copy process private information from 
user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_offset += sizeof(process_priv);
+
+   if (process_priv.version != KFD_CRIU_PRIV_VERSION) {
+   pr_err("Invalid CRIU API version (checkpointed:%d 
current:%d)\n",
+   process_priv.version, KFD_CRIU_PRIV_VERSION);
+   return -EINVAL;
+   }
+
+exit:
+   return ret;
+}
+
+static int criu_restore_bos(struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   struct kfd_criu_bo_priv_data *bo_privs;
+   bool flush_tlbs = false;
+   int ret = 0, j = 0;
+   uint32_t i;
+
+   if (*priv_offset + (args->num_bos * sizeof(*bo_privs)) > 
max_priv_data_size)
+   return -EINVAL;
+
+   bo_buckets = kvmalloc_array(args->num_bos, sizeof(*bo_buckets), 
GFP_KERNEL);
+   if (!bo_buckets)
+   return -ENOMEM;
+
+   ret = copy_from_user(bo_buckets, (void __user *)args->bos,
+args->num_bos * sizeof(*bo_buckets));
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+
+   bo_privs = kvmalloc_array(args->num_bos, sizeof(*bo_privs), GFP_KERNEL);
+   if (!bo_privs) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   ret = copy_from_user(bo_privs, (void __user *)args->priv_data + 
*priv_offset,
+args->num_bos * sizeof(*bo_privs));
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_offset += args->num_bos * sizeof(*bo_privs);
+
+   /* Create and map new BOs */
+   for (i = 0; i < args->num_bos; i++) {
+   struct kfd_criu_bo_bucket *bo_bucket;
+   struct kfd_criu_bo_priv_data *bo_priv;
+   struct kfd_dev *dev;
+   struct kfd_process_device *pdd;
+   void *mem;
+   u64 offset;
+   int idr_handle;
+
+   bo_bucket = _buckets[i];
+   bo_priv = _privs[i];
+
+   dev = kfd_device_by_id(bo_bucket->gpu_id);
+   if (!dev) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+   pdd = kfd_get_process_device_data(dev, p);
+   if (!pdd) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+
+   pr_debug("kfd restore ioctl - bo_bucket[%d]:\n", i);
+   pr_debug(&q

[Patch v4 24/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2021-12-22 Thread Rajneesh Bhardwaj

In CRIU resume stage, resume all the shared virtual memory ranges from
the data stored inside the resuming kfd process during CRIU restore
phase. Also setup xnack mode and free up the resources.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  6 +++
 3 files changed, 71 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f7aa15b18f95..6191e37656dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2759,7 +2759,17 @@ static int criu_resume(struct file *filep,
}
 
mutex_lock(>mutex);
+   ret = kfd_criu_resume_svm(target);
+   if (ret) {
+   pr_err("kfd_criu_resume_svm failed for %i\n", args->pid);
+   goto exit;
+   }
+
ret =  amdgpu_amdkfd_criu_resume(target->kgd_process_info);
+   if (ret)
+   pr_err("amdgpu_amdkfd_criu_resume failed for %i\n", args->pid);
+
+exit:
mutex_unlock(>mutex);
 
kfd_unref_process(target);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index e9f6c63c2a26..bd2dce37f345 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3427,6 +3427,61 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int kfd_criu_resume_svm(struct kfd_process *p)
+{
+   int nattr_common = 4, nattr_accessibility = 1;
+   struct criu_svm_metadata *criu_svm_md = NULL;
+   struct criu_svm_metadata *next = NULL;
+   struct svm_range_list *svms = >svms;
+   int i, j, num_attrs, ret = 0;
+   struct mm_struct *mm;
+
+   if (list_empty(>criu_svm_metadata_list)) {
+   pr_debug("No SVM data from CRIU restore stage 2\n");
+   return ret;
+   }
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_err("failed to get mm for the target process\n");
+   return -ESRCH;
+   }
+
+   num_attrs = nattr_common + (nattr_accessibility * p->n_pdds);
+
+   i = j = 0;
+   list_for_each_entry(criu_svm_md, >criu_svm_metadata_list, list) {
+   pr_debug("criu_svm_md[%d]\n\tstart: 0x%llx size: 0x%llx 
(npages)\n",
+i, criu_svm_md->start_addr, criu_svm_md->size);
+   for (j = 0; j < num_attrs; j++) {
+   pr_debug("\ncriu_svm_md[%d]->attrs[%d].type : 0x%x 
\ncriu_svm_md[%d]->attrs[%d].value : 0x%x\n",
+i,j, criu_svm_md->attrs[j].type,
+i,j, criu_svm_md->attrs[j].value);
+   }
+
+   ret = svm_range_set_attr(p, mm, criu_svm_md->start_addr,
+criu_svm_md->size, num_attrs,
+criu_svm_md->attrs);
+   if (ret) {
+   pr_err("CRIU: failed to set range attributes\n");
+   goto exit;
+   }
+
+   i++;
+   }
+
+exit:
+   list_for_each_entry_safe(criu_svm_md, next, 
>criu_svm_metadata_list, list) {
+   pr_debug("freeing criu_svm_md[]\n\tstart: 0x%llx\n",
+   criu_svm_md->start_addr);
+   kfree(criu_svm_md);
+   }
+
+   mmput(mm);
+   return ret;
+
+}
+
 int svm_criu_prepare_for_resume(struct kfd_process *p,
struct kfd_criu_svm_range_priv_data *svm_priv)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index e0c0853f085c..3b5bcb52723c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -195,6 +195,7 @@ int kfd_criu_restore_svm(struct kfd_process *p,
 uint8_t __user *user_priv_ptr,
 uint64_t *priv_data_offset,
 uint64_t max_priv_data_size);
+int kfd_criu_resume_svm(struct kfd_process *p);
 struct kfd_process_device *
 svm_range_get_pdd_by_adev(struct svm_range *prange, struct amdgpu_device 
*adev);
 void svm_range_list_lock_and_flush_work(struct svm_range_list *svms, struct 
mm_struct *mm);
@@ -256,6 +257,11 @@ static inline int kfd_criu_restore_svm(struct kfd_process 
*p,
return -EINVAL;
 }
 
+static inline int kfd_criu_resume_svm(struct kfd_process *p)
+{
+   return 0;
+}
+
 #define KFD_IS_SVM_API_SUPPORTED(dev) false
 
 #endif /* IS_ENABLED(CONFIG_HSA_AMD_SVM) */
-- 
2.17.1

[Patch v4 20/24] drm/amdkfd: use user_gpu_id for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj

Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore
support its possible that the SVM ranges can be resumed on another node
where the actual_gpu_id may not be same as the original (user_gpu_id)
gpu id. So modify svm code to use user_gpu_id.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 67e2432098d1..0769dc655e15 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1813,7 +1813,7 @@ int kfd_process_gpuidx_from_gpuid(struct kfd_process *p, 
uint32_t gpu_id)
int i;
 
for (i = 0; i < p->n_pdds; i++)
-   if (p->pdds[i] && gpu_id == p->pdds[i]->dev->id)
+   if (p->pdds[i] && gpu_id == p->pdds[i]->user_gpu_id)
return i;
return -EINVAL;
 }
@@ -1826,7 +1826,7 @@ kfd_process_gpuid_from_adev(struct kfd_process *p, struct 
amdgpu_device *adev,
 
for (i = 0; i < p->n_pdds; i++)
if (p->pdds[i] && p->pdds[i]->dev->adev == adev) {
-   *gpuid = p->pdds[i]->dev->id;
+   *gpuid = p->pdds[i]->user_gpu_id;
*gpuidx = i;
return 0;
}
-- 
2.17.1

[Patch v4 04/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-12-22 Thread Rajneesh Bhardwaj

This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
IOCTL.

The process_info IOCTL determines the number of GPUs, buffer objects
that are associated with the target process, its process id in
caller's namespace since /proc/pid/mem interface maybe used to drain
the contents of the discovered buffer objects in userspace and getpid
returns the pid of CRIU dumper process. Also the pid of a process
inside a container might be different than its global pid so return
the ns pid.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 14 ++
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1b863bd84c96..53d7a20e3c06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1857,6 +1857,41 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+uint64_t get_process_num_bos(struct kfd_process *p)
+{
+   uint64_t num_of_bos = 0, i;
+
+   /* Run over all PDDs of the process */
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+   void *mem;
+   int id;
+
+   idr_for_each_entry(>alloc_idr, mem, id) {
+   struct kgd_mem *kgd_mem = (struct kgd_mem *)mem;
+
+   if ((uint64_t)kgd_mem->va > pdd->gpuvm_base)
+   num_of_bos++;
+   }
+   }
+   return num_of_bos;
+}
+
+static void criu_get_process_object_info(struct kfd_process *p,
+uint32_t *num_bos,
+uint64_t *objs_priv_size)
+{
+   uint64_t priv_size;
+
+   *num_bos = get_process_num_bos(p);
+
+   if (objs_priv_size) {
+   priv_size = sizeof(struct kfd_criu_process_priv_data);
+   priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
+   *objs_priv_size = priv_size;
+   }
+}
+
 static int criu_checkpoint(struct file *filep,
   struct kfd_process *p,
   struct kfd_ioctl_criu_args *args)
@@ -1889,7 +1924,25 @@ static int criu_process_info(struct file *filep,
struct kfd_process *p,
struct kfd_ioctl_criu_args *args)
 {
-   return 0;
+   int ret = 0;
+
+   mutex_lock(>mutex);
+
+   if (!kfd_has_process_device_data(p)) {
+   pr_err("No pdd for given process\n");
+   ret = -ENODEV;
+   goto err_unlock;
+   }
+
+   args->pid = task_pid_nr_ns(p->lead_thread,
+   task_active_pid_ns(p->lead_thread));
+
+   criu_get_process_object_info(p, >num_bos, >priv_data_size);
+
+   dev_dbg(kfd_device, "Num of bos:%u\n", args->num_bos);
+err_unlock:
+   mutex_unlock(>mutex);
+   return ret;
 }
 
 static int kfd_ioctl_criu(struct file *filep, struct kfd_process *p, void 
*data)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e68f692362bb..4d9bc7af03af 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -950,6 +950,8 @@ void *kfd_process_device_translate_handle(struct 
kfd_process_device *p,
 void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
int handle);
 
+bool kfd_has_process_device_data(struct kfd_process *p);
+
 /* PASIDs */
 int kfd_pasid_init(void);
 void kfd_pasid_exit(void);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d4c8a6948a9f..f77d556ca0fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1456,6 +1456,20 @@ static int init_doorbell_bitmap(struct 
qcm_process_device *qpd,
return 0;
 }
 
+bool kfd_has_process_device_data(struct kfd_process *p)
+{
+   int i;
+
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (pdd)
+   return true;
+   }
+
+   return false;
+}
+
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
struct kfd_process *p)
 {
-- 
2.17.1

[Patch v4 10/24] drm/amdkfd: CRIU restore queue ids

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 

---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 37 +++
 4 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 9665c8657929..3fb155f756fd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 1e30717b5253..0c50e67e2b51 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL);
+   , , NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 7c2679a23aa3..8272bd5c4600 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -461,6 +461,7 @@ enum KFD_QUEUE_PRIORITY {
  * it's user mode or kernel mode queue.
  *
  */
+
 struct queue_properties {
enum kfd_queue_type type;
enum kfd_queue_format format;
@@ -1156,6 +1157,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue_properties(struct process_queue_manager *pqm, unsigned 
int qid,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 480ad794df4e..275aeebc58fa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -42,6 +42,20 @@ static inline struct process_queue_node *get_queue_by_qid(
return NULL;
 }
 
+static int assign_queue_slot_by_qid(struct process_queue_manager *pqm,
+   unsigned int qid)
+{
+   if (qid >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+   return -EINVAL;
+
+   if (__test_and_set_bit(qid, pqm->queue_slot_bitmap)) {
+   pr_err("Cannot create new queue because requested qid(%u) is in 
use\n", qid);
+   return -ENOSPC;
+   }
+
+   return 0;
+}
+
 static int find_available_queue_slot(struct process_queue_manager *pqm,
unsigned int *qid)
 {
@@ -194,6 +208,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process)
 {
int retval;
@@ -225,7 +240,12 @@ int pqm_create_queue(struct process_queue_manager *pqm,
if (pdd->qpd.queue_count >= max_queues)
return -ENOSPC;
 
-   retval = find_available_queue_slot(pqm, qid);
+   if (q_data) {
+   retval = assign_queue_slot_by_qid(pqm, q_data->q_id);
+   *qid = q_data->q_id;
+   } else
+   retval = find_available_queue_slot(pqm, qid);
+
if (retval != 0)
return retval;
 
@@ -528,7 +548,7 @@ int kfd_process_get_queue_info(struct kfd_process *p,
return 0;
 }
 
-static void criu_dump_queue(struct kfd_process_device *pdd,
+static void criu_checkpoint_queue(struct kfd_process_device *pdd,
   struct queue *q,
   struct kfd_criu_queue_priv_data *q_data)
 {
@@ -560,7 +580,7 @@ static void criu_dump_queue(struct kfd_process

[Patch v4 07/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-12-22 Thread Rajneesh Bhardwaj

This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  6 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 51 +--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 44 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 35 +++--
 5 files changed, 123 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index fcbc8a9c9e06..5c5fc839f701 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -131,6 +131,7 @@ struct amdkfd_process_info {
atomic_t evicted_bos;
struct delayed_work restore_userptr_work;
struct pid *pid;
+   bool block_mmu_notifications;
 };
 
 int amdgpu_amdkfd_init(void);
@@ -269,7 +270,7 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct amdgpu_device *adev, uint64_t va, uint64_t size,
void *drm_priv, struct kgd_mem **mem,
-   uint64_t *offset, uint32_t flags);
+   uint64_t *offset, uint32_t flags, bool criu_resume);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv,
uint64_t *size);
@@ -297,6 +298,9 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct amdgpu_device 
*adev,
 int amdgpu_amdkfd_get_tile_config(struct amdgpu_device *adev,
struct tile_config *config);
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev);
+void amdgpu_amdkfd_block_mmu_notifications(void *p);
+int amdgpu_amdkfd_criu_resume(void *p);
+
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 90b985436878..5679fb75ec88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -846,7 +846,8 @@ static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem 
*mem,
  *
  * Returns 0 for success, negative errno for errors.
  */
-static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
+static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
+  bool criu_resume)
 {
struct amdkfd_process_info *process_info = mem->process_info;
struct amdgpu_bo *bo = mem->bo;
@@ -868,6 +869,17 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
goto out;
}
 
+   if (criu_resume) {
+   /*
+* During a CRIU restore operation, the userptr buffer objects
+* will be validated in the restore_userptr_work worker at a
+* later stage when it is scheduled by another ioctl called by
+* CRIU master process for the target pid for restore.
+*/
+   atomic_inc(>invalid);
+   mutex_unlock(_info->lock);
+   return 0;
+   }
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
@@ -1240,6 +1252,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
INIT_DELAYED_WORK(>restore_userptr_work,
  amdgpu_amdkfd_restore_userptr_worker);
 
+   info->block_mmu_notifications = false;
*process_info = info;
*ef = dma_fence_get(>eviction_fence->base);
}
@@ -1456,10 +1469,37 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv)
return avm->pd_phys_addr;
 }
 
+void amdgpu_amdkfd_block_mmu_notifications(void *p)
+{
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   pinfo->block_mmu_notifications = true;
+}
+
+int amdgpu_amdkfd_criu_resume(void *p)
+{
+   int ret = 0;
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   mutex_lock(>lock);
+   pr_debug("scheduling work\n");
+   atomic_inc

[Patch v4 05/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2021-12-22 Thread Rajneesh Bhardwaj

This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  20 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   3 +-
 4 files changed, 195 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 56c5c4464829..4fd36bd9dcfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1173,6 +1173,26 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device 
*bdev,
return ttm_pool_free(>mman.bdev.pool, ttm);
 }
 
+/**
+ * amdgpu_ttm_tt_get_userptr - Return the userptr GTT ttm_tt for the current
+ * task
+ *
+ * @tbo: The ttm_buffer_object that contains the userptr
+ * @user_addr:  The returned value
+ */
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr)
+{
+   struct amdgpu_ttm_tt *gtt;
+
+   if (!tbo->ttm)
+   return -EINVAL;
+
+   gtt = (void *)tbo->ttm;
+   *user_addr = gtt->userptr;
+   return 0;
+}
+
 /**
  * amdgpu_ttm_tt_set_userptr - Initialize userptr GTT ttm_tt for the current
  * task
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 7346ecff4438..6e6d67ec43f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -177,6 +177,8 @@ static inline bool amdgpu_ttm_tt_get_user_pages_done(struct 
ttm_tt *ttm)
 #endif
 
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr);
 int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo,
  uint64_t addr, uint32_t flags);
 bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 53d7a20e3c06..cdbb92972338 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -42,6 +42,7 @@
 #include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "amdgpu_object.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1857,6 +1858,29 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int criu_checkpoint_process(struct kfd_process *p,
+uint8_t __user *user_priv_data,
+uint64_t *priv_offset)
+{
+   struct kfd_criu_process_priv_data process_priv;
+   int ret;
+
+   memset(_priv, 0, sizeof(process_priv));
+
+   process_priv.version = KFD_CRIU_PRIV_VERSION;
+
+   ret = copy_to_user(user_priv_data + *priv_offset,
+   _priv, sizeof(process_priv));
+
+   if (ret) {
+   pr_err("Failed to copy process information to user\n");
+   ret = -EFAULT;
+   }
+
+   *priv_offset += sizeof(process_priv);
+   return ret;
+}
+
 uint64_t get_process_num_bos(struct kfd_process *p)
 {
uint64_t num_of_bos = 0, i;
@@ -1877,6 +1901,111 @@ uint64_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_checkpoint_bos(struct kfd_process *p,
+  uint32_t num_bos,
+  uint8_t __user *user_bos,
+  uint8_t __user *user_priv_data,
+  uint64_t *priv_offset)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   struct kfd_criu_bo_priv_data *bo_privs;
+   int ret = 0, pdd_index, bo_index = 0, id;
+   void *mem;
+
+   bo_buckets = kvzalloc(num_bos * sizeof(*bo_buckets), GFP_KERNEL);
+   if (!bo_buckets) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   bo_privs = kvzalloc(num_bos * sizeof(*bo_privs), GFP_KERNEL);
+   if (!bo_privs) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   for (pdd_index = 0; pdd_index < p->n_pdds; pdd_index++) {
+   struct kfd_process_device *pdd = p->pdds[pdd_index];
+   struct amdgpu_bo *dumper_bo;
+   struct kgd_mem *kgd_mem;
+
+   idr_for_each_entry(>alloc_idr, mem, id) {
+   struct kfd_criu_bo_bucket

[Patch v4 08/24] drm/amdkfd: CRIU Implement KFD unpause operation

2021-12-22 Thread Rajneesh Bhardwaj

From: David Yat Sin 

Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO
op the queues will be stay in an evicted state. Once the plugin is done
draining BO contents, it is safe to perform an UNPAUSE op for the queues
to resume.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 37 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  1 +
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 87b9f019e96e..db2bb302a8d4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2040,6 +2040,14 @@ static int criu_checkpoint(struct file *filep,
goto exit_unlock;
}
 
+   /* Confirm all process queues are evicted */
+   if (!p->queues_paused) {
+   pr_err("Cannot dump process when queues are not in evicted 
state\n");
+   /* CRIU plugin did not call op PROCESS_INFO before 
checkpointing */
+   ret = -EINVAL;
+   goto exit_unlock;
+   }
+
criu_get_process_object_info(p, _bos, _size);
 
if (num_bos != args->num_bos ||
@@ -2382,7 +2390,24 @@ static int criu_unpause(struct file *filep,
struct kfd_process *p,
struct kfd_ioctl_criu_args *args)
 {
-   return 0;
+   int ret;
+
+   mutex_lock(>mutex);
+
+   if (!p->queues_paused) {
+   mutex_unlock(>mutex);
+   return -EINVAL;
+   }
+
+   ret = kfd_process_restore_queues(p);
+   if (ret)
+   pr_err("Failed to unpause queues ret:%d\n", ret);
+   else
+   p->queues_paused = false;
+
+   mutex_unlock(>mutex);
+
+   return ret;
 }
 
 static int criu_resume(struct file *filep,
@@ -2434,6 +2459,12 @@ static int criu_process_info(struct file *filep,
goto err_unlock;
}
 
+   ret = kfd_process_evict_queues(p);
+   if (ret)
+   goto err_unlock;
+
+   p->queues_paused = true;
+
args->pid = task_pid_nr_ns(p->lead_thread,
task_active_pid_ns(p->lead_thread));
 
@@ -2441,6 +2472,10 @@ static int criu_process_info(struct file *filep,
 
dev_dbg(kfd_device, "Num of bos:%u\n", args->num_bos);
 err_unlock:
+   if (ret) {
+   kfd_process_restore_queues(p);
+   p->queues_paused = false;
+   }
mutex_unlock(>mutex);
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cd72541a8f4f..f3a9f3de34e4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -875,6 +875,9 @@ struct kfd_process {
struct svm_range_list svms;
 
bool xnack_enabled;
+
+   /* Queues are in paused stated because we are in the process of doing a 
CRIU checkpoint */
+   bool queues_paused;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d2fcdc5e581f..e20fbb7ba9bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1364,6 +1364,7 @@ static struct kfd_process *create_process(const struct 
task_struct *thread)
process->mm = thread->mm;
process->lead_thread = thread->group_leader;
process->n_pdds = 0;
+   process->queues_paused = false;
INIT_DELAYED_WORK(>eviction_work, evict_process_worker);
INIT_DELAYED_WORK(>restore_work, restore_process_worker);
process->last_restore_timestamp = get_jiffies_64();
-- 
2.17.1

[Patch v4 01/24] x86/configs: CRIU update debug rock defconfig

2021-12-22 Thread Rajneesh Bhardwaj

 - Update debug config for Checkpoint-Restore (CR) support
 - Also include necessary options for CR with docker containers.

Signed-off-by: Rajneesh Bhardwaj 
---
 arch/x86/configs/rock-dbg_defconfig | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/arch/x86/configs/rock-dbg_defconfig 
b/arch/x86/configs/rock-dbg_defconfig
index 4877da183599..bc2a34666c1d 100644
--- a/arch/x86/configs/rock-dbg_defconfig
+++ b/arch/x86/configs/rock-dbg_defconfig
@@ -249,6 +249,7 @@ CONFIG_KALLSYMS_ALL=y
 CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
 CONFIG_KALLSYMS_BASE_RELATIVE=y
 # CONFIG_USERFAULTFD is not set
+CONFIG_USERFAULTFD=y
 CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
 CONFIG_KCMP=y
 CONFIG_RSEQ=y
@@ -1015,6 +1016,11 @@ CONFIG_PACKET_DIAG=y
 CONFIG_UNIX=y
 CONFIG_UNIX_SCM=y
 CONFIG_UNIX_DIAG=y
+CONFIG_SMC_DIAG=y
+CONFIG_XDP_SOCKETS_DIAG=y
+CONFIG_INET_MPTCP_DIAG=y
+CONFIG_TIPC_DIAG=y
+CONFIG_VSOCKETS_DIAG=y
 # CONFIG_TLS is not set
 CONFIG_XFRM=y
 CONFIG_XFRM_ALGO=y
@@ -1052,15 +1058,17 @@ CONFIG_SYN_COOKIES=y
 # CONFIG_NET_IPVTI is not set
 # CONFIG_NET_FOU is not set
 # CONFIG_NET_FOU_IP_TUNNELS is not set
-# CONFIG_INET_AH is not set
-# CONFIG_INET_ESP is not set
-# CONFIG_INET_IPCOMP is not set
-CONFIG_INET_TUNNEL=y
-CONFIG_INET_DIAG=y
-CONFIG_INET_TCP_DIAG=y
-# CONFIG_INET_UDP_DIAG is not set
-# CONFIG_INET_RAW_DIAG is not set
-# CONFIG_INET_DIAG_DESTROY is not set
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_ESP_OFFLOAD=m
+CONFIG_INET_TUNNEL=m
+CONFIG_INET_XFRM_TUNNEL=m
+CONFIG_INET_DIAG=m
+CONFIG_INET_TCP_DIAG=m
+CONFIG_INET_UDP_DIAG=m
+CONFIG_INET_RAW_DIAG=m
+CONFIG_INET_DIAG_DESTROY=y
 CONFIG_TCP_CONG_ADVANCED=y
 # CONFIG_TCP_CONG_BIC is not set
 CONFIG_TCP_CONG_CUBIC=y
@@ -1085,12 +1093,14 @@ CONFIG_TCP_MD5SIG=y
 CONFIG_IPV6=y
 # CONFIG_IPV6_ROUTER_PREF is not set
 # CONFIG_IPV6_OPTIMISTIC_DAD is not set
-CONFIG_INET6_AH=y
-CONFIG_INET6_ESP=y
-# CONFIG_INET6_ESP_OFFLOAD is not set
-# CONFIG_INET6_ESPINTCP is not set
-# CONFIG_INET6_IPCOMP is not set
-# CONFIG_IPV6_MIP6 is not set
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_ESP_OFFLOAD=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_INET6_XFRM_TUNNEL=m
+CONFIG_INET_DCCP_DIAG=m
+CONFIG_INET_SCTP_DIAG=m
 # CONFIG_IPV6_ILA is not set
 # CONFIG_IPV6_VTI is not set
 CONFIG_IPV6_SIT=y
@@ -1146,8 +1156,13 @@ CONFIG_NF_CT_PROTO_UDPLITE=y
 # CONFIG_NF_CONNTRACK_SANE is not set
 # CONFIG_NF_CONNTRACK_SIP is not set
 # CONFIG_NF_CONNTRACK_TFTP is not set
-# CONFIG_NF_CT_NETLINK is not set
-# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
+CONFIG_COMPAT_NETLINK_MESSAGES=y
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NF_CT_NETLINK_TIMEOUT=m
+CONFIG_NF_CT_NETLINK_HELPER=m
+CONFIG_NETFILTER_NETLINK_GLUE_CT=y
+CONFIG_SCSI_NETLINK=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
 CONFIG_NF_NAT=m
 CONFIG_NF_NAT_REDIRECT=y
 CONFIG_NF_NAT_MASQUERADE=y
@@ -1992,7 +2007,7 @@ CONFIG_NETCONSOLE_DYNAMIC=y
 CONFIG_NETPOLL=y
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_RIONET is not set
-# CONFIG_TUN is not set
+CONFIG_TUN=y
 # CONFIG_TUN_VNET_CROSS_LE is not set
 CONFIG_VETH=y
 # CONFIG_NLMON is not set
@@ -3990,7 +4005,7 @@ CONFIG_MANDATORY_FILE_LOCKING=y
 CONFIG_FSNOTIFY=y
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY_USER=y
-# CONFIG_FANOTIFY is not set
+CONFIG_FANOTIFY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
-- 
2.17.1

[Patch v4 03/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-12-22 Thread Rajneesh Bhardwaj

Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.

This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD doesn't allow any arbitrary ioctl call
unless it is called by the group leader process. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.

(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling 
Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 94 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 65 +++-
 include/uapi/linux/kfd_ioctl.h   | 79 +++-
 3 files changed, 235 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4bfc0c8ab764..1b863bd84c96 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "kfd_priv.h"
@@ -1856,6 +1857,75 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int criu_checkpoint(struct file *filep,
+  struct kfd_process *p,
+  struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_restore(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_unpause(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_resume(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_process_info(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu(struct file *filep, struct kfd_process *p, void 
*data)
+{
+   struct kfd_ioctl_criu_args *args = data;
+   int ret;
+
+   dev_dbg(kfd_device, "CRIU operation: %d\n", args->op);
+   switch (args->op) {
+   case KFD_CRIU_OP_PROCESS_INFO:
+   ret = criu_process_info(filep, p, args);
+   break;
+   case KFD_CRIU_OP_CHECKPOINT:
+   ret = criu_checkpoint(filep, p, args);
+   break;
+   case KFD_CRIU_OP_UNPAUSE:
+   ret = criu_unpause(filep, p, args);
+   break;
+   case KFD_CRIU_OP_RESTORE:
+   ret = criu_restore(filep, p, args);
+   break;
+   case KFD_CRIU_OP_RESUME:
+   ret = criu_resume(filep, p, args);
+   break;
+   default:
+   dev_dbg(kfd_device, "Unsupported CRIU operation:%d\n", 
args->op);
+   ret = -EINVAL;
+   break;
+   }
+
+   if (ret)
+   dev_dbg(kfd_device, "CRIU operation:%d err:%d\n", args->op, 
ret);
+
+   return ret;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -1959,6 +2029,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
kfd_ioctl_set_xnack_mode, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_OP,
+   kfd_ioctl_criu, KFD_IOC_FLAG_CHECKPOINT_RESTORE),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -1973,6 +2046,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
char *kdata = NULL;
unsigned int usize, asize;
int retcode = -EINVAL;
+   bool ptrace_attached = false;
 
if (nr >= AMDKFD_CORE_IOCTL_COUNT)
goto err_i1;
@@ -1998,7 +2072,15 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
 * processes need to create their own KFD device context.
 */
process

[Patch v4 00/24] CHECKPOINT RESTORE WITH ROCm

2021-12-22 Thread Rajneesh Bhardwaj

CRIU is a user space tool which is very popular for container live
migration in datacentres. It can checkpoint a running application, save
its complete state, memory contents and all system resources to images
on disk which can be migrated to another m achine and restored later.
More information on CRIU can be found at https://criu.org/Main_Page

CRIU currently does not support Checkpoint / Restore with applications
that have devices files open so it cannot perform checkpoint and restore
on GPU devices which are very complex and have their own VRAM managed
privately. CRIU, however can support external devices by using a plugin
architecture. We feel that we are getting close to finalizing our IOCTL
APIs which were again changed since V3 for an improved modular design.

Our changes to CRIU user space  are can be obtained from here:
https://github.com/RadeonOpenCompute/criu/tree/amdgpu_rfc-211222

We have tested the following scenarios:
 - Checkpoint / Restore of a Pytorch (BERT) workload
 - kfdtests with queues and events
 - Gfx9 and Gfx10 based multi GPU test systems 
 - On baremetal and inside a docker container
 - Restoring on a different system

V1: Initial
V2: Addressed review comments
V3: Rebased on latest amd-staging-drm-next (5.15 based)
v4: New API design and basic support for SVM, however there is an
outstanding issue with SVM restore which is currently under debug and
hopefully that won't impact the ioctl APIs as SVMs are treated as
private data hidden from user space like queues and events with the new
approch.


David Yat Sin (9):
  drm/amdkfd: CRIU Implement KFD unpause operation
  drm/amdkfd: CRIU add queues support
  drm/amdkfd: CRIU restore queue ids
  drm/amdkfd: CRIU restore sdma id for queues
  drm/amdkfd: CRIU restore queue doorbell id
  drm/amdkfd: CRIU checkpoint and restore queue mqds
  drm/amdkfd: CRIU checkpoint and restore queue control stack
  drm/amdkfd: CRIU checkpoint and restore events
  drm/amdkfd: CRIU implement gpu_id remapping

Rajneesh Bhardwaj (15):
  x86/configs: CRIU update debug rock defconfig
  x86/configs: Add rock-rel_defconfig for amd-feature-criu branch
  drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  drm/amdkfd: CRIU Implement KFD process_info ioctl
  drm/amdkfd: CRIU Implement KFD checkpoint ioctl
  drm/amdkfd: CRIU Implement KFD restore ioctl
  drm/amdkfd: CRIU Implement KFD resume ioctl
  drm/amdkfd: CRIU export BOs as prime dmabuf objects
  drm/amdkfd: CRIU checkpoint and restore xnack mode
  drm/amdkfd: CRIU allow external mm for svm ranges
  drm/amdkfd: use user_gpu_id for svm ranges
  drm/amdkfd: CRIU Discover svm ranges
  drm/amdkfd: CRIU Save Shared Virtual Memory ranges
  drm/amdkfd: CRIU prepare for svm resume
  drm/amdkfd: CRIU resume shared virtual memory ranges

 arch/x86/configs/rock-dbg_defconfig   |   53 +-
 arch/x86/configs/rock-rel_defconfig   | 4927 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   51 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   20 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 1453 -
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  185 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  313 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   14 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |   72 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |   74 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |   89 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |   81 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  166 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   86 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  377 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  326 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h  |   39 +
 include/uapi/linux/kfd_ioctl.h|   79 +-
 22 files changed, 8099 insertions(+), 334 deletions(-)
 create mode 100644 arch/x86/configs/rock-rel_defconfig

-- 
2.17.1

[Patch v2] drm/amdgpu: Don't inherit GEM object VMAs in child process

2021-12-10 Thread Rajneesh Bhardwaj

When an application having open file access to a node forks, its shared
mappings also get reflected in the address space of child process even
though it cannot access them with the object permissions applied. With the
existing permission checks on the gem objects, it might be reasonable to
also create the VMAs with VM_DONTCOPY flag so a user space application
doesn't need to explicitly call the madvise(addr, len, MADV_DONTFORK)
system call to prevent the pages in the mapped range to appear in the
address space of the child process. It also prevents the memory leaks
due to additional reference counts on the mapped BOs in the child
process that prevented freeing the memory in the parent for which we had
worked around earlier in the user space inside the thunk library.

Additionally, we faced this issue when using CRIU to checkpoint restore
an application that had such inherited mappings in the child which
confuse CRIU when it mmaps on restore. Having this flag set for the
render node VMAs helps. VMAs mapped via KFD already take care of this so
this is needed only for the render nodes.

To limit the impact of the change to user space consumers such as OpenGL
etc, limit it to KFD BOs only.

Cc: Felix Kuehling 

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---

Changes in v2:
 * Addressed Christian's concerns for user space impact
 * Further reduced the scope to KFD BOs only

 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a224b5295edd..64a7931eda8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -263,6 +263,9 @@ static int amdgpu_gem_object_mmap(struct drm_gem_object 
*obj, struct vm_area_str
!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
vma->vm_flags &= ~VM_MAYWRITE;
 
+   if (bo->kfd_bo)
+   vma->vm_flags |= VM_DONTCOPY;
+
return drm_gem_ttm_mmap(obj, vma);
 }
 
-- 
2.17.1

[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-08 Thread Rajneesh Bhardwaj

When an application having open file access to a node forks, its shared
mappings also get reflected in the address space of child process even
though it cannot access them with the object permissions applied. With the
existing permission checks on the gem objects, it might be reasonable to
also create the VMAs with VM_DONTCOPY flag so a user space application
doesn't need to explicitly call the madvise(addr, len, MADV_DONTFORK)
system call to prevent the pages in the mapped range to appear in the
address space of the child process. It also prevents the memory leaks
due to additional reference counts on the mapped BOs in the child
process that prevented freeing the memory in the parent for which we had
worked around earlier in the user space inside the thunk library.

Additionally, we faced this issue when using CRIU to checkpoint restore
an application that had such inherited mappings in the child which
confuse CRIU when it mmaps on restore. Having this flag set for the
render node VMAs helps. VMAs mapped via KFD already take care of this so
this is needed only for the render nodes.

Cc: Felix Kuehling 

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/drm_gem.c   | 3 ++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 09c820045859..d9c4149f36dd 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1058,7 +1058,8 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned 
long obj_size,
goto err_drm_gem_object_put;
}
 
-   vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | 
VM_DONTDUMP;
+   vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND
+   | VM_DONTDUMP | VM_DONTCOPY;
vma->vm_page_prot = 
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
}
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 33680c94127c..420a4898fdd2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -566,7 +566,7 @@ int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct 
ttm_buffer_object *bo)
 
vma->vm_private_data = bo;
 
-   vma->vm_flags |= VM_PFNMAP;
+   vma->vm_flags |= VM_PFNMAP | VM_DONTCOPY;
vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
return 0;
 }
-- 
2.17.1

61 matches

Mail list logo