[PATCH] drm/amdkfd: Track GPU memory utilization per process

2020-04-19 Thread Mukul Joshi
Track GPU memory usage on a per process basis and report it through sysfs. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 ++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 7 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 51 ++-- 3 files

[PATCH v2] drm/amdkfd: Track GPU memory utilization per process

2020-04-22 Thread Mukul Joshi
for VRAM usage calculations. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7

[PATCH v4] drm/amdkfd: Track GPU memory utilization per process

2020-04-27 Thread Mukul Joshi
for VRAM usage calculations. v3: - Move handling of imported BO size from kfd_ioctl_free_memory_of_gpu() to amdgpu_amdkfd_gpuvm_free_memory_of_gpu(). v4: - Add READ_ONCE() and WRITE_ONCE() around reading and writing vram_usage count. Signed-off-by: Mukul Joshi

[PATCH v3] drm/amdkfd: Track GPU memory utilization per process

2020-04-26 Thread Mukul Joshi
for VRAM usage calculations. v3: - Move handling of imported BO size from kfd_ioctl_free_memory_of_gpu() to amdgpu_amdkfd_gpuvm_free_memory_of_gpu(). Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 +- .../gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: Track SDMA utilization per process

2020-05-14 Thread Mukul Joshi
Track SDMA usage on a per process basis and report it through sysfs. The value in the sysfs file indicates the amount of time SDMA has been in-use by this process since the creation of the process. This value is in microsecond granularity. Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd

[PATCH v2] drm/amdkfd: Track SDMA utilization per process

2020-05-22 Thread Mukul Joshi
is kfd_procfs_show(). - Make counter part of the kfd_sdma_activity_handler_workarea structure. Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 57 .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 16 ++- drivers/gpu

[PATCH] drm/amdkfd: Move process doorbell allocation into kfd device

2020-09-01 Thread Mukul Joshi
to manage. In a system with mix of such devices, KFD would need to request process doorbell space based on the type of device, either from amdgpu or from its own doorbell space. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 +-- drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdgpu: Enable SDMA utilization for Arcturus

2020-09-11 Thread Mukul Joshi
SDMA utilization calculations are enabled/disabled by writing to SDMAx_PUB_DUMMY_REG2 register. Currently, enable this only for Arcturus. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd

[PATCH v2] drm/amdgpu: Enable SDMA utilization for Arcturus

2020-09-11 Thread Mukul Joshi
SDMA utilization calculations are enabled/disabled by writing to SDMAx_PUB_DUMMY_REG2 register. Currently, enable this only for Arcturus. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: Add GPU reset SMI event

2020-08-25 Thread Mukul Joshi
Add support for reporting GPU reset events through SMI. KFD would report both pre and post GPU reset events. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 30

[PATCH] drm/amdkfd: sparse: Fix warning in reading SDMA counters

2020-08-17 Thread Mukul Joshi
Add __user annotation to fix related sparse warning while reading SDMA counters from userland. Reported-by: kernel test robot Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers

[PATCH v2] drm/amdkfd: sparse: Fix warning in reading SDMA counters

2020-08-17 Thread Mukul Joshi
Add __user annotation to fix related sparse warning while reading SDMA counters from userland. Reported-by: kernel test robot Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git

[PATCH v3] drm/amdkfd: sparse: Fix warning in reading SDMA counters

2020-08-18 Thread Mukul Joshi
Add __user annotation to fix related sparse warning while reading SDMA counters from userland. Also, rework the read SDMA counters function by removing redundant checks. Reported-by: kernel test robot Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 28

[PATCH v3] drm/amdkfd: Add GPU reset SMI event

2020-08-28 Thread Mukul Joshi
Add support for reporting GPU reset events through SMI. KFD would report both pre and post GPU reset events. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 35

[PATCH] include/uapi/linux: Fix indentation in kfd_smi_event enum

2020-08-28 Thread Mukul Joshi
Replace spaces with Tabs to fix indentation in kfd_smi_event enum. Signed-off-by: Mukul Joshi --- include/uapi/linux/kfd_ioctl.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 8b7368bfbd84

[PATCH v2] drm/amdkfd: Add GPU reset SMI event

2020-08-26 Thread Mukul Joshi
Add support for reporting GPU reset events through SMI. KFD would report both pre and post GPU reset events. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 30

[PATCH] drm/amdkfd: Initialize SDMA activity counter to 0

2020-08-17 Thread Mukul Joshi
To prevent reporting erroneous SDMA usage, initialize SDMA activity counter to 0 before using. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd

[PATCH v3] drm/amdkfd: Track SDMA utilization per process

2020-05-26 Thread Mukul Joshi
is kfd_procfs_show(). - Make counter part of the kfd_sdma_activity_handler_workarea structure. v3: - Remove READ_ONCE/WRITE_ONCE while updating acitivty counter. - Add updation of past acitivt counter under dqm_lock. Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 57

[PATCH] drm/amdkfd: Add thermal throttling SMI event

2020-07-21 Thread Mukul Joshi
Add support for reporting thermal throttling events through SMI. Also, add a counter to count the number of throttling interrupts observed and report the count in the SMI event message. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 4 ++ drivers/gpu/drm/amd

[PATCH] drm/amdkfd: Fix circular locking dependency warning

2020-06-23 Thread Mukul Joshi
) is not called while reading SDMA stats with dqm_lock held as get_user() could cause a page fault which leads to the circular locking scenario. Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 +++--- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +

[PATCH v2] drm/amdkfd: Fix circular locking dependency warning

2020-06-24 Thread Mukul Joshi
) is not called while reading SDMA stats with dqm_lock held as get_user() could cause a page fault which leads to the circular locking scenario. Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 75 + .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +

[PATCH] drm/amdkfd: Replace bitmask with event idx in SMI event msg

2020-07-26 Thread Mukul Joshi
Event bitmask is a 64-bit mask with only 1 bit set. Sending this event bitmask in KFD SMI event message is both wasteful of memory and potentially limiting to only 64 events. Instead send event index in SMI event message. Signed-off-by: Mukul Joshi Suggested-by: Felix Kuehling --- drivers/gpu

[PATCH] drm/amdkfd: Add GPU reset SMI event

2020-07-27 Thread Mukul Joshi
Add support for reporting GPU reset events through SMI. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 18 ++ drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h | 1 + include/uapi/linux/kfd_ioctl.h

[PATCH v2] drm/amdkfd: Add thermal throttling SMI event

2020-07-22 Thread Mukul Joshi
Add support for reporting thermal throttling events through SMI. Also, add a counter to count the number of throttling interrupts observed and report the count in the SMI event message. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 4 ++ drivers/gpu/drm/amd

[PATCH v3] drm/amdkfd: Add thermal throttling SMI event

2020-07-23 Thread Mukul Joshi
Add support for reporting thermal throttling events through SMI. Also, add a counter to count the number of throttling interrupts observed and report the count in the SMI event message. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 4 ++ drivers/gpu/drm/amd

[PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-11 Thread Mukul Joshi
On Aldebaran, GPU driver will handle bad page retirement even though UMC is host managed. As a result, register a bad page retirement handler on the mce notifier chain to retire bad pages on Aldebaran. Signed-off-by: Mukul Joshi Reviewed-by: John Clements Acked-by: Felix Kuehling --- drivers

[PATCH] drm/amdgpu: Query correct register for DF hashing on Aldebaran

2021-05-18 Thread Mukul Joshi
For Aldebaran, driver needs to query DramMegaBaseAddress to check if DF hashing is enabled. Signed-off-by: Mukul Joshi Acked-by: Alex Deucher Reviewed-by: Harish Kasiviswanathan --- drivers/gpu/drm/amd/amdgpu/df_v3_6.c| 9 + drivers/gpu/drm/amd/include/asic_reg/df

[PATCH] drm/amdgpu: Enable TCP channel hashing for Aldebaran

2021-05-06 Thread Mukul Joshi
Enable TCP channel hashing to match DF hash settings for Aldebaran. Signed-off-by: Mukul Joshi Signed-off-by: Oak Zeng Reviewed-by: Joseph Greathouse --- drivers/gpu/drm/amd/amdgpu/df_v3_6.c| 17 +++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++- .../amd

[PATCH] drm/amdgpu: Correctly clear GCEA error status

2021-05-25 Thread Mukul Joshi
While clearing GCEA error status, do not clear the bits set by RAS TA. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Conditionally reset SDMA RAS error counts

2021-06-29 Thread Mukul Joshi
Reset SDMA RAS error counts during init only if persistent EDC harvesting is not supported. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm

[PATCH] drm/amdkfd: CWSR with sw scheduler on Aldebaran and Arcturus

2021-08-20 Thread Mukul Joshi
Program trap handler settings to enable CWSR with software scheduler on Aldebaran and Arcturus. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 3 ++- drivers/gpu/drm/amd/amdgpu

[PATCHv2 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-12 Thread Mukul Joshi
def CONFIG_X86_MCE_AMD. - Use MCE_PRIORITY_UC instead of MCE_PRIO_ACCEL as we are only handling uncorrectable errors. - Use macros to determine UMC instance and channel instance where the uncorrectable error occured. - Update the headline. Signed-off-by: Mukul Joshi Link: https://lore.kernel.

[PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-12 Thread Mukul Joshi
off-by: Mukul Joshi --- arch/x86/include/asm/mce.h| 2 +- arch/x86/kernel/cpu/mce/amd.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index fc3d36f1f9d0..d90d3ccb583a 100644 --- a/arch/x86/include/asm/mce.h +++ b/a

[PATCH] drm/amdkfd: CWSR with software scheduler

2021-08-09 Thread Mukul Joshi
This patch adds support to program trap handler settings when loading driver with software scheduler (sched_policy=2). Signed-off-by: Mukul Joshi Suggested-by: Jay Cornwall --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 31 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c

[PATCH] drm/amdgpu: Fix channel_index table layout for Aldebaran

2021-07-29 Thread Mukul Joshi
Fix the channel_index table layout to fetch the correct channel_index when calculating physical address from normalized address during page retirement. Also, fix the number of UMC instances and number of channels within each UMC instance for Aldebaran. Signed-off-by: Mukul Joshi --- drivers/gpu

[PATCH 1/2] drm/amdgpu: Enable RAS error injection after mode2 reset on Aldebaran

2021-10-11 Thread Mukul Joshi
Add the missing call to re-enable RAS error injections on the Aldebaran mode2 reset code path. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: Fix RAS page retirement with mode2 reset on Aldebaran

2021-10-11 Thread Mukul Joshi
occurred on a GPU that supports MCE notifier based page retirement. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu

[PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-22 Thread Mukul Joshi
On Aldebaran, GPU driver will handle bad page retirement even though UMC is host managed. As a result, register a bad page retirement handler on the mce notifier chain to retire bad pages on Aldebaran. Signed-off-by: Mukul Joshi --- v1->v2: - Use smca_get_bank_type() to determine MCA b

[PATCHv4 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Mukul Joshi
On Aldebaran, GPU driver will handle bad page retirement for GPU memory even though UMC is host managed. As a result, register a bad page retirement handler on the mce notifier chain to retire bad pages on Aldebaran. Signed-off-by: Mukul Joshi --- v1->v2: - Use smca_get_bank_type() to determ

[PATCHv2 3/3] drm/amdkfd: Consolidate MQD manager functions

2022-02-07 Thread Mukul Joshi
A few MQD manager functions are duplicated for all versions of MQD manager. Remove this duplication by moving the common functions into kfd_mqd_manager.c file. Signed-off-by: Mukul Joshi --- v1->v2: - Add "kfd_" prefix to functions moved to kfd_mqd_manager.c. - Also, suffix &quo

[PATCHv2 2/3] drm/amdkfd: Remove unused old debugger implementation

2022-02-07 Thread Mukul Joshi
Cleanup the kfd code by removing the unused old debugger implementation. Only a small piece of resetting wavefronts is kept and is moved to kfd_device_queue_manager.c Signed-off-by: Mukul Joshi --- v1->v2: - Rename AMDKFD_IOC_DBG_* to AMDKFD_IOC_DBG_*_DEPRECATED. - Cleanup address_watch_disa

[PATCHv2 1/3] drm/amdkfd: Fix TLB flushing in KFD SVM with no HWS

2022-02-07 Thread Mukul Joshi
With no HWS, TLB flushing will not work in SVM code. Fix this by calling kfd_flush_tlb() which works for both HWS and no HWS case. Signed-off-by: Mukul Joshi Reviewed-by: Philip Yang --- v1->v2: - Don't pass adev to svm_range_map_to_gpu(). drivers/gpu/drm/amd/amdkfd/kfd_svm.c |

[PATCH 1/3] drm/amdkfd: Fix TLB flushing in KFD SVM with no HWS

2022-02-04 Thread Mukul Joshi
With no HWS, TLB flushing will not work in SVM code. Fix this by calling kfd_flush_tlb() which works for both HWS and no HWS case. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers

[PATCH 3/3] drm/amdkfd: Consolidate MQD manager functions

2022-02-04 Thread Mukul Joshi
A few MQD manager functions are duplicated for all versions of MQD manager. Remove this duplication by moving the common functions into kfd_mqd_manager.c file. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 63 + drivers/gpu/drm/amd/amdkfd

[PATCH 2/3] drm/amdkfd: Remove unused old debugger implementation

2022-02-04 Thread Mukul Joshi
Cleanup the kfd code by removing the unused old debugger implementation. Only a small piece of resetting wavefronts is kept and is moved to kfd_device_queue_manager.c Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/Makefile | 2 - drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

[PATCH] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-07 Thread Mukul Joshi
generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79

[PATCHv2] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Mukul Joshi
generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu Signed-off-by: Mukul Joshi Reviewed-by: Shuotao Xu --- v1->v2: - Remove comments from inside kfd_topology_update_io_links() and add them as kernel-doc comments. drivers/gpu/drm/amd/amd

[PATCH] drm/amdkfd: Fix unaligned 64-bit doorbell warning

2023-08-29 Thread Mukul Joshi
[amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index

[PATCHv2] drm/amdkfd: Fix unaligned 64-bit doorbell warning

2023-08-30 Thread Mukul Joshi
[amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: cfeaeb3c0ce7 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi --- v1->v2: - Update the logic to make it work with both 32 bit 64 bit doorbells. - Add the Fixed tag. drivers/gpu/d

[PATCHv3] drm/amdkfd: Fix unaligned 64-bit doorbell warning

2023-09-06 Thread Mukul Joshi
] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: cfeaeb3c0ce7 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi --- v1->v2: - Update the logic to make it work with both 32 bit 64 bit doorbells. - Add the Fixed tag v2->v3: - Revert to the original

[PATCHv2 2/4] drm/amdkfd: Update cache info reporting for GFX v9.4.3

2023-09-06 Thread Mukul Joshi
Update cache info reporting in sysfs to report the correct number of CUs and associated cache information based on different spatial partitioning modes. Signed-off-by: Mukul Joshi --- v1->v2: - Revert the change in kfd_crat.c - Add a comment to not change value of CRAT_SIBLINGMAP_SIZE. driv

[PATCHv2 3/4] drm/amdkfd: Update CU masking for GFX 9.4.3

2023-09-06 Thread Mukul Joshi
The CU mask passed from user-space will change based on different spatial partitioning mode. As a result, update CU masking code for GFX9.4.3 to work for all partitioning modes. Signed-off-by: Mukul Joshi --- v1->v2: - Incorporate Felix's review comments. drivers/gpu/drm/amd/amd

[PATCHv2 1/4] drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3

2023-09-06 Thread Mukul Joshi
Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi --- v1->v2: - Incorporate Felix's review comments. drivers/

[PATCH 4/4] drm/amdgpu: Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES

2023-09-06 Thread Mukul Joshi
Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES to conform with the naming convention followed in amdgpu_gfx.h. No functional change. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 4 ++-- drivers

[PATCH] drm/amdkfd: Update cache reporting for GFX 9.4.3

2023-10-26 Thread Mukul Joshi
GFX 9.4.3 uses a new version of the GC info table in IP discovery. This patch adds a new function to parse and fill the cache information based on the new table. Also, update cache reporting based on compute and memory partitioning modes. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd

[PATCHv2 1/2] drm/amdkfd: Populate cache info for GFX 9.4.3

2023-10-27 Thread Mukul Joshi
GFX 9.4.3 uses a new version of the GC info table which contains the cache info. This patch adds a new function to populate the cache info from IP discovery for GFX 9.4.3. Signed-off-by: Mukul Joshi --- v1->v2: - Separate out the original patch into 2 patches. drivers/gpu/drm/amd/amd

[PATCHv2 2/2] drm/amdkfd: Update cache info for GFX 9.4.3

2023-10-27 Thread Mukul Joshi
Update cache info reporting based on compute and memory partitioning modes. Signed-off-by: Mukul Joshi --- v1->v2: - Separate into a separate patch. - Simplify the if condition to reduce indentation and make it logically more clear. drivers/gpu/drm/amd/amdkfd/kfd_topology.c |

[PATCH] drm/amdgpu: Fix typo in IP discovery parsing

2023-10-26 Thread Mukul Joshi
Fix a typo in parsing of the GC info table header when reading the IP discovery table. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu

[PATCHv2] drm/amdgpu: Fix typo in IP discovery parsing

2023-10-26 Thread Mukul Joshi
Fix a typo in parsing of the GC info table header when reading the IP discovery table. Fixes: ecb70926eb86 ("drm/amdgpu: add type conversion for gc info") Signed-off-by: Mukul Joshi --- v1->v2: - Add the Fixes tag. drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 +- 1 fi

[PATCH] drm/amdkfd: Fix reg offset for setting CWSR grace period

2023-08-29 Thread Mukul Joshi
parameter. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 3

[PATCH 1/2] drm/amdkfd: Fix updating IO links during device removal

2022-04-22 Thread Mukul Joshi
: 9be62cbcc62f ("drm/amdkfd: Cleanup IO links during KFD device removal") Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/

[PATCH 2/2] drm/amdkfd: Fix circular lock dependency warning

2022-04-22 Thread Mukul Joshi
led during device init. This cached value can then be used instead of querying the value again and again. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++

[PATCH] drm/amdgpu: Fix page table setup on Arcturus

2022-08-22 Thread Mukul Joshi
When translate_further is enabled, page table depth needs to be updated. This was missing on Arcturus MMHUB init. This was causing address translations to fail for SDMA user-mode queues. Fixes: 2abf2573b1c69 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach" Signed-off-by: M

[PATCH] drm/amdgpu: Update PTE flags with TF enabled

2022-09-13 Thread Mukul Joshi
to translate a retry fault into a no-retry fault, doesn't work with TF enabled. As a result, update invalid PTE flags settings which works for both TF enabled and disabled case. Fixes: 2abf2573b1c69 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach") Signed-off-by: M

[PATCHv2] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-15 Thread Mukul Joshi
There are no backing hardware registers for ih_soft ring. As a result, don't try to access hardware registers for read and write pointers when processing interrupts on the IH soft ring. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 7 ++- drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-12 Thread Mukul Joshi
There are no backing hardware registers for ih_soft ring. As a result, don't try to access hardware registers for read and write pointers when processing interrupts on the IH soft ring. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 7 ++- 1 file changed, 6

[PATCH 1/2] drm/amdgpu: Enable IH retry CAM on GFX9

2022-12-12 Thread Mukul Joshi
This patch enables the IH retry CAM on GFX9 series cards. This retry filter is used to prevent sending lots of retry interrupts in a short span of time and overflowing the IH ring buffer. This will also help reduce CPU interrupt workload. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd

[PATCH 2/2] drm/amdgpu: Rework retry fault removal

2022-12-12 Thread Mukul Joshi
in the sw filter. This helps in avoiding stale faults being added back into the filter and preventing legitimate faults from being handled. Suggested-by: Felix Kuehling Signed-off-by: Mukul Joshi Reviewed-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 36

[PATCH] drm/amdkfd: Fix kernel warning during topology setup

2022-12-20 Thread Mukul Joshi
disabled at (59649): [] irq_exit_rcu+0xd7/0x130 [ +0.004203] ---[ end trace ]--- Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 +- 1 file changed, 1

[PATCHv2] drm/amdgpu: Enable IH retry CAM on GFX9

2023-01-19 Thread Mukul Joshi
This patch enables the IH retry CAM on GFX9 series cards. This retry filter is used to prevent sending lots of retry interrupts in a short span of time and overflowing the IH ring buffer. This will also help reduce CPU interrupt workload. Signed-off-by: Mukul Joshi --- v1: - Reviewed by Felix

[PATCH 1/3] drm/ttm: Helper function to get TTM mem limit

2023-04-25 Thread Mukul Joshi
Add a helper function to get TTM memory limit. This is needed by KFD to set its own internal memory limits. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/ttm/ttm_tt.c | 6 ++ include/drm/ttm/ttm_tt.h | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit

2023-04-25 Thread Mukul Joshi
Use the helper function in TTM to get TTM mem limit and set GTT size to be equal to TTL mem limit. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 25 ++--- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 3/3] drm/amdkfd: Update KFD TTM mem limit

2023-04-25 Thread Mukul Joshi
Use the helper function in TTM to get TTM memory limit and set KFD's internal mem limit. This ensures that KFD's TTM mem limit and actual TTM mem limit are exactly same. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 ++- drivers

[PATCHv2 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit

2023-04-26 Thread Mukul Joshi
Use the helper function in TTM to get TTM mem limit and set GTT size to be equal to TTL mem limit. Signed-off-by: Mukul Joshi Reviewed-by: Christian König --- v1->v2: - Remove AMDGPU_DEFAULT_GTT_SIZE_MB as well as it is unused. drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - drivers/gpu/

[PATCH] drm/amdgpu: Update invalid PTE flag setting

2023-04-04 Thread Mukul Joshi
Update the invalid PTE flag setting to ensure, in addition to transitioning the retry fault to a no-retry fault, it also causes the wavefront to enter the trap handler. With the current setting, it only transitions to a no-retry fault. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: Remove DUMMY_VRAM_SIZE

2023-06-12 Thread Mukul Joshi
Remove DUMMY_VRAM_SIZE as it is not needed and can result in reporting incorrect memory size. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: Update CWSR grace period for GFX9.4.3

2023-07-10 Thread Mukul Joshi
For GFX9.4.3, setup a reduced default CWSR grace period equal to 1000 cycles instead of 64000 cycles. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 ++- 2

[PATCH] drm/amdkfd: Fix reserved SDMA queues handling

2023-06-07 Thread Mukul Joshi
9.4.3") Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 13 ++--- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 3 files changed, 12 insertions(+), 13 deletions(-)

[PATCH] drm/amdgpu: Raname DRM schedulers in amdgpu TTM

2023-06-07 Thread Mukul Joshi
Rename mman.entity to mman.high_pr to make the distinction clearer that this is a high priority scheduler. Similarly, rename the recently added mman.delayed to mman.low_pr to make it clear it is a low priority scheduler. No functional change in this patch. Signed-off-by: Mukul Joshi --- drivers

[PATCH] drm/amdkfd: Set event interrupt class for GFX 9.4.3

2023-05-23 Thread Mukul Joshi
Fix the warning during driver load because the event interrupt class is not set for GFX9.4.3. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd

[PATCHv2] drm/amdgpu: Update invalid PTE flag setting

2023-06-12 Thread Mukul Joshi
of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi --- v1->v2: - Update handling according to Christian's feedback. drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 7 +++ drivers/gpu/drm/

[PATCH 1/2] drm/amdkfd: Update interrupt handling for GFX 9.4.3

2023-06-22 Thread Mukul Joshi
to the process drain interrupt. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 43 ++- .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 29 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + drivers/gpu/drm/amd

[PATCH 2/2] drm/amdgpu: Correctly setup TMR region size for GFX9.4.3

2023-06-22 Thread Mukul Joshi
A faulty check was causing TMR region size to be setup incorrectly for GFX9.4.3. Remove the check and setup TMR region size as 280MB for GFX9.4.3. Fixes: b6780d70db5e ("drm/amdgpu: bypass bios dependent operations") Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu

[PATCHv2] drm/amdkfd: Use KIQ to unmap HIQ

2023-06-29 Thread Mukul Joshi
Currently, we unmap HIQ by directly writing to HQD registers. This doesn't work for GFX9.4.3. Instead, use KIQ to unmap HIQ, similar to how we use KIQ to map HIQ. Using KIQ to unmap HIQ works for all GFX series post GFXv9. Signed-off-by: Mukul Joshi --- v1->v2: - Use kiq_unmap_queues funct

[PATCHv4] drm/amdgpu: Update invalid PTE flag setting

2023-06-19 Thread Mukul Joshi
of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi --- v1->v2: - Update handling according to Christian's feedback. v2->v3: - Remove ASIC specific callback (Felix). v3->v4: - Add nor

[PATCHv3] drm/amdgpu: Update invalid PTE flag setting

2023-06-13 Thread Mukul Joshi
of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi --- v1->v2: - Update handling according to Christian's feedback. v2->v3: - Remove ASIC specific callback (Felix). drivers/gpu/drm/amd/

[PATCH] drm/amdkfd: Enable GWS on GFX9.4.3

2023-06-16 Thread Mukul Joshi
Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 + .../amd/amdkfd/kfd_process_queue_manager.c| 31 --- 2 files changed, 20 insertions(+), 12 deletions(-) diff

[PATCH] drm/amdkfd: Use KIQ to unmap HIQ

2023-06-16 Thread Mukul Joshi
Currently, we unmap HIQ by directly writing to HQD registers. This doesn't work for GFX9.4.3. Instead, use KIQ to unmap HIQ, similar to how we use KIQ to map HIQ. Using KIQ to unmap HIQ works for all GFX series post GFXv9. Signed-off-by: Mukul Joshi --- .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3

[PATCHv2] drm/amdkfd: Enable GWS on GFX9.4.3

2023-06-16 Thread Mukul Joshi
Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3. Signed-off-by: Mukul Joshi --- v1->v2: - Update the condition for setting pqn->q->gws for GFX 9.4.3. drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 + .../amd/amdkfd/kfd_process_queue_manager.

[PATCH] drm/amdgpu: Add a low priority scheduler for VRAM clearing

2023-05-17 Thread Mukul Joshi
Add a low priority DRM scheduler for VRAM clearing instead of using the exisiting high priority scheduler. Use the high priority scheduler for migrations and evictions. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

[PATCH] drm/amdgpu: Fix module unload hang with RAS enabled

2024-01-23 Thread Mukul Joshi
mdgpu: Prepare for asynchronous processing of umc page retirement") Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index a3

[PATCH] drm/amdkfd: Use common function for IP version check

2023-11-22 Thread Mukul Joshi
KFD_GC_VERSION was recently updated to use a new function for IP version checks. As a result, use KFD_GC_VERSION as the common function for all IP version checks in KFD. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion

[PATCH] drm/amdkfd: Use correct drm device for cgroup permission check

2024-01-26 Thread Mukul Joshi
On GFX 9.4.3, for a given KFD node, fetch the correct drm device from XCP manager when checking for cgroup permissions. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdgpu: Handle duplicate BOs during process restore

2024-03-08 Thread Mukul Joshi
In certain situations, some apps can import a BO multiple times (through IPC for example). To restore such processes successfully, we need to tell drm to ignore duplicate BOs. While at it, also add additional logging to prevent silent failures when process restore fails. Signed-off-by: Mukul

[PATCH 2/2] drm/amdkfd: Check preemption status on all XCDs

2024-03-14 Thread Mukul Joshi
to return a bool instead of uint32_t and pass the MQD manager as an argument. Suggested-by: Jay Cornwall Signed-off-by: Mukul Joshi --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +-- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 18 + drivers/gpu/drm/amd/amdkfd

[PATCH 1/2] drm/amdkfd: Rename read_doorbell_id in MQD functions

2024-03-14 Thread Mukul Joshi
Rename read_doorbell_id function to a more meaningful name, implying what it is used for. No functional change. Suggested-by: Jay Cornwall Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 2

[PATCH] drm/amdkfd: Cleanup workqueue during module unload

2024-03-20 Thread Mukul Joshi
Destroy the high priority workqueue that handles interrupts during KFD node cleanup. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: Check cgroup when returning DMABuf info

2024-03-15 Thread Mukul Joshi
Check cgroup permissions when returning DMA-buf info and based on cgroup check return the id of the GPU that has access to the BO. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH] drm/amdgpu: Fix leak when GPU memory allocation fails

2024-04-18 Thread Mukul Joshi
Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Fix VRAM memory accounting

2024-04-23 Thread Mukul Joshi
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

  1   2   >