Re: [PATCH] drm/amdgpu: support gpu recovery tests on compute rings

2019-04-28 Thread Deucher, Alexander
maybe just: amdgpu.lockup_timeout= I don't think we really need separate timeouts for all the different video related engines. Alex From: Quan, Evan Sent: Sunday, April 28, 2019 1:37 AM To: Deucher, Alexander; Michel Dänzer; Koenig, Christian Cc: Xu, Feifei;

[PATCH][next] drm/amd/display: fix incorrect null check on pointer

2019-04-28 Thread Colin King
From: Colin Ian King Currently an allocation is being made but the allocation failure check is being performed on another pointer. Fix this by checking the correct pointer. Also use the normal kernel idiom for null pointer checks. Addresses-Coverity: ("Resource leak") Fixes: 43e3ac8389ef

Bug Report: [PowerPlay] MCLK can't be set above 1107MHz on Vega 64

2019-04-28 Thread Yanik Yiannakis
Hello, I experience a bug that prevents me from setting the MCLK of my Vega 64 LC above 1107MHz. I am using Unigine Superposition 1.1 in "Game"-mode to check the performance by watching the FPS. *Behaviour with a single monitor:* First I set the MCLK to a known stable value below

[PATCH 01/12] dma-buf: add dynamic caching of sg_table

2019-04-28 Thread Liam Mark
On Tue, 16 Apr 2019, Christian K??nig wrote: > To allow a smooth transition from pinning buffer objects to dynamic > invalidation we first start to cache the sg_table for an attachment > unless the driver explicitly says to not do so. > > --- > drivers/dma-buf/dma-buf.c | 24

[PATCH] drm/amdgpu: Unmap CSA under SR-IOV in KFD path

2019-04-28 Thread Trigger Huang
In amdgpu open path, CSA will be mappened in VM, so when opening KFD, calling mdgpu_vm_make_compute will fail because it found this VM is not a clean VM with some mappings, as a result, it will lead to failed to create process VM object The fix is try to unmap CSA, and actually CSA is not needed

[PATCH 20/27] drm/amdkfd: Fix gfx8 MEM_VIOL exception handler

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall When MEM_VIOL is asserted the context save handler rewinds the program counter. This is incorrect for any source of the exception. MEM_VIOL may be raised in normal operation by out-of-bounds access to LDS or GDS and does not require special handling. Remove PC adjustment when

[PATCH 19/27] drm/amdkfd: Fix a circular lock dependency

2019-04-28 Thread Kuehling, Felix
Fix a circular lock dependency exposed under userptr memory pressure. The DQM lock is the only one taken inside the MMU notifier. We need to make sure that no reclaim is done under this lock, and that no other locks are taken under which reclaim is possible. Signed-off-by: Felix Kuehling

[PATCH 13/27] drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue()

2019-04-28 Thread Kuehling, Felix
From: Yong Zhao This avoids duplicated code. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 29 +++ 1 file changed, 11 insertions(+), 18 deletions(-) diff --git

[PATCH 23/27] drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall ttmp[4:5] is initialized by the SPI with SPI_GDBG_TRAP_DATA* values. These values are more useful to the debugger than ttmp[14:15], which carries dispatch_scratch_base*. There are too few registers to preserve both. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling

[PATCH 26/27] drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration

2019-04-28 Thread Kuehling, Felix
From: shaoyunl There is a bug found in vml2 xgmi logic: mtype is always sent as NC on the VMC to TC interface for a page walk, regardless of whether the request is being sent to local or remote GPU. NC means non-coherent and will cause the VMC return data to be cached in the TCC (versus UC –

[PATCH 16/27] drm/amdkfd: Introduce XGMI SDMA queue type

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Existing QUEUE_TYPE_SDMA means PCIe optimized SDMA queues. Introduce a new QUEUE_TYPE_SDMA_XGMI, which is optimized for non-PCIe transfer such as XGMI. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 10/27] drm/amdkfd: Allocate MQD trunk for HIQ and SDMA

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng MEC FW for some new asic requires all SDMA MQDs to be in a continuous trunk of memory right after HIQ MQD. Add a field in device queue manager to hold the HIQ/SDMA MQD memory object and allocate MQD trunk on device queue manager initialization. Signed-off-by: Oak Zeng

[PATCH 21/27] drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall If instruction fetch fails the wave cannot be halted and returned to the shader without raising MEM_VIOL again. Currently the wave is terminated if this occurs, but this loses information about the cause of the fault. The debugger would prefer the faulting wave state to be

[PATCH 12/27] drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Instead of allocat hiq and sdma mqd from sub-allocator, allocate them from a mqd trunk pool. This is done for all asics Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 49 +++

[PATCH 03/27] drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng sdma_queue_id is sdma queue index inside one sdma engine. sdma_id is sdma queue index among all sdma engines. Use those two names properly. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 01/27] drm/amdkfd: Use 64 bit sdma_bitmap

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Maximumly support 64 sdma queues Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- 2 files changed, 6

[PATCH 07/27] drm/amdkfd: Introduce DIQ type mqd manager

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng With introduction of new mqd allocation scheme for HIQ, DIQ and HIQ use different mqd allocation scheme, DIQ can't reuse HIQ mqd manager Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c

[PATCH 05/27] drm/amdkfd: Fix a potential memory leak

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Free mqd_mem_obj it GTT buffer allocation for MQD+control stack fails. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git

[PATCH 02/27] drm/amdkfd: Add sdma allocation debug message

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Add debug messages during SDMA queue allocation. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +++ 1 file changed, 3 insertions(+) diff --git

[PATCH 27/27] drm/amdgpu: Fix GTT size calculation

2019-04-28 Thread Kuehling, Felix
From: Kent Russell GTT size is currently limited to the minimum of VRAM size or 3/4 of system memory. This severely limits the quanitity of system memory that can be used by ROCm application. Increase GTT size to the maximum of VRAM size or system memory size. Signed-off-by: Kent Russell

[PATCH 25/27] drm/amdkfd: Add domain number into gpu_id

2019-04-28 Thread Kuehling, Felix
From: Amber Lin A multi-socket server can have multiple PCIe segments so BFD is not enough to distingush each GPU. Also add domain number into account when generating gpu_id. Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 18/27] drm/amdkfd: Delete alloc_format field from map_queue struct

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Alloc format was never really supported by MEC FW. FW always does one per pipe allocation. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 2 --

[PATCH 11/27] drm/amdkfd: Move non-sdma mqd allocation out of init_mqd

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng This is preparation work to introduce more mqd allocation scheme Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 20 ++-- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 51

[PATCH 00/27] KFD upstreaming

2019-04-28 Thread Kuehling, Felix
Assorted KFD changes that have been accumulating on amd-kfd-staging. New features and fixes included: * Support for VegaM * Support for systems with multiple PCI domains * New SDMA queue type that's optimized for XGMI links * SDMA MQD allocation changes to support future ASICs with more SDMA

[PATCH 04/27] drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng FW of some new ASICs requires sdma mqd size to be not more than 128 dwords. Repurpose the last 2 reserved fields of sdma mqd for driver internal use, so the total mqd size is no bigger than 128 dwords Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix

[PATCH 09/27] drm/amdkfd: Add mqd size in mqd manager struct

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Also initialize mqd size on mqd manager initialization Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 4

[PATCH 24/27] drm/amdkfd: Add VegaM support

2019-04-28 Thread Kuehling, Felix
From: Kent Russell Add the VegaM information to KFD Signed-off-by: Kent Russell Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 +++

[PATCH 06/27] drm/amdkfd: Introduce asic-specific mqd_manager_init function

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Global function mqd_manager_init just calls asic-specific functions and it is not necessary. Delete it and introduce a mqd_manager_init interface in dqm for asic-specific mqd manager init. Call mqd_manager_init interface directly to initialize mqd manager Signed-off-by: Oak Zeng

[PATCH 17/27] drm/amdkfd: Expose sdma engine numbers to topology

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Expose available numbers of both SDMA queue types in the topology. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 7 +++ drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 2 ++ 2 files changed, 9

[PATCH 08/27] drm/amdkfd: Init mqd managers in device queue manager init

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Previously mqd managers was initialized on demand. As there are only a few type of mqd managers, the on demand initialization doesn't save too much memory. Initialize them on device queue initialization instead and delete the get_mqd_manager interface. This makes codes more

[PATCH 22/27] drm/amdkfd: Fix gfx9 XNACK state save/restore

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall SQ_WAVE_IB_STS.RCNT grew from 4 bits to 5 in gfx9. Do not truncate when saving in the high bits of TTMP1. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 12 ++--

[PATCH 15/27] drm/amdkfd: Fix sdma queue map issue

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Previous codes assumes there are two sdma engines. This is not true e.g., Raven only has 1 SDMA engine. Fix the issue by using sdma engine number info in device_info. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 14/27] drm/amdkfd: Fix compute profile switching

2019-04-28 Thread Kuehling, Felix
From: Harish Kasiviswanathan Fix compute profile switching on process termination. Add a dedicated reference counter to keep track of entry/exit to/from compute profile. This enables switching compute profiles for other reasons than process creation or termination. Signed-off-by: Harish