[PATCH 19/27] drm/amdkfd: Fix a circular lock dependency

2019-04-28 Thread Kuehling, Felix
Fix a circular lock dependency exposed under userptr memory pressure. The DQM lock is the only one taken inside the MMU notifier. We need to make sure that no reclaim is done under this lock, and that no other locks are taken under which reclaim is possible. Signed-off-by: Felix Kuehling

[PATCH 13/27] drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue()

2019-04-28 Thread Kuehling, Felix
From: Yong Zhao This avoids duplicated code. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 29 +++ 1 file changed, 11 insertions(+), 18 deletions(-) diff --git

[PATCH 23/27] drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall ttmp[4:5] is initialized by the SPI with SPI_GDBG_TRAP_DATA* values. These values are more useful to the debugger than ttmp[14:15], which carries dispatch_scratch_base*. There are too few registers to preserve both. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling

[PATCH 26/27] drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration

2019-04-28 Thread Kuehling, Felix
From: shaoyunl There is a bug found in vml2 xgmi logic: mtype is always sent as NC on the VMC to TC interface for a page walk, regardless of whether the request is being sent to local or remote GPU. NC means non-coherent and will cause the VMC return data to be cached in the TCC (versus UC –

[PATCH 16/27] drm/amdkfd: Introduce XGMI SDMA queue type

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Existing QUEUE_TYPE_SDMA means PCIe optimized SDMA queues. Introduce a new QUEUE_TYPE_SDMA_XGMI, which is optimized for non-PCIe transfer such as XGMI. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 10/27] drm/amdkfd: Allocate MQD trunk for HIQ and SDMA

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng MEC FW for some new asic requires all SDMA MQDs to be in a continuous trunk of memory right after HIQ MQD. Add a field in device queue manager to hold the HIQ/SDMA MQD memory object and allocate MQD trunk on device queue manager initialization. Signed-off-by: Oak Zeng

[PATCH 21/27] drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall If instruction fetch fails the wave cannot be halted and returned to the shader without raising MEM_VIOL again. Currently the wave is terminated if this occurs, but this loses information about the cause of the fault. The debugger would prefer the faulting wave state to be

[PATCH 12/27] drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Instead of allocat hiq and sdma mqd from sub-allocator, allocate them from a mqd trunk pool. This is done for all asics Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 49 +++

[PATCH 03/27] drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng sdma_queue_id is sdma queue index inside one sdma engine. sdma_id is sdma queue index among all sdma engines. Use those two names properly. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 01/27] drm/amdkfd: Use 64 bit sdma_bitmap

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Maximumly support 64 sdma queues Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- 2 files changed, 6

[PATCH 07/27] drm/amdkfd: Introduce DIQ type mqd manager

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng With introduction of new mqd allocation scheme for HIQ, DIQ and HIQ use different mqd allocation scheme, DIQ can't reuse HIQ mqd manager Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c

[PATCH 05/27] drm/amdkfd: Fix a potential memory leak

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Free mqd_mem_obj it GTT buffer allocation for MQD+control stack fails. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git

[PATCH 02/27] drm/amdkfd: Add sdma allocation debug message

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Add debug messages during SDMA queue allocation. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +++ 1 file changed, 3 insertions(+) diff --git

[PATCH 27/27] drm/amdgpu: Fix GTT size calculation

2019-04-28 Thread Kuehling, Felix
From: Kent Russell GTT size is currently limited to the minimum of VRAM size or 3/4 of system memory. This severely limits the quanitity of system memory that can be used by ROCm application. Increase GTT size to the maximum of VRAM size or system memory size. Signed-off-by: Kent Russell

[PATCH 25/27] drm/amdkfd: Add domain number into gpu_id

2019-04-28 Thread Kuehling, Felix
From: Amber Lin A multi-socket server can have multiple PCIe segments so BFD is not enough to distingush each GPU. Also add domain number into account when generating gpu_id. Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 18/27] drm/amdkfd: Delete alloc_format field from map_queue struct

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Alloc format was never really supported by MEC FW. FW always does one per pipe allocation. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 2 --

[PATCH 11/27] drm/amdkfd: Move non-sdma mqd allocation out of init_mqd

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng This is preparation work to introduce more mqd allocation scheme Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 20 ++-- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 51

[PATCH 00/27] KFD upstreaming

2019-04-28 Thread Kuehling, Felix
Assorted KFD changes that have been accumulating on amd-kfd-staging. New features and fixes included: * Support for VegaM * Support for systems with multiple PCI domains * New SDMA queue type that's optimized for XGMI links * SDMA MQD allocation changes to support future ASICs with more SDMA

[PATCH 04/27] drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng FW of some new ASICs requires sdma mqd size to be not more than 128 dwords. Repurpose the last 2 reserved fields of sdma mqd for driver internal use, so the total mqd size is no bigger than 128 dwords Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix

[PATCH 09/27] drm/amdkfd: Add mqd size in mqd manager struct

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Also initialize mqd size on mqd manager initialization Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 4

[PATCH 24/27] drm/amdkfd: Add VegaM support

2019-04-28 Thread Kuehling, Felix
From: Kent Russell Add the VegaM information to KFD Signed-off-by: Kent Russell Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 +++

[PATCH 06/27] drm/amdkfd: Introduce asic-specific mqd_manager_init function

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Global function mqd_manager_init just calls asic-specific functions and it is not necessary. Delete it and introduce a mqd_manager_init interface in dqm for asic-specific mqd manager init. Call mqd_manager_init interface directly to initialize mqd manager Signed-off-by: Oak Zeng

[PATCH 17/27] drm/amdkfd: Expose sdma engine numbers to topology

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Expose available numbers of both SDMA queue types in the topology. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 7 +++ drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 2 ++ 2 files changed, 9

[PATCH 08/27] drm/amdkfd: Init mqd managers in device queue manager init

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Previously mqd managers was initialized on demand. As there are only a few type of mqd managers, the on demand initialization doesn't save too much memory. Initialize them on device queue initialization instead and delete the get_mqd_manager interface. This makes codes more

[PATCH 22/27] drm/amdkfd: Fix gfx9 XNACK state save/restore

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall SQ_WAVE_IB_STS.RCNT grew from 4 bits to 5 in gfx9. Do not truncate when saving in the high bits of TTMP1. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 12 ++--

[PATCH 15/27] drm/amdkfd: Fix sdma queue map issue

2019-04-28 Thread Kuehling, Felix
From: Oak Zeng Previous codes assumes there are two sdma engines. This is not true e.g., Raven only has 1 SDMA engine. Fix the issue by using sdma engine number info in device_info. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling ---

[PATCH 14/27] drm/amdkfd: Fix compute profile switching

2019-04-28 Thread Kuehling, Felix
From: Harish Kasiviswanathan Fix compute profile switching on process termination. Add a dedicated reference counter to keep track of entry/exit to/from compute profile. This enables switching compute profiles for other reasons than process creation or termination. Signed-off-by: Harish

Re: [PATCH 1/2] drm/amdgpu: Remap hdp coherency registers

2019-04-23 Thread Kuehling, Felix
One more nit-pick inline. On 2019-04-23 4:59 p.m., Zeng, Oak wrote: > Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL > to an empty page in mmio space. We will later map this page to process > space so application can flush hdp. This can't be done properly at > those

Re: [PATCH 1/2] drm/amdgpu: Implement get num of hops between two xgmi device

2019-04-23 Thread Kuehling, Felix
It seems to me that amdgpu_hive_info is a driver-internal structure, but the psp_xpmi_topology structures are an interface with the PSP that may change in future ASIC generations. So on second thought, adding the psp_xgmi_topology structures to the psp_xgmi_context (or amdgpu_hive_info) like

Re: [PATCH 1/2] drm/amdgpu: Remap hdp coherency registers

2019-04-23 Thread Kuehling, Felix
See inline. On 2019-04-23 3:23 p.m., Zeng, Oak wrote: > Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL > to an empty page in mmio space. We will later map this page to process > space so application can flush hdp. This can't be done properly at > those registers' original

Re: [PATCH 2/2] drm/amdkfd: Adjust weight to represent num_hops info when report xgmi iolink

2019-04-23 Thread Kuehling, Felix
On 2019-04-17 2:59 p.m., Liu, Shaoyun wrote: > Upper level runtime need the xgmi hops info to determine the data path > > Change-Id: I969b419eab125157e223e9b03980ca229c1e6af4 > Signed-off-by: shaoyunl > --- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 8 ++-- >

Re: [PATCH 1/2] drm/amdgpu: Implement get num of hops between two xgmi device

2019-04-23 Thread Kuehling, Felix
See inline. On 2019-04-17 2:58 p.m., Liu, Shaoyun wrote: > KFD need to provide the info for upper level to determine the data path > > Change-Id: Idc809e8f3381b9222dd7be96539522d440f3ee7d > Signed-off-by: shaoyunl > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +++ >

Re: [PATCH] drm: increase drm mmap_range size to 1TB

2019-04-17 Thread Kuehling, Felix
Adding dri-devel On 2019-04-17 6:15 p.m., Yang, Philip wrote: > After patch "drm: Use the same mmap-range offset and size for GEM and > TTM", application failed to create bo of system memory because drm > mmap_range size decrease to 64GB from original 1TB. This is not big > enough for

Re: [PATCH] drm/amdkfd: Disable Packet Manager in non HWS mode except Hawaii

2019-04-17 Thread Kuehling, Felix
> > On 2019-04-17 5:06 p.m., Kuehling, Felix wrote: >> On 2019-04-17 4:54 p.m., Zhao, Yong wrote: >>> The packet manager is only needed for HWS mode, as well as Hawaii in non >>> HWS mode. So only initialize it under those scenarios. This is useful >>> especially

Re: [PATCH 1/2] drm/amdgpu: Remap hdp coherency registers

2019-04-17 Thread Kuehling, Felix
On 2019-04-17 10:20 a.m., Zeng, Oak wrote: > Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL > to an empty page in mmio space. We will later map this page to process > space so application can flush hdp. This can't be done properly at > those registers' original location

Re: [PATCH 2/2] drm/amdkfd: Expose HDP registers to user space

2019-04-17 Thread Kuehling, Felix
On 2019-04-17 12:20 p.m., Deucher, Alexander wrote: >> -Original Message- >> From: amd-gfx On Behalf Of >> Zeng, Oak >> Sent: Wednesday, April 17, 2019 10:21 AM >> To: amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander ; Kuehling, Felix >

Re: [PATCH] drm/amdkfd: Disable Packet Manager in non HWS mode except Hawaii

2019-04-17 Thread Kuehling, Felix
On 2019-04-17 4:54 p.m., Zhao, Yong wrote: > The packet manager is only needed for HWS mode, as well as Hawaii in non > HWS mode. So only initialize it under those scenarios. This is useful > especially for emulation environment when things are slow. I never thought of packet manager

Re: [RFC] drm/amdkfd: Use logical cpu id for building vcrat

2019-04-16 Thread Kuehling, Felix
nel code to remove here. All I was saying was, that it's not a high priority to add the kernel code to populate CPU cache information in kernel mode. Regards,   Felix > > Regards, > Christian. > > Am 16.04.19 um 05:24 schrieb Kuehling, Felix: >> >> On x86

Re: [PATCH] drm/amdgpu: get_fw_version isn't ASIC specific

2019-04-16 Thread Kuehling, Felix
This is a nice cleanup. With this change, kfd2kgd_calls.get_fw_version is no longer used. You should remove it from kgd_kfd_interface.h. Also move the enum kgd_engine_type to amdgpu_amdkfd.h at the same time. With that fixed, this patch is Reviewed-by: Felix Kuehling On 2019-04-12 4:10

Re: [RFC] drm/amdkfd: Use logical cpu id for building vcrat

2019-04-15 Thread Kuehling, Felix
On x86 we use the apicid to associate caches with CPU cores. See the Thunk code in libhsakmt/src/topology.c (static void find_cpu_cache_siblings()). If we used a different way to identify CPU cores, I think that would break. This code in the Thunk is x86-specific as it uses the CPUID

RE: [PATCH] drm/amdgpu: support dpm level modification under virtualization

2019-04-10 Thread Kuehling, Felix
How does forcing DPM levels work in SRIOV? Can clocks switch fast enough to allow different VFs have different clocks? If not, can one VF override the clocks used by another VF? In that case, wouldn't that violate the isolation between VFs? Regards, Felix -Original Message- From:

Re: [PATCH 1/8] drm/amdgpu: fix ATC handling for Ryzen

2019-04-03 Thread Kuehling, Felix
On 2019-04-03 1:24 p.m., Koenig, Christian wrote: > Am 01.04.19 um 20:58 schrieb Kuehling, Felix: >> On 2019-04-01 2:03 p.m., Christian König wrote: >>> Am 01.04.19 um 19:59 schrieb Kuehling, Felix: >>>> On 2019-04-01 7:23 a.m., Christian König wrote: >>>&

Re: [PATCH v13 14/20] drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages

2019-04-02 Thread Kuehling, Felix
On 2019-04-02 10:37 a.m., Andrey Konovalov wrote: > On Mon, Mar 25, 2019 at 11:21 PM Kuehling, Felix > wrote: >> On 2019-03-20 10:51 a.m., Andrey Konovalov wrote: >>> This patch is a part of a series that extends arm64 kernel ABI to allow to >>> pass tagged user p

Re: [PATCH RFC tip/core/rcu 3/4] drivers/gpu/drm/amd: Dynamically allocate kfd_processes_srcu

2019-04-02 Thread Kuehling, Felix
On 2019-04-02 10:29 a.m., Paul E. McKenney wrote: > Having DEFINE_SRCU() or DEFINE_STATIC_SRCU() in a loadable module > requires that the size of the reserved region be increased, which is > not something we really want to be doing. This commit therefore removes > the DEFINE_STATIC_SRCU() from

Re: [PATCH 1/8] drm/amdgpu: fix ATC handling for Ryzen

2019-04-01 Thread Kuehling, Felix
On 2019-04-01 2:03 p.m., Christian König wrote: > Am 01.04.19 um 19:59 schrieb Kuehling, Felix: >> On 2019-04-01 7:23 a.m., Christian König wrote: >>> Am 30.03.19 um 01:41 schrieb Kuehling, Felix: >>>> Patches 1-3 are Reviewed-by: Felix Kuehling >>>

Re: [PATCH 1/8] drm/amdgpu: fix ATC handling for Ryzen

2019-04-01 Thread Kuehling, Felix
On 2019-04-01 7:23 a.m., Christian König wrote: > Am 30.03.19 um 01:41 schrieb Kuehling, Felix: >> Patches 1-3 are Reviewed-by: Felix Kuehling > > Thanks. > >> >> About the direct mode, that removes a bunch of synchronization, so it >> must make some assumption

Re: [PATCH 1/8] drm/amdgpu: fix ATC handling for Ryzen

2019-03-29 Thread Kuehling, Felix
Patches 1-3 are Reviewed-by: Felix Kuehling About the direct mode, that removes a bunch of synchronization, so it must make some assumptions about the state of the page tables. What makes that safe? Is it safe to use direct-mode on a per-page-table-update basis? Or do all page table updates

Re: [PATCH] drm/amdgpu: Add preferred_domain check when determine XGMI state

2019-03-28 Thread Kuehling, Felix
On 2019-03-28 4:38 p.m., Liu, Shaoyun wrote: > Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for > peer device > > Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2 > Signed-off-by: shaoyunl This patch is Reviewed-by: Felix Kuehling Please also give Christian a

Re: [PATCH] drm/amdgpu: Add preferred_domain check when determine XGMI state

2019-03-28 Thread Kuehling, Felix
nly makes sense inside the loop. The amdgpu_vm_bo_base should tell you the device that's mapping and potentially accessing the memory over XGMI. You could get it like this:     mapping_adev = base->vm->root.base.bo->tbo.bdev; Regards,   Felix > > Regards > > shaoyun.liu

Re: [PATCH] drm/amdgpu: don't put the root PD into the relocated list

2019-03-28 Thread Kuehling, Felix
The change looks reasonable to me. Acked-by: Felix Kuehling I just don't understand why the root PD is special and handled differently from other PDs and PTs. Regards,   Felix On 2019-03-27 6:39 a.m., Christian König wrote: > Instead of skipping the root PD while processing the relocated

Re: [PATCH] drm/amdgpu: Add preferred_domain check when determine XGMI state

2019-03-28 Thread Kuehling, Felix
On 2019-03-28 1:55 p.m., Liu, Shaoyun wrote: > Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for > peer device > > Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2 > Signed-off-by: shaoyunl > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 13 + >

Re: [PATCH] drm/amdgpu: Add preferred_domain check when determine XGMI state

2019-03-27 Thread Kuehling, Felix
On 2019-03-26 4:35 p.m., Liu, Shaoyun wrote: > Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for > peer device > > Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2 > Signed-off-by: shaoyunl > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +++ >

Re: [PATCH] drm/amdgpu: Add preferred_domain check when determine XGMI state

2019-03-26 Thread Kuehling, Felix
On 2019-03-26 2:54 p.m., Liu, Shaoyun wrote: > Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for > peer device > > Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2 > Signed-off-by: shaoyunl > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 + >

Re: [PATCH] drm/amdgpu: XGMI pstate switch initial support

2019-03-26 Thread Kuehling, Felix
- > *From:* amd-gfx on behalf of > Kuehling, Felix > *Sent:* March 25, 2019 6:28:32 PM > *To:* Liu, Shaoyun; amd-gfx@lists.freedesktop.org > *Subject:* Re: [PATCH] drm/amdgpu: XGMI pstate switch initial support > I don't see any check for the memory type. As far as I can tell you'll >

Re: [PATCH 1/2] drm/amdgpu: move VM table mapping into the backend as well

2019-03-25 Thread Kuehling, Felix
The series is Reviewed-by: Felix Kuehling On 2019-03-25 8:22 a.m., Christian König wrote: > Clean that up further and also fix another case where the BO > wasn't kmapped for CPU based updates. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31

Re: Kernel panic while “ modprobe amdkfd ; modprobe -r amdkfd ; 4.14.35 kernel

2019-03-25 Thread Kuehling, Felix
On 2019-03-22 12:58 p.m., John Donnelly wrote: > Hello , > > I am investigating a issue reported by a test group concerning this driver. > Their test loads and unloads every kernel module included in the 4.14.35 > kernel release . You don’t even need a AMD platform . It occurs on any Intel, >

Re: [PATCH] drm/amdgpu: XGMI pstate switch initial support

2019-03-25 Thread Kuehling, Felix
I don't see any check for the memory type. As far as I can tell you'll power up XGMI even for system memory mappings. See inline. On 2019-03-22 3:28 p.m., Liu, Shaoyun wrote: > Driver vote low to high pstate switch whenever there is an outstanding > XGMI mapping request. Driver vote high to low

Re: [PATCH v13 14/20] drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages

2019-03-25 Thread Kuehling, Felix
On 2019-03-20 10:51 a.m., Andrey Konovalov wrote: > This patch is a part of a series that extends arm64 kernel ABI to allow to > pass tagged user pointers (with the top byte set to something else other > than 0x00) as syscall arguments. > > amdgpu_ttm_tt_get_user_pages() uses provided user

Re: [PATCH 5/8] drm/amdgpu: new VM update backends

2019-03-25 Thread Kuehling, Felix
On 2019-03-25 7:38 a.m., Christian König wrote: > Am 20.03.19 um 12:57 schrieb Kuehling, Felix: >> As far as I can tell, the whole series is a small cleanup and big >> refactor to enable CPU clearing of PTs without a lot of ugliness or code >> duplication. > > It's a bi

Re: [PATCH 5/8] drm/amdgpu: new VM update backends

2019-03-20 Thread Kuehling, Felix
As far as I can tell, the whole series is a small cleanup and big refactor to enable CPU clearing of PTs without a lot of ugliness or code duplication. It looks good to me. I haven't reviewed all the moved SDMA update code to make sure it all works correctly, but at least the prepare and

Re: [PATCH] drm/amdgpu: revert "XGMI pstate switch initial support"

2019-03-19 Thread Kuehling, Felix
cause > you don't have a lock protecting the hw update itself. E.g. while > powering down you can add a mapping which needs to power it up again > and so powering down and powering up race with each other. That's a good point. Regards,   Felix > > Regards, > Christian. > &g

Re: [PATCH] drm/amdgpu: revert "XGMI pstate switch initial support"

2019-03-19 Thread Kuehling, Felix
We discussed a few different approaches before settling on this one. Maybe it needs some more background. XGMI links are quite power hungry. Being able to power them down improves performance for power-limited workloads that don't need XGMI. In machine learning, pretty much all workloads are

Re: [PATCH] drm/amdkfd: Fix unchecked return value

2019-03-18 Thread Kuehling, Felix
Alex already applied an equivalent patch by Colin King (attached for reference). Regards,   Felix On 3/18/2019 2:05 PM, Gustavo A. R. Silva wrote: > Assign return value of function amdgpu_bo_sync_wait() to variable ret > for its further check. > > Addresses-Coverity-ID: 1443914 ("Logically

Re: Slow memory access when using OpenCL without X11

2019-03-15 Thread Kuehling, Felix
> echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' >> | sudo tee /etc/apt/sources.list.d/rocm.list >> sudo apt install rocm-opencl-dev >> >> Also exactly the same issue happens with this board: >> https://www.gigabyte.com/Motherboard/GA-AB350-Gami

Re: Slow memory access when using OpenCL without X11

2019-03-13 Thread Kuehling, Felix
-- > *From:* amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org>> on behalf of Lauri > Ehrenpreis mailto:lauri...@gmail.com>> > *Sent:* Tuesday, March 12, 2019 5:31 PM > *To:* Kuehling, Felix > *Cc:* Tom St Denis; amd-gfx@lists.freedesktop.

Re: [PATCH 1/3] drm/amdgpu: re-enable retry faults

2019-03-13 Thread Kuehling, Felix
The series is Reviewed-by: Felix Kuehling On 2019-03-13 9:44 a.m., Christian König wrote: > Now that we have re-reoute faults to the other IH > ring we can enable retries again. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 2 +- >

Re: [PATCH 2/3] drm/amdgpu: support userptr cross VMAs case with HMM v3

2019-03-12 Thread Kuehling, Felix
This patch is Reviewed-by: Felix Kuehling Regards,   Felix On 3/12/2019 9:17 PM, Yang, Philip wrote: > userptr may cross two VMAs if the forked child process (not call exec > after fork) malloc buffer, then free it, and then malloc larger size > buf, kerenl will create new VMA adjacent to old

Re: [PATCH 1/1] drm/amdgpu: Wait for newly allocated PTs to be idle

2019-03-12 Thread Kuehling, Felix
a dedicate SDMA engine > for PTE update including clear? ).  But if we didn't use the  same > engine , it may explain why the  test failed occasionally. > > Regards > > shaoyun.liu > > > > On 2019-03-12 5:20 p.m., Kuehling, Felix wrote: >> When page table are upd

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
t;> Peculiar, I hit it immediately when I ran it . Can you try use >>> --gtest_filter=KFDCWSRTest.BasicTest . That one hung every time for me. >>> >>>    Kent >>> >>>> -Original Message- >>>> From: Christian König >>>>

[PATCH 1/1] drm/amdgpu: Wait for newly allocated PTs to be idle

2019-03-12 Thread Kuehling, Felix
When page table are updated by the CPU, synchronize with the allocation and initialization of newly allocated page tables. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
ery time for me. >> >>Kent >> >>> -Original Message- >>> From: Christian König >>> Sent: Tuesday, March 12, 2019 11:09 AM >>> To: Russell, Kent ; Koenig, Christian >>> ; Kuehling, Felix ; >>> amd-gfx@lists.freedesktop.org >>>

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
;>  From what I've been able to dig through, the VM Fault seems to >>> occur right after a doorbell mmap, but that's as far as I got. I can >>> try to revert it in today's merge and see how things go. >>> >>>   Kent >>> >>>> -Origin

Re: [PATCH 1/3] drm/amdkfd: support concurrent userptr update for HMM v2

2019-03-12 Thread Kuehling, Felix
On 2019-03-06 9:42 p.m., Yang, Philip wrote: > Userptr restore may have concurrent userptr invalidation after > hmm_vma_fault adds the range to the hmm->ranges list, needs call > hmm_vma_range_done to remove the range from hmm->ranges list first, > then reschedule the restore worker. Otherwise

Re: [PATCH 2/3] drm/amdgpu: support userptr cross VMAs case with HMM v2

2019-03-12 Thread Kuehling, Felix
See one comment inline. There are still some potential problems that you're not catching. On 2019-03-06 9:42 p.m., Yang, Philip wrote: > userptr may cross two VMAs if the forked child process (not call exec > after fork) malloc buffer, then free it, and then malloc larger size > buf, kerenl will

Re: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-12 Thread Kuehling, Felix
49 schrieb Russell, Kent: >>  From what I've been able to dig through, the VM Fault seems to occur >> right after a doorbell mmap, but that's as far as I got. I can try to >> revert it in today's merge and see how things go. >> >>   Kent >> >>> -Origi

Re: Slow memory access when using OpenCL without X11

2019-03-12 Thread Kuehling, Felix
[adding the list back] I'd suspect a problem related to memory clock. This is an APU where system memory is shared with the CPU, so if the SMU changes memory clocks that would affect CPU memory access performance. If the problem only occurs when OpenCL is running, then the compute power

RE: [PATCH 2/3] drm/amdgpu: free up the first paging queue

2019-03-12 Thread Kuehling, Felix
I think this would break Raven, which only has one SDMA engine. Regards, Felix -Original Message- From: amd-gfx On Behalf Of Christian König Sent: Tuesday, March 12, 2019 8:38 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH 2/3] drm/amdgpu: free up the first paging queue We

RE: [PATCH 3/6] drm/amdgpu: allocate VM PDs/PTs on demand

2019-03-08 Thread Kuehling, Felix
My concerns were related to eviction fence handing. It would manifest by unnecessary eviction callbacks into KFD that aren't cause by real evictions. I addressed that with a previous patch series that removed the need to remove eviction fences and add them back around page table updates in

Re: [PATCH 1/2] drm/amdgpu: use ring/hash for fault handling on GMC9 v2

2019-03-07 Thread Kuehling, Felix
Hmm, that's a clever (and elegant) little data structure. The series is Reviewed-by: Felix Kuehling Regards,   Felix On 3/7/2019 8:28 AM, Christian König wrote: > Further testing showed that the idea with the chash doesn't work as expected. > Especially we can't predict when we can remove the

Re: [PATCH 2/3] drm/amdgpu: support userptr cross VMAs case with HMM

2019-03-06 Thread Kuehling, Felix
Some comments inline ... On 3/5/2019 1:09 PM, Yang, Philip wrote: > userptr may cross two VMAs if the forked child process (not call exec > after fork) malloc buffer, then free it, and then malloc larger size > buf, kerenl will create new VMA adjacent to old VMA which was cloned > from parent

Re: [PATCH 1/3] drm/amdkfd: support concurrent userptr update for HMM

2019-03-06 Thread Kuehling, Felix
Hmm, I'm not sure. This change probably fixes this issue, but there may be other similar corner cases in other situations where the restore worker fails and needs to retry. The better place to call untrack in  amdgpu_amdkfd_restore_userptr_worker would be at the very end. Anything that's left

Re: [PATCH] drm/amdkfd: Add curly braces around idr_for_each_entry_continue loop

2019-03-05 Thread Kuehling, Felix
On 2019-03-05 6:20 a.m., Michel Dänzer wrote: > From: Michel Dänzer > > The compiler pointed out that one if block unintentionally wasn't part > of the loop: > > In file included from ./include/linux/kernfs.h:14, > from ./include/linux/sysfs.h:16, > from

Re: [PATCH 1/2] drm/amdgpu: rework shadow handling during PD clear v3

2019-03-04 Thread Kuehling, Felix
One not so obvious change here: The fence on the page table after clear_bo now waits for clearing both the page table and the shadow. That may make clearing of page tables appear a bit slower. On the other hand, if you're clearing a bunch of page tables at once, then difference will be minimal

RE: [PATCH] drm/amdgpu: handle userptr corner cases with HMM path

2019-03-01 Thread Kuehling, Felix
Since you're addressing two distinct bugs, please split this into two patches. For the multiple VMAs, should we generalize that to handle any number of VMAs? It's not a typical case, but you could easily construct situations with mprotect where different parts of the same buffer have different

Re: [PATCH] drm/amdgpu: Add sysfs files for returning VRAM/GTT info

2019-02-28 Thread Kuehling, Felix
On 2/28/2019 9:56 AM, Christian König wrote: > Am 28.02.19 um 16:32 schrieb Russell, Kent: >> Add 3 files that return: >> The total amount of VRAM and the current total used VRAM >> The total amount of VRAM and the current total used visible VRAM >> The total GTT size and the current total of used

Re: [PATCH 1/1] drm/ttm: Account for kernel allocations in kernel zone only

2019-02-25 Thread Kuehling, Felix
On 2/25/2019 2:58 PM, Thomas Hellstrom wrote: > On Mon, 2019-02-25 at 14:20 +, Koenig, Christian wrote: >> Am 23.02.19 um 00:19 schrieb Kuehling, Felix: >>> Don't account for them in other zones such as dma32. The kernel >>> page >>> allocator has its own he

[PATCH 1/1] drm/ttm: Account for kernel allocations in kernel zone only

2019-02-22 Thread Kuehling, Felix
Don't account for them in other zones such as dma32. The kernel page allocator has its own heuristics to avoid exhausting special zones for regular kernel allocations. Signed-off-by: Felix Kuehling CC: thellst...@vmware.com CC: christian.koe...@amd.com --- drivers/gpu/drm/ttm/ttm_memory.c | 6

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-22 Thread Kuehling, Felix
On 2019-02-22 8:45 a.m., Thomas Hellstrom wrote: > On Fri, 2019-02-22 at 07:10 +, Koenig, Christian wrote: >> Am 21.02.19 um 22:02 schrieb Thomas Hellstrom: >>> Hi, >>> >>> On Thu, 2019-02-21 at 20:24 +, Kuehling, Felix wrote: >>>>

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-21 Thread Kuehling, Felix
On 2019-02-21 12:34 p.m., Thomas Hellstrom wrote: > On Thu, 2019-02-21 at 16:57 +0000, Kuehling, Felix wrote: >> On 2019-02-21 2:59 a.m., Koenig, Christian wrote: >>> On x86 with HIGHMEM there is no dma32 zone. Why do we need one on >>>>> x86_64? Can we make

Re: [PATCH] drm/amdgpu: fix HMM config dependency issue

2019-02-21 Thread Kuehling, Felix
On 2019-02-21 12:48 p.m., Yang, Philip wrote: > Only select HMM_MIRROR will get kernel config dependency warnings > if CONFIG_HMM is missing in the config. Add depends on HMM will > solve the issue. > > Add conditional compilation to fix compilation errors if HMM_MIRROR > is not enabled as HMM

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-21 Thread Kuehling, Felix
On 2019-02-21 2:59 a.m., Koenig, Christian wrote: > On x86 with HIGHMEM there is no dma32 zone. Why do we need one on >>> x86_64? Can we make x86_64 more like HIGHMEM instead? >>> >>> Regards, >>> Felix >>> >> IIRC with x86, the kernel zone is always smaller than any dma32 zone, >> so we'd

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Kuehling, Felix
On 2019-02-20 6:34 p.m., Jerome Glisse wrote: > On Wed, Feb 20, 2019 at 10:39:49PM +0000, Kuehling, Felix wrote: >> On 2019-02-20 5:12 p.m., Jerome Glisse wrote: >>> On Wed, Feb 20, 2019 at 07:18:17PM +0000, Kuehling, Felix wrote: >>>> [+Jerome] >>>> >&

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Kuehling, Felix
On 2019-02-20 5:12 p.m., Jerome Glisse wrote: > On Wed, Feb 20, 2019 at 07:18:17PM +0000, Kuehling, Felix wrote: >> [+Jerome] >> >> Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring >> CPU page tables to device page tables. >> >> ARCH_H

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Kuehling, Felix
On 2019-02-20 1:41 a.m., Thomas Hellstrom wrote: > On Tue, 2019-02-19 at 17:06 +0000, Kuehling, Felix wrote: >> On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote: >>> On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote: >>>> Am 18.02.19 um 10:47 schrieb Thomas He

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Kuehling, Felix
[+Jerome] Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring CPU page tables to device page tables. ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative for ARM support? Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the CPU

Re: [PATCH] drm/amdgpu: disable userptr if swiotlb is active

2019-02-20 Thread Kuehling, Felix
I guess we'll need something similar for KFD? I don't think we've ever intentionally tested KFD with swiotlb. But I've seen some backtraces with swiotlb in them before. I wonder how badly broken it is ... Regards,   Felix On 2019-02-20 8:46 a.m., Christian König wrote: > Otherwise we can't be

Re: [PATCH 1/7] drm/amdgpu: clear PDs/PTs only after initializing them

2019-02-19 Thread Kuehling, Felix
I commented on patches 2 and 3 in separate emails. The rest of the series is Reviewed-by: Felix Kuehling On 2019-02-19 8:40 a.m., Christian König wrote: > Clear the VM PDs/PTs only after initializing all the structures. > > Signed-off-by: Christian König > --- >

Re: [PATCH 3/7] drm/amdgpu: let amdgpu_vm_clear_bo figure out ats status

2019-02-19 Thread Kuehling, Felix
On 2019-02-19 8:40 a.m., Christian König wrote: > Instead of providing it from outside figure out the ats status in the > function itself from the data structures. > > Signed-off-by: Christian König One suggestion inline. Other than that this patch is Reviewed-by: Felix Kuehling Regards,  

Re: [PATCH 2/7] drm/amdgpu: rework shadow handling during PD clear

2019-02-19 Thread Kuehling, Felix
Comments inline. On 2019-02-19 8:40 a.m., Christian König wrote: > This way we only deal with the real BO in here. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 64 -- > 1 file changed, 39 insertions(+), 25 deletions(-) > > diff

Re: [PATCH] drm/powerplay: print current clock level when dpm is disabled on vg20

2019-02-19 Thread Kuehling, Felix
On 2019-02-19 4:09 p.m., Liu, Shaoyun wrote: > When DPM for the specific clock is difabled, driver should still print out > current clock info for rocm-smi support on vega20 > > Change-Id: I8669c77bf153caa2cd63a575802eb58747151239 > Signed-off-by: shaoyunl > --- >

<    1   2   3   4   5   6   >