[PATCH 2/4] drm/amdgpu: Changed CU reservation golden settings

2020-04-29 Thread Felix Kuehling
and CU1. Signed-off-by: Oak Zeng Acked-by: Alex Deucher Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0

[PATCH 0/4] KFD upstreaming

2020-04-29 Thread Felix Kuehling
A few small patches found during DKMS branch rebasing that were missing from amd-staging-drm-next for no good reason. Felix Kuehling (2): drm/amdkfd: Fix comment formatting drm/amdgpu: Add missing parameter description in comments Oak Zeng (1): drm/amdgpu: Changed CU reservation golden

Re: [PATCH hmm 5/5] mm/hmm: remove the customizable pfn format from hmm_range_fault

2020-04-22 Thread Felix Kuehling
t sweeps over its pfns array a couple of times anyhow. > > Signed-off-by: Jason Gunthorpe > Signed-off-by: Christoph Hellwig Hi Jason, I pointed out a typo in the documentation inline. Other than that, the series is Acked-by: Felix Kuehling I'll try to build it and run so

Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

2020-04-15 Thread Felix Kuehling
ioctl/botching-up-ioctls.html >> >> Alex >> >> ---- >> *From:* amd-gfx >> <mailto:amd-gfx-boun...@lists.freedesktop.org> on behalf of Felix >> Kuehling <mailto:felix.kuehl...@amd.

Re: [PATCH 1/1] drm/amdgpu: Take a reference to an exported BO

2020-05-05 Thread Felix Kuehling
Am 2020-05-05 um 11:19 a.m. schrieb Christian König: > Am 05.05.20 um 16:58 schrieb Felix Kuehling: >> Am 2020-05-05 um 3:47 a.m. schrieb Christian König: >>> Just to reply here once more, this patch is a clear NAK. >> >> Agreed. But see below. I don't think all is w

Re: [PATCH 1/1] drm/amdgpu: Take a reference to an exported BO

2020-05-05 Thread Felix Kuehling
BO would hold one token reference to the TTM BO, which it can drop when the GEM BO refcount drops to 0. Finally, the amdgpu BO should only be freed once the TTM BO refcount also becomes 0. Regards,   Felix > > Regards, > Christian. > > Am 01.05.20 um 16:44 schrieb Felix Kuehling:

[PATCH 1/1] drm/amdgpu: Use GEM obj reference for KFD BOs

2020-05-05 Thread Felix Kuehling
-off-by: Felix Kuehling Tested-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1247938b1ec1

Re: [PATCH 1/1] drm/amdgpu: Take a reference to an exported BO

2020-05-05 Thread Felix Kuehling
Am 2020-05-05 um 1:29 p.m. schrieb Felix Kuehling: > Am 2020-05-05 um 11:19 a.m. schrieb Christian König: >> Am 05.05.20 um 16:58 schrieb Felix Kuehling: >>> Am 2020-05-05 um 3:47 a.m. schrieb Christian König: >>>> Just to reply here once more, this patch is a clea

Re: [PATCH] drm/amdkfd: Provide SMI events watch

2020-05-13 Thread Felix Kuehling
ts enablement from ioctl to fd write > > Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/Makefile | 1 + > drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 2 + > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 18 ++ &

Re: [PATCH 6/8] drm/amdgpu: remove Arcturus references from vega10 ih

2020-03-18 Thread Felix Kuehling
I believe this should be squashed into Patch #8 or applied after patch #8. Otherwise it creates a broken intermediate state where Arcturus doesn't have any valid IH support. That said, it's probably less critical because it only affects the case of direct (backdoor) firmware loading.

Re: [PATCH 2/8] drm/amdgpu: create new files for arcturus ih

2020-03-18 Thread Felix Kuehling
How much overlap is there between arcturus_ih and nave10_ih? Given that they both use the same register map, could they share the same driver code with only minor differences? If they're almost the same, maybe you could rename navi10_ih.[ch] to osssys_v5_0.[ch] and use it for both navi10 and

Re: [PATCH 1/2] drm/amdgpu: cleanup amdgpu_ttm_copy_mem_to_mem and amdgpu_map_buffer

2020-03-19 Thread Felix Kuehling
That looks like a nice cleanup. Some nit-picks inline ... On 2020-03-19 9:41, Christian König wrote: Cleanup amdgpu_ttm_copy_mem_to_mem by using fewer variables for the same value. Rename amdgpu_map_buffer to amdgpu_ttm_map_buffer, move it to avoid the forward decleration, cleanup by moving

Re: [PATCH 1/6] drm/amdgpu: ih doorbell size of range changed for nbio v7.4

2020-03-20 Thread Felix Kuehling
is the subject "PATCH 1/6"? It makes me wonder, what are the other 5 patches. Anyway, this patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/g

Re: [PATCH] drm/amdgpu: infinite retries fix from UTLC1 RB SDMA

2020-03-20 Thread Felix Kuehling
this description because the patch is no longer limited to Arcturus. One more comment inline. With those fixed, the patch is Reviewed-by: Felix Kuehling Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 21 + 1 file changed, 17 insertions(+), 4 deletions

Re: [PATCH 1/4] drm/amdgpu: add stride to calculate oss ring offsets

2020-03-20 Thread Felix Kuehling
On 2020-03-20 10:06, Deucher, Alexander wrote: [AMD Public Use] This seems kind of complicated and error prone.  I didn't realize the extent to the changes required.  I think it would be better to either add arcturus specific versions of these functions or just go with your original

Re: [PATCH 2/4] drm/amdgpu: add macro to get proper ih ring register offset

2020-03-20 Thread Felix Kuehling
On 2020-03-19 20:22, Alex Sierra wrote: This macro calculates the IH ring register offset based on the three ring numbers and asic type. The parameters needed are the register's name without the prefix mmIH and the ring number taken from RING0, RING1 or RING2 macros. Signed-off-by: Alex Sierra

Re: [PATCH 1/4] drm/amdgpu: add stride to calculate oss ring offsets

2020-03-20 Thread Felix Kuehling
On 2020-03-20 10:39, Deucher, Alexander wrote: [AMD Public Use] I'm worried we'll miss a register by accident.  We went with per IP sub drivers to avoid handling complexities around IP differences if possible.  Also the scheme seems like kind of a one off compared to what we do for other

Re: [PATCH 1/2] drm/amdgpu: cleanup amdgpu_ttm_copy_mem_to_mem and amdgpu_map_buffer v2

2020-03-23 Thread Felix Kuehling
documentation. No functional change. v2: add some more cleanup suggested by Felix Signed-off-by: Christian König Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 269 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 +- 2 files changed, 135

Re: [PATCH] drm/amd/amdkfd: Fix large framesize for kfd_smi_ev_read()

2020-05-20 Thread Felix Kuehling
ee one comment inline. With that fixed, the patch is Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 26 +++-- > 1 file changed, 19 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c > b/driv

[PATCH 1/1] drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode

2020-05-20 Thread Felix Kuehling
This fixes an intermittent bug where a root PD clear operation still in progress could overwrite a PDE update done by the CPU, resulting in a VM fault. Fixes: 108b4d928c03 ("drm/amd/amdgpu: Update VM function pointer") Reported-by: Jay Cornwall Tested-by: Jay Cornwall Signed-off

Re: [PATCH] drm/amdkfd: report the real PCI bus number

2020-05-20 Thread Felix Kuehling
Am 2020-05-20 um 11:34 p.m. schrieb Evan Quan: > Since the PCI bus number retrieved by PCI_BUS_NUM(pdev->devfn) > is wrong. > > Change-Id: I882a8531a65cdf91be20e34a034aca1f43f658b4 > Signed-off-by: Evan Quan Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/am

Re: [PATCH 1/1] drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode

2020-05-21 Thread Felix Kuehling
Am 2020-05-21 um 9:50 a.m. schrieb Christian König: > Am 21.05.20 um 00:51 schrieb Felix Kuehling: >> This fixes an intermittent bug where a root PD clear operation still in >> progress could overwrite a PDE update done by the CPU, resulting in a >> VM fault. >

Re: [PATCH] drm/amdkfd: Track SDMA utilization per process

2020-05-21 Thread Felix Kuehling
Hi Mukul, This looks pretty good. See some suggestions inline. Am 2020-05-14 um 4:33 p.m. schrieb Mukul Joshi: > Track SDMA usage on a per process basis and report it through sysfs. > The value in the sysfs file indicates the amount of time SDMA has > been in-use by this process since the

Re: [PATCH] drm/amdkfd: fix restore worker race condition

2020-05-21 Thread Felix Kuehling
O user pages if MMU interval notifer is gone. > > Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgp

Re: [PATCH] drm/kfd: fix a system crash issue during GPU recovery

2020-09-01 Thread Felix Kuehling
On 2020-09-01 11:21 a.m., Li, Dennis wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Felix, If GPU hang, execute_queues_cpsch will fail to unmap or map queues and then create_queue_cpsch will return error. If pqm_create_queue find create_queue_cpsch failed, it will call

Re: [PATCH 2/2] drm/amdgpu/gmc10: print client id string for gfxhub

2020-09-01 Thread Felix Kuehling
Should there a corresponding change in mmhub_v2_0.c? Other than that, the series is Reviewed-by: Felix Kuehling On 2020-09-01 5:51 p.m., Alex Deucher wrote: Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Signed-off

Re: [PATCH] drm/amdgpu: enable ih1 ih2 for Arcturus only

2020-09-02 Thread Felix Kuehling
adev->irq.ih1.doorbell_index = (adev->doorbell_index.ih + 1) << 1; > + if (adev->asic_type == CHIP_ARCTURUS) { This may apply to the Arcturus successor as well. I'd use asic_type < NAVI10 instead, to be future-proof. With these two issues fixed, the patch is Reviewed-by: Felix K

Re: [PATCH] drm/amdgpu: enable ih1 ih2 for Arcturus only

2020-09-02 Thread Felix Kuehling
Am 2020-09-02 um 2:13 p.m. schrieb Alex Deucher: > On Wed, Sep 2, 2020 at 2:08 PM Alex Deucher wrote: >> On Wed, Sep 2, 2020 at 1:01 PM Alex Sierra wrote: >>> Enable multi-ring ih1 and ih2 for Arcturus only. >>> For Navi10 family multi-ring has been disabled. >>> Apparently, having multi-ring

Re: [PATCH] drm/amdgpu: enable ih1 ih2 for Arcturus only

2020-09-02 Thread Felix Kuehling
Am 2020-09-02 um 2:08 p.m. schrieb Alex Deucher: > On Wed, Sep 2, 2020 at 1:01 PM Alex Sierra wrote: >> Enable multi-ring ih1 and ih2 for Arcturus only. >> For Navi10 family multi-ring has been disabled. >> Apparently, having multi-ring enabled in Navi was causing >> continus page fault

[PATCH 1/1] drm/amdkfd: Use a new capability bit for SRAM ECC

2020-09-10 Thread Felix Kuehling
quot;drm/amdkfd: fix set kfd node ras properties value") Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.

Re: [PATCH 3/3] drm/amdkfd: Reduce eviction/restore message levels

2020-09-10 Thread Felix Kuehling
Am 2020-09-10 um 2:54 p.m. schrieb Philip Cox: > Reduce the eviction and restore messages from INFO level to DEBUG level. > > Signed-off-by: Philip Cox This patch is Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 >

Re: [PATCH 2/3] drm/amdkfd: Add process eviction counters to sysfs

2020-09-10 Thread Felix Kuehling
Am 2020-09-10 um 2:54 p.m. schrieb Philip Cox: > Add per-process eviction counters to sysfs to keep track of > how many eviction events have happened for each process. > > Signed-off-by: Philip Cox > --- > drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 15 ++- >

Re: [PATCH 1/3] drm/amdkfd: Add some eveiction debugging code

2020-09-10 Thread Felix Kuehling
Am 2020-09-10 um 2:54 p.m. schrieb Philip Cox: > Extending the module parameter debug_evictions to also print a stack > trace when the eviction code path is called. > > Signed-off-by: Philip Cox This patch is Reviewed-by: Felix Kuehling > --- > drivers/

Re: [PATCH] drm/amdgpu: Enable SDMA utilization for Arcturus

2020-09-11 Thread Felix Kuehling
Am 2020-09-11 um 12:27 p.m. schrieb Mukul Joshi: > SDMA utilization calculations are enabled/disabled by > writing to SDMAx_PUB_DUMMY_REG2 register. Currently, > enable this only for Arcturus. > > Signed-off-by: Mukul Joshi > --- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 10 ++ > 1 file

Re: [PATCH v2] drm/kfd: fix a system crash issue during GPU recovery

2020-09-07 Thread Felix Kuehling
ccess. And then kfd_process_evict_queues will > access a freed memory, which cause a system crash. > > v2: > The failure to execute_queues should probably not be reported to > the caller of create_queue, because the queue was already created. ... and the failure affects all processes in t

Re: [PATCH] drm/amdgpu: enable ih1 ih2 for Arcturus only

2020-09-07 Thread Felix Kuehling
is is probably expected, and imperceptible to the standard user, > I thought I'd at least mention it in an effort to keep contributing. > > > > Thanks for the continued open source work. You all make my life. > > > > Cheers, > > Matt > > On 9/3/20 2:05 AM, Christian Kö

Re: [PATCH] drm/amdkfd: fix a memory leak issue

2020-09-07 Thread Felix Kuehling
he suspend > stage of GPU recovery. > > Signed-off-by: Dennis Li Reviewed-by: Felix Kuehling > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 069ba4be1e8f..20ef048d6a03 10

Re: [PATCH] drm/amdkfd: Move process doorbell allocation into kfd device

2020-09-14 Thread Felix Kuehling
l Joshi Two nit-picks inline. With those fixed, the patch is Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 +-- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ++ > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +- > drive

Re: [PATCH] drm/amdgpu: prevent double kfree ttm->sg

2020-09-15 Thread Felix Kuehling
_amdkfd_gpuvm_alloc_memory_of_gpu+0x9ca/0xb10 > [amdgpu] > [ 420.990666] kfd_ioctl_alloc_memory_of_gpu+0xef/0x2c0 [amdgpu] > > Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + > 1 file changed, 1 insertion(+)

Re: [PATCH] drm/amdgpu/gmc9: remove mmhub client duplicated case

2020-09-14 Thread Felix Kuehling
Am 2020-09-14 um 11:42 a.m. schrieb Alex Deucher: > Copy paste typo. > > Reported-by: kernel test robot > Signed-off-by: Alex Deucher Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a

Re: [PATCH] drm/kfd: fix a system crash issue during GPU recovery

2020-09-01 Thread Felix Kuehling
I'm not sure how the bug you're fixing is caused, but your fix is clearly in the wrong place. A queue being disabled is not the same thing as a queue being destroyed. Queues can be disabled for legitimate reasons, but they still should exist and be in the qpd->queues_list. If a destroyed queue

Re: [PATCH v2 -next] drm/amdkfd: Fix -Wunused-const-variable warning

2020-09-10 Thread Felix Kuehling
ven_device_info = { > ^ > > As Huang Rui suggested, Raven already has the fallback path, > so it should be out of IOMMU v2 flag. > > Suggested-by: Huang Rui > Signed-off-by: YueHaibing Reviewed-by: Felix Kuehling I applied yo

Re: [PATCH v2 2/3] drm/amdkfd: Add process eviction counters to sysfs

2020-09-11 Thread Felix Kuehling
Am 2020-09-11 um 4:10 p.m. schrieb Philip Cox: > Add per-process eviction counters to sysfs to keep track of > how many eviction events have happened for each process. > > v2: rename the stats dir, and track all evictions per process, per device. > > Signed-off-by: Philip Cox Some more comments

Re: [PATCH v2] drm/amdgpu: Enable SDMA utilization for Arcturus

2020-09-11 Thread Felix Kuehling
Am 2020-09-11 um 5:33 p.m. schrieb Mukul Joshi: > SDMA utilization calculations are enabled/disabled by > writing to SDMAx_PUB_DUMMY_REG2 register. Currently, > enable this only for Arcturus. > > Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling > --- > drive

Re: [RFC PATCH radeon-alex] drm/amdgpu: kfd_initialized can be static

2020-10-08 Thread Felix Kuehling
FYI, I applied this patch to amd-staging-drm-next. Sorry for the delay, I finally caught up with my vacation backlog. Regards,   Felix Am 2020-09-22 um 10:28 p.m. schrieb kernel test robot: > Fixes: 0b54e1e30e9f ("drm/amdgpu: Fix handling of KFD initialization > failures") > Signed-off-by:

Re: [PATCH v2] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

2020-10-13 Thread Felix Kuehling
Do you have more details about those test failures. In theory that test should pass with noretry=0. If it fails, I'd rather look into the problem than hiding it with a workaround. Regards,   Felix Am 2020-10-13 um 11:13 a.m. schrieb Chengming Gui: > noretry = 0 cause some dGPU's kfd page fault

Re: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

2020-10-15 Thread Felix Kuehling
Am 2020-10-14 um 11:35 p.m. schrieb Chengming Gui: > noretry = 0 cause some dGPU's kfd page fault tests fail, > so set noretry to 1 for these special ASICs: > vega20/navi10/navi14/ARCTURUS > > v2: merge raven and default case due to the same setting > v3: remove ARCTURUS > > Signed-off-by:

Re: [PATCH 1/1] drm/amdgpu: fix compute queue priority if num_kcq is less than 4

2020-10-16 Thread Felix Kuehling
Am 2020-10-16 um 11:34 a.m. schrieb Nirmoy Das: > Compute queues are configurable with module param, num_kcq. > amdgpu_gfx_is_high_priority_compute_queue was setting 1st 4 queues to > high priority queue leaving a null drm scheduler in > adev->gpu_sched[hw_ip]["normal_prio"].sched if num_kcq < 5

Re: [PATCH] drm/amdgpu: move amdgpu_num_kcq handling to a helper

2020-10-16 Thread Felix Kuehling
consistency. > > Signed-off-by: Alex Deucher Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 11 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 1 + > drivers/gpu/dr

Re: [PATCH] drm/amdkfd: Use same SQ prefetch setting as amdgpu

2020-10-19 Thread Felix Kuehling
Am 2020-10-19 um 10:02 a.m. schrieb Jay Cornwall: > 0 causes instruction fetch stall at cache line boundary under some > conditions on Navi10. A non-zero prefetch is the preferred default > in any case. > > Fixes soft hang in Luxmark. > > Signed-off-by: Jay Cornwall Review

Re: [PATCH 2/2] drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2

2020-10-19 Thread Felix Kuehling
Signed-off-by: Christian König > Reviewed-by: Madhav Chauhan (v1) I guess the speedup comes from the locking/prepare/commit overhead in amdgpu_vm_update_mapping that is now getting called less frequently and does more work in a single call. Reviewed-by: Felix Kuehling > --- > drive

Re: [PATCH v6] drm/amdkfd: implement the dGPU fallback path for apu (v6)

2020-08-23 Thread Felix Kuehling
nction if CRAT is broken. > v5: refine acpi crat good but no iommu support case, and rename the > title. > v6: fix the issue of dGPU initialized firstly, just modify the report > value in the node_show(). > > Signed-off-by: Huang Rui Reviewed-by: Felix Kuehling > --- >

Re: [PATCH] drm/amdkfd: sparse: Fix warning in reading SDMA counters

2020-08-17 Thread Felix Kuehling
Am 2020-08-17 um 4:45 p.m. schrieb Mukul Joshi: > Add __user annotation to fix related sparse warning while reading > SDMA counters from userland. > > Reported-by: kernel test robot > Signed-off-by: Mukul Joshi > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +--- > 1

Re: [PATCH v2] drm/amdkfd: sparse: Fix warning in reading SDMA counters

2020-08-17 Thread Felix Kuehling
Sorry, more bike-shedding. Am 2020-08-17 um 7:58 p.m. schrieb Mukul Joshi: > Add __user annotation to fix related sparse warning while reading > SDMA counters from userland. > > Reported-by: kernel test robot > Signed-off-by: Mukul Joshi > --- >

Re: [PATCH v2 1/3] drm/amdkfd: force raven as "dgpu" path (v2)

2020-08-18 Thread Felix Kuehling
Am 2020-08-18 um 9:09 a.m. schrieb Huang Rui: > We still have a few iommu issues which need to address, so force raven > as "dgpu" path for the moment. > > This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled > or ACPI CRAT table not correct. > > v2: use ignore_crat parameter

Re: [PATCH v2 2/3] drm/amdkfd: abstract the iommu device checking with ignore_crat

2020-08-18 Thread Felix Kuehling
I'd recommend making this the first change in the series. Make 'drm/amdkfd: force raven as "dgpu" path' the second patch. That way it only needs to change one place. A few more comments inline. Am 2020-08-18 um 9:09 a.m. schrieb Huang Rui: > It's better to use inline function to wrap the iommu

Re: [PATCH v2 3/3] drm/amdkfd: remove iommu v2 for old apu series

2020-08-18 Thread Felix Kuehling
Interesting. Does this actually work on Carrizo or Kaveri? I'd like to see any Thunk changes needed to support this before giving my R-b. For now this patch is Acked-by: Felix Kuehling Am 2020-08-18 um 9:09 a.m. schrieb Huang Rui: > We already support the fallback path, so it doesn't n

Re: [PATCH v3] drm/amdkfd: sparse: Fix warning in reading SDMA counters

2020-08-18 Thread Felix Kuehling
d-off-by: Mukul Joshi Reviewed-by: Felix Kuehling > --- > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 28 ++- > .../drm/amd/amdkfd/kfd_device_queue_manager.h | 8 +- > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 ++-- > 3 files changed, 12 insertion

Re: [PATCH] drm/amdkfd: Initialize SDMA activity counter to 0

2020-08-17 Thread Felix Kuehling
Am 2020-08-17 um 1:05 p.m. schrieb Mukul Joshi: > To prevent reporting erroneous SDMA usage, initialize SDMA > activity counter to 0 before using. > > Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 + > 1 file

Re: [PATCH v3] drm/amdkfd: Add GPU reset SMI event

2020-08-28 Thread Felix Kuehling
Am 2020-08-28 um 1:53 p.m. schrieb Mukul Joshi: > Add support for reporting GPU reset events through SMI. KFD > would report both pre and post GPU reset events. > > Signed-off-by: Mukul Joshi Minor coding-style nit-picks inline. With those fixed, this patch is Reviewed-by: Fe

Re: [PATCH] include/uapi/linux: Fix indentation in kfd_smi_event enum

2020-08-28 Thread Felix Kuehling
Am 2020-08-28 um 8:38 p.m. schrieb Mukul Joshi: > Replace spaces with Tabs to fix indentation in kfd_smi_event > enum. > > Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling > --- > include/uapi/linux/kfd_ioctl.h | 6 +++--- > 1 file changed, 3 insertions(+), 3

Re: [PATCH] drm/amdgpu: remap hdp coherency registers for vi on carrizo

2020-08-19 Thread Felix Kuehling
Just for Carrizo, HDP flushing doesn't make a lot of sense because we don't use HDP to access the framebuffer. The code you're changing doesn't look Carrizo-specific, but VI-specific. So it would affect Fiji and Polaris as well. We already support Fiji and Polaris dGPUs with KFD, apparently

Re: [PATCH v5] drm/amdkfd: implement the dGPU fallback path for apu (v5)

2020-08-21 Thread Felix Kuehling
Am 2020-08-21 um 8:50 a.m. schrieb Huang Rui: > We still have a few iommu issues which need to address, so force raven > as "dgpu" path for the moment. > > This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled > or ACPI CRAT table not correct. > > v2: Use ignore_crat parameter

Re: [PATCH v5] drm/amdkfd: implement the dGPU fallback path for apu (v5)

2020-08-22 Thread Felix Kuehling
Am 2020-08-21 um 9:42 p.m. schrieb Huang Rui: > On Sat, Aug 22, 2020 at 12:48:00AM +0800, Kuehling, Felix wrote: >> Am 2020-08-21 um 8:50 a.m. schrieb Huang Rui: >>> We still have a few iommu issues which need to address, so force raven >>> as "dgpu" path for the moment. >>> >>> This is to add

Re: [PATCH v4 2/2] drm/amdkfd: force raven as "dgpu" path (v4)

2020-08-20 Thread Felix Kuehling
Am 2020-08-20 um 11:53 p.m. schrieb Huang Rui: > On Fri, Aug 21, 2020 at 10:41:17AM +0800, Kuehling, Felix wrote: >> Am 2020-08-20 um 4:40 a.m. schrieb Huang Rui: >>> We still have a few iommu issues which need to address, so force raven >>> as "dgpu" path for the moment. >>> >>> This is to add

Re: [PATCH] drm/amdgpu: remap hdp coherency registers for vi on carrizo

2020-08-19 Thread Felix Kuehling
Am 2020-08-19 um 11:01 a.m. schrieb Huang Rui: > On Wed, Aug 19, 2020 at 10:36:05PM +0800, Kuehling, Felix wrote: >> Just for Carrizo, HDP flushing doesn't make a lot of sense because we >> don't use HDP to access the framebuffer. > OK, so soc15 and later need use HDP to access the framebuffer

Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

2020-08-19 Thread Felix Kuehling
Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui: > We still have a few iommu issues which need to address, so force raven > as "dgpu" path for the moment. > > This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled > or ACPI CRAT table not correct. > > v2: Use ignore_crat parameter

Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

2020-08-19 Thread Felix Kuehling
On 2020-08-19 7:56 p.m., Huang Rui wrote: On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote: Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui: We still have a few iommu issues which need to address, so force raven as "dgpu" path for the moment. This is to add the fallback path to

Re: [PATCH v6] drm/amd: Add Stream Performance Counter Monitors Driver -v6

2020-08-19 Thread Felix Kuehling
[+amd-gfx] Am 2020-08-19 um 12:15 p.m. schrieb Ba, Gang: > [AMD Official Use Only - Internal Distribution Only] > > Hi Felix, > > For write point update, what is the best way to do: Do you mean read pointer update? The hardware updates the write pointer, the driver updates the read pointer. >

Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

2020-08-19 Thread Felix Kuehling
Am 2020-08-19 um 11:09 p.m. schrieb Huang Rui: > On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote: >> On 2020-08-19 7:56 p.m., Huang Rui wrote: >>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote: Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui: > We still

Re: [PATCH v4 2/2] drm/amdkfd: force raven as "dgpu" path (v4)

2020-08-20 Thread Felix Kuehling
Am 2020-08-20 um 4:40 a.m. schrieb Huang Rui: > We still have a few iommu issues which need to address, so force raven > as "dgpu" path for the moment. > > This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled > or ACPI CRAT table not correct. > > v2: Use ignore_crat parameter

Re: [PATCH] drm/amdkfd: ignore userptr NUMA auto balancing event

2020-08-20 Thread Felix Kuehling
MMU_NOTIFY_PROTECTION_VMA is not specific to NUMA auto-balancing. It can also be the result of an mprotect system call which actually makes the VMA read-only. I don't think it's OK to ignore that notifier in the general case. Regards,   Felix Am 2020-08-19 um 2:00 p.m. schrieb Philip Yang: >

Re: [PATCH v2] drm/amdkfd: Add GPU reset SMI event

2020-08-26 Thread Felix Kuehling
Am 2020-08-26 um 4:01 p.m. schrieb Mukul Joshi: > Add support for reporting GPU reset events through SMI. KFD > would report both pre and post GPU reset events. > > Signed-off-by: Mukul Joshi > --- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 +++ > drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH 2/2] drm/amdkfd: call amdgpu_amdkfd_get_hive_id directly

2020-08-24 Thread Felix Kuehling
No need to use a function pointer because the implementation is not ASIC-specific. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c | 1

[PATCH 1/2] drm/amdkfd: call amdgpu_amdkfd_get_unique_id directly

2020-08-24 Thread Felix Kuehling
No need to use a function pointer because the implementation is not ASIC-specific. This fixes missing support due to a missing function pointer on Arcturus. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 1 - drivers/gpu/drm/amd/amdgpu

Re: [PATCH V3 1/1] drm/amdkfd: fix set kfd node ras properties value

2020-08-24 Thread Felix Kuehling
The KFD part looks good to me, other than the SDMA comment that Guchun pointed out. With that fixed this patch is Acked-by: Felix Kuehling Thanks,   Felix Am 2020-08-24 um 6:33 a.m. schrieb Stanley.Yang: > The ctx->features are new RAS implementation which > is only available f

Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

2020-08-20 Thread Felix Kuehling
Am 2020-08-20 um 5:38 a.m. schrieb Huang Rui: > On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote: >> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote: >>> On 2020-08-19 7:56 p.m., Huang Rui wrote: On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote: >

Re: [PATCH] drm/amdgpu: Use SKU instead of DID for FRU check

2020-09-29 Thread Felix Kuehling
Am 2020-09-29 um 7:31 a.m. schrieb Kent Russell: > The VG20 DIDs 66a0, 66a1 and 66a4 are used for various SKUs that may or may > not have the FRU EEPROM on it. Parse the VBIOS to check for server SKU > variants (D131 or D134) until a more general solution can be determined. > > Signed-off-by:

Re: [PATCH] drm/amdgpu: Use SKU instead of DID for FRU check v2

2020-09-29 Thread Felix Kuehling
: Remove string-based logic, correct the VBIOS string comment > > Signed-off-by: Kent Russell Reviewed-by: Felix Kuehling > --- > .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 34 +-- > 1 file changed, 23 insertions(+), 11 deletions(-) > > dif

Re: [PATCH 1/4] drm/amdkfd: Remove legacy code from trap handler

2020-10-01 Thread Felix Kuehling
The series is Acked-by: Felix Kuehling I'm hoping Laurent can give it a more through and informed R-b. Thanks,   Felix Am 2020-10-01 um 2:24 p.m. schrieb Jay Cornwall: > ATC and MTYPE fields do not exist in gfx9 or later. > > Signed-off-by: Jay Cornwall > Cc: Lauren

Re: [PATCH] drm/amdkfd: Calculate CPU VCRAT size dynamically

2020-09-18 Thread Felix Kuehling
lso need kvfree to free the memory. For consistency, use kvmalloc for GPU CRAT allocation, too, and replace the kmemdup in kfd_create_crat_image_acpi with kvmalloc+memcpy. Let's make that kmalloc->kvmalloc change a second commit. This patch is Reviewed-by: Felix Kuehling Regards

Re: CONFIG_AMDGPU triggers full rebuild

2020-09-28 Thread Felix Kuehling
If I had to guess, I'd say something HMM-related. There has been some back-and-forth between kernel releases. So I won't say anything more specific without knowing exactly which branch or release you're on. Regards,   Felix Am 2020-09-25 um 10:29 a.m. schrieb Thomas Zimmermann: > Hi, > >

Re: [PATCH] amdgpu/drm: cleanup navi10 ih logic about older ASIC

2020-09-28 Thread Felix Kuehling
[+Alex] I think this was added for Arcturus, which shares the same IH IP as Navi10 and needs to support virtualization. Regards,   Felix Am 2020-09-25 um 7:30 a.m. schrieb Zhang, Hawking: > [AMD Public Use] > > Hi Likun, > > Let's take a step back to check with Alex S why he add the ASIC type

Re: [PATCH v3 2/3] drm/amdkfd: Add process eviction counters to sysfs

2020-09-17 Thread Felix Kuehling
hen > they can be added to this structure, but I don't think the eviction stats > should be. Thanks. Makes sense. I expect that all the stats will need the PDD. They are all per-process, per-device stats. With the small nit-picks fixed, the patch is Reviewed-by: Felix Kuehling Regards,  

[PATCH 1/1] drm/amdkfd: Fix GCC 10 compiler warning

2020-05-25 Thread Felix Kuehling
| snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__) | ^ This patch fixes the warnings and makes the sysfs code more efficient by remembering the offset in the buffer between append operations. Signed-off-by: Feli

Re: [PATCH][next] drm/amdkfd: fix a dereference of pdd before it is null checked

2020-05-28 Thread Felix Kuehling
check") Fixes: 522b89c63370 ("drm/amdkfd: Track SDMA utilization per process") Signed-off-by: Colin Ian King Reviewed-by: Felix Kuehling I applied the patch to our internal amd-staging-drm-next. Regards,   Felix --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++--

Re: [PATCH v3] drm/amdkfd: Track SDMA utilization per process

2020-05-26 Thread Felix Kuehling
ion of past acitivt counter under dqm_lock. Typo: activity Other than that, the patch is Reviewed-by: Felix Kuehling > > Signed-off-by: Mukul Joshi > --- > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 57 > .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2

Re: [PATCH] drm/amdgpu: fix leftover drm_gem_object_put_unlocked call

2020-05-22 Thread Felix Kuehling
Am 2020-05-22 um 3:38 p.m. schrieb Simon Ser: > drm_gem_object_put_unlocked has been renamed to drm_gem_object_put. Alex, I guess you'll need to apply this patch when you include e07ddb0ce7cd in a pull request to Dave Airlie. I don't think it makes sense to apply this on amd-kfd-staging until the

Re: drm/amdkfd: Change pasid's type to unsigned int

2020-05-22 Thread Felix Kuehling
Hi Fenghua, The PASID width in KFD is currently limited to 16 bits. I believe this reflects what our hardware can handle. KFD will never allocate a PASID bigger than 16 bits. That said, I'm OK with changing this field in the kfd_process structure to unsigned int. Generally, I find uint16_t in

[PATCH 1/1] drm/amdgpu: Fix handling of KFD initialization failures

2020-09-16 Thread Felix Kuehling
ven when KFD failed. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 11 ++- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_module.c| 1 + 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/

Re: [PATCH v3 2/3] drm/amdkfd: Add process eviction counters to sysfs

2020-09-16 Thread Felix Kuehling
Some nit-picks and one more possible simplification inline. I want to make adding more stats later as painless as possible. Looks good otherwise. Am 2020-09-16 um 2:42 p.m. schrieb Philip Cox: > Add per-process eviction counters to sysfs to keep track of > how many eviction events have happened

Re: [PATCH 2/2] drm/amdkfd: Change unique_id to print hex format

2020-10-28 Thread Felix Kuehling
So rocm-smi reads the decimal and converts it to hex? Then changing KFD will break rocm-smi. If you want to fix rocminfo, you'll need to fix it in the rocminfo code to do the conversion to hex. Regards,   Felix Am 2020-10-28 um 12:02 p.m. schrieb Russell, Kent: > [AMD Public Use] > > rocminfo

Re: [PATCH 1/2] drm/amdkfd: Fix getting unique_id in topology

2020-10-28 Thread Felix Kuehling
Please also remove the broken code that initializes gpu->unique_id and remove the unique_id field from the structure. Regards,   Felix Am 2020-10-28 um 11:22 a.m. schrieb Kent Russell: > Since the unique_id is now obtained in amdgpu in smu_late_init, > topology's device addition is now happening

Re: [PATCH 2/2] drm/amdkfd: Change unique_id to print hex format

2020-10-28 Thread Felix Kuehling
This is an ABI-breaking change. Is any user mode code using this already? Regards,   Felix Am 2020-10-28 um 11:22 a.m. schrieb Kent Russell: > amdgpu's unique_id prints in hex format, so change topology's printout > to hex by adding a new sysfs_print macro specifically for hex output, > and use

Re: [PATCH] drm/amdkfd: Add thermal throttling SMI event

2020-07-21 Thread Felix Kuehling
Am 2020-07-21 um 5:01 p.m. schrieb Mukul Joshi: > Add support for reporting thermal throttling events through SMI. > Also, add a counter to count the number of throttling interrupts > observed and report the count in the SMI event message. > > Signed-off-by: Mukul Joshi > --- >

Re: [PATCH v2] drm/amdkfd: option to disable system mem limit

2020-08-04 Thread Felix Kuehling
Am 2020-08-04 um 4:42 p.m. schrieb Philip Yang: > If multiple process share system memory through /dev/shm, KFD allocate > memory should not fail if it reaches the system memory limit because > one copy of physical system memory are shared by multiple process. > > Add module parameter

Re: [PATCH v3] drm/amdkfd: option to disable system mem limit

2020-08-04 Thread Felix Kuehling
gt; because system memory reaches limit. > > Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

2020-08-07 Thread Felix Kuehling
Am 2020-08-07 um 2:57 a.m. schrieb Christian König: [snip] > That's a really good argument, but I still hesitate to merge this > patch. How severe is the lockdep splat? I argued before that any lockdep splat is bad, because it disables further lockdep checking and can hide other lockdep problems

Re: [PATCH] drm/amdgpu: annotate a false positive locking dependency

2020-08-05 Thread Felix Kuehling
The commit headline is misleading. An annotation would be something like replacing mutex_lock with mutex_lock_nested. You're not annotating anything, you're actually changing the locking. Am 2020-08-05 um 9:24 p.m. schrieb Dennis Li: > [ 264.483189]

Re: [PATCH] drm/amdkfd: force raven as "dgpu" path

2020-08-07 Thread Felix Kuehling
Am 2020-08-07 um 4:25 a.m. schrieb Huang Rui: > We still have a few iommu issues which need to address, so force raven > as "dgpu" path for the moment. > > Will enable IOMMUv2 since the issues are fixed. Do you mean "_when_ the issues are fixed"? The current iommuv2 troubles aside, I think this

<    7   8   9   10   11   12   13   14   15   16   >