[PATCH] drm/amd/pm: fix gpu reset failure by MP1 state setting

2021-03-22 Thread Guchun Chen
Instead of blocking varied unsupported MP1 state in upper level, defer and skip such MP1 state handling in specific ASIC. Signed-off-by: Lijo Lazar Signed-off-by: Guchun Chen --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c| 3 ---

Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Joshua Ashton
Typo in the title: s/dispaly/display - Joshie ✨ On 3/22/21 8:11 AM, Lang Yu wrote: In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084]

RE: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Yu, Lang
[AMD Official Use Only - Internal Distribution Only] -Original Message- From: Grodzovsky, Andrey Sent: Monday, March 22, 2021 11:01 PM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Ray Subject: Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu

[PATCH] drm/amd/display: Use DRM_DEBUG_DP

2021-03-22 Thread Luben Tuikov
Convert IRQ-based prints from DRM_DEBUG_DRIVER to DRM_DEBUG_DP, as the latter is not used in drm/amd prior to this patch and since IRQ-based prints drown out the rest of the driver's DRM_DEBUG_DRIVER messages. Cc: Harry Wentland Cc: Alex Deucher Signed-off-by: Luben Tuikov ---

Re: [PATCH] drm/amd/display: Allow idle optimization based on vblank.

2021-03-22 Thread R, Bindu
[AMD Official Use Only - Internal Distribution Only] ​Hi, The updated patch has been merged and is available with commit ID "ef5c594461650de0a18aa0bfd240189991790d7e". Somehow missed to mail the updated version, attached is the updated patch, please review and let me know if any changes

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

2021-03-22 Thread Michal Hocko
On Mon 22-03-21 14:05:48, Matthew Wilcox wrote: > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote: > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter: > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König > > > >

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

2021-03-22 Thread Christian König
Am 22.03.21 um 18:02 schrieb Daniel Vetter: On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko wrote: On Mon 22-03-21 14:05:48, Matthew Wilcox wrote: On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote: On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: Am 20.03.21 um

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-22 Thread Alex Deucher
On Thu, Mar 18, 2021 at 8:19 AM Harvey wrote: > > Alex, > > I waited for kernel 5.11.7 to hit our repos yesterday evening and tested > again: > > 1. The suspend issue is gone - suspend and resume now work as expected. > > 2. System hibernation seems to be a different beast - still freezing You

Re: [PATCH 00/44] Add HMM-based SVM memory manager to KFD v2

2021-03-22 Thread Daniel Vetter
On Mon, Mar 22, 2021 at 5:07 PM Felix Kuehling wrote: > > Am 2021-03-22 um 10:15 a.m. schrieb Daniel Vetter: > > On Mon, Mar 22, 2021 at 06:58:16AM -0400, Felix Kuehling wrote: > >> Since the last patch series I sent on Jan 6 a lot has changed. Patches 1-33 > >> are the cleaned up, rebased on

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

2021-03-22 Thread Daniel Vetter
On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko wrote: > > On Mon 22-03-21 14:05:48, Matthew Wilcox wrote: > > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote: > > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: > > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter: > >

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-22 Thread Harvey
Still freezing on 5.11.8 and 5.12-rc4. Log on 5.12-rc4 looks a little different: Mär 22 17:40:26 obelix systemd[1]: Reached target Sleep. Mär 22 17:40:26 obelix systemd[1]: Starting Hibernate... Mär 22 17:40:26 obelix kernel: PM: hibernation: hibernation entry Mär 22 17:40:26 obelix

Re: [PATCH 00/44] Add HMM-based SVM memory manager to KFD v2

2021-03-22 Thread Felix Kuehling
Am 2021-03-22 um 10:15 a.m. schrieb Daniel Vetter: > On Mon, Mar 22, 2021 at 06:58:16AM -0400, Felix Kuehling wrote: >> Since the last patch series I sent on Jan 6 a lot has changed. Patches 1-33 >> are the cleaned up, rebased on amd-staging-drm-next 5.11 version from about >> a week ago. The

Re: [PATCH] drm/amdgpu: Ensure that the modifier requested is supported by plane.

2021-03-22 Thread Mark Yacoub
"friendly ping" On Wed, Mar 10, 2021 at 11:14 AM Mark Yacoub wrote: > From: Mark Yacoub > > On initializing the framebuffer, call drm_any_plane_has_format to do a > check if the modifier is supported. drm_any_plane_has_format calls > dm_plane_format_mod_supported which is extended to validate

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

2021-03-22 Thread Matthew Wilcox
On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote: > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: > > Am 20.03.21 um 14:17 schrieb Daniel Vetter: > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König > > > wrote: > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:

Re: [PATCH] amdgpu: avoid incorrect %hu format string

2021-03-22 Thread Tom Rix
On 3/22/21 4:54 AM, Arnd Bergmann wrote: > From: Arnd Bergmann > > clang points out that the %hu format string does not match the type > of the variables here: > > drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type > 'unsigned short' but the argument has type

Re: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

2021-03-22 Thread Gustavo A. R. Silva
On 3/22/21 09:04, Chen, Guchun wrote: > [AMD Public Use] > > Thanks for your patch, Silva. The issue has been fixed by " a5c6007e20e1 > drm/amd/display: fix modprobe failure on vega series". Great. :) Good to know this is already fixed. Thanks! -- Gustavo

Re: 16 bpc fixed point (RGBA16) framebuffer support for core and AMD.

2021-03-22 Thread Ville Syrjälä
On Fri, Mar 19, 2021 at 10:03:12PM +0100, Mario Kleiner wrote: > Hi, > > this patch series adds the fourcc's for 16 bit fixed point unorm > framebuffers to the core, and then an implementation for AMD gpu's > with DisplayCore. > > This is intended to allow for pageflipping to, and direct scanout

Re: [PATCH] drm/amd/pm: drop redundant and unneeded BACO APIs V2

2021-03-22 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Alex Deucher From: amd-gfx on behalf of Evan Quan Sent: Monday, March 22, 2021 2:11 AM To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan Subject: [PATCH] drm/amd/pm: drop redundant and unneeded

Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-22 Thread Steven Price
On 15/03/2021 05:23, Zhang, Jack (Jian) wrote: [AMD Public Use] Hi, Rob/Tomeu/Steven, Would you please help to review this patch for panfrost driver? Thanks, Jack Zhang -Original Message- From: Jack Zhang Sent: Monday, March 15, 2021 1:21 PM To: dri-de...@lists.freedesktop.org;

Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Andrey Grodzovsky
On 2021-03-22 4:11 a.m., Lang Yu wrote: In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084]

Re: [PATCH] drm/amd/display: Set AMDGPU_DM_DEFAULT_MIN_BACKLIGHT to 0

2021-03-22 Thread Alex Deucher
On Sun, Mar 21, 2021 at 8:12 PM Evan Benn wrote: > > On Sat, Mar 20, 2021 at 8:36 AM Alex Deucher wrote: > > > > On Fri, Mar 19, 2021 at 5:31 PM Evan Benn wrote: > > > > > > On Sat, 20 Mar 2021 at 02:10, Harry Wentland > > > wrote: > > > > On 2021-03-19 10:22 a.m., Alex Deucher wrote: > > > >

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

2021-03-22 Thread Daniel Vetter
On Mon, Mar 22, 2021 at 02:05:48PM +, Matthew Wilcox wrote: > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote: > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter: > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian

Re: [PATCH 00/44] Add HMM-based SVM memory manager to KFD v2

2021-03-22 Thread Daniel Vetter
On Mon, Mar 22, 2021 at 06:58:16AM -0400, Felix Kuehling wrote: > Since the last patch series I sent on Jan 6 a lot has changed. Patches 1-33 > are the cleaned up, rebased on amd-staging-drm-next 5.11 version from about > a week ago. The remaining 11 patches are current work-in-progress with >

RE: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

2021-03-22 Thread Chen, Guchun
[AMD Public Use] Thanks for your patch, Silva. The issue has been fixed by " a5c6007e20e1 drm/amd/display: fix modprobe failure on vega series". Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Gustavo A. R. Silva Sent: Monday, March 22, 2021 8:51 PM To: Lee Jones ;

Re: [PATCH 2/2] drm/amdgpu: Introduce new SETUP_TMR interface

2021-03-22 Thread Zeng, Oak
[AMD Official Use Only - Internal Distribution Only] Hello all, Can someone help to review below patches? We verified with firmware team and want to check-in together with psp firmware Regards, Oak On 2021-03-12, 4:24 PM, "Zeng, Oak" wrote: This new interface passes both virtual and

Re: [PATCH V2] drm/amdgpu: Fix a typo

2021-03-22 Thread Alex Deucher
On Sat, Mar 20, 2021 at 3:52 AM Randy Dunlap wrote: > > > > On Fri, 19 Mar 2021, Bhaskar Chowdhury wrote: > > > s/traing/training/ > > > > ...Plus the entire sentence construction for better readability. > > > > Signed-off-by: Bhaskar Chowdhury > > --- > > Changes from V1: > > Alex and Randy's

[PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

2021-03-22 Thread Gustavo A. R. Silva
The wrong sizeof values are currently being used as arguments to kzalloc(). Fix this by using the right arguments *dceip and *vbios, correspondingly. Addresses-Coverity-ID: 1502901 ("Wrong sizeof argument") Fixes: fca1e079055e ("drm/amd/display/dc/calcs/dce_calcs: Remove some large variables

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

2021-03-22 Thread Daniel Vetter
On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: > Am 20.03.21 um 14:17 schrieb Daniel Vetter: > > On Sat, Mar 20, 2021 at 10:04 AM Christian König > > wrote: > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter: > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:

Re: [PATCH] drm/ttm: stop warning on TT shrinker failure v2

2021-03-22 Thread Daniel Vetter
On Mon, Mar 22, 2021 at 12:22 PM Christian König wrote: > > Don't print a warning when we fail to allocate a page for swapping things out. > > v2: only stop the warning > > Signed-off-by: Christian König Reviewed-by: Daniel Vetter It is kinda surprising that page allocator warns here even

Re: [PATCH 29/44] drm/amdgpu: reserve fence slot to update page table

2021-03-22 Thread Christian König
Am 22.03.21 um 11:58 schrieb Felix Kuehling: From: Philip Yang Forgot to reserve a fence slot to use sdma to update page table, cause below kernel BUG backtrace to handle vm retry fault while application is exiting. [ 133.048143] kernel BUG at

RE: [PATCH 00/14] DC Patches March 22, 2021

2021-03-22 Thread Wheeler, Daniel
[AMD Public Use] Hi all, This week this patchset was tested on a HP Envy 360, with Ryzen 5 4500U, on the following display types (via usb-c to dp/dvi/hdmi/vga): 4k 60z, 1440p 144hz, 1680*1050 60hz, internal eDP 1080p 60hz Tested on a Sapphire Pulse RX5700XT on the following display types (via

Re: [RESEND 00/53] Rid GPU from W=1 warnings

2021-03-22 Thread Lee Jones
On Fri, 19 Mar 2021, Daniel Vetter wrote: > On Fri, Mar 19, 2021 at 08:24:07AM +, Lee Jones wrote: > > On Thu, 18 Mar 2021, Daniel Vetter wrote: > > > > > On Wed, Mar 17, 2021 at 9:32 PM Daniel Vetter wrote: > > > > > > > > On Wed, Mar 17, 2021 at 9:17 AM Lee Jones wrote: > > > > > > > > >

Re: [PATCH] drm/amd/display: fix modprobe failure on vega series

2021-03-22 Thread Lee Jones
On Mon, 22 Mar 2021, Guchun Chen wrote: > Fixes: d88b34caee83 ("Remove some large variables from the stack") > > [ 41.232097] Call Trace: > [ 41.232105] kvasprintf+0x66/0xd0 > [ 41.232122] kasprintf+0x49/0x70 > [ 41.232136] __drm_crtc_init_with_planes+0x2e1/0x340 [drm] > [

[PATCH] drivers: gpu: Remove duplicate include of amdgpu_hdp.h

2021-03-22 Thread Wan Jiabing
amdgpu_hdp.h has been included at line 91, so remove the duplicate include. Signed-off-by: Wan Jiabing --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index

[PATCH] drm/amd/pm/powerplay/smumgr/smu7_smumgr: Fix some typo error

2021-03-22 Thread samirweng1979
From: wengjianfeng change 'addres' to 'address' Signed-off-by: wengjianfeng --- drivers/gpu/drm/amd/pm/powerplay/smumgr/smu7_smumgr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu7_smumgr.c

RE: [PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""

2021-03-22 Thread Chen, Guchun
[AMD Public Use] Hi Christian, I will conduct one stress test for this tomorrow. Would you mind waiting for my ack before submitting? Regards, Guchun -Original Message- From: Christian König Sent: Monday, March 22, 2021 8:41 PM To: amd-gfx@lists.freedesktop.org Cc: Chen, Guchun ;

[PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""

2021-03-22 Thread Christian König
Now that we found the underlying problem we can re-apply this patch. This reverts commit 867fee7f8821ff42e7308088cf0c3450ac49c17c. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 55 +- 1 file changed, 18 insertions(+), 37 deletions(-) diff

Re: [PATCH] drivers: gpu: Remove duplicate include of amdgpu_hdp.h

2021-03-22 Thread Christian König
Am 22.03.21 um 13:02 schrieb Wan Jiabing: amdgpu_hdp.h has been included at line 91, so remove the duplicate include. Signed-off-by: Wan Jiabing Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - 1 file changed, 1 deletion(-) diff --git

Re: [PATCH] amdgpu: avoid incorrect %hu format string

2021-03-22 Thread Christian König
Am 22.03.21 um 12:54 schrieb Arnd Bergmann: From: Arnd Bergmann clang points out that the %hu format string does not match the type of the variables here: drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int'

[PATCH] amdgpu: avoid incorrect %hu format string

2021-03-22 Thread Arnd Bergmann
From: Arnd Bergmann clang points out that the %hu format string does not match the type of the variables here: drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat]

[PATCH] drm/ttm: stop warning on TT shrinker failure v2

2021-03-22 Thread Christian König
Don't print a warning when we fail to allocate a page for swapping things out. v2: only stop the warning Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_tt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c

[PATCH 38/44] drm/amdkfd: Simplify split_by_granularity

2021-03-22 Thread Felix Kuehling
svm_range_split_by_granularity always added the parent range and only the parent range to the update list for the caller to add it to the deferred work list. So just do that in the caller unconditionally and eliminate the update_list parameter. Split the range so that the original prange is

[PATCH 40/44] drm/amdkfd: Return pdd from kfd_process_device_from_gduid

2021-03-22 Thread Felix Kuehling
This saves callers from looking up the pdd with a linear search later. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 8 +++- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 - drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 51 +++- 3 files

[PATCH 42/44] drm/amdkfd: Allow invalid pages in migration.src

2021-03-22 Thread Felix Kuehling
This can happen when syste memory page were never allocated. Skip them during the migration. 0-initialize the BO. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 50 ++-- 1 file changed, 38 insertions(+), 12 deletions(-) diff --git

[PATCH 41/44] drm/amdkfd: Remove broken deferred mapping

2021-03-22 Thread Felix Kuehling
Mapping without validation is broken. Also removed saving the pages from the last migration. They may be invalidated without an MMU notifier to catch it, so let the next proper validation take care of it. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 14

[PATCH 43/44] drm/amdkfd: Correct locking during migration and mapping

2021-03-22 Thread Felix Kuehling
This fixes potential race conditions between any code that validates and maps SVM ranges and MMU notifiers. The whole sequence is encapsulated in svm_range_validate_and_map. The page_addr and hmm_range structures are not useful outside that function, so they were removed from struct svm_range.

[PATCH 34/44] drm/amdkfd: Fix dma unmapping

2021-03-22 Thread Felix Kuehling
Don't dma_unmap in unmap_from_gpu. The dma_addr arrays are protected by the migrate_mutex, which we cannot hold when unmapping in MMU notifiers. Instead dma_unmap and free dma_addr arrays whenever the pages_array is invalidated: when migrating to VRAM and when re-validating RAM. Freeing dma_addr

[PATCH 44/44] drm/amdkfd: Nested locking and invalidation of child ranges

2021-03-22 Thread Felix Kuehling
This allows validation of child ranges, so the GPU page fault handler can be more light-weight. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 8 + drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 40 +--- 2 files changed, 37 insertions(+), 11

[PATCH 32/44] drm/amdkfd: multiple gpu migrate vram to vram

2021-03-22 Thread Felix Kuehling
If prefetch range to gpu with acutal location is another gpu, or GPU retry fault restore pages to migrate the range with acutal location is gpu, then migrate from one gpu to another gpu. Use system memory as bridge because sdma engine may not able to access another gpu vram, use sdma of source

[PATCH 37/44] drm/amdkfd: Fix svm_bo_list locking in eviction worker

2021-03-22 Thread Felix Kuehling
Take the svm_bo_list spin lock when iterating of the range list during eviction. Change-Id: I979d959e06c32e114cea8d151933b8ee7455627e Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git

[PATCH 33/44] drm/amdkfd: Add SVM API support capability bits

2021-03-22 Thread Felix Kuehling
From: Philip Yang SVMAPISupported property added to HSA_CAPABILITY, the value match HSA_CAPABILITY defined in Thunk spec: SVMAPISupported: it will not be supported on older kernels that don't have HMM or on systems with GFXv8 or older GPUs without support for 48-bit virtual addresses.

[PATCH 35/44] drm/amdkfd: Call mutex_destroy

2021-03-22 Thread Felix Kuehling
Destroy SVM-related mutexes correctly. Change-Id: I85da30b1b0dce72433e6d3b507cb0b55b83b433c Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

[PATCH 36/44] drm/amdkfd: Fix spurious restore failures

2021-03-22 Thread Felix Kuehling
Restore can appear to fail if the svms->evicted counter changes before the function can acquire the necessary locks. Re-read the counter after acquiring the lock to minimize the chances of having to reschedule the worker. Change-Id: I236b912bddf106583be264abde2f6bd1a5d5a083 Signed-off-by: Felix

[PATCH 31/44] drm/amdkfd: add svm range validate timestamp

2021-03-22 Thread Felix Kuehling
With xnack on, add validate timestamp in order to handle GPU vm fault from multiple GPUs. If GPU retry fault need migrate the range to the best restore location, use range validate timestamp to record system timestamp after range is restored to update GPU page table. Because multiple pages of

[PATCH 39/44] drm/amdkfd: Point out several race conditions

2021-03-22 Thread Felix Kuehling
There are several race conditions with XNACK enabled. For now just some FIXME comments with ideas how to fix it. Change-Id: If0abab6dcb8f4e95c9d8820f6c569263eda29a89 Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_svm.c |

[PATCH 30/44] drm/amdkfd: refine migration policy with xnack on

2021-03-22 Thread Felix Kuehling
With xnack on, GPU vm fault handler decide the best restore location, then migrate range to the best restore location and update GPU mapping to recover the GPU vm fault. Signed-off-by: Philip Yang Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling ---

[PATCH 12/44] drm/amdkfd: add xnack enabled flag to kfd_process

2021-03-22 Thread Felix Kuehling
From: Alex Sierra This flag is useful at cpu invalidation page table decision. Between select queue eviction or page fault. Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 36

[PATCH 24/44] drm/amdkfd: add svm_bo reference for eviction fence

2021-03-22 Thread Felix Kuehling
From: Alex Sierra [why] As part of the SVM functionality, the eviction mechanism used for SVM_BOs is different. This mechanism uses one eviction fence per prange, instead of one fence per kfd_process. [how] A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between SVM_BO or

[PATCH 13/44] drm/amdkfd: add ioctl to configure and query xnack retries

2021-03-22 Thread Felix Kuehling
From: Alex Sierra Xnack retries are used for page fault recovery. Some AMD chip families support continuously retry while page table entries are invalid. The driver must handle the page fault interrupt and fill in a valid entry for the GPU to continue. This ioctl allows to enable/disable XNACK

[PATCH 15/44] drm/amdkfd: validate vram svm range from TTM

2021-03-22 Thread Felix Kuehling
If svm range perfetch location is not zero, use TTM to alloc amdgpu_bo vram nodes to validate svm range, then map vram nodes to GPUs. Use offset to sub allocate from the same amdgpu_bo to handle overlap vram range while adding new range or unmapping range. svm_bo has ref count to trace the

[PATCH 25/44] drm/amdgpu: add param bit flag to create SVM BOs

2021-03-22 Thread Felix Kuehling
From: Alex Sierra Add CREATE_SVM_BO define bit for SVM BOs. Another define flag was moved to concentrate these KFD type flags in one include file. Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 ++-

[PATCH 27/44] drm/amdgpu: svm bo enable_signal call condition

2021-03-22 Thread Felix Kuehling
From: Alex Sierra [why] To support svm bo eviction mechanism. [how] If the BO crated has AMDGPU_AMDKFD_CREATE_SVM_BO flag set, enable_signal callback will be called inside amdgpu_evict_flags. This also causes gutting of the BO by removing all placements, so that TTM won't actually do an

[PATCH 28/44] drm/amdgpu: add svm_bo eviction to enable_signal cb

2021-03-22 Thread Felix Kuehling
From: Alex Sierra Add to amdgpu_amdkfd_fence.enable_signal callback, support for svm_bo fence eviction. Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git

[PATCH 20/44] drm/amdkfd: invalidate tables on page retry fault

2021-03-22 Thread Felix Kuehling
GPU page tables are invalidated by unmapping prange directly at the mmu notifier, when page fault retry is enabled through amdgpu_noretry global parameter. The restore page table is performed at the page fault handler. If xnack is on, we update GPU mappings after migration to avoid unnecessary

[PATCH 18/44] drm/amdkfd: HMM migrate ram to vram

2021-03-22 Thread Felix Kuehling
Register svm range with same address and size but perferred_location is changed from CPU to GPU or from GPU to CPU, trigger migration the svm range from ram to vram or from vram to ram. If svm range prefetch location is GPU with flags KFD_IOCTL_SVM_FLAG_HOST_ACCESS, validate the svm range on ram

[PATCH 03/44] drm/amdkfd: add svm ioctl API

2021-03-22 Thread Felix Kuehling
From: Philip Yang Add svm (shared virtual memory) ioctl data structure and API definition. The svm ioctl API is designed to be extensible in the future. All operations are provided by a single IOCTL to preserve ioctl number space. The arguments structure ends with a variable size array of

[PATCH 19/44] drm/amdkfd: HMM migrate vram to ram

2021-03-22 Thread Felix Kuehling
If CPU page fault happens, HMM pgmap_ops callback migrate_to_ram start migrate memory from vram to ram in steps: 1. migrate_vma_pages get vram pages, and notify HMM to invalidate the pages, HMM interval notifier callback evict process queues 2. Allocate system memory pages 3. Use svm copy memory

[PATCH 22/44] drm/amdkfd: page table restore through svm API

2021-03-22 Thread Felix Kuehling
Page table restore implementation in SVM API. This is called from the fault handler at amdgpu_vm. To update page tables through the page fault retry IH. Signed-off-by: Alex Sierra Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 69

[PATCH 21/44] drm/amdgpu: enable 48-bit IH timestamp counter

2021-03-22 Thread Felix Kuehling
From: Alex Sierra By default this timestamp is 32 bit counter. It gets overflowed in around 10 minutes. Change-Id: I7c46604b0272dcfd1ce24351437c16fe53dca0ab Signed-off-by: Alex Sierra Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 1 + 1 file changed, 1 insertion(+)

[PATCH 23/44] drm/amdkfd: SVM API call to restore page tables

2021-03-22 Thread Felix Kuehling
From: Alex Sierra Use SVM API to restore page tables when retry fault and compute context are enabled. Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git

[PATCH 06/44] drm/amdgpu: add common HMM get pages function

2021-03-22 Thread Felix Kuehling
From: Philip Yang Move the HMM get pages function from amdgpu_ttm and to amdgpu_mn. This common function will be used by new svm APIs. Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 83 +

[PATCH 29/44] drm/amdgpu: reserve fence slot to update page table

2021-03-22 Thread Felix Kuehling
From: Philip Yang Forgot to reserve a fence slot to use sdma to update page table, cause below kernel BUG backtrace to handle vm retry fault while application is exiting. [ 133.048143] kernel BUG at /home/yangp/git/compute_staging/kernel/drivers/dma-buf/dma-resv.c:281! [ 133.048487]

[PATCH 26/44] drm/amdkfd: add svm_bo eviction mechanism support

2021-03-22 Thread Felix Kuehling
svm_bo eviction mechanism is different from regular BOs. Every SVM_BO created contains one eviction fence and one worker item for eviction process. SVM_BOs can be attached to one or more pranges. For SVM_BO eviction mechanism, TTM will start to call enable_signal callback for every SVM_BO until

[PATCH 16/44] drm/amdkfd: support xgmi same hive mapping

2021-03-22 Thread Felix Kuehling
From: Philip Yang amdgpu_gmc_get_vm_pte use bo_va->is_xgmi same hive information to set pte flags to update GPU mapping. Add local structure variable bo_va, and update bo_va.is_xgmi, pass it to mapping->bo_va while mapping to GPU. Assuming xgmi pstate is hi after boot. Signed-off-by: Philip

[PATCH 00/44] Add HMM-based SVM memory manager to KFD v2

2021-03-22 Thread Felix Kuehling
Since the last patch series I sent on Jan 6 a lot has changed. Patches 1-33 are the cleaned up, rebased on amd-staging-drm-next 5.11 version from about a week ago. The remaining 11 patches are current work-in-progress with further cleanup and fixes. MMU notifiers and CPU page faults now can split

[PATCH 05/44] drm/amdkfd: add svm ioctl GET_ATTR op

2021-03-22 Thread Felix Kuehling
From: Philip Yang Get the intersection of attributes over all memory in the given range Signed-off-by: Philip Yang Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 164 +++ 1 file changed, 164 insertions(+) diff

[PATCH 08/44] drm/amdkfd: deregister svm range

2021-03-22 Thread Felix Kuehling
From: Philip Yang When application explicitly call unmap or unmap from mmput when application exit, driver will receive MMU_NOTIFY_UNMAP event to remove svm range from process svms object tree and list first, unmap from GPUs (in the following patch). Split the svm ranges to handle partial

[PATCH 09/44] drm/amdgpu: export vm update mapping interface

2021-03-22 Thread Felix Kuehling
From: Philip Yang It will be used by kfd to map svm range to GPU, because svm range does not have amdgpu_bo and bo_va, cannot use amdgpu_bo_update interface, use amdgpu vm update interface directly. Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling ---

[PATCH 11/44] drm/amdkfd: svm range eviction and restore

2021-03-22 Thread Felix Kuehling
HMM interval notifier callback notify CPU page table will be updated, stop process queues if the updated address belongs to svm range registered in process svms objects tree. Scheduled restore work to update GPU page table using new pages address in the updated svm range. The restore worker

[PATCH 07/44] drm/amdkfd: validate svm range system memory

2021-03-22 Thread Felix Kuehling
From: Philip Yang Use HMM to get system memory pages address, which will be used to map to GPUs or migrate to vram. Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 103 ++- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 4

[PATCH 10/44] drm/amdkfd: map svm range to GPUs

2021-03-22 Thread Felix Kuehling
Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap svm range system memory pages address to GPUs. Signed-off-by: Philip Yang Signed-off-by: Alex Sierra Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 395 +--

[PATCH 14/44] drm/amdkfd: register HMM device private zone

2021-03-22 Thread Felix Kuehling
From: Philip Yang Register vram memory as MEMORY_DEVICE_PRIVATE type resource, to allocate vram backing pages for page migration. Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 4 + drivers/gpu/drm/amd/amdkfd/Kconfig | 1

[PATCH 04/44] drm/amdkfd: register svm range

2021-03-22 Thread Felix Kuehling
From: Philip Yang svm range structure stores the range start address, size, attributes, flags, prefetch location and gpu bitmap which indicates which GPU this range maps to. Same virtual address is shared by CPU and GPUs. Process has svm range list which uses both interval tree and list to

[PATCH 02/44] drm/amdkfd: helper to convert gpu id and idx

2021-03-22 Thread Felix Kuehling
From: Alex Sierra svm range uses gpu bitmap to store which GPU svm range maps to. Application pass driver gpu id to specify GPU, the helper is needed to convert gpu id to gpu bitmap idx. Access through kfd_process_device pointers array from kfd_process. Signed-off-by: Alex Sierra

[PATCH 01/44] drm/amdgpu: replace per_device_list by array

2021-03-22 Thread Felix Kuehling
From: Alex Sierra Remove per_device_list from kfd_process and replace it with a kfd_process_device pointers array of MAX_GPU_INSTANCES size. This helps to manage the kfd_process_devices binded to a specific kfd_process. Also, functions used by kfd_chardev to iterate over the list were removed,

Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-22 Thread Daniel Gomez
On Mon, 22 Mar 2021 at 11:34, Christian König wrote: > > Hi Daniel, > > Am 22.03.21 um 10:38 schrieb Daniel Gomez: > > On Fri, 19 Mar 2021 at 21:29, Felix Kuehling wrote: > >> This caused a regression in kfdtest in a large-buffer stress test after > >> memory allocation for user pages fails: > >

Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-22 Thread Christian König
Hi Daniel, Am 22.03.21 um 10:38 schrieb Daniel Gomez: On Fri, 19 Mar 2021 at 21:29, Felix Kuehling wrote: This caused a regression in kfdtest in a large-buffer stress test after memory allocation for user pages fails: I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and not

Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-22 Thread Daniel Gomez
On Fri, 19 Mar 2021 at 21:29, Felix Kuehling wrote: > > This caused a regression in kfdtest in a large-buffer stress test after > memory allocation for user pages fails: I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and not this one. Just some background for the mem leak patch

Re: [PATCH] drm/amd/display: Allow idle optimization based on vblank.

2021-03-22 Thread Michel Dänzer
On 2021-03-20 1:31 a.m., R, Bindu wrote: > > The Update patch has been submitted. Submitted where? Still can't see it. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer

[PATCH] drm/amd/display: fix modprobe failure on vega series

2021-03-22 Thread Guchun Chen
Fixes: d88b34caee83 ("Remove some large variables from the stack") [ 41.232097] Call Trace: [ 41.232105] kvasprintf+0x66/0xd0 [ 41.232122] kasprintf+0x49/0x70 [ 41.232136] __drm_crtc_init_with_planes+0x2e1/0x340 [drm] [ 41.232219] ? create_object+0x263/0x3b0 [ 41.232231]

[PATCH] drm/amdgpu/swsmu: fix typo (memlk -> memclk)

2021-03-22 Thread Tobias Jakobi
- no functional changes Signed-off-by: Tobias Jakobi --- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 4 ++-- drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git

Re: [PATCH] drm/radeon: don't evict if not initialized

2021-03-22 Thread Tong Zhang
Thanks, Fixed as suggested and sent as v2. - Tong On Sun, Mar 21, 2021 at 9:26 AM Christian König wrote: > > > > Am 20.03.21 um 21:10 schrieb Tong Zhang: > > TTM_PL_VRAM may not initialized at all when calling > > radeon_bo_evict_vram(). We need to check before doing eviction. > > > > [

[PATCH] drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug

2021-03-22 Thread Qu Huang
Here is the system crash log: [ 1272.884438] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1272.88] IP: [< (null)>] (null) [ 1272.884447] PGD 825b09067 PUD 8267c8067 PMD 0 [ 1272.884452] Oops: 0010 [#1] SMP [ 1272.884509] CPU: 13 PID: 3485 Comm: cat

[PATCH v2] drm/radeon: don't evict if not initialized

2021-03-22 Thread Tong Zhang
TTM_PL_VRAM may not initialized at all when calling radeon_bo_evict_vram(). We need to check before doing eviction. [2.160837] BUG: kernel NULL pointer dereference, address: 0020 [2.161212] #PF: supervisor read access in kernel mode [2.161490] #PF: error_code(0x) -

Re: [PATCH V2] drm/amdgpu: Fix a typo

2021-03-22 Thread Randy Dunlap
On Fri, 19 Mar 2021, Bhaskar Chowdhury wrote: s/traing/training/ ...Plus the entire sentence construction for better readability. Signed-off-by: Bhaskar Chowdhury --- Changes from V1: Alex and Randy's suggestions incorporated. drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 8 1 file

[PATCH] drm/radeon: don't evict if not initialized

2021-03-22 Thread Tong Zhang
TTM_PL_VRAM may not initialized at all when calling radeon_bo_evict_vram(). We need to check before doing eviction. [2.160837] BUG: kernel NULL pointer dereference, address: 0020 [2.161212] #PF: supervisor read access in kernel mode [2.161490] #PF: error_code(0x) -

[PATCH] drm/amd/display: Remove unnecessary conversion to bool

2021-03-22 Thread Jiapeng Chong
Fix the following coccicheck warnings: ./drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c:875:62-67: WARNING: conversion to bool not needed here. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c | 2 +- 1 file changed, 1 insertion(+),

Re: [PATCH] drm/amd/display: Set AMDGPU_DM_DEFAULT_MIN_BACKLIGHT to 0

2021-03-22 Thread Evan Benn
On Sat, Mar 20, 2021 at 8:36 AM Alex Deucher wrote: > > On Fri, Mar 19, 2021 at 5:31 PM Evan Benn wrote: > > > > On Sat, 20 Mar 2021 at 02:10, Harry Wentland wrote: > > > On 2021-03-19 10:22 a.m., Alex Deucher wrote: > > > > On Fri, Mar 19, 2021 at 3:23 AM Evan Benn wrote: > > > >> > > > >>

[PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Lang Yu
In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084] == [ 104.640092] WARNING: possible

Re: [PATCH] drm/amdgpu: Use correct size when access vram

2021-03-22 Thread Christian König
Am 22.03.21 um 01:53 schrieb xinhui pan: To make size is 4 byte aligned. Use &~0x3ULL instead of &3ULL. Signed-off-by: xinhui pan Good catch. Patch is Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

  1   2   >