[PATCH 2/2] drm/amd/pm: label these APIs used internally as static

2021-03-18 Thread Evan Quan
Also drop unnecessary header file and declarations. Change-Id: I877b48c32c599534798e14e271c3e700b0d6ebf6 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 1 - drivers/gpu/drm/amd/amdgpu/nv.c | 1 - drivers/g

[PATCH 1/2] drm/amd/pm: make DAL communicate with SMU through unified interfaces

2021-03-18 Thread Evan Quan
No need to have special handlings for swSMU supported ASICs. Change-Id: I1ec552c6a2a4283cf6ab3acfe6c0753bfcca57a9 Signed-off-by: Evan Quan --- .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 134 +++--- .../gpu/drm/amd/include/kgd_pp_interface.h| 14 ++ drivers/gpu/drm/amd/pm/in

Re: [PATCH] drm/amdgpu: Fix a typo

2021-03-18 Thread Alex Deucher
Applied both patches. Thanks! Alex On Thu, Mar 18, 2021 at 7:20 PM Bhaskar Chowdhury wrote: > > > s/proces/process/ > > Signed-off-by: Bhaskar Chowdhury > --- > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/

Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-18 Thread Alex Deucher
Applied. Thanks! Alex On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian wrote: > > Reviewed-by: Christian König > > Von: Daniel Gomez > Gesendet: Donnerstag, 18. März 2021 09:32 > Cc: dag...@gmail.com ; Daniel Gomez ; > Deucher, Alexander ; Koenig, Christian

Re: [PATCH] drm/amdgpu/ttm: Fix memory leak userptr pages

2021-03-18 Thread Alex Deucher
Applied. Thanks! Alex On Wed, Mar 17, 2021 at 12:09 PM Daniel Gomez wrote: > > If userptr pages have been pinned but not bounded, > they remain uncleared. > > Signed-off-by: Daniel Gomez > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions

[PATCH] drm/amdgpu: Fix a typo

2021-03-18 Thread Bhaskar Chowdhury
s/proces/process/ Signed-off-by: Bhaskar Chowdhury --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bf3857867f51..c1d5a3085bae 100644 --- a/drive

RE: [PATCH] drm/amd/pm: fix Navi1x runtime resume failure V2

2021-03-18 Thread Quan, Evan
[AMD Public Use] Thanks Guchun & Jiansong. Yes, I had same concern as Jiansong. BR Evan -Original Message- From: Chen, Jiansong (Simon) Sent: Thursday, March 18, 2021 5:36 PM To: Chen, Guchun ; Quan, Evan ; amd-gfx@lists.freedesktop.org Cc: Lazar, Lijo ; Quan, Evan Subject: RE: [PATCH

[PATCH] drm/amdgpu/display: properly guard dc_dsc_stream_bandwidth_in_kbps

2021-03-18 Thread Alex Deucher
Move the function protoype to the right header and guard the call with CONFIG_DRM_AMD_DC_DCN as DSC is only available with DCN. Fixes: a03f6c0e26b2 ("drm/amd/display: Add changes for dsc bpp in 16ths and unify bw calculations") Signed-off-by: Alex Deucher Cc: Dillon Varone Cc: Stephen Rothwell

RE: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-18 Thread Quan, Evan
[AMD Public Use] Hi Harvey, Resuming after mode1 reset failed according to the error logs below. Also according to the lspci output of last email, it happened for a Navi14 ASIC. However, I cannot reproduce that on my desktop platform with 2 x Navi14 ASICs. Mär 18 13:00:43 obelix kernel: amdgpu 0

RE: [PATCH] drm/amdgpu: Add additional Sienna Cichlid PCI ID

2021-03-18 Thread Chen, Guchun
[AMD Public Use] Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, March 19, 2021 4:45 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu: Add additional Sienna Cichlid PCI ID Add new

RE: [PATCH] drm/amdgpu: Fix the page fault issue in amdgpu_irq_fini

2021-03-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only] >-Original Message- >From: Christian König >Sent: Thursday, March 18, 2021 7:52 PM >To: Deng, Emily ; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the page fault issue in amdgpu_irq_fini > >Am 18.03.21 um 12:48 sc

Re: [PATCH] drm/amd/display: Allow idle optimization based on vblank.

2021-03-18 Thread R, Bindu
[AMD Official Use Only - Internal Distribution Only] ​Hi All, Thanks for the inputs, have updated the patch to include these changes. Regards, Bindu From: Lakha, Bhawanpreet Sent: Wednesday, March 17, 2021 1:02 PM To: Michel Dänzer ; R, Bindu ; amd-gfx@lists

Re: [PATCH] drm/dp_mst: Enhance DP MST topology logging

2021-03-18 Thread Lyude Paul
(going to try to take a look at this tomorrow JFYI) On Thu, 2021-03-18 at 11:55 -0400, Eryk Brol wrote: > [why] > MST topology print was missing fec logging and pdt printed > as an int wasn't clear. vcpi and payload info were also logged as an > arbitrary series of ints which require the user to k

[PATCH] drm/amdgpu: Add additional Sienna Cichlid PCI ID

2021-03-18 Thread Alex Deucher
Add new DID. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 5438a4d3d517..6c78107db789 100644 --- a/drivers/gpu/drm/amd/amdgpu/amd

[PATCH V2] drm/amdgpu: Fix a typo

2021-03-18 Thread Bhaskar Chowdhury
s/traing/training/ ...Plus the entire sentence construction for better readability. Signed-off-by: Bhaskar Chowdhury --- Changes from V1: Alex and Randy's suggestions incorporated. drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --g

Re: [PATCH] drm/amdgpu: Fix a typo

2021-03-18 Thread Bhaskar Chowdhury
On 14:12 Thu 18 Mar 2021, Alex Deucher wrote: On Thu, Mar 18, 2021 at 2:08 PM Randy Dunlap wrote: On 3/18/21 4:33 AM, Bhaskar Chowdhury wrote: > > s/traing/training/ > > Signed-off-by: Bhaskar Chowdhury > --- > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- > 1 file changed, 1 insertion(+),

Re: [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup

2021-03-18 Thread Brian Welty
On 3/18/2021 3:16 AM, Daniel Vetter wrote: > On Sat, Mar 6, 2021 at 1:44 AM Brian Welty wrote: >> >> >> On 2/11/2021 7:34 AM, Daniel Vetter wrote: >>> On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote: On 2/9/2021 2:54 AM, Daniel Vetter wrote: > On Tue, Jan 26, 2021 at 01

Re: [PATCH] PCI: quirks: Quirk PCI d3hot delay for AMD xhci

2021-03-18 Thread Bjorn Helgaas
On Tue, Mar 16, 2021 at 03:28:51PM -0400, Alex Deucher wrote: > From: Marcin Bachry > > Renoir needs a similar delay. See https://lore.kernel.org/linux-pci/20210311125322.GA216@bjorn-Precision-5520/ This is becoming a problem. We shouldn't have to merge a quirk for every new device. Eith

Re: [PATCH] drm/amdgpu: Fix a typo

2021-03-18 Thread Alex Deucher
On Thu, Mar 18, 2021 at 2:08 PM Randy Dunlap wrote: > > On 3/18/21 4:33 AM, Bhaskar Chowdhury wrote: > > > > s/traing/training/ > > > > Signed-off-by: Bhaskar Chowdhury > > --- > > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git

Re: [PATCH] drm/amdgpu: Fix a typo

2021-03-18 Thread Randy Dunlap
On 3/18/21 4:33 AM, Bhaskar Chowdhury wrote: > > s/traing/training/ > > Signed-off-by: Bhaskar Chowdhury > --- > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c > b/drivers/gpu/drm/amd/amdgp

RE: [PATCH 1/1] drm/amdgpu: Mark Albebaran HW support as experimental

2021-03-18 Thread Russell, Kent
[AMD Public Use] Sorry, just realized a typo in the headline. Albebaran->Aldebaran. With that fixed, Reviewed-by: Kent Russell > -Original Message- > From: amd-gfx On Behalf Of Russell, > Kent > Sent: Thursday, March 18, 2021 12:32 PM > To: Kuehling, Felix ; amd-gfx@lists.freedeskt

RE: [PATCH 1/1] drm/amdgpu: Mark Albebaran HW support as experimental

2021-03-18 Thread Russell, Kent
[AMD Public Use] Reviewed-by: Kent Russell > -Original Message- > From: amd-gfx On Behalf Of Felix > Kuehling > Sent: Thursday, March 18, 2021 12:06 PM > To: amd-gfx@lists.freedesktop.org > Subject: [PATCH 1/1] drm/amdgpu: Mark Albebaran HW support as experimental > > The HW is not

Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-18 Thread Andrey Grodzovsky
On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Andrey Let me summarize the background of this patch: In TDR resubmit step “amdgpu_device_recheck_guilty_jobs, It will submit first jobs of each ring and do guilty job re-check. At

[PATCH 1/1] drm/amdgpu: Mark Albebaran HW support as experimental

2021-03-18 Thread Felix Kuehling
The HW is not in production yet. Driver support is still in development. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgp

[PATCH] drm/dp_mst: Enhance DP MST topology logging

2021-03-18 Thread Eryk Brol
[why] MST topology print was missing fec logging and pdt printed as an int wasn't clear. vcpi and payload info were also logged as an arbitrary series of ints which require the user to know the ordering of the prints, making the logs difficult to use. [how] -add fec logging -add pdt parsing into s

[PATCH] drm/amdgpu: Fix a typo

2021-03-18 Thread Bhaskar Chowdhury
s/traing/training/ Signed-off-by: Bhaskar Chowdhury --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index c325d6f53a71..db18e4f6cf5f 100644 --- a/driv

RE: [PATCH] drm/amdgpu: revert "reserve backup pages for bad page retirment"

2021-03-18 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Christian König Sent: Thursday, March 18, 2021 21:09 To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Li, Dennis ; Deucher, Alexander Subject: [PATCH] drm/amdg

Re: [RESEND 00/53] Rid GPU from W=1 warnings

2021-03-18 Thread Daniel Vetter
On Wed, Mar 17, 2021 at 9:32 PM Daniel Vetter wrote: > > On Wed, Mar 17, 2021 at 9:17 AM Lee Jones wrote: > > > > On Thu, 11 Mar 2021, Lee Jones wrote: > > > > > On Thu, 11 Mar 2021, Daniel Vetter wrote: > > > > > > > On Mon, Mar 08, 2021 at 09:19:32AM +, Lee Jones wrote: > > > > > On Fri, 05

RE: [PATCH] drm/amdgpu: revert "use the new cursor in the VM code"

2021-03-18 Thread Chen, Guchun
[AMD Public Use] Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: Christian König Sent: Thursday, March 18, 2021 7:53 PM To: amd-gfx@lists.freedesktop.org Cc: Chen, Guchun Subject: [PATCH] drm/amdgpu: revert "use the new cursor in the VM code" We are seeing VM page

[PATCH] drm/amdgpu: revert "reserve backup pages for bad page retirment"

2021-03-18 Thread Christian König
As noted during the review this approach doesn't make sense at all. We should not apply any limitation on the VRAM applications can use inside the kernel. If an application or end user wants to reserve a certain amount of VRAM for bad pages handling we should do this in the upper layer. This r

Re: [PATCH 2/2 V2] platform/x86: force LPS0 functions for AMD

2021-03-18 Thread Alex Deucher
Let's hold off on these patches for the time being. At least one of them seems to cause problems on another laptop. Thanks, Alex On Wed, Mar 17, 2021 at 10:39 AM Alex Deucher wrote: > > ACPI_LPS0_ENTRY_AMD/ACPI_LPS0_EXIT_AMD are supposedly not > required for AMD platforms, and on some platform

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-18 Thread Harvey
Alex, I waited for kernel 5.11.7 to hit our repos yesterday evening and tested again: 1. The suspend issue is gone - suspend and resume now work as expected. 2. System hibernation seems to be a different beast - still freezing When invoking 'systemctl hibernate' the system does not power off

[PATCH] drm/amdgpu: revert "use the new cursor in the VM code"

2021-03-18 Thread Christian König
We are seeing VM page faults with this. Revert the change until the bugs are fixed. This reverts commit e71af7b9807cc9ab2b40a7c02ff93c622786bd2a. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 55 +- 1 file changed, 37 insertions(+), 18 delet

Re: [PATCH] drm/amdgpu: Fix the page fault issue in amdgpu_irq_fini

2021-03-18 Thread Christian König
Am 18.03.21 um 12:48 schrieb Emily Deng: For some source, it will be shared by some client ID and source ID. To fix the page fault issue, set all those to null. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 16 +--- 1 file changed, 13 insertions(+), 3 de

[PATCH] drm/amdgpu: Fix the page fault issue in amdgpu_irq_fini

2021-03-18 Thread Emily Deng
For some source, it will be shared by some client ID and source ID. To fix the page fault issue, set all those to null. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/

RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-18 Thread Zhang, Jack (Jian)
[AMD Official Use Only - Internal Distribution Only] Hi, Andrey Let me summarize the background of this patch: In TDR resubmit step “amdgpu_device_recheck_guilty_jobs, It will submit first jobs of each ring and do guilty job re-check. At that point, We had to make sure each job is in the mirror

Re: [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup

2021-03-18 Thread Daniel Vetter
On Sat, Mar 6, 2021 at 1:44 AM Brian Welty wrote: > > > On 2/11/2021 7:34 AM, Daniel Vetter wrote: > > On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote: > >> > >> On 2/9/2021 2:54 AM, Daniel Vetter wrote: > >>> On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote: > This pat

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Christian König
Am 18.03.21 um 10:30 schrieb Li, Dennis: >>> The GPU reset doesn't complete the fences we wait for. It only completes the hardware fences as part of the reset. >>> So waiting for a fence while holding the reset lock is illegal and needs to be avoided. I understood your concern. It is more

RE: [PATCH] drm/amd/pm: fix Navi1x runtime resume failure V2

2021-03-18 Thread Chen, Jiansong (Simon)
We still need reserve "return 0", otherwise may trigger warning "not all control paths return a value". Regards, Jiansong -Original Message- From: amd-gfx On Behalf Of Chen, Guchun Sent: Thursday, March 18, 2021 5:28 PM To: Quan, Evan ; amd-gfx@lists.freedesktop.org Cc: Lazar, Lijo ; Qua

RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Li, Dennis
>>> The GPU reset doesn't complete the fences we wait for. It only completes >>> the hardware fences as part of the reset. >>> So waiting for a fence while holding the reset lock is illegal and needs to >>> be avoided. I understood your concern. It is more complex for DRM GFX, therefore I abandon

RE: [PATCH] drm/amd/pm: fix Navi1x runtime resume failure V2

2021-03-18 Thread Chen, Guchun
[AMD Public Use] One comment inline. Other than this, the patch is: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Thursday, March 18, 2021 5:21 PM To: amd-gfx@lists.freedesktop.org Cc: Lazar, Lijo ; Quan, Evan Subject: [PATCH]

[PATCH] drm/amd/pm: fix Navi1x runtime resume failure V2

2021-03-18 Thread Evan Quan
The RLC was put into a wrong state on runtime suspend. Thus the RLC autoload will fail on the succeeding runtime resume. By adding an intermediate PPSMC_MSG_PrepareMp1ForUnload(some GC hard reset involved, designed for PnP), we can bring RLC back into the desired state. V2: integrate INTERRUPTS_EN

AW: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-18 Thread Koenig, Christian
Reviewed-by: Christian König Von: Daniel Gomez Gesendet: Donnerstag, 18. März 2021 09:32 Cc: dag...@gmail.com ; Daniel Gomez ; Deucher, Alexander ; Koenig, Christian ; David Airlie ; Daniel Vetter ; amd-gfx@lists.freedesktop.org ; dri-de...@lists.freedesktop.o

AW: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Koenig, Christian
Exactly that's what you don't seem to understand. The GPU reset doesn't complete the fences we wait for. It only completes the hardware fences as part of the reset. So waiting for a fence while holding the reset lock is illegal and needs to be avoided. Lockdep also complains about this when it

[PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-18 Thread Daniel Gomez
If userptr pages have been pinned but not bounded, they remain uncleared. Signed-off-by: Daniel Gomez --- drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index

RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Li, Dennis
>>> Those two steps need to be exchanged or otherwise it is possible that new >>> delayed work items etc are started before the lock is taken. What about adding check for adev->in_gpu_reset in work item? If exchange the two steps, it maybe introduce the deadlock. For example, the user thread hol

Re: [PATCH] drm/amdgpu/ttm: Fix memory leak userptr pages

2021-03-18 Thread Daniel Gomez
On Thu, 18 Mar 2021 at 08:49, Christian König wrote: > > Am 17.03.21 um 17:08 schrieb Daniel Gomez: > > If userptr pages have been pinned but not bounded, > > they remain uncleared. > > > > Signed-off-by: Daniel Gomez > > Good catch, not sure if that can ever happen in practice but better save >

Re: [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence

2021-03-18 Thread Christian König
Am 18.03.21 um 08:23 schrieb Dennis Li: Changed to only set in_gpu_reset as 1 when the recovery thread begin, and delay hold reset_sem after pre-reset but before reset. It make sure that other threads have exited or been blocked before doing GPU reset. Compared with the old codes, it could make s

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Christian König
Am 18.03.21 um 08:23 schrieb Dennis Li: We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread w

Re: [PATCH] drm/amdgpu/ttm: Fix memory leak userptr pages

2021-03-18 Thread Christian König
Am 17.03.21 um 17:08 schrieb Daniel Gomez: If userptr pages have been pinned but not bounded, they remain uncleared. Signed-off-by: Daniel Gomez Good catch, not sure if that can ever happen in practice but better save than sorry. Reviewed-by: Christian König --- drivers/gpu/drm/amd/am

RE: [PATCH 13/13] drm/amdgpu: skip kfd suspend/resume for S0ix

2021-03-18 Thread Quan, Evan
[AMD Public Use] Patch 1 -7 are reviewed-by: Evan Quan Patch 8 - 13 are acked-by: Evan Quan -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, March 18, 2021 12:33 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH 13/13] drm/amdgpu: sk

Re: [PATCH] drm/amdgpu: enable mode-2 gpu reset for vangogh

2021-03-18 Thread Huang Rui
On Thu, Mar 18, 2021 at 03:21:20PM +0800, Du, Xiaojian wrote: > From: Xiaojian Du > > From: Xiaojian Du > > This patch is to enable mdoe-2 gpu reset for vangogh. > > Signed-off-by: Xiaojian Du Please add the fix PSP firmware version in the commit to let us know the good version for reset. O

[PATCH 4/4] drm/amdkfd: add reset lock protection for kfd entry functions

2021-03-18 Thread Dennis Li
When doing GPU reset, try to block all kfd functions including kfd ioctls and file close function, which maybe access hardware. v2: fix a potential recursive locking issue kfd_ioctl_dbg_register has chance called into pqm_create_queue, which will cause recursive locking. So remove locking read_lo

[PATCH 3/4] drm/amdgpu: instead of using down/up_read directly

2021-03-18 Thread Dennis Li
change to use amdgpu_read_lock/unlock which could handle more cases Signed-off-by: Dennis Li diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index bcaf271b39bf..66dec0f49c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/dri

[PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence

2021-03-18 Thread Dennis Li
Changed to only set in_gpu_reset as 1 when the recovery thread begin, and delay hold reset_sem after pre-reset but before reset. It make sure that other threads have exited or been blocked before doing GPU reset. Compared with the old codes, it could make some threads exit more early without waitin

[PATCH 1/4] drm/amdgpu: remove reset lock from low level functions

2021-03-18 Thread Dennis Li
It is easy to cause performance drop issue when using lock in low level functions. Signed-off-by: Dennis Li diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0b1e0127056f..24ff5992cb02 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device

[PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Dennis Li
We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread when it access hardware. The r/w semaphore

[PATCH] drm/amdgpu: enable mode-2 gpu reset for vangogh

2021-03-18 Thread Xiaojian Du
From: Xiaojian Du From: Xiaojian Du This patch is to enable mdoe-2 gpu reset for vangogh. Signed-off-by: Xiaojian Du --- drivers/gpu/drm/amd/amdgpu/nv.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 5846eac292c3