[PATCH v2] drm/kfd: fix a system crash issue during GPU recovery

2020-09-01 Thread Dennis Li
The crash log as the below: [Thu Aug 20 23:18:14 2020] general protection fault: [#1] SMP NOPTI [Thu Aug 20 23:18:14 2020] CPU: 152 PID: 1837 Comm: kworker/152:1 Tainted: G OE 5.4.0-42-generic #46~18.04.1-Ubuntu [Thu Aug 20 23:18:14 2020] Hardware name: GIGABYTE

Re: [PATCH V2] drm/amdgpu: Do not move root PT bo to relocated list

2020-09-01 Thread Pan, Xinhui
> 2020年9月1日 21:54,Christian König 写道: > > Agreed, that change doesn't seem to make sense and your backtrace is mangled > so barely readable. it is reply that messed up the logs. And this patch was sent on 10th Feb. > > Christian. > > Am 01.09.20 um 14:59 schrieb Liu, Monk: >> [AMD

Re: [PATCH 0/3] Use implicit kref infra

2020-09-01 Thread Pan, Xinhui
> 2020年9月2日 11:46,Tuikov, Luben 写道: > > On 2020-09-01 21:42, Pan, Xinhui wrote: >> If you take a look at the below function, you should not use driver's >> release to free adev. As dev is embedded in adev. > > Do you mean "look at the function below", using "below" as an adverb? > "below" is

Re: [PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Luben Tuikov
On 2020-09-01 5:58 p.m., Pan, Xinhui wrote: > [AMD Official Use Only - Internal Distribution Only] > > > > The correct thing to do this is to > _leave the amdgpu_driver_release()_ alone, > remove "drmm_add_final_kfree()" and qualify > the WARN_ON() in drm_dev_register() by > the existence of

Re: [PATCH 0/3] Use implicit kref infra

2020-09-01 Thread Luben Tuikov
On 2020-09-01 21:42, Pan, Xinhui wrote: > If you take a look at the below function, you should not use driver's release > to free adev. As dev is embedded in adev. Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective. I know dev is embedded in

[PATCH] drm/amdgpu: Revert "drm/amdgpu: stop allocating dummy GTT nodes"

2020-09-01 Thread xinhui pan
This reverts commit 1e691e2444871d1fde11b611653b5da9010dcec8. mem->mm_node now could be NULL with commit above. That makes amdgpu_vm_bo_split_mapping touchs outside of the page table as max_entries set to S64_MAX; before we fix that issue, revert commit above. [ 978.955925] BUG: unable to

Re: [PATCH] drm/kfd: fix a system crash issue during GPU recovery

2020-09-01 Thread Felix Kuehling
On 2020-09-01 11:21 a.m., Li, Dennis wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Felix, If GPU hang, execute_queues_cpsch will fail to unmap or map queues and then create_queue_cpsch will return error. If pqm_create_queue find create_queue_cpsch failed, it will call

RE: [PATCH] drm/amd/display: Fix a list corruption

2020-09-01 Thread Xu, Feifei
[AMD Official Use Only - Internal Distribution Only] Acked-by: Feifei Xu -Original Message- From: amd-gfx On Behalf Of Pan, Xinhui Sent: Tuesday, September 1, 2020 3:58 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amd/display: Fix a list corruption

Re: [PATCH 2/2] drm/amdgpu/gmc10: print client id string for gfxhub

2020-09-01 Thread Felix Kuehling
Should there a corresponding change in mmhub_v2_0.c? Other than that, the series is Reviewed-by: Felix Kuehling On 2020-09-01 5:51 p.m., Alex Deucher wrote: Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Signed-off-by:

[PATCH] drm/amdgpu: add ta firmware load in psp_v12_0 for renoir

2020-09-01 Thread Changfeng.Zhu
From: changzhu From: Changfeng It needs to load renoir_ta firmware because hdcp is enabled by default for renoir now. This can avoid error:DTM TA is not initialized Change-Id: Ib2f03a531013e4b432c2e9d4ec3dc021b4f8da7d Signed-off-by: Changfeng --- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 54

Re: [PATCH 0/3] Use implicit kref infra

2020-09-01 Thread Pan, Xinhui
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev. 809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if

[PATCH 0/3] Use implicit kref infra

2020-09-01 Thread Luben Tuikov
Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device. First, in drm_dev_register(), do not indiscriminately warn when a DRM driver hasn't opted for managed.final_kfree, but instead check if the driver has provided its own "release"

[PATCH 2/3] drm/amdgpu: Remove drmm final free

2020-09-01 Thread Luben Tuikov
The amdgpu driver implements its own DRM driver release function which naturally frees the container struct amdgpu_device of the DRM device, on a "final" kref-put, i.e. when the kref transitions from non-zero to 0. Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 --

[PATCH 3/3] drm/amdgpu: Remove superfluous NULL check

2020-09-01 Thread Luben Tuikov
The DRM device is a static member of the amdgpu device structure and as such always exists, so long as the PCI and thus the amdgpu device exist. Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- 1 file changed, 3 deletions(-) diff --git

[PATCH 1/3] drm: No warn for drivers who provide release

2020-09-01 Thread Luben Tuikov
Drivers usually allocate their container struct at PCI probe time, then call drm_dev_init(), which initializes the contained DRM dev kref to 1. A DRM driver may provide their own kref release method, which frees the container object, the container of the DRM device, on the last "put" which

[PATCH] drm/amdkfd: Move process doorbell allocation into kfd device

2020-09-01 Thread Mukul Joshi
Move doorbell allocation for a process into kfd device and allocate doorbell space in each PDD during process creation. Currently, KFD manages its own doorbell space but for some devices, amdgpu would allocate the complete doorbell space instead of leaving a chunk of doorbell space for KFD to

Re: [PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] The correct thing to do this is to _leave the amdgpu_driver_release()_ alone, remove "drmm_add_final_kfree()" and qualify the WARN_ON() in drm_dev_register() by the existence of drm_driver.release() (i.e. non-NULL). Re: this drm driver

[PATCH 1/2] drm/amdgpu/gmc9: print client id string for gfxhub

2020-09-01 Thread Alex Deucher
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 30 +++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git

[PATCH 2/2] drm/amdgpu/gmc10: print client id string for gfxhub

2020-09-01 Thread Alex Deucher
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 30 +--- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 30 +--- 2

Re: [PATCH 3/3] drm/amdgpu: Embed drm_device into amdgpu_device (v2)

2020-09-01 Thread Luben Tuikov
On 2020-09-01 9:49 a.m., Alex Deucher wrote: > On Tue, Sep 1, 2020 at 3:44 AM Daniel Vetter wrote: >> >> On Wed, Aug 19, 2020 at 01:00:42AM -0400, Luben Tuikov wrote: >>> a) Embed struct drm_device into struct amdgpu_device. >>> b) Modify the inline-f drm_to_adev() accordingly. >>> c) Modify the

Re: [PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Luben Tuikov
On 2020-09-01 10:12 a.m., Alex Deucher wrote: > On Tue, Sep 1, 2020 at 3:46 AM Pan, Xinhui wrote: >> >> [AMD Official Use Only - Internal Distribution Only] >> >> drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free >> itself. >> Now from commit 5cdd68498918("drm/amdgpu: Embed

[PATCH 2/2] drm/amdgpu: disable gpu-sched load balance for uvd

2020-09-01 Thread Nirmoy Das
On hardware with multiple uvd instances, dependent uvd jobs may get scheduled to different uvd instances. Because uvd jobs retain hw context, dependent jobs should always run on the same uvd instance. This patch disables gpu scheduler's load balancer for a context that binds jobs from same the

[PATCH 1/2] Revert "drm/amdgpu: disable gpu-sched load balance for uvd"

2020-09-01 Thread Nirmoy Das
This reverts commit e0300ed8820d19fe108006cf1b69fa26f0b4e3fc. We should also disable load balance for AMDGPU_HW_IP_UVD_ENC jobs. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git

Re: [PATCH 4/5] drm_dp_cec: add plumbing in preparation for MST support

2020-09-01 Thread Lyude Paul
Super minor nitpicks: On Tue, 2020-09-01 at 16:22 +1000, Sam McNally wrote: > From: Hans Verkuil > > Signed-off-by: Hans Verkuil > [sa...@chromium.org: > - rebased > - removed polling-related changes > - moved the calls to drm_dp_cec_(un)set_edid() into the next patch > ] > Signed-off-by:

Re: [PATCH] drm/radeon: Reset ASIC if suspend is not managed by platform firmware

2020-09-01 Thread Alex Deucher
On Tue, Sep 1, 2020 at 12:21 PM Kai-Heng Feng wrote: > > > > > On Sep 1, 2020, at 22:19, Alex Deucher wrote: > > > > On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng > > wrote: > >> > >> Suspend with s2idle or by the following steps cause screen frozen: > >> # echo devices > /sys/power/pm_test > >>

Re: [PATCH] drm/radeon: Reset ASIC if suspend is not managed by platform firmware

2020-09-01 Thread Kai-Heng Feng
> On Sep 1, 2020, at 22:19, Alex Deucher wrote: > > On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng > wrote: >> >> Suspend with s2idle or by the following steps cause screen frozen: >> # echo devices > /sys/power/pm_test >> # echo freeze > /sys/power/mem >> >> [ 289.625461]

RE: [PATCH] drm/kfd: fix a system crash issue during GPU recovery

2020-09-01 Thread Li, Dennis
[AMD Official Use Only - Internal Distribution Only] Hi, Felix, If GPU hang, execute_queues_cpsch will fail to unmap or map queues and then create_queue_cpsch will return error. If pqm_create_queue find create_queue_cpsch failed, it will call uninit_queue to free queue object. However

Re: [PATCH] drm/amdgpu: block ring buffer access during GPU recovery

2020-09-01 Thread Andrey Grodzovsky
Now i get it, I missed the 'else' part. Acked-by: Andrey Grodzovsky Andrey On 8/31/20 10:45 PM, Li, Dennis wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Andrey, RE- Isn't adev->reset_sem non-recursive ? How this works when you try to access registers from within GPU

Re: [PATCH] drm/radeon: Reset ASIC if suspend is not managed by platform firmware

2020-09-01 Thread Alex Deucher
On Tue, Sep 1, 2020 at 3:32 AM Kai-Heng Feng wrote: > > Suspend with s2idle or by the following steps cause screen frozen: > # echo devices > /sys/power/pm_test > # echo freeze > /sys/power/mem > > [ 289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait > timed out. > [

Re: [PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Alex Deucher
On Tue, Sep 1, 2020 at 3:46 AM Pan, Xinhui wrote: > > [AMD Official Use Only - Internal Distribution Only] > > drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free > itself. > Now from commit 5cdd68498918("drm/amdgpu: Embed drm_device into > amdgpu_device (v3)") we alloc *adev*

Re: [PATCH] drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is

2020-09-01 Thread Harry Wentland
On 2020-09-01 3:54 a.m., Daniel Vetter wrote: > On Wed, Aug 26, 2020 at 11:24:23AM +0300, Pekka Paalanen wrote: >> On Tue, 25 Aug 2020 12:58:19 -0400 >> "Kazlauskas, Nicholas" wrote: >> >>> On 2020-08-22 5:59 a.m., Michel Dänzer wrote: On 2020-08-21 8:07 p.m., Kazlauskas, Nicholas wrote:

Re: 回复: [PATCH V2] drm/amdgpu: Do not move root PT bo to relocated list

2020-09-01 Thread Christian König
Agreed, that change doesn't seem to make sense and your backtrace is mangled so barely readable. Christian. Am 01.09.20 um 14:59 schrieb Liu, Monk: [AMD Official Use Only - Internal Distribution Only] See that we already have such logic: 282 static void amdgpu_vm_bo_relocated(struct

Re: [PATCH 3/3] drm/amdgpu: Embed drm_device into amdgpu_device (v2)

2020-09-01 Thread Alex Deucher
On Tue, Sep 1, 2020 at 3:44 AM Daniel Vetter wrote: > > On Wed, Aug 19, 2020 at 01:00:42AM -0400, Luben Tuikov wrote: > > a) Embed struct drm_device into struct amdgpu_device. > > b) Modify the inline-f drm_to_adev() accordingly. > > c) Modify the inline-f adev_to_drm() accordingly. > > d)

Re: [PATCH] drm/kfd: fix a system crash issue during GPU recovery

2020-09-01 Thread Felix Kuehling
I'm not sure how the bug you're fixing is caused, but your fix is clearly in the wrong place. A queue being disabled is not the same thing as a queue being destroyed. Queues can be disabled for legitimate reasons, but they still should exist and be in the qpd->queues_list. If a destroyed queue

回复: [PATCH V2] drm/amdgpu: Do not move root PT bo to relocated list

2020-09-01 Thread Liu, Monk
[AMD Official Use Only - Internal Distribution Only] See that we already have such logic: 282 static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base *vm_bo) 283 { 284 if (vm_bo->bo->parent) 285 list_move(_bo->vm_status, _bo->vm->relocated); 286 else 287

Re: [PATCH] drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is

2020-09-01 Thread Daniel Vetter
On Tue, Sep 01, 2020 at 10:56:42AM +0200, Michel Dänzer wrote: > On 2020-09-01 9:57 a.m., Daniel Vetter wrote: > > On Tue, Aug 25, 2020 at 04:55:28PM +0200, Michel Dänzer wrote: > >> On 2020-08-24 9:43 a.m., Pekka Paalanen wrote: > >> > >>> Sounds like the helpers you refer to are inadequate for

Re: [PATCH] drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is

2020-09-01 Thread Michel Dänzer
On 2020-09-01 9:57 a.m., Daniel Vetter wrote: > On Tue, Aug 25, 2020 at 04:55:28PM +0200, Michel Dänzer wrote: >> On 2020-08-24 9:43 a.m., Pekka Paalanen wrote: >> >>> Sounds like the helpers you refer to are inadequate for your case. >>> Can't you fix the helpers in the long run and land this

Re: [PATCH v2] drm/amdgpu: block ring buffer access during GPU recovery

2020-09-01 Thread Christian König
Am 01.09.20 um 09:50 schrieb Dennis Li: When GPU is in reset, its status isn't stable and ring buffer also need be reset when resuming. Therefore driver should protect GPU recovery thread from ring buffer accessed by other threads. Otherwise GPU will randomly hang during recovery. v2: correct

RE: [PATCH v2] drm/amdgpu: block ring buffer access during GPU recovery

2020-09-01 Thread Zhang, Hawking
[AMD Public Use] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Dennis Li Sent: Tuesday, September 1, 2020 15:50 To: amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix ; Zhang, Hawking ; Koenig, Christian Cc: Li, Dennis Subject: [PATCH v2]

Re: [PATCH] drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is

2020-09-01 Thread Daniel Vetter
On Tue, Aug 25, 2020 at 04:55:28PM +0200, Michel Dänzer wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 2020-08-24 9:43 a.m., Pekka Paalanen wrote: > > On Sat, 22 Aug 2020 11:59:26 +0200 Michel Dänzer > > wrote: > >> On 2020-08-21 8:07 p.m., Kazlauskas, Nicholas wrote: > >>> On

[PATCH] drm/amd/display: Fix a list corruption

2020-09-01 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] Remove the private obj from the internal list before we free aconnector. [ 56.925828] BUG: unable to handle page fault for address: 8f84a870a560 [ 56.933272] #PF: supervisor read access in kernel mode [ 56.938801] #PF:

Re: [PATCH] drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is

2020-09-01 Thread Daniel Vetter
On Wed, Aug 26, 2020 at 11:24:23AM +0300, Pekka Paalanen wrote: > On Tue, 25 Aug 2020 12:58:19 -0400 > "Kazlauskas, Nicholas" wrote: > > > On 2020-08-22 5:59 a.m., Michel Dänzer wrote: > > > On 2020-08-21 8:07 p.m., Kazlauskas, Nicholas wrote: > > >> On 2020-08-21 12:57 p.m., Michel Dänzer

[PATCH v2] drm/amdgpu: block ring buffer access during GPU recovery

2020-09-01 Thread Dennis Li
When GPU is in reset, its status isn't stable and ring buffer also need be reset when resuming. Therefore driver should protect GPU recovery thread from ring buffer accessed by other threads. Otherwise GPU will randomly hang during recovery. v2: correct indent Signed-off-by: Dennis Li diff

Re: [PATCH 1/1] drm/amdgpu: disable gpu-sched load balance for uvd

2020-09-01 Thread Nirmoy
On 9/1/20 9:07 AM, Paul Menzel wrote: Dear Nirmoy, Am 31.08.20 um 12:45 schrieb Nirmoy Das: UVD dependent jobs should run on the same udv instance. Why? Datasheet? Performance reasons? What happens if they do not run on the UVD instance? Are there bug reports? Sorry about that, I

[PATCH] drm/amdgpu: Fix a redundant kfree

2020-09-01 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only] drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free itself. Now from commit 5cdd68498918("drm/amdgpu: Embed drm_device into amdgpu_device (v3)") we alloc *adev* and ddev is just a member of it. So drm_dev_release try to free

Re: [PATCH 3/3] drm/amdgpu: Embed drm_device into amdgpu_device (v2)

2020-09-01 Thread Daniel Vetter
On Wed, Aug 19, 2020 at 01:00:42AM -0400, Luben Tuikov wrote: > a) Embed struct drm_device into struct amdgpu_device. > b) Modify the inline-f drm_to_adev() accordingly. > c) Modify the inline-f adev_to_drm() accordingly. > d) Eliminate the use of drm_device.dev_private, >in amdgpu. > e)

Re: [PATCH v2 1/2] drm: allow limiting the scatter list size.

2020-09-01 Thread Daniel Vetter
On Tue, Aug 18, 2020 at 11:20:16AM +0200, Gerd Hoffmann wrote: > Add max_segment argument to drm_prime_pages_to_sg(). When set pass it > through to the __sg_alloc_table_from_pages() call, otherwise use > SCATTERLIST_MAX_SEGMENT. > > Also add max_segment field to drm driver and pass it to >

[PATCH] drm/radeon: Reset ASIC if suspend is not managed by platform firmware

2020-09-01 Thread Kai-Heng Feng
Suspend with s2idle or by the following steps cause screen frozen: # echo devices > /sys/power/pm_test # echo freeze > /sys/power/mem [ 289.625461] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait timed out. [ 289.625494] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed

Re: [PATCH 3/7] drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes

2020-09-01 Thread Daniel Vetter
On Mon, Aug 17, 2020 at 02:23:47AM -0400, Marek Olšák wrote: > On Wed, Aug 12, 2020 at 9:54 AM Daniel Vetter wrote: > > > On Tue, Aug 11, 2020 at 09:42:11AM -0400, Marek Olšák wrote: > > > There are a few cases when the flags can change, for example DCC can be > > > disabled due to a hw

Re: [PATCH] drm/amdgpu: block ring buffer access during GPU recovery

2020-09-01 Thread Christian König
Am 01.09.20 um 03:17 schrieb Dennis Li: When GPU is in reset, its status isn't stable and ring buffer also need be reset when resuming. Therefore driver should protect GPU recovery thread from ring buffer accessed by other threads. Otherwise GPU will randomly hang during recovery. One style

Re: [PATCH] drm/amdgpu: block ring buffer access during GPU recovery

2020-09-01 Thread Christian König
Yeah, correct. What we maybe should do is to add a WARN_ON() which tests if the current thread is the one which has locked the semaphore to catch this case. Regards, Christian. Am 01.09.20 um 04:45 schrieb Li, Dennis: [AMD Official Use Only - Internal Distribution Only] Hi, Andrey,

Re: [PATCH 1/1] drm/amdgpu: disable gpu-sched load balance for uvd

2020-09-01 Thread Paul Menzel
Dear Nirmoy, Am 31.08.20 um 12:45 schrieb Nirmoy Das: UVD dependent jobs should run on the same udv instance. Why? Datasheet? Performance reasons? What happens if they do not run on the UVD instance? Are there bug reports? It’d be great if you extended the commit message. This patch

[PATCH 4/5] drm_dp_cec: add plumbing in preparation for MST support

2020-09-01 Thread Sam McNally
From: Hans Verkuil Signed-off-by: Hans Verkuil [sa...@chromium.org: - rebased - removed polling-related changes - moved the calls to drm_dp_cec_(un)set_edid() into the next patch ] Signed-off-by: Sam McNally --- .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +-