Re: [PATCH v1] drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov

2024-04-01 Thread JingWen Chen
Acked-by: Jingwen Chen On 2024/3/27 11:52, chongli2 wrote: > support MES command SET_HW_RESOURCE1 in sriov > > Signed-off-by: chongli2 > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 6 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++ > drive

Re: [PATCH] drm/amd/pm set pp_dpm_*clk as read only for SRIOV one VF mode

2024-03-19 Thread JingWen Chen
Acked-by: Jingwen Chen On 2024/3/15 14:31, Lin.Cao wrote: > pp_dpm_*clk should be set as read only for SRIOV one VF mode, remove > S_IWUGO flag and _store function of these debugfs in one VF mode. > > Signed-off-by: Lin.Cao > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 10 ++

Re: [PATCH] drm/amdgpu: release gpu full access after "amdgpu_device_ip_late_init"

2023-04-17 Thread JingWen Chen
Reviewed-by: jingwen.ch...@amd.com On 4/14/23 4:41 PM, Chong Li wrote: > [WHY] > Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is > an atomic context. > We shouldn't access registers through KIQ since "msleep()" may be called in > "amdgpu_kiq_rreg()". > > [HOW] >

Re: [PATCH] drm/ttm: update bulk move object of ghost BO

2022-09-01 Thread JingWen Chen
Acked-by: Jingwen Chen still need confirmation from Christian On 9/1/22 5:29 PM, ZhenGuo Yin wrote: > [Why] > Ghost BO is released with non-empty bulk move object. There is a > warning trace: > WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 > [amdtt

Re: [PATCH] drm/amdgpu: Call trace info was found in dmesg when loading amdgpu

2022-07-13 Thread JingWen Chen
feel free to add Reviewed-by: Jingwen Chen On 7/14/22 10:31 AM, lin cao wrote: > In the case of SRIOV, the register smnMp1_PMI_3_FIFO will get an invalid > value which will cause the "shift out of bound". In Ubuntu22.04, this > issue will be checked an related call tra

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-03-02 Thread JingWen Chen
Hi Andrey, Most part of the patches are OK, but the code will introduce a ib test fail on the disabled vcn of sienna_cichlid. In SRIOV use case we will disable one vcn on sienna_cichlid, I have attached a patch to fix this issue, please check the attachment. Best Regards, Jingwen Chen On 2

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-24 Thread JingWen Chen
istian >> ; dan...@ffwll.ch >> *Subject:* Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI >> is ready >> No because all the patch-set including this patch was landed into >> drm-misc-next and will reach amd-staging-drm-next on the next upstream >> r

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-23 Thread JingWen Chen
Hi Andrey, Will you port this patch into amd-staging-drm-next? on 2/10/22 2:06 AM, Andrey Grodzovsky wrote: > All comments are fixed and code pushed. Thanks for everyone > who helped reviewing. > > Andrey > > On 2022-02-09 02:53, Christian König wrote: >> Am 09.02.22 um 01:23 schrieb Andrey

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-08 Thread JingWen Chen
Hi Andrey, I have been testing your patch and it seems fine till now. Best Regards, Jingwen Chen On 2022/2/3 上午2:57, Andrey Grodzovsky wrote: > Just another ping, with Shyun's help I was able to do some smoke testing on > XGMI SRIOV system (booting and triggering hive reset) > an

Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-02-06 Thread JingWen Chen
Hi Andrey, I don't have any XGMI machines here, maybe you can reach out shaoyun for help. On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote: > Just a gentle ping. > > Andrey >

[PATCH v4] drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV

2022-01-13 Thread Jingwen Chen
: call amdgpu_virt_init_data_exchange after gmc sw_init to make data exchange workqueue run v3: clean up the code logic v4: add some comment and make the code more readable Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c

[PATCH v3] drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV

2022-01-13 Thread Jingwen Chen
: call amdgpu_virt_init_data_exchange after gmc sw_init to make data exchange workqueue run v3: clean up the code logic Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 2 files changed, 5 insertions(+), 9

[PATCH] drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV

2022-01-13 Thread Jingwen Chen
: call amdgpu_virt_init_data_exchange after gmc sw_init to make data exchange workqueue run Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 +++--- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git

[PATCH] drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV

2022-01-13 Thread Jingwen Chen
-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 89ab0032..0b887a49b604 100644 --- a/drivers/gpu/drm/amd

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-11 Thread JingWen Chen
Hi Andrey, Please go ahead and push your change. I will prepare the RFC later. On 2022/1/8 上午12:02, Andrey Grodzovsky wrote: > > On 2022-01-07 12:46 a.m., JingWen Chen wrote: >> On 2022/1/7 上午11:57, JingWen Chen wrote: >>> On 2022/1/7 上午3:13, Andrey Grodzovsky wrote: >>

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-06 Thread JingWen Chen
On 2022/1/7 上午11:57, JingWen Chen wrote: > On 2022/1/7 上午3:13, Andrey Grodzovsky wrote: >> On 2022-01-06 12:18 a.m., JingWen Chen wrote: >>> On 2022/1/6 下午12:59, JingWen Chen wrote: >>>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: >>>>> On 2022-01-0

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-06 Thread JingWen Chen
On 2022/1/7 上午3:13, Andrey Grodzovsky wrote: > > On 2022-01-06 12:18 a.m., JingWen Chen wrote: >> On 2022/1/6 下午12:59, JingWen Chen wrote: >>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: >>>> On 2022-01-05 2:59 a.m., Christian König wrote: >>>&g

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-05 Thread JingWen Chen
On 2022/1/6 下午12:59, JingWen Chen wrote: > On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: >> On 2022-01-05 2:59 a.m., Christian König wrote: >>> Am 05.01.22 um 08:34 schrieb JingWen Chen: >>>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: >>>>> O

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-05 Thread JingWen Chen
On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: > > On 2022-01-05 2:59 a.m., Christian König wrote: >> Am 05.01.22 um 08:34 schrieb JingWen Chen: >>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: >>>> On 2022-01-04 6:36 a.m., Christian König wrote: >>>

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
implementation in amdgpu to >>> actually match the requirements. >>> >>> Could be that the reset sequence is questionable in general, but I doubt so >>> at least for now. >>> >>> See the FLR request from the hypervisor is just another source of s

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
t; >> See the FLR request from the hypervisor is just another source of signaling >> the need for a reset, similar to each job timeout on each queue. Otherwise >> you have a race condition between the hypervisor and the scheduler. >> >> Properly setting in_gpu_reset

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
e_unlock_adev in flr_work instead of try_lock since no one will conflict with this thread with reset_domain introduced. But we do need the reset_sem and adev->in_gpu_reset to keep device untouched via user space. Best Regards, Jingwen Chen On 2022/1/3 下午6:17, Christian König wrote

Re: [PATCH] drm/amdgpu: add dummy event6 for vega10

2021-12-30 Thread JingWen Chen
Reviewed-by: Jingwen Chen On 2021/12/29 下午6:38, James Yao wrote: > [why] > Malicious mailbox event1 fails driver loading on vega10. > An dummy event6 prevent driver from taking response from malicious event1 as > its own. > > [how] > On vega10, send a mailbox event6

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2021-12-24 Thread JingWen Chen
I do agree with shaoyun, if the host find the gpu engine hangs first, and do the flr, guest side thread may not know this and still try to access HW(e.g. kfd is using a lot of amdgpu_in_reset and reset_sem to identify the reset status). And this may lead to very bad result. On 2021/12/24

[PATCH v2 2/2] drm/amd/amdgpu: fix gmc bo pin count leak in SRIOV

2021-12-13 Thread Jingwen Chen
[Why] gmc bo will be pinned during loading amdgpu and reset in SRIOV while only unpinned in unload amdgpu [How] add amdgpu_in_reset and sriov judgement to skip pin bo v2: fix wrong judgement Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 4 drivers/gpu/drm/amd

[PATCH v2 1/2] drm/amd/amdgpu: fix psp tmr bo pin count leak in SRIOV

2021-12-13 Thread Jingwen Chen
[Why] psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while only unpinned in unload amdgpu [How] add amdgpu_in_reset and sriov judgement to skip pin bo v2: fix wrong judgement Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 1 file changed

[PATCH 2/2] drm/amd/amdgpu: fix gmc bo pin count leak in SRIOV

2021-12-13 Thread Jingwen Chen
[Why] gmc bo will be pinned during loading amdgpu and reset in SRIOV while only unpinned in unload amdgpu [How] add amdgpu_in_reset and sriov judgement to skip pin bo Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 4 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4

[PATCH 1/2] drm/amd/amdgpu: fix psp tmr bo pin count leak in SRIOV

2021-12-13 Thread Jingwen Chen
[Why] psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while only unpinned in unload amdgpu [How] add amdgpu_in_reset and sriov judgement to skip pin bo Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 1 file changed, 4 insertions(+) diff

Re: [PATCH 2/2] drm/amd/amdgpu: fix gmc bo pin count leak in SRIOV

2021-12-13 Thread JingWen Chen
patch abandoned On 2021/12/14 上午11:52, Jingwen Chen wrote: > [Why] > gmc bo will be pinned during loading amdgpu and reset in SRIOV while > only unpinned in unload amdgpu > > [How] > add amdgpu_in_reset and sriov judgement for pin bo in gart_enable > > Sig

[PATCH 2/2] drm/amd/amdgpu: fix gmc bo pin count leak in SRIOV

2021-12-13 Thread Jingwen Chen
[Why] gmc bo will be pinned during loading amdgpu and reset in SRIOV while only unpinned in unload amdgpu [How] add amdgpu_in_reset and sriov judgement for pin bo in gart_enable Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amd/amdgpu: fix psp tmr bo pin count leak in SRIOV

2021-12-13 Thread Jingwen Chen
[Why] psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while only unpinned in unload amdgpu [How] add amdgpu_in_reset and sriov judgement for psp_tmr_init Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 11 ++- 1 file changed, 6 insertions

Re: [PATCH] drm/amd/amdgpu: use advanced TDR mode by default

2021-11-29 Thread JingWen Chen
Hi Bokun, please remove the change-id in your commit message when submitting this patch. Acked-by:  Jingwen Chen On 2021/11/27 上午8:57, Bokun Zhang wrote: > From: Bokun Zhang > > In the patch about advanced TDR mode, we force to always set > amdgpu_gpu_recovery=2 under SRIOV. This

Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-27 Thread JingWen Chen
On 2021/10/28 上午3:43, Andrey Grodzovsky wrote: > > On 2021-10-25 10:57 p.m., JingWen Chen wrote: >> On 2021/10/25 下午11:18, Andrey Grodzovsky wrote: >>> On 2021-10-24 10:56 p.m., JingWen Chen wrote: >>>> On 2021/10/23 上午4:41, Andrey Grodzovsky wrote: &g

Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-25 Thread JingWen Chen
On 2021/10/25 下午11:18, Andrey Grodzovsky wrote: > > On 2021-10-24 10:56 p.m., JingWen Chen wrote: >> On 2021/10/23 上午4:41, Andrey Grodzovsky wrote: >>> What do you mean by underflow in this case ? You mean use after free >>> because of extra dma_fence_put() ?

Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-24 Thread JingWen Chen
On 2021/10/23 上午4:41, Andrey Grodzovsky wrote: > > What do you mean by underflow in this case ? You mean use after free because > of extra dma_fence_put() ? yes > > On 2021-10-22 4:14 a.m., JingWen Chen wrote: >> ping >> >> On 2021/10/22 AM11:33, Jingwen Chen wr

Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-22 Thread JingWen Chen
ping On 2021/10/22 AM11:33, Jingwen Chen wrote: > [Why] > In advance tdr mode, the real bad job will be resubmitted twice, while > in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job > is put one more time than other jobs. > > [How] > Adding dma_fence_ge

[PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-21 Thread Jingwen Chen
for normal jobs Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41ce86244144..975f069f6fe8 100644 --- a/drivers/gpu/drm/amd

[PATCH v2] drm/amd/amdgpu: add dummy_page_addr to sriov msg

2021-10-21 Thread Jingwen Chen
Add dummy_page_addr to sriov msg for host driver to set GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly. v2: should update vf2pf msg instead Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 3 ++- 2 files

[PATCH] drm/amd/amdgpu: add dummy_page_addr to sriov msg

2021-10-21 Thread Jingwen Chen
Add dummy_page_addr to sriov msg for host driver to set GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++- 2 files changed, 4 insertions(+), 1 deletion

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-09-06 Thread Jingwen Chen
deleted from pending list. While if we use the ordered workqueue for timedout in the driver, there will be no bailing job. Do you have any suggestions? Best Regards, JingWen Chen On Mon Sep 06, 2021 at 02:36:52PM +0800, Liu, Monk wrote: > [AMD Official Use Only] > > > I'm fearing that ju

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-08-31 Thread Jingwen Chen
On Wed Sep 01, 2021 at 12:28:59AM -0400, Andrey Grodzovsky wrote: > > On 2021-09-01 12:25 a.m., Jingwen Chen wrote: > > On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote: > > > I will answer everything here - > > > > > > O

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-08-31 Thread Jingwen Chen
On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote: > I will answer everything here - > > On 2021-08-31 9:58 p.m., Liu, Monk wrote: > > > [AMD Official Use Only] > > > > In the previous discussion, you guys stated that we should drop the > “kthread_should_park”

Re: [PATCH] drm/amd/amdgpu: Add ready_to_reset resp for vega10

2021-08-27 Thread Jingwen Chen
Reviewed-by: Jingwen Chen On Fri Aug 27, 2021 at 02:56:51PM +0800, YuBiao Wang wrote: > Send response to host after received the flr notification from host. > Port NV change to vega10. > > Signed-off-by: YuBiao Wang > --- > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 ++ >

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Jingwen Chen
; -Original Message- > > From: Daniel Vetter > > Sent: Thursday, August 19, 2021 5:31 PM > > To: Grodzovsky, Andrey > > Cc: Daniel Vetter ; Alex Deucher ; > > Chen, JingWen ; Maling list - DRI developers > > ; amd-gfx list > > ; Liu, Monk ; Koenig, >

[PATCH v3] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Jingwen Chen
revert this commit. This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. v2: add dma_fence_get/put() around timedout_job to avoid concurrent delete during processing timedout_job v3: park sched->thread instead during timedout_job. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/schedu

Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-18 Thread Jingwen Chen
Sorry, just get what you mean, will submit a v2 patch. On Wed Aug 18, 2021 at 04:08:37PM +0800, Jingwen Chen wrote: > On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote: > > > > > > Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky: > > > On 2021-08-17

Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-18 Thread Jingwen Chen
On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote: > > > Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky: > > On 2021-08-17 12:28 a.m., Jingwen Chen wrote: > > > [Why] > > > for bailing job, this commit will delete it from pending list thus the

[PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-18 Thread Jingwen Chen
revert this commit. This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. v2: add dma_fence_get/put() around timedout_job to avoid concurrent delete during processing timedout_job Signed-off-by: Jingwen Chen --- drivers/gpu/drm/scheduler/sched_main.c | 23 +-- 1 file

[PATCH] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-16 Thread Jingwen Chen
revert this commit. This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/scheduler/sched_main.c | 27 -- 1 file changed, 27 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler

Re: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-11 Thread Jingwen Chen
st 11, 2021 12:41 AM > To: Chen, JingWen ; amd-gfx@lists.freedesktop.org > Cc: Liu, Monk ; Koenig, Christian > ; Jack Zhang ; Jack Zhang > > Subject: Re: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job > > Reviewed-by: Andrey Grodzovsky > > Andrey >

[PATCH v5] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-11 Thread Jingwen Chen
: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang Reviewed-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu

Re: [PATCHv2 2/2] drm/amd/amdgpu: add tdr support for embeded hw_fence

2021-08-10 Thread Jingwen Chen
Hi Andrey, The latest patch [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job has been sent to amd-gfx. can you help review this patch? Best Regards, Jingwen On Tue Aug 10, 2021 at 10:51:17AM +0800, Jingwen Chen wrote: > On Mon Aug 09, 2021 at 12:24:37PM -0400, Andrey Grodzovsky wr

Re: [PATCHv2 2/2] drm/amd/amdgpu: add tdr support for embeded hw_fence

2021-08-10 Thread Jingwen Chen
On Mon Aug 09, 2021 at 12:24:37PM -0400, Andrey Grodzovsky wrote: > > On 2021-08-05 4:31 a.m., Jingwen Chen wrote: > > [Why] > > After embeded hw_fence to amdgpu_job, we need to add tdr support > > for this feature. > > > > [How] > > 1. Add a resubmi

[PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-09 Thread Jingwen Chen
: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd

Re: [PATCHv2 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-09 Thread Jingwen Chen
On Mon Aug 09, 2021 at 10:18:37AM +0800, Jingwen Chen wrote: > On Fri Aug 06, 2021 at 11:48:04AM +0200, Christian König wrote: > > > > > > Am 06.08.21 um 07:52 schrieb Jingwen Chen: > > > On Thu Aug 05, 2021 at 05:13:22PM -0400, Andrey Grodzovsky wrote: > >

Re: [PATCHv2 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-09 Thread Jingwen Chen
On Fri Aug 06, 2021 at 11:48:04AM +0200, Christian König wrote: > > > Am 06.08.21 um 07:52 schrieb Jingwen Chen: > > On Thu Aug 05, 2021 at 05:13:22PM -0400, Andrey Grodzovsky wrote: > > > On 2021-08-05 4:31 a.m., Jingwen Chen wrote: > > > > From: Jack Zhang

[PATCHv3 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-08 Thread Jingwen Chen
Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 62 - drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +- drivers/gpu

Re: [PATCHv2 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-06 Thread Jingwen Chen
On Thu Aug 05, 2021 at 05:13:22PM -0400, Andrey Grodzovsky wrote: > > On 2021-08-05 4:31 a.m., Jingwen Chen wrote: > > From: Jack Zhang > > > > Why: Previously hw fence is alloced separately with job. > > It caused historical lifetime issues and corner cases. >

[PATCHv2 2/2] drm/amd/amdgpu: add tdr support for embeded hw_fence

2021-08-05 Thread Jingwen Chen
for guilty jobs. v2: use a job_run_counter in amdgpu_job to replace the resubmit_flag in drm_sched_job. When the job_run_counter >= 1, it means this job is a resubmit job. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/

[PATCHv2 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-05 Thread Jingwen Chen
into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embeded in a job. Signed-off-by: Jingwen Chen Signed-off-by: Jack

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-23 Thread Jingwen Chen
On Fri Jul 23, 2021 at 10:45:49AM +0200, Christian König wrote: > Am 23.07.21 um 09:07 schrieb Jingwen Chen: > > [SNIP] > > Hi Christian, > > > > The thing is vm flush fence has no job passed to amdgpu_fence_emit, so > > go through the jobs cannot help

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-23 Thread Jingwen Chen
On Fri Jul 23, 2021 at 08:33:02AM +0200, Christian König wrote: > Am 22.07.21 um 18:47 schrieb Jingwen Chen: > > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: > > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: > > > > On 2021-07-22

[PATCH v2] drm: add tdr support for embeded hw_fence

2021-07-23 Thread Jingwen Chen
for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1

[PATCH v2 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-07-23 Thread Jingwen Chen
into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu

[PATCH] drm: add tdr support for embeded hw_fence

2021-07-23 Thread Jingwen Chen
for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1

[PATCH 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-07-23 Thread Jingwen Chen
into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-23 Thread Jingwen Chen
On Fri Jul 23, 2021 at 12:06:32AM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 8:20 p.m., Jingwen Chen wrote: > > On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote: > > > On 2021-07-22 1:27 p.m., Jingwen Chen wrote: > > > > On Thu Jul 22,

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Jingwen Chen
On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 1:27 p.m., Jingwen Chen wrote: > > On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: > > > On 2021-07-22 12:47 p.m., Jingwen Chen wrote: > > > > On Thu Jul 22, 20

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Jingwen Chen
On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 12:47 p.m., Jingwen Chen wrote: > > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: > > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: > > > > On 2021-07

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Jingwen Chen
On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote: > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky: > > > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote: > > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > > > On 2021-

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Jingwen Chen
On Thu Jul 22, 2021 at 10:45:40AM -0400, Andrey Grodzovsky wrote: > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote: > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > > > [Why] > > &

Re: [PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-22 Thread Jingwen Chen
On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote: > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote: > > [Why] > > After embeded hw_fence to amdgpu_job, we need to add tdr support > > for this feature. > > > > [How] > > 1. Add a resubmi

[PATCH 1/2] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-07-21 Thread Jingwen Chen
into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm: add tdr support for embeded hw_fence

2021-07-21 Thread Jingwen Chen
for guilty jobs. Signed-off-by: Jack Zhang Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++- drivers/gpu/drm/scheduler/sched_main.c | 1

[PATCH v2] drm/amd/amdgpu: consider kernel job always not guilty

2021-07-21 Thread Jingwen Chen
, then the innocent sdma job will be set to guilty. This will lead to a page fault after resubmitting job. [How] If the job is a kernel job, we will always consider it not guilty Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3

[PATCH] drm/amd/amdgpu: consider paging job always not guilty

2021-07-20 Thread Jingwen Chen
, then the innocent sdma job will be set to guilty. This will lead to a page fault after resubmitting job. [How] If the job is a paging job, we will always consider it not guilty Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3

[PATCH] drm/amd/amdgpu: vm entities should have kernel priority

2021-07-19 Thread Jingwen Chen
job will be set to guilty as it only has NORMAL priority. This will lead to a page fault after resubmitting job. [How] sdma should always have KERNEL priority. The kernel job will always be resubmitted. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file

[PATCH] drm/amdgpu: SRIOV flr_work should take write_lock

2021-07-01 Thread Jingwen Chen
[Why] If flr_work takes read_lock, then other threads who takes read_lock can access hardware when host is doing vf flr. [How] flr_work should take write_lock to avoid this case. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 4 ++-- drivers/gpu/drm/amd/amdgpu

[PATCHv2] drm/amd/amdgpu:save psp ring wptr to avoid attack

2021-05-26 Thread Jingwen Chen
: Idee78e8c1c781463048f2f6311fdc70488ef05b2 Signed-off-by: Victor Zhao Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 + drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 3 ++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu

[PATCH] drm/amd/amdgpu:save psp ring wptr in SRIOV to avoid attack

2021-05-26 Thread Jingwen Chen
From: Victor Zhao save psp ring wptr in SRIOV to avoid attack to avoid extra changes to MP0_SMN_C2PMSG_102 reg Change-Id: Idee78e8c1c781463048f2f6311fdc70488ef05b2 Signed-off-by: Victor Zhao Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 + drivers/gpu/drm/amd

[PATCH] drm/amd/amdgpu: fix refcount leak

2021-05-17 Thread Jingwen Chen
[Why] the gem object rfb->base.obj[0] is get according to num_planes in amdgpufb_create, but is not put according to num_planes [How] put rfb->base.obj[0] in amdgpu_fbdev_destroy according to num_planes Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 3 +++

[PATCH] drm/amd/amdgpu: destroy pinned gem obj according to refcount

2021-05-17 Thread Jingwen Chen
092ae120 R08: 7ffdfa3d6551 R09: [324584.592188] R10: 7fea6f660c40 R11: 0206 R12: 55b9092ae188 [324584.592189] R13: 0001 R14: 55b9092ae188 R15: 7ffdfa3d8990 [324584.592190] ---[ end trace 4ea03bb6309ad6c3 ]--- Signed-off-by:

[PATCH] drm/amd/amdgpu: add fini virt data exchange to ip_suspend

2021-03-04 Thread Jingwen Chen
[Why] when try to shutdown guest vm in sriov mode, virt data exchange is not fini. After vram lost, trying to write vram could hang cpu. [How] add fini virt data exchange in ip_suspend Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- 1 file changed, 3

[PATCH] drm/amd/amdgpu: fini data exchange when req_gpu_fini in SRIOV

2021-03-04 Thread Jingwen Chen
Do fini data exchange everytime req_gpu_fini in SRIOV Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 3 +++ 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amd/amdgpu: move inc gpu_reset_counter after drm_sched_stop

2021-02-25 Thread Jingwen Chen
Move gpu_reset_counter after drm_sched_stop to avoid race condition caused by job submitted between reset_count +1 and drm_sched_stop. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH 2/2] drm/amd/amdgpu: force flush resubmit job

2021-02-24 Thread Jingwen Chen
[Why] when a job is scheduled during TDR(after device reset count increase and before drm_sched_stop), this job won't do vm_flush when resubmit itself after GPU reset done. This can lead to a page fault. [How] Always do vm_flush for resubmit job. Signed-off-by: Jingwen Chen --- drivers/gpu/drm

[PATCH 1/2] drm: add a flag to indicate job is resubmitted

2021-02-24 Thread Jingwen Chen
Add a flag in drm_sched_job to indicate the job resubmit. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/scheduler/sched_main.c | 2 ++ include/drm/gpu_scheduler.h| 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm

[PATCH] drm/amd/amdgpu: add error handling to amdgpu_virt_read_pf2vf_data

2021-01-19 Thread Jingwen Chen
[Why] when vram lost happened in guest, try to write vram can lead to kernel stuck. [How] When the readback data is invalid, don't do write work, directly reschedule a new work. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 +- 1 file changed, 5 insertions

[PATCH] drm/amd/amdgpu: remove redundant flush_delayed_work

2021-01-17 Thread Jingwen Chen
When using cancel_delayed_work_sync, there's no need to flush_delayed_work first. This sequence can lead to a redundant loop of work executing. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: skip power profile switch in sriov

2020-11-26 Thread Jingwen Chen
power profile switch in vcn need to send SetWorkLoad msg to smu, which is not supported in sriov. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c

[PATCH] drm/amdgpu: skip power profile switch in sriov

2020-11-23 Thread Jingwen Chen
power profile switch in vcn need to send SetWorkLoad msg to smu, which is not supported in sriov. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c

[PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

2020-09-23 Thread Jingwen Chen
smc, sdma, sos, ta and asd fw is not used in SRIOV. Skip them to accelerate sw_init for navi12. v2: skip above fw in SRIOV for vega10 and sienna_cichlid v3: directly skip psp fw loading in SRIOV Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 10

[PATCH 1/2] drm/amd/pm: Skip use smc fw data in SRIOV

2020-09-23 Thread Jingwen Chen
smc fw is not needed in SRIOV, thus driver should not try to get smc fw data. Signed-off-by: Jingwen Chen --- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 61 ++- 1 file changed, 32 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b

[PATCH 1/2] drm/amd/pm: Skip use smc fw data in SRIOV

2020-09-22 Thread Jingwen Chen
smc fw is not needed in SRIOV, thus driver should not try to get smc fw data. Signed-off-by: Jingwen Chen --- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 61 ++- 1 file changed, 32 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b

[PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

2020-09-22 Thread Jingwen Chen
smc, sdma, sos, ta and asd fw is not used in SRIOV. Skip them to accelerate sw_init for navi12. v2: skip above fw in SRIOV for vega10 and sienna_cichlid Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 9 + drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c

[PATCH 1/2] drm/amd/pm: Skip use smc fw data in SRIOV

2020-09-17 Thread Jingwen Chen
smc fw is not needed in SRIOV, thus driver should not try to get smc fw data. Signed-off-by: Jingwen Chen --- .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 61 ++- 1 file changed, 32 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b

[PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

2020-09-17 Thread Jingwen Chen
smc, sdma, sos and asd fw is not used in SRIOV. Skip them to accelerate sw_init. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +--- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 +++ drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 3

[PATCH] drm/amd/pm: Skip smu_post_init in SRIOV

2020-09-17 Thread Jingwen Chen
smu_post_init needs to enable SMU feature, while this require virtualization off. Skip it since this feature is not used in SRIOV. v2: move the check to the early stage of smu_post_init. v3: fix typo Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 3 +++ 1

[PATCH] drm/amd/pm: Skip smu_post_init in SRIOV

2020-09-17 Thread Jingwen Chen
smu_post_init needs to enable SMU feature, while this require virtualization off. Skip it since this feature is not used in SRIOV. v2: move the check to the early stage of smu_post_init. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 3 +++ 1 file changed, 3

[PATCH] drm/amd/pm: Skip smu_post_init in SRIOV

2020-09-17 Thread Jingwen Chen
smu_post_init needs to enable SMU feature, while this require virtualization off. Skip it since this feature is not used in SRIOV. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu

  1   2   >