RE: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Chen, Guchun
+ msleep((HZ / 100) < 1) ? 1 : HZ / 100); a "(" is missed. With it fixed, this patch is: Acked-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of xinhui pan Sent: Wednesday, April 13, 2022 11:09 AM To: amd-gfx@lists.freedesktop.org Cc:

[PATCH v2] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread xinhui pan
ttm_device_delayed_workqueue would reschedule itself if there is pending BO to be destroyed. So just one flush + cancel_sync is not enough. We still see lru_list not empty warnging. Fix it by waiting all BO to be destroyed. Acked-by: Guchun Chen Signed-off-by: xinhui pan ---

Re: 回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Christian König
That warning is a bit more than a little annoying. Before we stop the delayed delete worker we *must* absolutely make sure that there is nothing going on the hardware any more. Otherwise we could easily run into use after free issues. There should somewhere be a amdgpu_fence_wait_empty()

Re: [PATCH 1/2] drm/amd/amdgpu: Update PF2VF header

2022-04-13 Thread Paul Menzel
[Removed unintended paste in second line] Am 13.04.22 um 09:03 schrieb Paul Menzel: Dear Bokun, Thank you for rerolling the patch. Please add the iteration/version in the subject next time `[PATCH v2 1/2]` or so. Am 12.04.22 um 23:31 schrieb Bokun Zhang: - Add proper indentation in the

[PATCH] drm/amdkfd: potential NULL dereference in kfd_set/reset_event()

2022-04-13 Thread Dan Carpenter
If lookup_event_by_id() returns a NULL "ev" pointer then the spin_lock(>lock) will crash. This was detected by Smatch: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event() error: we previously assumed 'ev' could be null (see line 639) Fixes: 5273e82c5f47 ("drm/amdkfd:

回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Pan, Xinhui
[AMD Official Use Only] The log from tester says it is the drm framebuffer BO being busy. I just feel there is lack of time for its fence to be signaled. As a delay works too in my test. But the warning is a little annoying. 发件人: Koenig, Christian 发送时间:

Re: 回复: [PATCH] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Christian König
I think for now we should just have a the following code in amdgpu_vm_fini: dma_fence_wait(vm->last_tlb_flush, false); /* Make sure that all fence callbacks have completed*/ spinlock(vm->last_tlb_flush->lock); spinunlock(vm->last_tlb_flush->lock); dma_fence_put(vm->last_tlb_flush); Cleaning

Re: [PATCH 1/2] drm/amd/amdgpu: Update PF2VF header

2022-04-13 Thread Paul Menzel
Dear Bokun, drm/amd/amdgpu: Update PF2VF header Thank you for rerolling the patch. Please add the iteration/version in the subject next time `[PATCH v2 1/2]` or so. Am 12.04.22 um 23:31 schrieb Bokun Zhang: - Add proper indentation in the header file Please use that as the commit message

Re: Vega 56 failing to process EDID from VR Headset

2022-04-13 Thread Paul Menzel
Dear James, Am 13.04.22 um 00:13 schrieb James Dutton: On Tue, 12 Apr 2022 at 07:13, Paul Menzel wrote: Am 11.04.22 um 23:39 schrieb James Dutton: So, did you do any changes to Linux? Why do you think the EDID is at fault? […] I suggest to analyze, why `No DP link bandwidth` is logged. The

Re: [PATCH v2] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Paul Menzel
Dear xinhui, Am 13.04.22 um 08:46 schrieb xinhui pan: ttm_device_delayed_workqueue would reschedule itself if there is pending BO to be destroyed. So just one flush + cancel_sync is not enough. We still see lru_list not empty warnging. warning (`scripts/checkpatch.pl --codespell` should

Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Paul Menzel
Dear Richard, Thank you for sending out v4. Am 12.04.22 um 23:50 schrieb Richard Gong: Active State Power Management (ASPM) feature is enabled since kernel 5.14. There are some AMD GFX cards (such as WX3200 and RX640) that won't work with ASPM-enabled Intel Alder Lake based systems. Using

AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Koenig, Christian
We don't need that. TTM only reschedules when the BOs are still busy. And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :) Regards, Christian Von: Pan, Xinhui Gesendet: Mittwoch, 13. April 2022 05:08

Re: [PATCH] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Daniel Vetter
On Tue, 12 Apr 2022 at 14:11, Christian König wrote: > > Am 12.04.22 um 14:03 schrieb xinhui pan: > > VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. > > We see the calltrace below. > > > > Fix it by keeping the last flush fence around and wait for it to signal > > > > BUG

Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Alex Deucher
On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote: > > Dear Richard, > > > Thank you for sending out v4. > > Am 12.04.22 um 23:50 schrieb Richard Gong: > > Active State Power Management (ASPM) feature is enabled since kernel 5.14. > > There are some AMD GFX cards (such as WX3200 and RX640) that

RE: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Limonciello, Mario
[Public] > On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel > wrote: > > > > Dear Richard, > > > > > > Thank you for sending out v4. > > > > Am 12.04.22 um 23:50 schrieb Richard Gong: > > > Active State Power Management (ASPM) feature is enabled since kernel > 5.14. > > > There are some AMD GFX

Re: [PATCH v3 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-13 Thread Luben Tuikov
Looks good! Thanks. Reviewed-by: Luben Tuikov On 2022-04-13 01:08, Darren Powell wrote: > Clarify the smu_cmn_send_smc_msg_with_param documentation to mention two > cases exist where messages are silently dropped with no error returned. > These cases occur in unusual situations where either: >

Re: gcc inserts __builtin_popcount, causes 'modpost: "__popcountdi2" ... amdgpu.ko] undefined'

2022-04-13 Thread Sergei Trofimovich
On Mon, Apr 11, 2022 at 10:08:15PM +0100, Sergei Trofimovich wrote: > Current linux-5.17.1 on fresh gcc-12 fails to build with errors like: > > ERROR: modpost: "__popcountdi2" > [drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko] undefined! > ERROR: modpost: "__popcountdi2"

Re: [PATCH v2] Revert "drm/amd/display: Pass HostVM enable flag into DCN3.1 DML"

2022-04-13 Thread Alex Deucher
On Tue, Apr 12, 2022 at 5:03 PM Rodrigo Siqueira wrote: > > This reverts commit 367b3e934f578f6c0d5d8ca5987dc6ac4cd6831d. > > While we were testing DCN3.1 with a hub, we noticed that only one of 2 > connected displays lights up when using some specific display > resolution. In summary, this was

Re: [PATCH] drm/amdkfd: potential NULL dereference in kfd_set/reset_event()

2022-04-13 Thread Felix Kuehling
Am 2022-04-13 um 03:36 schrieb Dan Carpenter: If lookup_event_by_id() returns a NULL "ev" pointer then the spin_lock(>lock) will crash. This was detected by Smatch: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event() error: we previously assumed 'ev' could be null

Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Nathan Chancellor
Hi Richard, On Tue, Apr 12, 2022 at 04:50:00PM -0500, Richard Gong wrote: > Active State Power Management (ASPM) feature is enabled since kernel 5.14. > There are some AMD GFX cards (such as WX3200 and RX640) that won't work > with ASPM-enabled Intel Alder Lake based systems. Using these GFX

[PATCH 1/2] drm/amdkfd: Fix GWS queue count

2022-04-13 Thread David Yat Sin
Queue can be inactive during process termination. This would cause dqm->gws_queue_count to not be decremented. There can only be 1 GWS queue per device process so moving the logic out of loop. Signed-off-by: David Yat Sin --- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c| 12

[PATCH 2/2] drm/amdkfd: CRIU add support for GWS queues

2022-04-13 Thread David Yat Sin
Adding support to checkpoint/restore GWS(Global Wave Sync) queues. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 ++-- .../amd/amdkfd/kfd_process_queue_manager.c| 22 ++- 3 files

[PATCH] drm/amd/amdgpu: Remove static from variable in RLCG Reg RW.

2022-04-13 Thread Gavin Wan
[why] These static variables saves the RLC Scratch registers address. When we installed multiple GPUs (for example: XGMI setting) and multiple GPUs call the function at same time. The RLC Scratch registers address are changed each other. Then it caused reading/writing to

Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-04-13 Thread Shuotao Xu
On Apr 11, 2022, at 11:52 PM, Andrey Grodzovsky mailto:andrey.grodzov...@amd.com>> wrote: [Some people who received this message don't often get email from andrey.grodzov...@amd.com. Learn why this is important at

[PATCH] drm/radeon: Add build directory to include path

2022-04-13 Thread Michel Dänzer
From: Michel Dänzer Fixes compile errors with out-of-tree builds, e.g. ../drivers/gpu/drm/radeon/r420.c:38:10: fatal error: r420_reg_safe.h: No such file or directory 38 | #include "r420_reg_safe.h" | ^ Signed-off-by: Michel Dänzer ---

Re: AMDGPU: regression on 5.17.1

2022-04-13 Thread Michele Ballabio
On Mon, 11 Apr 2022 14:34:37 -0400 Alex Deucher wrote: > On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio > wrote: > > > > On Tue, 5 Apr 2022 10:23:16 -0400 > > Alex Deucher wrote: > > > > > On Mon, Apr 4, 2022 at 3:39 PM Michele Ballabio > > > wrote: > > > > > > > > On Mon, 4 Apr 2022

RE: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-13 Thread Michael Kelley (LINUX)
From: Alex Deucher Sent: Tuesday, April 12, 2022 7:13 AM > > On Tue, Apr 12, 2022 at 4:01 AM Paul Menzel wrote: > > > > [Cc: +x86 folks] > > > > Dear Alex, dear x86 folks, > > > > > > x86 folks, can you think of alternatives to access `X86_HYPER_MS_HYPERV` > > from

Re: AMDGPU: regression on 5.17.1

2022-04-13 Thread Michele Ballabio
On Wed, 13 Apr 2022 14:14:42 -0400 Alex Deucher wrote: > On Wed, Apr 13, 2022 at 1:33 PM Michele Ballabio > wrote: > > > > On Mon, 11 Apr 2022 14:34:37 -0400 > > Alex Deucher wrote: > > > > > On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio > > > wrote: > > > > > > > > On Tue, 5 Apr 2022

[PATCH] drm/amdgpu: Move reset domain locking in DPC handler

2022-04-13 Thread Andrey Grodzovsky
Lock reset domain unconditionally because on resume we unlock it unconditionally. This solved mutex deadlock when handling both FATAL and non FATAL PCI errors one after another. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++--- 1 file changed,

Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-04-13 Thread Andrey Grodzovsky
On 2022-04-13 12:03, Shuotao Xu wrote: On Apr 11, 2022, at 11:52 PM, Andrey Grodzovsky wrote: [Some people who received this message don't often get email fromandrey.grodzov...@amd.com. Learn why this is important athttp://aka.ms/LearnAboutSenderIdentification.] On 2022-04-08 21:28,

Re: AMDGPU: regression on 5.17.1

2022-04-13 Thread Alex Deucher
On Wed, Apr 13, 2022 at 1:33 PM Michele Ballabio wrote: > > On Mon, 11 Apr 2022 14:34:37 -0400 > Alex Deucher wrote: > > > On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio > > wrote: > > > > > > On Tue, 5 Apr 2022 10:23:16 -0400 > > > Alex Deucher wrote: > > > > > > > On Mon, Apr 4, 2022 at

[PATCH] drm/amdgpu: don't runtime suspend if there are displays attached (v2)

2022-04-13 Thread Alex Deucher
We normally runtime suspend when there are displays attached if they are in the DPMS off state, however, if something wakes the GPU we send a hotplug event on resume (in case any displays were connected while the GPU was in suspend) which can cause userspace to light up the displays again soon

Re: [PATCH v2] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Paul Menzel
Dear Xinhui, Thank you for rerolling the patch. Am 14.04.22 um 07:03 schrieb xinhui pan: VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. We see the calltrace below. Fix it by keeping the last flush fence around and wait for it to signal Nit: Please add a dot/period to the

[pull] amdgpu drm-fixes-5.18

2022-04-13 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 5.18. The following changes since commit 88711fa9a14f6f473f4a7645155ca51386e36c21: Merge tag 'drm-misc-fixes-2022-04-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes (2022-04-08 09:22:16 +1000) are available in the Git repository at:

[PATCH v2] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread xinhui pan
VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. We see the calltrace below. Fix it by keeping the last flush fence around and wait for it to signal BUG kmalloc-4k (Not tainted): Poison overwritten 0x9c88630414e8-0x9c88630414e8 @offset=5352. First byte 0x6c instead of