[PATCH] drm/amdkfd: Add fw version for 10.3.6

2022-06-06 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - General] It will happed error on loading firmware. we need add firmware version information. [ 309.650118] [drm] kiq ring mec 2 pipe 1 q 0 [ 309.652595] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 309.653402] kfd kfd: amdgpu: skipped

RE: [PATCH] drm/amdgpu: simplify amdgpu_ucode_get_load_type()

2022-06-06 Thread Chen, Guchun
Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Tuesday, June 7, 2022 2:40 AM To: Deucher, Alexander Cc: amd-gfx list Subject: Re: [PATCH] drm/amdgpu: simplify amdgpu_ucode_get_load_type() Ping? Alex On Tue, May 24, 2022 at

[PATCH] drm/amdgpu: Add MODE register to wave debug info in gfx11

2022-06-06 Thread Joseph Greathouse
All other chips, from gfx6-gfx10, now include the MODE register at the end of the wave debug state. This appears to have been missed in gfx11, so this patch adds in MODE to the debug state for gfx11. Signed-off-by: Joseph Greathouse --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 1 + 1 file

Re: [PATCH 3/3] drm/amdkfd: Extend KFD device topology to surface peer-to-peer links

2022-06-06 Thread Felix Kuehling
Am 2022-06-06 um 14:08 schrieb Ramesh Errabolu: Extend KFD device topology to surface peer-to-peer links among GPU devices connected over PCIe or xGMI. Enabling HSA_AMD_P2P is REQUIRED to surface peer-to-peer links. Prior to this KFD did not expose to user mode any P2P links or indirect links

[PATCH] umr: print MODE register as part of wave state

2022-06-06 Thread Joseph Greathouse
The MODE register contains detailed per-wave information, but UMR skipped printing it. This patch adds the ability to print each wave's MODE register as part of the wave scan operation, and prints the MODE register's sub-fields as part of the deeper print option. Signed-off-by: Joseph Greathouse

Re: [PATCH 2/3] drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs

2022-06-06 Thread Felix Kuehling
Am 2022-06-06 um 16:04 schrieb Felix Kuehling: Am 2022-06-06 um 14:07 schrieb Ramesh Errabolu: Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu Sorry, one more nit-pick inline. With

Re: [PATCH 2/3] drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs

2022-06-06 Thread Felix Kuehling
Am 2022-06-06 um 14:07 schrieb Ramesh Errabolu: Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu Sorry, one more nit-pick inline. With that fixed, the patch is Reviewed-by: Felix

Re: [PATCH] drm/amdgpu: simplify amdgpu_ucode_get_load_type()

2022-06-06 Thread Alex Deucher
Ping? Alex On Tue, May 24, 2022 at 10:09 PM Alex Deucher wrote: > > This is the same as the default case, so drop the extra > logic. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 20 > 1 file changed, 20 deletions(-) > > diff --git

RE: [PATCH 2/3] drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs

2022-06-06 Thread Errabolu, Ramesh
[AMD Official Use Only - General] Just posted updated patch addressing the comment -Original Message- From: Kuehling, Felix Sent: Monday, June 6, 2022 7:57 PM To: Errabolu, Ramesh ; amd-gfx list Subject: Re: [PATCH 2/3] drm/amdgpu: Add peer-to-peer support among PCIe connected AMD

[PATCH 3/3] drm/amdkfd: Extend KFD device topology to surface peer-to-peer links

2022-06-06 Thread Ramesh Errabolu
Extend KFD device topology to surface peer-to-peer links among GPU devices connected over PCIe or xGMI. Enabling HSA_AMD_P2P is REQUIRED to surface peer-to-peer links. Prior to this KFD did not expose to user mode any P2P links or indirect links that go over two or more direct hops. Old versions

[PATCH 2/3] drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs

2022-06-06 Thread Ramesh Errabolu
Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 300 ++

Re: [PATCH 6/6] Revert "drm/amd/display: Drop unnecessary guard from DC resource"

2022-06-06 Thread Alex Deucher
On Mon, Jun 6, 2022 at 2:02 PM Harry Wentland wrote: > > > > On 2022-06-06 12:17, Alex Deucher wrote: > > On Mon, Jun 6, 2022 at 10:16 AM Harry Wentland > > wrote: > >> > >> On 2022-06-03 14:50, Rodrigo Siqueira wrote: > >>> This reverts commit 78baa3c4dfff4375b109bc5e19663a2f7fad1190. > >>> >

Re: [PATCH 6/6] Revert "drm/amd/display: Drop unnecessary guard from DC resource"

2022-06-06 Thread Harry Wentland
On 2022-06-06 12:17, Alex Deucher wrote: > On Mon, Jun 6, 2022 at 10:16 AM Harry Wentland wrote: >> >> On 2022-06-03 14:50, Rodrigo Siqueira wrote: >>> This reverts commit 78baa3c4dfff4375b109bc5e19663a2f7fad1190. >>> >>> This commit introduced the below compilation error when using >>>

Re: [PATCH] drm/amdgpu: fix refcount underflow in device reset

2022-06-06 Thread Andrey Grodzovsky
On 2022-06-06 03:43, Yiqing Yao wrote: [why] A gfx job may be processed but not finished when reset begin from compute job timeout. drm_sched_resubmit_jobs_ext in sched_main assume submitted job unsignaled and always put parent fence. Resubmission for that job cause underflow. This fix is done

Re: [PATCH] drm/radeon: fix potential buffer overflow in ni_set_mc_special_registers()

2022-06-06 Thread Alex Deucher
Applied. Thanks! Alex On Mon, Jun 6, 2022 at 9:51 AM Alexey Kodanev wrote: > > The last case label can write two buffers 'mc_reg_address[j]' and > 'mc_data[j]' with 'j' offset equal to SMC_NISLANDS_MC_REGISTER_ARRAY_SIZE > since there are no checks for this value in both case labels after the

Re: [PATCH 6/6] Revert "drm/amd/display: Drop unnecessary guard from DC resource"

2022-06-06 Thread Alex Deucher
On Mon, Jun 6, 2022 at 10:16 AM Harry Wentland wrote: > > On 2022-06-03 14:50, Rodrigo Siqueira wrote: > > This reverts commit 78baa3c4dfff4375b109bc5e19663a2f7fad1190. > > > > This commit introduced the below compilation error when using > > allmodconfig: > > > > error: implicit declaration of

Re: [PATCH 2/2] drm/amdgpu/display: fix DCN3.2 Makefiles for non-x86

2022-06-06 Thread Harry Wentland
On 2022-06-06 12:00, Alex Deucher wrote: > On Mon, Jun 6, 2022 at 11:54 AM Harry Wentland wrote: >> >> On 2022-06-06 11:42, Alex Deucher wrote: >>> Add proper handling for PPC64. >>> >>> Reported-by: kernel test robot >>> Signed-off-by: Alex Deucher >>> --- >>>

Re: [PATCH 2/2] drm/amdgpu/display: fix DCN3.2 Makefiles for non-x86

2022-06-06 Thread Alex Deucher
On Mon, Jun 6, 2022 at 11:54 AM Harry Wentland wrote: > > On 2022-06-06 11:42, Alex Deucher wrote: > > Add proper handling for PPC64. > > > > Reported-by: kernel test robot > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/display/dc/dcn32/Makefile | 9 - > >

Re: [PATCH 2/2] drm/amdgpu/display: fix DCN3.2 Makefiles for non-x86

2022-06-06 Thread Harry Wentland
On 2022-06-06 11:42, Alex Deucher wrote: > Add proper handling for PPC64. > > Reported-by: kernel test robot > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/dc/dcn32/Makefile | 9 - > drivers/gpu/drm/amd/display/dc/dcn321/Makefile | 9 - > 2 files changed, 16

Re: [PATCH 1/2] drm/amdgpu/display: make some functions static

2022-06-06 Thread Harry Wentland
On 2022-06-06 11:42, Alex Deucher wrote: > Fixes "no previous prototype" warnings. > > Reported-by: kernel test robot > Signed-off-by: Alex Deucher Reviewed-by: Harry Wentland Harry > --- > .../gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 2 +- >

[PATCH 2/2] drm/amdgpu/display: fix DCN3.2 Makefiles for non-x86

2022-06-06 Thread Alex Deucher
Add proper handling for PPC64. Reported-by: kernel test robot Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/dcn32/Makefile | 9 - drivers/gpu/drm/amd/display/dc/dcn321/Makefile | 9 - 2 files changed, 16 insertions(+), 2 deletions(-) diff --git

[PATCH 1/2] drm/amdgpu/display: make some functions static

2022-06-06 Thread Alex Deucher
Fixes "no previous prototype" warnings. Reported-by: kernel test robot Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_dccg.c | 8 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dccg.c

[PATCH] drm/amdkfd: surface xgmi physical node id in the kfd topology

2022-06-06 Thread Jonathan Kim
Add the xgmi node's physical node id to the kfd topology node properties. The physical node id is a 0-indexed, monotonically increasing id that is statically assigned based on the node's frame buffer position within the hive. This is useful because the id is also used by the DF for xGMI peer to

Re: [PATCH V2 6/6] drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.0/7

2022-06-06 Thread Alex Deucher
Patches 2-6 are: Reviewed-by: Alex Deucher On Sun, Jun 5, 2022 at 11:13 PM Evan Quan wrote: > > PMFW will handle that properly. Driver involvement may cause some > unexpected issues. > > Signed-off-by: Evan Quan > Change-Id: I77da7d894485a3ac6a1a956e4d2605d0bc730c25 > --- >

Re: [RFC PATCH v2 00/27] DRM.debug on DYNAMIC_DEBUG, add trace events

2022-06-06 Thread jim . cromie
On Wed, May 25, 2022 at 9:02 AM Daniel Vetter wrote: > On Mon, May 16, 2022 at 04:56:13PM -0600, Jim Cromie wrote: > > DRM.debug API is 23 macros, issuing 10 exclusive categories of debug > > messages. By rough count, they are used 5140 times in the kernel. > > These all call drm_dbg or

Re: [PATCH] drm/amdkfd: Document and fix GTT BO kmap API

2022-06-06 Thread philip yang
On 2022-06-03 14:49, Felix Kuehling wrote: Removed an unused parameter from two functions and added kernel-doc comments. Reviewed-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 7

Re: [PATCH 2/3] drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs

2022-06-06 Thread Felix Kuehling
Am 2022-06-04 um 06:23 schrieb Errabolu, Ramesh: +bool amdgpu_device_is_peer_accessible(struct amdgpu_device *adev, + struct amdgpu_device *peer_adev) +{ +#ifdef CONFIG_HSA_AMD_P2P + bool p2p_access = false; + uint64_t address_mask =

Re: [PATCH 6/6] Revert "drm/amd/display: Drop unnecessary guard from DC resource"

2022-06-06 Thread Harry Wentland
On 2022-06-03 14:50, Rodrigo Siqueira wrote: > This reverts commit 78baa3c4dfff4375b109bc5e19663a2f7fad1190. > > This commit introduced the below compilation error when using > allmodconfig: > > error: implicit declaration of function ‘remove_hpo_dp_link_enc_from_ctx’; > did you mean

Re: [PATCH 5/6] drm/amd/display: Reduce frame size in the bouding box for DCN21

2022-06-06 Thread Harry Wentland
On 2022-06-03 14:50, Rodrigo Siqueira wrote: > GCC throw warnings for the function dcn21_update_bw_bounding_box and > dcn316_update_bw_bounding_box due to its frame size that looks like > this: > > error: the frame size of 1936 bytes is larger than 1024 bytes > [-Werror=frame-larger-than=] > >

Re: [PATCH 4/6] drm/amd/display: Reduce frame size in the bouding box for DCN31/316

2022-06-06 Thread Harry Wentland
On 2022-06-03 14:50, Rodrigo Siqueira wrote: > GCC throw warnings for the function dcn31_update_bw_bounding_box and > dcn316_update_bw_bounding_box due to its frame size that looks like > this: > > error: the frame size of 1936 bytes is larger than 1024 bytes > [-Werror=frame-larger-than=] > >

Re: [PATCH 3/6] drm/amd/display: Reduce frame size in the bouding box for DCN301

2022-06-06 Thread Harry Wentland
On 2022-06-03 14:50, Rodrigo Siqueira wrote: > GCC throw warnings for the function dcn301_fpu_update_bw_bounding_box > due to its frame size that looks like this: > > error: the frame size of 1936 bytes is larger than 1024 bytes > [-Werror=frame-larger-than=] > > For fixing this issue I

Re: [PATCH 2/6] drm/amd/display: Reduce frame size in the bouding box for DCN20

2022-06-06 Thread Harry Wentland
On 2022-06-03 14:50, Rodrigo Siqueira wrote: > GCC throw warnings for the function dcn20_update_bounding_box due to its > frame size that looks like this: > > error: the frame size of 1936 bytes is larger than 1024 bytes > [-Werror=frame-larger-than=] > > This commit fixes this issue by

Re: [PATCH 1/6] drm/amd/display: Remove duplicated macro

2022-06-06 Thread Harry Wentland
On 2022-06-03 14:50, Rodrigo Siqueira wrote: > Signed-off-by: Rodrigo Siqueira Reviewed-by: Harry Wentland Harry > --- > drivers/gpu/drm/amd/display/include/dal_asic_id.h | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h >

[PATCH] drm/radeon: fix potential buffer overflow in ni_set_mc_special_registers()

2022-06-06 Thread Alexey Kodanev
The last case label can write two buffers 'mc_reg_address[j]' and 'mc_data[j]' with 'j' offset equal to SMC_NISLANDS_MC_REGISTER_ARRAY_SIZE since there are no checks for this value in both case labels after the last 'j++'. Instead of changing '>' to '>=' there, add the bounds check at the start

RE: [PATCH 00/16] DC Patches May 30, 2022

2022-06-06 Thread Wheeler, Daniel
[Public] Hi all, This week this patchset was tested on the following systems: HP Envy 360, with Ryzen 5 4500U Lenovo Thinkpad T14s Gen2, with AMD Ryzen 5 5650U Sapphire Pulse RX5700XT Reference AMD RX6800 Engineering board with Ryzen 9 5900H These systems were tested on the following

Re: [PATCH] Revert "drm/amdgpu: Ensure the DMA engine is deactivated during set ups"

2022-06-06 Thread Alex Deucher
On Sun, Jun 5, 2022 at 10:14 PM Guchun Chen wrote: > > This reverts commit da38a66ac46e334f198afcd1b4d4554b4ddca0df. > > This causes regression in GPU reset related test. > > Cc: Alexander Deucher > Cc: ricet...@gmail.com > Signed-off-by: Guchun Chen Reviewed-by: Alex Deucher > --- >

Re: (REGRESSION bisected) Re: amdgpu errors (VM fault / GPU fault detected) with 5.19 merge window snapshots

2022-06-06 Thread Michal Kubecek
On Fri, Jun 03, 2022 at 11:49:31AM -0400, Alex Deucher wrote: > On Thu, Jun 2, 2022 at 10:22 AM Michal Kubecek wrote: > > > > On Thu, Jun 02, 2022 at 09:58:22AM -0400, Alex Deucher wrote: > > > On Fri, May 27, 2022 at 8:58 AM Michal Kubecek wrote: > > > > On Fri, May 27, 2022 at 11:00:39AM

Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout

2022-06-06 Thread Felix Kuehling
Am 2022-06-03 um 06:46 schrieb Christian König: Resources about to be destructed are not tied to BOs any more. Signed-off-by: Christian König Reviewed-by: Felix Kuehling --- drivers/gpu/drm/ttm/ttm_device.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git

Re: [PATCH V2 1/6] drm/amdgpu: enable ASPM support for PCIE 7.4.0/7.6.0

2022-06-06 Thread Lazar, Lijo
On 6/6/2022 8:41 AM, Evan Quan wrote: Enable ASPM support for PCIE 7.4.0 and 7.6.0. Signed-off-by: Evan Quan Change-Id: Ib3b0e106ff43ad49f0f815e6eeb5c756b6bf4550 --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +

Re: [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-06 Thread Christian König
Am 05.06.22 um 18:47 schrieb Daniel Vetter: On Fri, 27 May 2022 at 01:55, Dmitry Osipenko wrote: Introduce a common DRM SHMEM shrinker framework that allows to reduce code duplication among DRM drivers by replacing theirs custom shrinker implementations with the generic shrinker. In order to

[PATCH] drm/amdgpu/mes: only invalid/prime icache after finish loading both pipe MES FWs.

2022-06-06 Thread Yifan Zhang
invalid/prime icahce operation takes effect on both pipes cuconrrently, therefore CP_MES_IC_BASE_LO/HI and CP_MES_MDBASE_LO/HI have to be both set before prime icache. Otherwise MES hardware gets garbage data in above regsters and causes page fault [ 470.873200] amdgpu :33:00.0: amdgpu:

Re: (REGRESSION bisected) Re: amdgpu errors (VM fault / GPU fault detected) with 5.19 merge window snapshots

2022-06-06 Thread Christian König
Am 06.06.22 um 00:00 schrieb Michal Kubecek: [SNIP] This patch should help: https://patchwork.freedesktop.org/patch/488258/ After ~48 hours with this patch, still no apparent issues. Tested-by: Michal Kubecek Thanks, this could be optimized for gfx8 a bit if anybody is interested in a

Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout

2022-06-06 Thread Christian König
Am 04.06.22 um 00:44 schrieb Felix Kuehling: [+amd-gfx] On 2022-06-03 15:37, Felix Kuehling wrote: On 2022-06-03 06:46, Christian König wrote: Resources about to be destructed are not tied to BOs any more. I've been seeing a backtrace in that area with a patch series I'm working on, but

RE: [PATCH] drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.4

2022-06-06 Thread Huang, Tim
[AMD Official Use Only - General] Reviewed-by: Tim Huang -Original Message- From: Deucher, Alexander Sent: Friday, May 27, 2022 1:58 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Tim Subject: [PATCH] drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.4

[PATCH] drm/amdgpu: fix refcount underflow in device reset

2022-06-06 Thread Yiqing Yao
[why] A gfx job may be processed but not finished when reset begin from compute job timeout. drm_sched_resubmit_jobs_ext in sched_main assume submitted job unsignaled and always put parent fence. Resubmission for that job cause underflow. This fix is done in device reset to avoid changing drm