[pull] amdgpu, amdkfd drm-next-5.19

2022-05-11 Thread Alex Deucher
Hi Dave, Daniel,

New stuff for 5.19.  Mostly new IP block support.

The following changes since commit 3da2c38231a4c62dafdbd762a199cfacaccd0533:

  drm/amdgpu: Free user pages if amdgpu_cs_parser_bos failed (2022-04-28 
17:49:04 -0400)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.19-2022-05-11

for you to fetch changes up to 81c5495910e81c2cadcb9118ca0c8803ab3bde61:

  drm/amdgpu: Remove duplicated argument in vcn_v4_0 (2022-05-10 17:53:13 -0400)


amd-drm-next-5.19-2022-05-11:

amdgpu:
- OPTC updates
- NBIO 4.3 support
- IH 6.0 support
- GPUVM TLB flush fix
- HDP 6.0 support
- LTTPR fixes
- HDP 5.2 support
- NBIO 7.7 support
- SMUIO 13.x updates
- DP2 fixes
- GMC 11.0 support
- PSP 13.x updates
- SMU 13.x updates
- VCN RAS support
- GC 11.0 support
- SDMA 6.0 support
- VCN 4.0 support
- Misc code cleanups
- DCN CONFIG cleanup
- RAS fixes

amdkfd:
- GC 11.0 support


Alan Liu (1):
  drm/amd/display: do not disable an invalid irq source in hdp finish

Alex Deucher (15):
  drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init
  drm/amdgpu/psp: drop load/unload/init_shared_buf wrappers
  drm/amdgpu/psp: fix memory leak in terminate functions
  drm/amdgpu/psp: move shared buffer frees into single function
  drm/amdgpu/discovery: handle AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO in SMU
  drm/amdkfd: add helper to generate cache info from gfx config
  drm/amdgpu/discovery: add MES11 support
  drm/amdgpu/gfx11: remove some register fields that no longer exist
  Revert "drm/amdgpu: disable runpm if we are the primary adapter"
  Revert "fbdev: fbmem: add a helper to determine if an aperture is used by 
a fw fb"
  drm/amdgpu/discovery: set flag for GC 11.0.1
  drm/amdgpu: simplify nv and soc21 read_register functions
  drm/amdgpu: make smu_v13_0_7_check_fw_status() static
  drm/amdgpu/mes: fix format specifier for size_t
  Revert "drm/amd/pm: keep the BACO feature enabled for suspend"

Alex Hung (7):
  drm/amd/display: remove redundant CONFIG_DRM_AMD_DC_DCN in dc
  drm/amd/display: remove redundant CONFIG_DRM_AMD_DC_DCN in dce
  drm/amd/display: remove redundant CONFIG_DRM_AMD_DC_DCN in gpio
  drm/amd/display: remove redundant CONFIG_DRM_AMD_DC_DCN in irq
  drm/amd/display: remove redundant CONFIG_DRM_AMD_DC_DCN for z10
  drm/amd/display: remove redundant CONFIG_DRM_AMD_DC_DCN in amdgpu_dm
  drm/amd/display: remove unnecessary else by CONFIG_DRM_AMD_DC_DCN

Alice Wong (3):
  drm/amdgpu/psp: deallocate memory when psp_load_fw failed
  drm/amdgpu/ucode: Remove firmware load type check in amdgpu_ucode_free_bo
  drm/amdgpu/psp: Return failure when firmware failed to load in SRIOV

Andrey Grodzovsky (2):
  drm/amd/psp: Add C2P registers to mp_13_0_2 header
  drm/amdgpu/psp: Add VBIOS flash handler

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.115.0

Aric Cyr (3):
  drm/amd/display: Clean up pixel format types
  drm/amd/display: 3.2.184
  drm/amd/display: 3.2.185

Chengming Gui (8):
  drm/amd/amdgpu: adjust the fw load type list
  drm/amd/amdgpu: add more fw load type to fit new ASICs
  drm/amdgpu/discovery: add SMUIO_13_0_8 func support
  drm/amdgpu/psp13: add support for MP0 13.0.7
  drm/amdgpu/discovery: add psp13 support for PSP 13.0.7
  drm/amd/pm: add SMU_13_0_7 PMFW headers
  drm/amdgpu/swsmu: add smu 13.0.7 firmware
  drm/amd/pm: add SMU_13_0_7 ppt_funcs for SMU_13_0_7

Christian König (1):
  drm/amdgpu: nuke dynamic gfx scratch reg allocation

Dan Carpenter (1):
  drm/amdgpu/gfx11: unlock on error in gfx_v11_0_kiq_resume()

Danijel Slivka (1):
  amdgpu/pm: Disallow managing power profiles on SRIOV for Sienna Cichlid

Elena Sakhnovitch (2):
  drm/amdgpu: Remove trailing space
  drm/amd/pm: Disable fan control if not supported

Eric Bernstein (1):
  drm/amd/display: Add new DSC interface to disconnect from pipe

Eric Huang (2):
  drm/amdkfd: add asic support for SDMA 6.0.2
  drm/amdkfd: add asic support for GC 11.0.2

Evan Quan (38):
  drm/amd/pm: enable pp_dpm_vclk/dclk sysfs interface support for SMU 13.0.0
  drm/amd/pm: move bootup values retrieving to ->sw_init
  drm/amd/pm: correct the way for retrieving bootup clocks
  drm/amd/pm: update the hw initialization sequence around pptable setup
  drm/amdgpu: enable pptable ucode loading
  drm/amd/pm: enable SCPM support for SMU
  drm/amd/pm: correct SMU OverridePcieParameters related settings
  drm/amd/pm: enable the support for retrieving combo pptable
  drm/amd/smu: Update SMU13 support for SMU 13.0.0
  drm/amdgpu/soc21: enable ATHUB and MMHUB PG
  drm/amdgpu: add FGCG support
  drm/amdgpu: enable GFX CGCG/CGLS for 

Re: [PATCH v1 01/15] mm: add zone device coherent type memory support

2022-05-11 Thread Alistair Popple


Alex Sierra  writes:

[...]

> diff --git a/mm/rmap.c b/mm/rmap.c
> index fedb82371efe..d57102cd4b43 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1995,7 +1995,8 @@ void try_to_migrate(struct folio *folio, enum ttu_flags 
> flags)
>   TTU_SYNC)))
>   return;
>
> - if (folio_is_zone_device(folio) && !folio_is_device_private(folio))
> + if (folio_is_zone_device(folio) &&
> + (!folio_is_device_private(folio) && 
> !folio_is_device_coherent(folio)))
>   return;
>
>   /*

I vaguely recall commenting on this previously, or at least intending
to. In try_to_migrate_one() we have this:

if (folio_is_zone_device(folio)) {
unsigned long pfn = folio_pfn(folio);
swp_entry_t entry;
pte_t swp_pte;

/*
 * Store the pfn of the page in a special migration
 * pte. do_swap_page() will wait until the migration
 * pte is removed and then restart fault handling.
 */
entry = pte_to_swp_entry(pteval);
if (is_writable_device_private_entry(entry))
entry = make_writable_migration_entry(pfn);
else
entry = make_readable_migration_entry(pfn);
swp_pte = swp_entry_to_pte(entry);

The check in try_to_migrate() guarantees that if folio_is_zone_device()
is true this must be a DEVICE_PRIVATE page and it treats it as such by
assuming there is a special device private swap entry there.

Relying on that assumption seems bad, and I have no idea why I didn't
just use is_device_private_page() originally but I think the fix is just
to change this to:

if (folio_is_device_private(folio))

And let DEVICE_COHERENT pages fall through to normal page processing.

 - Alistair


[PATCH 5/8] drm/amdgpu/pm: enable swsmu for SMU IP v13.0.4

2022-05-11 Thread Alex Deucher
From: Tim Huang 

Add the entry to set the ppt functions for SMU IP v13.0.4.

Signed-off-by: Tim Huang 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 956062496202..ab32277f9108 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -38,6 +38,7 @@
 #include "yellow_carp_ppt.h"
 #include "cyan_skillfish_ppt.h"
 #include "smu_v13_0_0_ppt.h"
+#include "smu_v13_0_4_ppt.h"
 #include "smu_v13_0_5_ppt.h"
 #include "smu_v13_0_7_ppt.h"
 #include "amd_pcie.h"
@@ -550,6 +551,9 @@ static int smu_set_funcs(struct amdgpu_device *adev)
case IP_VERSION(13, 0, 8):
yellow_carp_set_ppt_funcs(smu);
break;
+   case IP_VERSION(13, 0, 4):
+   smu_v13_0_4_set_ppt_funcs(smu);
+   break;
case IP_VERSION(13, 0, 5):
smu_v13_0_5_set_ppt_funcs(smu);
break;
-- 
2.35.3



[PATCH 7/8] drm/amdgpu/discovery: add SMU v13.0.4 into the IP discovery list

2022-05-11 Thread Alex Deucher
From: Xiaojian Du 

This patch will add SMU v13.0.4 into the IP discovery list.

Signed-off-by: Xiaojian Du 
Reviewed-by: Huang Rui 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index d7b0fd1cad62..881570dced41 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1656,6 +1656,7 @@ static int amdgpu_discovery_set_smu_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(13, 0, 1):
case IP_VERSION(13, 0, 2):
case IP_VERSION(13, 0, 3):
+   case IP_VERSION(13, 0, 4):
case IP_VERSION(13, 0, 5):
case IP_VERSION(13, 0, 7):
case IP_VERSION(13, 0, 8):
-- 
2.35.3



[PATCH 4/8] drm/amdgpu/pm: add swsmu ppt implementation for SMU IP v13.0.4

2022-05-11 Thread Alex Deucher
From: Tim Huang 

Add swsmu ppt files for SMU IP v13.0.4.

Signed-off-by: Tim Huang 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/Makefile   |3 +-
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c  | 1044 +
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.h  |   28 +
 3 files changed, 1074 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.h

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/Makefile 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/Makefile
index 4711fb53dd19..9043f6ef1aee 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/Makefile
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/Makefile
@@ -23,7 +23,8 @@
 # Makefile for the 'smu manager' sub-component of powerplay.
 # It provides the smu management services for the driver.
 
-SMU13_MGR = smu_v13_0.o aldebaran_ppt.o yellow_carp_ppt.o smu_v13_0_0_ppt.o 
smu_v13_0_5_ppt.o smu_v13_0_7_ppt.o
+SMU13_MGR = smu_v13_0.o aldebaran_ppt.o yellow_carp_ppt.o smu_v13_0_0_ppt.o 
smu_v13_0_4_ppt.o \
+   smu_v13_0_5_ppt.o smu_v13_0_7_ppt.o
 
 AMD_SWSMU_SMU13MGR = $(addprefix $(AMD_SWSMU_PATH)/smu13/,$(SMU13_MGR))
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
new file mode 100644
index ..7d6ff141b43f
--- /dev/null
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
@@ -0,0 +1,1044 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "smu_types.h"
+#define SWSMU_CODE_LAYER_L2
+
+#include "amdgpu.h"
+#include "amdgpu_smu.h"
+#include "smu_v13_0.h"
+#include "smu13_driver_if_v13_0_4.h"
+#include "smu_v13_0_4_ppt.h"
+#include "smu_v13_0_4_ppsmc.h"
+#include "smu_v13_0_4_pmfw.h"
+#include "smu_cmn.h"
+
+/*
+ * DO NOT use these for err/warn/info/debug messages.
+ * Use dev_err, dev_warn, dev_info and dev_dbg instead.
+ * They are more MGPU friendly.
+ */
+#undef pr_err
+#undef pr_warn
+#undef pr_info
+#undef pr_debug
+
+#define FEATURE_MASK(feature) (1ULL << feature)
+
+#define SMC_DPM_FEATURE ( \
+   FEATURE_MASK(FEATURE_CCLK_DPM_BIT) | \
+   FEATURE_MASK(FEATURE_VCN_DPM_BIT)| \
+   FEATURE_MASK(FEATURE_FCLK_DPM_BIT)   | \
+   FEATURE_MASK(FEATURE_SOCCLK_DPM_BIT) | \
+   FEATURE_MASK(FEATURE_MP0CLK_DPM_BIT) | \
+   FEATURE_MASK(FEATURE_LCLK_DPM_BIT)   | \
+   FEATURE_MASK(FEATURE_SHUBCLK_DPM_BIT)| \
+   FEATURE_MASK(FEATURE_DCFCLK_DPM_BIT) | \
+   FEATURE_MASK(FEATURE_ISP_DPM_BIT)| \
+   FEATURE_MASK(FEATURE_IPU_DPM_BIT)| \
+   FEATURE_MASK(FEATURE_GFX_DPM_BIT))
+
+static struct cmn2asic_msg_mapping smu_v13_0_4_message_map[SMU_MSG_MAX_COUNT] 
= {
+   MSG_MAP(TestMessage,PPSMC_MSG_TestMessage,  
1),
+   MSG_MAP(GetSmuVersion,  PPSMC_MSG_GetPmfwVersion,   
1),
+   MSG_MAP(GetDriverIfVersion, PPSMC_MSG_GetDriverIfVersion,   
1),
+   MSG_MAP(EnableGfxOff,   PPSMC_MSG_EnableGfxOff, 
1),
+   MSG_MAP(AllowGfxOff,PPSMC_MSG_AllowGfxOff,  
1),
+   MSG_MAP(DisallowGfxOff, PPSMC_MSG_DisallowGfxOff,   
1),
+   MSG_MAP(PowerDownVcn,   PPSMC_MSG_PowerDownVcn, 
1),
+   MSG_MAP(PowerUpVcn, PPSMC_MSG_PowerUpVcn,   
1),
+   MSG_MAP(SetHardMinVcn,  PPSMC_MSG_SetHardMinVcn,
1),
+   MSG_MAP(PrepareMp1ForUnload,PPSMC_MSG_PrepareMp1ForUnload,  
1),
+   MSG_MAP(SetDriverDramAddrHigh,  
PPSMC_MSG_SetDriverDramAddrHigh,1),
+   MSG_MAP(SetDriverDramAddrLow,   

[PATCH 8/8] drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.4

2022-05-11 Thread Alex Deucher
Set the default reset method to mode2 for SMU IP v13.0.4

Signed-off-by: Tim Huang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index c6a8520053bb..a4e23a7b3d63 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -309,6 +309,7 @@ static enum amd_reset_method
 soc21_asic_reset_method(struct amdgpu_device *adev)
 {
if (amdgpu_reset_method == AMD_RESET_METHOD_MODE1 ||
+   amdgpu_reset_method == AMD_RESET_METHOD_MODE2 ||
amdgpu_reset_method == AMD_RESET_METHOD_BACO)
return amdgpu_reset_method;
 
@@ -319,6 +320,8 @@ soc21_asic_reset_method(struct amdgpu_device *adev)
switch (adev->ip_versions[MP1_HWIP][0]) {
case IP_VERSION(13, 0, 0):
return AMD_RESET_METHOD_MODE1;
+   case IP_VERSION(13, 0, 4):
+   return AMD_RESET_METHOD_MODE2;
default:
if (amdgpu_dpm_is_baco_supported(adev))
return AMD_RESET_METHOD_BACO;
@@ -340,6 +343,10 @@ static int soc21_asic_reset(struct amdgpu_device *adev)
dev_info(adev->dev, "BACO reset\n");
ret = amdgpu_dpm_baco_reset(adev);
break;
+   case AMD_RESET_METHOD_MODE2:
+   dev_info(adev->dev, "MODE2 reset\n");
+   ret = amdgpu_dpm_mode2_reset(adev);
+   break;
default:
dev_info(adev->dev, "MODE1 reset\n");
ret = amdgpu_device_mode1_reset(adev);
-- 
2.35.3



[PATCH 6/8] drm/amdgpu/pm: add GFXOFF control IP version check for SMU IP v13.0.4

2022-05-11 Thread Alex Deucher
From: Tim Huang 

Enable the SMU IP v13.0.4 GFXOFF control

Signed-off-by: Tim Huang 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 14a7eef0b15f..ae6321af9d88 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -836,6 +836,7 @@ int smu_v13_0_gfx_off_control(struct smu_context *smu, bool 
enable)
case IP_VERSION(13, 0, 0):
case IP_VERSION(13, 0, 1):
case IP_VERSION(13, 0, 3):
+   case IP_VERSION(13, 0, 4):
case IP_VERSION(13, 0, 5):
case IP_VERSION(13, 0, 7):
case IP_VERSION(13, 0, 8):
-- 
2.35.3



[PATCH 2/8] drm/amdgpu/pm: add EnableGfxImu message dummy map for SMU IP v13.0.4

2022-05-11 Thread Alex Deucher
From: Tim Huang 

The SMU needs this message to trigger IMU initialization.

Signed-off-by: Tim Huang 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 799050ea7515..a1cb8e73e171 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -233,7 +233,8 @@
__SMU_DUMMY_MAP(Spare0),  \
__SMU_DUMMY_MAP(UnforceGfxVid),   \
__SMU_DUMMY_MAP(HeavySBR),  \
-   __SMU_DUMMY_MAP(SetBadHBMPagesRetiredFlagsPerChannel),
+   __SMU_DUMMY_MAP(SetBadHBMPagesRetiredFlagsPerChannel), \
+   __SMU_DUMMY_MAP(EnableGfxImu),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
-- 
2.35.3



[PATCH 3/8] drm/amdgpu/pm: add some common ppt functions for SMU IP v13.0.x

2022-05-11 Thread Alex Deucher
From: Tim Huang 

Add some common ppt functions that will be used by SMU IP v13.0.x
and drop the not used function smu_v13_0_mode2_reset.

Signed-off-by: Tim Huang 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h  |   9 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 106 +++---
 2 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
index 013be82db1f3..2b44d41a5157 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h
@@ -28,6 +28,7 @@
 #define SMU13_DRIVER_IF_VERSION_INV 0x
 #define SMU13_DRIVER_IF_VERSION_YELLOW_CARP 0x04
 #define SMU13_DRIVER_IF_VERSION_ALDE 0x08
+#define SMU13_DRIVER_IF_VERSION_SMU_V13_0_4 0x04
 #define SMU13_DRIVER_IF_VERSION_SMU_V13_0_5 0x04
 #define SMU13_DRIVER_IF_VERSION_SMU_V13_0_0 0x27
 #define SMU13_DRIVER_IF_VERSION_SMU_V13_0_7 0x28
@@ -224,8 +225,6 @@ int smu_v13_0_baco_set_state(struct smu_context *smu, enum 
smu_baco_state state)
 int smu_v13_0_baco_enter(struct smu_context *smu);
 int smu_v13_0_baco_exit(struct smu_context *smu);
 
-int smu_v13_0_mode2_reset(struct smu_context *smu);
-
 int smu_v13_0_get_dpm_ultimate_freq(struct smu_context *smu, enum smu_clk_type 
clk_type,
uint32_t *min, uint32_t *max);
 
@@ -293,5 +292,11 @@ int smu_v13_0_baco_enter(struct smu_context *smu);
 
 int smu_v13_0_baco_exit(struct smu_context *smu);
 
+int smu_v13_0_od_edit_dpm_table(struct smu_context *smu,
+   enum PP_OD_DPM_TABLE_COMMAND type,
+   long input[],
+   uint32_t size);
+
+int smu_v13_0_set_default_dpm_tables(struct smu_context *smu);
 #endif
 #endif
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index aee1741d98e9..14a7eef0b15f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -295,6 +295,9 @@ int smu_v13_0_check_fw_version(struct smu_context *smu)
case IP_VERSION(13, 0, 8):
smu->smc_driver_if_version = 
SMU13_DRIVER_IF_VERSION_YELLOW_CARP;
break;
+   case IP_VERSION(13, 0, 4):
+   smu->smc_driver_if_version = 
SMU13_DRIVER_IF_VERSION_SMU_V13_0_4;
+   break;
case IP_VERSION(13, 0, 5):
smu->smc_driver_if_version = 
SMU13_DRIVER_IF_VERSION_SMU_V13_0_5;
break;
@@ -1516,19 +1519,6 @@ int smu_v13_0_wait_for_event(struct smu_context *smu, 
enum smu_event_type event,
return ret;
 }
 
-int smu_v13_0_mode2_reset(struct smu_context *smu)
-{
-   int ret;
-
-   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset,
-   SMU_RESET_MODE_2, NULL);
-   /*TODO: mode2 reset wait time should be shorter, add ASIC specific func 
if required */
-   if (!ret)
-   msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
-
-   return ret;
-}
-
 int smu_v13_0_get_dpm_ultimate_freq(struct smu_context *smu, enum smu_clk_type 
clk_type,
uint32_t *min, uint32_t *max)
 {
@@ -2282,3 +2272,93 @@ int smu_v13_0_baco_exit(struct smu_context *smu)
return smu_v13_0_baco_set_state(smu,
SMU_BACO_STATE_EXIT);
 }
+
+int smu_v13_0_od_edit_dpm_table(struct smu_context *smu,
+   enum PP_OD_DPM_TABLE_COMMAND type,
+   long input[], uint32_t size)
+{
+   struct smu_dpm_context *smu_dpm = &(smu->smu_dpm);
+   int ret = 0;
+
+   /* Only allowed in manual mode */
+   if (smu_dpm->dpm_level != AMD_DPM_FORCED_LEVEL_MANUAL)
+   return -EINVAL;
+
+   switch (type) {
+   case PP_OD_EDIT_SCLK_VDDC_TABLE:
+   if (size != 2) {
+   dev_err(smu->adev->dev, "Input parameter number not 
correct\n");
+   return -EINVAL;
+   }
+
+   if (input[0] == 0) {
+   if (input[1] < smu->gfx_default_hard_min_freq) {
+   dev_warn(smu->adev->dev,
+"Fine grain setting minimum sclk (%ld) 
MHz is less than the minimum allowed (%d) MHz\n",
+input[1], 
smu->gfx_default_hard_min_freq);
+   return -EINVAL;
+   }
+   smu->gfx_actual_hard_min_freq = input[1];
+   } else if (input[0] == 1) {
+   if (input[1] > smu->gfx_default_soft_max_freq) {
+   dev_warn(smu->adev->dev,
+"Fine grain setting maximum sclk (%ld) 
MHz is greater than the maximum allowed (%d) MHz\n",
+   

[PATCH 0/8] Add support for SMU v13.0.4

2022-05-11 Thread Alex Deucher
Add initial support for SMU (System Management Unit)
version 13.0.4.  The SMU handles power management and
other tasks on the GPU.  Patch 1 adds large new header
files so I didn't send them to the list.

Alex Deucher (1):
  drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.4

Huang Rui (1):
  drm/amdgpu/pm: add smu v13.0.4 driver SMU if headers

Tim Huang (5):
  drm/amdgpu/pm: add EnableGfxImu message dummy map for SMU IP v13.0.4
  drm/amdgpu/pm: add some common ppt functions for SMU IP v13.0.x
  drm/amdgpu/pm: add swsmu ppt implementation for SMU IP v13.0.4
  drm/amdgpu/pm: enable swsmu for SMU IP v13.0.4
  drm/amdgpu/pm: add GFXOFF control IP version check for SMU IP v13.0.4

Xiaojian Du (1):
  drm/amdgpu/discovery: add SMU v13.0.4 into the IP discovery list

 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |1 +
 drivers/gpu/drm/amd/amdgpu/soc21.c|7 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |4 +
 .../inc/pmfw_if/smu13_driver_if_v13_0_4.h |  267 +
 .../pm/swsmu/inc/pmfw_if/smu_v13_0_4_pmfw.h   |  137 +++
 .../pm/swsmu/inc/pmfw_if/smu_v13_0_4_ppsmc.h  |  138 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |3 +-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h  |9 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/Makefile   |3 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  107 +-
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c  | 1044 +
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.h  |   28 +
 12 files changed, 1731 insertions(+), 17 deletions(-)
 create mode 100644 
drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu13_driver_if_v13_0_4.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_4_pmfw.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_4_ppsmc.h
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
 create mode 100644 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.h

-- 
2.35.3



[PATCH] drm/amdgpu/gmc11: avoid cpu accessing registers to flush VM

2022-05-11 Thread Alex Deucher
From: Jack Xiao 

Due to gfxoff on, cpu accessing registers is not expected.

v2: remove bug-on, fix the vmhub check

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 49 +-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 477f67d9b07c..63f3fc0a1e7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -258,6 +258,12 @@ static void gmc_v11_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
 static void gmc_v11_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
uint32_t vmhub, uint32_t flush_type)
 {
+   struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+   struct dma_fence *fence;
+   struct amdgpu_job *job;
+
+   int r;
+
if ((vmhub == AMDGPU_GFXHUB_0) && !adev->gfx.is_poweron)
return;
 
@@ -281,8 +287,49 @@ static void gmc_v11_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
}
 
mutex_lock(>mman.gtt_window_lock);
-   gmc_v11_0_flush_vm_hub(adev, vmid, vmhub, 0);
+
+   if (vmhub != AMDGPU_GFXHUB_0) {
+   gmc_v11_0_flush_vm_hub(adev, vmid, vmhub, 0);
+   mutex_unlock(>mman.gtt_window_lock);
+   return;
+   }
+
+   if (!adev->mman.buffer_funcs_enabled ||
+   !adev->ib_pool_ready ||
+   amdgpu_in_reset(adev) ||
+   ring->sched.ready == false) {
+   gmc_v11_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
+   mutex_unlock(>mman.gtt_window_lock);
+   return;
+   }
+
+   r = amdgpu_job_alloc_with_ib(adev, 16 * 4, AMDGPU_IB_POOL_IMMEDIATE,
+);
+   if (r)
+   goto error_alloc;
+
+   job->vm_pd_addr = amdgpu_gmc_pd_addr(adev->gart.bo);
+   job->vm_needs_flush = true;
+   job->ibs->ptr[job->ibs->length_dw++] = ring->funcs->nop;
+   amdgpu_ring_pad_ib(ring, >ibs[0]);
+   r = amdgpu_job_submit(job, >mman.entity,
+ AMDGPU_FENCE_OWNER_UNDEFINED, );
+   if (r)
+   goto error_submit;
+
+   mutex_unlock(>mman.gtt_window_lock);
+
+   dma_fence_wait(fence, false);
+   dma_fence_put(fence);
+
+   return;
+
+error_submit:
+   amdgpu_job_free(job);
+
+error_alloc:
mutex_unlock(>mman.gtt_window_lock);
+   DRM_ERROR("Error flushing GPU TLB using the SDMA (%d)!\n", r);
return;
 }
 
-- 
2.35.3



[PATCH] drm/amdgpu/gfx11: fix mes mqd settings

2022-05-11 Thread Alex Deucher
From: Jack Xiao 

Use the correct Memory Queue Descriptor (MQD)
structure for GC 11.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 5d4d54cabf70..fcf51947bb18 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -29,7 +29,7 @@
 #include "gc/gc_11_0_0_offset.h"
 #include "gc/gc_11_0_0_sh_mask.h"
 #include "gc/gc_11_0_0_default.h"
-#include "v10_structs.h"
+#include "v11_structs.h"
 #include "mes_v11_api_def.h"
 
 MODULE_FIRMWARE("amdgpu/gc_11_0_0_mes.bin");
@@ -637,7 +637,7 @@ static int mes_v11_0_allocate_eop_buf(struct amdgpu_device 
*adev,
 
 static int mes_v11_0_mqd_init(struct amdgpu_ring *ring)
 {
-   struct v10_compute_mqd *mqd = ring->mqd_ptr;
+   struct v11_compute_mqd *mqd = ring->mqd_ptr;
uint64_t hqd_gpu_addr, wb_gpu_addr, eop_base_addr;
uint32_t tmp;
 
@@ -724,22 +724,22 @@ static int mes_v11_0_mqd_init(struct amdgpu_ring *ring)
mqd->cp_hqd_vmid = 0;
/* activate the queue */
mqd->cp_hqd_active = 1;
-   mqd->cp_hqd_persistent_state = regCP_HQD_PERSISTENT_STATE_DEFAULT;
+
+   tmp = regCP_HQD_PERSISTENT_STATE_DEFAULT;
+   tmp = REG_SET_FIELD(tmp, CP_HQD_PERSISTENT_STATE,
+   PRELOAD_SIZE, 0x55);
+   mqd->cp_hqd_persistent_state = tmp;
+
mqd->cp_hqd_ib_control = regCP_HQD_IB_CONTROL_DEFAULT;
mqd->cp_hqd_iq_timer = regCP_HQD_IQ_TIMER_DEFAULT;
mqd->cp_hqd_quantum = regCP_HQD_QUANTUM_DEFAULT;
 
-   tmp = regCP_HQD_GFX_CONTROL_DEFAULT;
-   tmp = REG_SET_FIELD(tmp, CP_HQD_GFX_CONTROL, DB_UPDATED_MSG_EN, 1);
-   /* offset: 184 - this is used for CP_HQD_GFX_CONTROL */
-   mqd->cp_hqd_suspend_cntl_stack_offset = tmp;
-
return 0;
 }
 
 static void mes_v11_0_queue_init_register(struct amdgpu_ring *ring)
 {
-   struct v10_compute_mqd *mqd = ring->mqd_ptr;
+   struct v11_compute_mqd *mqd = ring->mqd_ptr;
struct amdgpu_device *adev = ring->adev;
uint32_t data = 0;
 
@@ -910,7 +910,7 @@ static int mes_v11_0_kiq_ring_init(struct amdgpu_device 
*adev)
 static int mes_v11_0_mqd_sw_init(struct amdgpu_device *adev,
 enum admgpu_mes_pipe pipe)
 {
-   int r, mqd_size = sizeof(struct v10_compute_mqd);
+   int r, mqd_size = sizeof(struct v11_compute_mqd);
struct amdgpu_ring *ring;
 
if (pipe == AMDGPU_MES_KIQ_PIPE)
-- 
2.35.3



[PATCH] drm/amdgpu/gfx11: fix me field handling in map_queue packet

2022-05-11 Thread Alex Deucher
From: Jack Xiao 

Select the correct microengine (me) when using the
map_queue packet.  There are different me's for GFX,
compute, and scheduling.

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 7614f38ff381..8a1bec70c719 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -145,16 +145,19 @@ static void gfx11_kiq_map_queues(struct amdgpu_ring 
*kiq_ring,
 {
uint64_t mqd_addr = amdgpu_bo_gpu_offset(ring->mqd_obj);
uint64_t wptr_addr = ring->wptr_gpu_addr;
-   uint32_t eng_sel = 0;
+   uint32_t me = 0, eng_sel = 0;
 
switch (ring->funcs->type) {
case AMDGPU_RING_TYPE_COMPUTE:
+   me = 1;
eng_sel = 0;
break;
case AMDGPU_RING_TYPE_GFX:
+   me = 0;
eng_sel = 4;
break;
case AMDGPU_RING_TYPE_MES:
+   me = 2;
eng_sel = 5;
break;
default:
@@ -168,7 +171,7 @@ static void gfx11_kiq_map_queues(struct amdgpu_ring 
*kiq_ring,
  PACKET3_MAP_QUEUES_VMID(0) | /* VMID */
  PACKET3_MAP_QUEUES_QUEUE(ring->queue) |
  PACKET3_MAP_QUEUES_PIPE(ring->pipe) |
- PACKET3_MAP_QUEUES_ME((ring->me == 1 ? 0 : 1)) |
+ PACKET3_MAP_QUEUES_ME((me)) |
  PACKET3_MAP_QUEUES_QUEUE_TYPE(0) | /*queue_type: 
normal compute queue */
  PACKET3_MAP_QUEUES_ALLOC_FORMAT(0) | /* alloc format: 
all_on_one_pipe */
  PACKET3_MAP_QUEUES_ENGINE_SEL(eng_sel) |
-- 
2.35.3



Re: [PATCH v1 04/15] mm: add device coherent checker to remove migration pte

2022-05-11 Thread Alistair Popple


"Sierra Guiza, Alejandro (Alex)"  writes:

> @apop...@nvidia.com Could you please check this patch? It's somehow related 
> to migrate_device_page() for long term device coherent pages.
>
> Regards,
> Alex Sierra
>> -Original Message-
>> From: amd-gfx  On Behalf Of Alex
>> Sierra
>> Sent: Thursday, May 5, 2022 4:34 PM
>> To: j...@nvidia.com
>> Cc: rcampb...@nvidia.com; wi...@infradead.org; da...@redhat.com;
>> Kuehling, Felix ; apop...@nvidia.com; amd-
>> g...@lists.freedesktop.org; linux-...@vger.kernel.org; linux...@kvack.org;
>> jgli...@redhat.com; dri-de...@lists.freedesktop.org; akpm@linux-
>> foundation.org; linux-e...@vger.kernel.org; h...@lst.de
>> Subject: [PATCH v1 04/15] mm: add device coherent checker to remove
>> migration pte
>>
>> During remove_migration_pte(), entries for device coherent type pages that
>> were not created through special migration ptes, ignore _PAGE_RW flag. This
>> path can be found at migrate_device_page(), where valid vma is not
>> required. In this case, migrate_vma_collect_pmd() is not called and special
>> migration ptes are not set.

It's true that we don't call migrate_vma_collect_pmd() for
migrate_device_page(), but this doesn't imply migration entries are not
created. We still call migrate_vma_unmap() which calls try_to_migrate()
to install migration entries.

When we have a vma migrate_vma_collect_pmd() is a fast path for the
common case a page is only mapped once. So migrate_vma_collect_pmd()
should fairly closely match try_to_migrate_one(). I did experiment
locally with removing the fast path to simplify the code, but it does
provide a meaningful performance improvement so I abandoned it.

I think you're running into the problem addressed by
https://lkml.kernel.org/r/20211018045247.3128058-1-apop...@nvidia.com
but for DEVICE_COHERENT pages.

Based on that I think the approach below is wrong. You should update
try_to_migrate_one() to deal with DEVICE_COHERENT pages. It would make
sense to do that as part of patch 1 in this series.

The problem is that try_to_migrate_one() assumes folio_is_zone_device()
implies it is a DEVICE_PRIVATE page due to the check in
try_to_migrate().

>> Signed-off-by: Alex Sierra 
>> ---
>>  mm/migrate.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c index
>> 6c31ee1e1c9b..e18ddee56f37 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -206,7 +206,8 @@ static bool remove_migration_pte(struct folio *folio,
>>   * Recheck VMA as permissions can change since migration
>> started
>>   */
>>  entry = pte_to_swp_entry(*pvmw.pte);
>> -if (is_writable_migration_entry(entry))
>> +if (is_writable_migration_entry(entry) ||
>> +is_device_coherent_page(pfn_to_page(pvmw.pfn)))
>>  pte = maybe_mkwrite(pte, vma);
>>  else if (pte_swp_uffd_wp(*pvmw.pte))
>>  pte = pte_mkuffd_wp(pte);
>> --
>> 2.32.0


Re: [PATCH 3/3] drm/amdgpu: bump minor version number

2022-05-11 Thread Marek Olšák
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16466

Marek

On Fri, May 6, 2022 at 9:35 AM Alex Deucher  wrote:

> On Fri, May 6, 2022 at 7:23 AM Christian König
>  wrote:
> >
> > Increase the minor version number to indicate that the new flags are
> > avaiable.
>
> typo: available.  Other than that the series is:
> Reviewed-by: Alex Deucher 
> Once we get the Mesa patches.
>
> Alex
>
>
> >
> > Signed-off-by: Christian König 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 16871baee784..3dbf406b4194 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -99,10 +99,11 @@
> >   * - 3.43.0 - Add device hot plug/unplug support
> >   * - 3.44.0 - DCN3 supports DCC independent block settings: !64B &&
> 128B, 64B && 128B
> >   * - 3.45.0 - Add context ioctl stable pstate interface
> > - * * 3.46.0 - To enable hot plug amdgpu tests in libdrm
> > + * - 3.46.0 - To enable hot plug amdgpu tests in libdrm
> > + * * 3.47.0 - Add AMDGPU_GEM_CREATE_DISCARDABLE and AMDGPU_VM_NOALLOC
> flags
> >   */
> >  #define KMS_DRIVER_MAJOR   3
> > -#define KMS_DRIVER_MINOR   46
> > +#define KMS_DRIVER_MINOR   47
> >  #define KMS_DRIVER_PATCHLEVEL  0
> >
> >  int amdgpu_vram_limit;
> > --
> > 2.25.1
> >
>


Re: [PATCH] drm/amdkfd: allocate MMIO/DOORBELL BOs with AMDGPU_GEM_CREATE_PREEMPTIBLE

2022-05-11 Thread Felix Kuehling

On 2022-05-11 05:44, Lang Yu wrote:

MMIO/DOORBELL BOs' backing resources(bus address resources that are
used to talk to the GPU) are not managed by GTT manager, but they
are counted by GTT manager. That makes no sense.

With AMDGPU_GEM_CREATE_PREEMPTIBLE flag, such BOs will be managed by
PREEMPT manager(for preemptible contexts, e.g., KFD). Then they won't
be evicted and don't need to be pinned as well.

But we still leave these BOs pinned to indicate that the underlying
resource never moves.

Signed-off-by: Lang Yu 


Reviewed-by: Felix Kuehling 



---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 32 +--
  1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 8b14c55a0793..fada3b149361 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1518,26 +1518,26 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
} else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_flags = 0;
-   } else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+   } else {
domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
alloc_flags = AMDGPU_GEM_CREATE_PREEMPTIBLE;
-   if (!offset || !*offset)
-   return -EINVAL;
-   user_addr = untagged_addr(*offset);
-   } else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
-   KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
-   domain = AMDGPU_GEM_DOMAIN_GTT;
-   alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
-   bo_type = ttm_bo_type_sg;
-   alloc_flags = 0;
-   if (size > UINT_MAX)
+
+   if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+   if (!offset || !*offset)
+   return -EINVAL;
+   user_addr = untagged_addr(*offset);
+   } else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+   KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
+   bo_type = ttm_bo_type_sg;
+   if (size > UINT_MAX)
+   return -EINVAL;
+   sg = create_doorbell_sg(*offset, size);
+   if (!sg)
+   return -ENOMEM;
+   } else {
return -EINVAL;
-   sg = create_doorbell_sg(*offset, size);
-   if (!sg)
-   return -ENOMEM;
-   } else {
-   return -EINVAL;
+   }
}
  
  	*mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);


Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE

2022-05-11 Thread Marek Olšák
3rd question: Is it worth using this on APUs?

Thanks,
Marek

On Wed, May 11, 2022 at 5:58 PM Marek Olšák  wrote:

> Will the kernel keep all discardable buffers in VRAM if VRAM is not
> overcommitted by discardable buffers, or will other buffers also affect the
> placement of discardable buffers?
>
> Do evictions deallocate the buffer, or do they keep an allocation in GTT
> and only the copy is skipped?
>
> Thanks,
> Marek
>
> On Wed, May 11, 2022 at 3:08 AM Marek Olšák  wrote:
>
>> OK that sounds good.
>>
>> Marek
>>
>> On Wed, May 11, 2022 at 2:04 AM Christian König <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>>> Hi Marek,
>>>
>>> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>>
>>> A better flag name would be:
>>> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>>
>>>
>>> A bit long for my taste and I think the best placement is just a side
>>> effect.
>>>
>>>
>>> Marek
>>>
>>> On Tue, May 10, 2022 at 4:13 PM Marek Olšák  wrote:
>>>
 Does this really guarantee VRAM placement? The code doesn't say
 anything about that.

>>>
>>> Yes, see the code here:
>>>
>>>
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 8b7ee1142d9a..1944ef37a61e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
> bp->domain;
> bo->allowed_domains = bo->preferred_domains;
> if (bp->type != ttm_bo_type_kernel &&
> +   !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
> bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
> bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>

>>> The only case where this could be circumvented is when you try to
>>> allocate more than physically available on an APU.
>>>
>>> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then
>>> the GEM code will catch the error and fallback to GTT (IIRC).
>>>
>>> Regards,
>>> Christian.
>>>
>>


Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE

2022-05-11 Thread Marek Olšák
Will the kernel keep all discardable buffers in VRAM if VRAM is not
overcommitted by discardable buffers, or will other buffers also affect the
placement of discardable buffers?

Do evictions deallocate the buffer, or do they keep an allocation in GTT
and only the copy is skipped?

Thanks,
Marek

On Wed, May 11, 2022 at 3:08 AM Marek Olšák  wrote:

> OK that sounds good.
>
> Marek
>
> On Wed, May 11, 2022 at 2:04 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Hi Marek,
>>
>> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>
>> A better flag name would be:
>> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>
>>
>> A bit long for my taste and I think the best placement is just a side
>> effect.
>>
>>
>> Marek
>>
>> On Tue, May 10, 2022 at 4:13 PM Marek Olšák  wrote:
>>
>>> Does this really guarantee VRAM placement? The code doesn't say anything
>>> about that.
>>>
>>
>> Yes, see the code here:
>>
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 index 8b7ee1142d9a..1944ef37a61e 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 bp->domain;
 bo->allowed_domains = bo->preferred_domains;
 if (bp->type != ttm_bo_type_kernel &&
 +   !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
 bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;

>>>
>> The only case where this could be circumvented is when you try to
>> allocate more than physically available on an APU.
>>
>> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then
>> the GEM code will catch the error and fallback to GTT (IIRC).
>>
>> Regards,
>> Christian.
>>
>


Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 11:39, Christian König wrote:

Am 11.05.22 um 17:35 schrieb Andrey Grodzovsky:

On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, 
that's debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a 
fix to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be 
aware who exactly schedules reset work and explicitly cancel 
them, this also means that for any new source added in the 
future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if 
you share 'reset source' (meaning a delayed work) with another 
client (i.e. RAS and KFD) it means you make assumption that the 
other client always proceeds with the
reset exactly the same way as you expect. So today we have this 
only in scheduler vs non scheduler reset happening - non scheduler 
reset clients assume the reset is always fully executed in HW 
while scheduler based reset makes shortcuts and not always does HW 
reset hence they cannot share 'reset source' (delayed work). Yes, 
we can always add this in the future if and when such problem will 
arise but no one will remember this then and a new bug will be 
introduced and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.
But still one caveat, look at amdgpu_recover_work_struct and it's 
usage in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset 
and so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in 
amdgpu_device_gpu_recover.


You should try to avoid that as well.

See when the debugfs reset is canceled because of a scheduler reset 
you won't get a useful return code either.


What we should do instead is to cache the status of the last reset in 
the reset domain.


Regards,
Christian.



Another problem with this approach -  to execute  the actaul GPU reset I 
need accesses  to concrete amdgpu_device pointer from work struct (see 
xgpu_ai_mailbox_flr_work) as example. If i store all work items in
array in amdgpu_reset_domain the most i can only retrieve is the 
reset_domain struct itself which won't help since it's dynamically 
allocated, not embedded in hive or adev and can can be one per device or 
per entire hive in case of XGMI and so there is no way for me to reach 
back to amdgpu_device. Back pointer to adev* from amdgpu_reset_domain 
will only work for single device but not for XGMI hive where there are 
multiple devices in a hive.


Andrey






Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array 
or other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In 

Re: [PATCH v1 13/15] mm: handling Non-LRU pages returned by vm_normal_pages

2022-05-11 Thread Jason Gunthorpe
On Thu, May 05, 2022 at 04:34:36PM -0500, Alex Sierra wrote:

> diff --git a/mm/memory.c b/mm/memory.c
> index 76e3af9639d9..892c4cc54dc2 100644
> +++ b/mm/memory.c
> @@ -621,6 +621,13 @@ struct page *vm_normal_page(struct vm_area_struct *vma, 
> unsigned long addr,
>   if (is_zero_pfn(pfn))
>   return NULL;
>   if (pte_devmap(pte))
> +/*
> + * NOTE: Technically this should goto check_pfn label. However, 
> page->_mapcount
> + * is never incremented for device pages that are mmap through DAX mechanism
> + * using pmem driver mounted into ext4 filesystem. When these pages are 
> unmap,
> + * zap_pte_range is called and vm_normal_page return a valid page with
> + * page_mapcount() = 0, before page_remove_rmap is called.
> + */
>   return NULL;

? Where does this series cause device coherent to be returned?

Wasn't the plan to not set pte_devmap() ?

Jason


Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Marek Olšák
Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> It really *is* a NOALLOC feature. In other words there is no latency
> improvement on reads because the cache is always checked, even with the
> noalloc flag set.
>
> The only thing it affects is that misses not enter the cache and so don't
> cause any additional pressure on evicting cache lines.
>
> You might want to double check with the hardware guys, but I'm something
> like 95% sure that it works this way.
>
> Christian.
>
> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>
> Bypass means that the contents of the cache are ignored, which decreases
> latency at the cost of no coherency between bypassed and normal memory
> requests. NOA (noalloc) means that the cache is checked and can give you
> cache hits, but misses are not cached and the overall latency is higher. I
> don't know what the hw does, but I hope it was misnamed and it really means
> bypass because there is no point in doing cache lookups on every memory
> request if the driver wants to disable caching to *decrease* latency in the
> situations when the cache isn't helping.
>
> Marek
>
> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:
>
>>
>>
>> On 5/11/2022 11:36 AM, Christian König wrote:
>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
>> > entries on write.
>> >
>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>
>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
>> of attribute which decides LLC behaviour]
>>
>> Thanks,
>> Lijo
>>
>> >
>> > Christian.
>> >
>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> >> A better name would be:
>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>> >>
>> >> Marek
>> >>
>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>> >>  wrote:
>> >>
>> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>> >> allocation.
>> >>
>> >> Only compile tested!
>> >>
>> >> Signed-off-by: Christian König 
>> >> ---
>> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
>> >>  4 files changed, 10 insertions(+)
>> >>
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> index bf97d8f07f57..d8129626581f 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>> >> amdgpu_device *adev, uint32_t flags)
>> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
>> >> if (flags & AMDGPU_VM_PAGE_PRT)
>> >> pte_flag |= AMDGPU_PTE_PRT;
>> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
>> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
>> >>
>> >> if (adev->gmc.gmc_funcs->map_mtype)
>> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> index b8c79789e1e4..9077dfccaf3c 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>> >> amdgpu_device *adev,
>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>> >>
>> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
>> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>> >> +
>> >> if (mapping->flags & AMDGPU_PTE_PRT) {
>> >> *flags |= AMDGPU_PTE_PRT;
>> >> *flags |= AMDGPU_PTE_SNOOPED;
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> index 8d733eeac556..32ee56adb602 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>> >> amdgpu_device *adev,
>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>> >>
>> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
>> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>> >> +
>> >> if (mapping->flags & AMDGPU_PTE_PRT) {
>> >> *flags |= AMDGPU_PTE_PRT;
>> >> *flags |= AMDGPU_PTE_SNOOPED;
>> >> diff --git a/include/uapi/drm/amdgpu_drm.h
>> >> b/include/uapi/drm/amdgpu_drm.h
>> >> index 57b9d8f0133a..9d71d6330687 100644
>> >> --- a/include/uapi/drm/amdgpu_drm.h
>> >> +++ b/include/uapi/drm/amdgpu_drm.h
>> >> @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>> >>

Re: [PATCH] drm/amdgpu: Add 'modeset' module parameter

2022-05-11 Thread Alex Deucher
On Wed, May 11, 2022 at 2:20 PM Lyude Paul  wrote:
>
> Many DRM drivers feature a 'modeset' argument, which can be used to
> enable/disable the entire driver (as opposed to passing nomodeset to the
> kernel, which would disable modesetting globally and make it difficult to
> load amdgpu afterwards). Apparently amdgpu is actually missing this
> however, so let's add it!

You can already do that by passing modprobe.blacklist=amdgpu on the
kernel command line.  I don't think we need another option to do that.


>
> Keep in mind that this currently just lets one enable or disable amdgpu, I
> haven't bothered adding a headless mode like nouveau has - however I'm sure
> someone else can add this if needed.

You can do this as well by passing an IP block mask which disables the
display IP, or media IP, etc.

Alex

>
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index ebd37fb19cdb..24e6fb4517cc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -872,6 +872,15 @@ MODULE_PARM_DESC(smu_pptable_id,
> "specify pptable id to be used (-1 = auto(default) value, 0 = use 
> pptable from vbios, > 0 = soft pptable id)");
>  module_param_named(smu_pptable_id, amdgpu_smu_pptable_id, int, 0444);
>
> +/**
> + * DOC: modeset (int)
> + * Used to enable/disable modesetting for amdgpu exclusively.
> + */
> +bool amdgpu_enable_modeset = true;
> +MODULE_PARM_DESC(modeset,
> +"Enable or disable display driver (1 = on (default), 0 = 
> off");
> +module_param_named(modeset, amdgpu_enable_modeset, bool, 0444);
> +
>  /* These devices are not supported by amdgpu.
>   * They are supported by the mach64, r128, radeon drivers
>   */
> @@ -2003,6 +2012,11 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> bool is_fw_fb;
> resource_size_t base, size;
>
> +   if (!amdgpu_enable_modeset) {
> +   DRM_INFO("modeset=0 passed to amdgpu, driver will not be 
> enabled\n");
> +   return -ENODEV;
> +   }
> +
> /* skip devices which are owned by radeon */
> for (i = 0; i < ARRAY_SIZE(amdgpu_unsupported_pciidlist); i++) {
> if (amdgpu_unsupported_pciidlist[i] == pdev->device)
> --
> 2.35.1
>


[PATCH] drm/amdgpu: Add 'modeset' module parameter

2022-05-11 Thread Lyude Paul
Many DRM drivers feature a 'modeset' argument, which can be used to
enable/disable the entire driver (as opposed to passing nomodeset to the
kernel, which would disable modesetting globally and make it difficult to
load amdgpu afterwards). Apparently amdgpu is actually missing this
however, so let's add it!

Keep in mind that this currently just lets one enable or disable amdgpu, I
haven't bothered adding a headless mode like nouveau has - however I'm sure
someone else can add this if needed.

Signed-off-by: Lyude Paul 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ebd37fb19cdb..24e6fb4517cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -872,6 +872,15 @@ MODULE_PARM_DESC(smu_pptable_id,
"specify pptable id to be used (-1 = auto(default) value, 0 = use 
pptable from vbios, > 0 = soft pptable id)");
 module_param_named(smu_pptable_id, amdgpu_smu_pptable_id, int, 0444);
 
+/**
+ * DOC: modeset (int)
+ * Used to enable/disable modesetting for amdgpu exclusively.
+ */
+bool amdgpu_enable_modeset = true;
+MODULE_PARM_DESC(modeset,
+"Enable or disable display driver (1 = on (default), 0 = off");
+module_param_named(modeset, amdgpu_enable_modeset, bool, 0444);
+
 /* These devices are not supported by amdgpu.
  * They are supported by the mach64, r128, radeon drivers
  */
@@ -2003,6 +2012,11 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
bool is_fw_fb;
resource_size_t base, size;
 
+   if (!amdgpu_enable_modeset) {
+   DRM_INFO("modeset=0 passed to amdgpu, driver will not be 
enabled\n");
+   return -ENODEV;
+   }
+
/* skip devices which are owned by radeon */
for (i = 0; i < ARRAY_SIZE(amdgpu_unsupported_pciidlist); i++) {
if (amdgpu_unsupported_pciidlist[i] == pdev->device)
-- 
2.35.1



[pull] amdgpu drm-fixes-5.18

2022-05-11 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.18.

The following changes since commit 5727375215b0915f28806c337a7ba9835efd340b:

  Merge tag 'drm-msm-fixes-2022-04-30' of 
https://gitlab.freedesktop.org/drm/msm into drm-fixes (2022-05-06 11:22:03 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-5.18-2022-05-11

for you to fetch changes up to c65b364c52ba352177dde6944f5efaa29bd40b52:

  drm/amdgpu/ctx: only reset stable pstate if the user changed it (v2) 
(2022-05-11 11:50:43 -0400)


amd-drm-fixes-5.18-2022-05-11:

amdgpu:
- Disable ASPM for VI boards on ADL platforms
- S0ix DCN3.1 display fix
- Resume regression fix
- Stable pstate fix


Alex Deucher (2):
  Revert "drm/amd/pm: keep the BACO feature enabled for suspend"
  drm/amdgpu/ctx: only reset stable pstate if the user changed it (v2)

Eric Yang (1):
  drm/amd/display: undo clearing of z10 related function pointers

Richard Gong (1):
  drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems

 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  5 +
 drivers/gpu/drm/amd/amdgpu/vi.c   | 17 -
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c |  5 -
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  8 +---
 4 files changed, 22 insertions(+), 13 deletions(-)


Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 12:49, Felix Kuehling wrote:

Am 2022-05-11 um 09:49 schrieb Andrey Grodzovsky:





[snip]

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f1a225a20719..4b789bec9670 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -714,16 +714,37 @@ bool kfd_is_locked(void)

void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
{
+   struct kfd_process *p;
+   struct amdkfd_process_info *p_info;
+   unsigned int temp;
+
   if (!kfd->init_complete)
   return;

   /* for runtime suspend, skip locking kfd */
-   if (!run_pm) {
+   if (!run_pm && !drm_dev_is_unplugged(kfd->ddev)) {
   /* For first KFD device suspend all the KFD 
processes */

   if (atomic_inc_return(_locked) == 1)
   kfd_suspend_all_processes();
   }

+   if (drm_dev_is_unplugged(kfd->ddev)){
+   int idx = srcu_read_lock(_processes_srcu);
+   pr_debug("cancel restore_userptr_work\n");
+   hash_for_each_rcu(kfd_processes_table, temp, p,
kfd_processes) {
+   if (kfd_process_gpuidx_from_gpuid(p, kfd->id)
>= 0) {
+   p_info = p->kgd_process_info;
+   pr_debug("cancel processes, pid = %d
for gpu_id = %d", pid_nr(p_info->pid), kfd->id);
+ cancel_delayed_work_sync(_info->restore_userptr_work);


Is this really necessary? If it is, there are probably other workers,
e.g. related to our SVM code, that would need to be canceled as well.



I delete this and it seems to be OK. It was previously added to 
suppress restore_useptr_work which keeps updating PTE.

Now this is gone by Fix 3. Please let us know if it is OK:) @Felix


Sounds good to me.







+
+ /* send exception signals to the kfd
events waiting in user space */
+ kfd_signal_hw_exception_event(p->pasid);


This makes sense. It basically tells user mode that the application's
GPU state is lost due to a RAS error or a GPU reset, or now a GPU
hot-unplug.


The problem is that it cannot find an event with a type that matches 
HW_EXCEPTION_TYPE so it does **nothing** from the driver with the 
default parameter value of send_sigterm = false;
After all, if a “zombie” process (zombie in the sense it does not 
have a GPU dev) does not exit, kfd resources seems not been released 
properly and new kfd process cannot run after plug back.
(I still need to look hard into rocr/hsakmt/kfd driver code to 
understand the reason. At least I am seeing that the kfd topology 
won’t be cleaned up without process exiting, so that there would be 
a “zombie" kfd node in the topology, which may or may not cause 
issues in hsakmt).
@Felix Do you have suggestion/insight on this “zombie" process 
issue? @Andrey suggests it should be OK to have a “zombie” kfd 
process and a “zombie” kfd dev, and the new kfd process should be ok 
to run on the new kfd dev after plugback.



My experience with the graphic stack at least showed that. At least 
in a setup with 2 GPUs, if i remove a secondary GPU which had a 
rendering process on it, I could plug back the secondary GPU and 
start a new rendering process while the old zombie process was still 
present. It could be that in KFD case there are some obstacles to 
this that need to be resolved.


I think this may be related to how KFD is tracking GPU resources. Do 
we actually destroy the KFD device structure when the GPU is unplugged?



No, all the device hierarchy (drm_device, amdgpu_device and hence I 
assume kfd_device) is kept around until the last drm_put drops the 
refcount to 0 - which happens when the process dies and drops it's drm 
file descriptor.



If not, it's still tracking process resource usage of the hanging 
process. This may be a bigger issue here and the solution is probably 
quite involved because of how all the process and device structures 
are related to each other.


Normally the KFD process cleanup is triggered by an MMU notifier when 
the process address space is destroyed. 



Note that the only thing we do is to invalidate all MMIO mappings within 
all the processes that have the GPU mapped into their address space 
(amdgpu_pci_remove->...->amdgpu_device_unmap_mmio) - this will prevent 
the zombie
process from subsequently writing into physical addresses that are not 
assigned to the removed GPU anymore.


Andrey


The kfd_process structure is also reference counted. I'll need to 
check if there is a way to force-delete the KFD process structure when 
a GPU is unplugged. That's going to be tricky, because of how the KFD 
process struct ties together several GPUs.


Regards,
  Felix



Andrey




May 11 09:52:07 NETSYS26 kernel: [52604.845400] amdgpu: cancel 
restore_userptr_work
May 11 09:52:07 NETSYS26 kernel: [52604.845405] amdgpu: sending hw 
exception to pasid = 0x800
May 11 09:52:07 NETSYS26 kernel: [52604.845414] kfd kfd: amdgpu: 

Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-05-11 Thread Felix Kuehling

Am 2022-05-11 um 09:49 schrieb Andrey Grodzovsky:





[snip]

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f1a225a20719..4b789bec9670 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -714,16 +714,37 @@ bool kfd_is_locked(void)

void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
{
+   struct kfd_process *p;
+   struct amdkfd_process_info *p_info;
+   unsigned int temp;
+
   if (!kfd->init_complete)
   return;

   /* for runtime suspend, skip locking kfd */
-   if (!run_pm) {
+   if (!run_pm && !drm_dev_is_unplugged(kfd->ddev)) {
   /* For first KFD device suspend all the KFD processes */
   if (atomic_inc_return(_locked) == 1)
   kfd_suspend_all_processes();
   }

+   if (drm_dev_is_unplugged(kfd->ddev)){
+   int idx = srcu_read_lock(_processes_srcu);
+   pr_debug("cancel restore_userptr_work\n");
+   hash_for_each_rcu(kfd_processes_table, temp, p,
kfd_processes) {
+   if (kfd_process_gpuidx_from_gpuid(p, kfd->id)
>= 0) {
+   p_info = p->kgd_process_info;
+   pr_debug("cancel processes, pid = %d
for gpu_id = %d", pid_nr(p_info->pid), kfd->id);
+ cancel_delayed_work_sync(_info->restore_userptr_work);


Is this really necessary? If it is, there are probably other workers,
e.g. related to our SVM code, that would need to be canceled as well.



I delete this and it seems to be OK. It was previously added to 
suppress restore_useptr_work which keeps updating PTE.

Now this is gone by Fix 3. Please let us know if it is OK:) @Felix


Sounds good to me.







+
+ /* send exception signals to the kfd
events waiting in user space */
+ kfd_signal_hw_exception_event(p->pasid);


This makes sense. It basically tells user mode that the application's
GPU state is lost due to a RAS error or a GPU reset, or now a GPU
hot-unplug.


The problem is that it cannot find an event with a type that matches 
HW_EXCEPTION_TYPE so it does **nothing** from the driver with the 
default parameter value of send_sigterm = false;
After all, if a “zombie” process (zombie in the sense it does not 
have a GPU dev) does not exit, kfd resources seems not been released 
properly and new kfd process cannot run after plug back.
(I still need to look hard into rocr/hsakmt/kfd driver code to 
understand the reason. At least I am seeing that the kfd topology 
won’t be cleaned up without process exiting, so that there would be a 
“zombie" kfd node in the topology, which may or may not cause issues 
in hsakmt).
@Felix Do you have suggestion/insight on this “zombie" process issue? 
@Andrey suggests it should be OK to have a “zombie” kfd process and a 
“zombie” kfd dev, and the new kfd process should be ok to run on the 
new kfd dev after plugback.



My experience with the graphic stack at least showed that. At least in 
a setup with 2 GPUs, if i remove a secondary GPU which had a rendering 
process on it, I could plug back the secondary GPU and start a new 
rendering process while the old zombie process was still present. It 
could be that in KFD case there are some obstacles to this that need 
to be resolved.


I think this may be related to how KFD is tracking GPU resources. Do we 
actually destroy the KFD device structure when the GPU is unplugged? If 
not, it's still tracking process resource usage of the hanging process. 
This may be a bigger issue here and the solution is probably quite 
involved because of how all the process and device structures are 
related to each other.


Normally the KFD process cleanup is triggered by an MMU notifier when 
the process address space is destroyed. The kfd_process structure is 
also reference counted. I'll need to check if there is a way to 
force-delete the KFD process structure when a GPU is unplugged. That's 
going to be tricky, because of how the KFD process struct ties together 
several GPUs.


Regards,
  Felix



Andrey




May 11 09:52:07 NETSYS26 kernel: [52604.845400] amdgpu: cancel 
restore_userptr_work
May 11 09:52:07 NETSYS26 kernel: [52604.845405] amdgpu: sending hw 
exception to pasid = 0x800
May 11 09:52:07 NETSYS26 kernel: [52604.845414] kfd kfd: amdgpu: 
Process 25894 (pasid 0x8001) got unhandled exception






+ kfd_signal_vm_fault_event(kfd, p->pasid, NULL);


This does not make sense. A VM fault indicates an access to a bad
virtual address by the GPU. If a debugger is attached to the process, it
notifies the debugger to investigate what went wrong. If the GPU is
gone, that doesn't make any sense. There is no GPU that could have
issued a bad memory request. And the debugger won't be happy either to
find a VM fault from a GPU that doesn't exist any more.


OK understood.



If the HW-exception event doesn't terminate your process, we may need to
look into how ROCr handles 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 11:39, Christian König wrote:

Am 11.05.22 um 17:35 schrieb Andrey Grodzovsky:

On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, 
that's debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a 
fix to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be 
aware who exactly schedules reset work and explicitly cancel 
them, this also means that for any new source added in the 
future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if 
you share 'reset source' (meaning a delayed work) with another 
client (i.e. RAS and KFD) it means you make assumption that the 
other client always proceeds with the
reset exactly the same way as you expect. So today we have this 
only in scheduler vs non scheduler reset happening - non scheduler 
reset clients assume the reset is always fully executed in HW 
while scheduler based reset makes shortcuts and not always does HW 
reset hence they cannot share 'reset source' (delayed work). Yes, 
we can always add this in the future if and when such problem will 
arise but no one will remember this then and a new bug will be 
introduced and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.
But still one caveat, look at amdgpu_recover_work_struct and it's 
usage in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset 
and so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in 
amdgpu_device_gpu_recover.


You should try to avoid that as well.



Why ?




See when the debugfs reset is canceled because of a scheduler reset 
you won't get a useful return code either.



That true.



What we should do instead is to cache the status of the last reset in 
the reset domain.



Probably an atomic ret in reset_domain then.

Another 2 points i forgot to ask -

1) What race condition you had in mind about insertion and extraction 
from the list if all is done under lock ?


2) This I asked already - why you opposed so much to allocation on the 
stack ? I understand the problem with dynamic memory allocations but why 
stack ? We do multiple allocations on the stack for any function
we call during GPU reset, what so special in this case where we allocate 
work struct on the stack? It's not like the work struct is especially 
big compared to other stuff we allocate on the stack.


Andrey




Regards,
Christian.



Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array 
or other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way 
all this done 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 11:46, Lazar, Lijo wrote:



On 5/11/2022 9:13 PM, Andrey Grodzovsky wrote:


On 2022-05-11 11:37, Lazar, Lijo wrote:



On 5/11/2022 9:05 PM, Andrey Grodzovsky wrote:


On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, 
that's debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired 
and which hasn't, because that would be racy again. Instead 
just bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a 
fix to only insert reset work to pending reset list if it's 
not already there), the point of using list (or array) to 
which you add and from which you remove is that the logic of 
this is encapsulated within reset domain. In your way we need 
to be aware who exactly schedules reset work and explicitly 
cancel them, this also means that for any new source added in 
the future you will need to remember to add him


I don't think that this is a valid argument. Additionally to 
the schedulers we probably just need less than a handful of 
reset sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if 
you share 'reset source' (meaning a delayed work) with another 
client (i.e. RAS and KFD) it means you make assumption that the 
other client always proceeds with the
reset exactly the same way as you expect. So today we have this 
only in scheduler vs non scheduler reset happening - non 
scheduler reset clients assume the reset is always fully 
executed in HW while scheduler based reset makes shortcuts and 
not always does HW reset hence they cannot share 'reset source' 
(delayed work). Yes, we can always add this in the future if and 
when such problem will arise but no one will remember this then 
and a new bug will be introduced and will take time to find and 
resolve.


Mhm, so your main concern is that we forget to correctly handle 
the new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index 
for each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain 
later for other drivers.


The current reset domain queue design is not good for a 
hierarchichal reset within amdgpu itself :)


Thanks,
Lijo



Not sure what do you mean ?



It's tied to the TDR queue in scheduler.

Hierarchichal model - start from reset of lowest level nodes and on 
failure try with a higher level reset. This model doesn't suit that.


Thanks,
Lijo



The TDR queue just provides a single threaded context to execute all 
resets. It has no restrictions on what exactly you reset within each 
work item on the queue
so I still don't see a problem. I also don't understand what is lower 
level vs higher level nodes in our case. Is it single node vs hive ?


Andrey





Andrey




But still one caveat, look at amdgpu_recover_work_struct and it's 
usage in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset 
and so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back 
the return value from gpu_recover like I do now in 
amdgpu_device_gpu_recover.


Andrey




Not 100% sure if that works, but something like that should do 
the trick.


My main concern is that I don't want to allocate the work items 
on the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array 
or other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current 
way all this done automatically within reset_domain code and 
it's agnostic to specific driver and it's specific list of 
reset sources. Also in case we would want to 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 9:13 PM, Andrey Grodzovsky wrote:


On 2022-05-11 11:37, Lazar, Lijo wrote:



On 5/11/2022 9:05 PM, Andrey Grodzovsky wrote:


On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, 
that's debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a 
fix to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be 
aware who exactly schedules reset work and explicitly cancel 
them, this also means that for any new source added in the 
future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if 
you share 'reset source' (meaning a delayed work) with another 
client (i.e. RAS and KFD) it means you make assumption that the 
other client always proceeds with the
reset exactly the same way as you expect. So today we have this 
only in scheduler vs non scheduler reset happening - non scheduler 
reset clients assume the reset is always fully executed in HW 
while scheduler based reset makes shortcuts and not always does HW 
reset hence they cannot share 'reset source' (delayed work). Yes, 
we can always add this in the future if and when such problem will 
arise but no one will remember this then and a new bug will be 
introduced and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.


The current reset domain queue design is not good for a hierarchichal 
reset within amdgpu itself :)


Thanks,
Lijo



Not sure what do you mean ?



It's tied to the TDR queue in scheduler.

Hierarchichal model - start from reset of lowest level nodes and on 
failure try with a higher level reset. This model doesn't suit that.


Thanks,
Lijo


Andrey




But still one caveat, look at amdgpu_recover_work_struct and it's 
usage in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset 
and so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in 
amdgpu_device_gpu_recover.


Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array 
or other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way 
all this done automatically within reset_domain code and it's 
agnostic to specific driver and it's specific list of reset 
sources. Also in case we would want to generalize reset_domain 
to other GPU drivers (which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent 
reset source into the reset domain as well. But I'm not sure 
those additional reset sources should be part of 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 11:37, Lazar, Lijo wrote:



On 5/11/2022 9:05 PM, Andrey Grodzovsky wrote:


On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, 
that's debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a 
fix to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be 
aware who exactly schedules reset work and explicitly cancel 
them, this also means that for any new source added in the 
future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if 
you share 'reset source' (meaning a delayed work) with another 
client (i.e. RAS and KFD) it means you make assumption that the 
other client always proceeds with the
reset exactly the same way as you expect. So today we have this 
only in scheduler vs non scheduler reset happening - non scheduler 
reset clients assume the reset is always fully executed in HW 
while scheduler based reset makes shortcuts and not always does HW 
reset hence they cannot share 'reset source' (delayed work). Yes, 
we can always add this in the future if and when such problem will 
arise but no one will remember this then and a new bug will be 
introduced and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.


The current reset domain queue design is not good for a hierarchichal 
reset within amdgpu itself :)


Thanks,
Lijo



Not sure what do you mean ?

Andrey




But still one caveat, look at amdgpu_recover_work_struct and it's 
usage in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset 
and so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in 
amdgpu_device_gpu_recover.


Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array 
or other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way 
all this done automatically within reset_domain code and it's 
agnostic to specific driver and it's specific list of reset 
sources. Also in case we would want to generalize reset_domain 
to other GPU drivers (which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent 
reset source into the reset domain as well. But I'm not sure 
those additional reset sources should be part of any common 
handling, that is largely amdgpu specific.



So it's for sure more then one source for the reasons described 
above, also note that for scheduler we already cancel delayed work 
in drm_sched_stop so calling them again in amdgpu code kind 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Christian König

Am 11.05.22 um 17:35 schrieb Andrey Grodzovsky:

On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a 
fix to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be aware 
who exactly schedules reset work and explicitly cancel them, this 
also means that for any new source added in the future you will 
need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if you 
share 'reset source' (meaning a delayed work) with another client 
(i.e. RAS and KFD) it means you make assumption that the other 
client always proceeds with the
reset exactly the same way as you expect. So today we have this 
only in scheduler vs non scheduler reset happening - non scheduler 
reset clients assume the reset is always fully executed in HW while 
scheduler based reset makes shortcuts and not always does HW reset 
hence they cannot share 'reset source' (delayed work). Yes, we can 
always add this in the future if and when such problem will arise 
but no one will remember this then and a new bug will be introduced 
and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.
But still one caveat, look at amdgpu_recover_work_struct and it's 
usage in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset and 
so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in amdgpu_device_gpu_recover.


You should try to avoid that as well.

See when the debugfs reset is canceled because of a scheduler reset you 
won't get a useful return code either.


What we should do instead is to cache the status of the last reset in 
the reset domain.


Regards,
Christian.



Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array or 
other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way 
all this done automatically within reset_domain code and it's 
agnostic to specific driver and it's specific list of reset 
sources. Also in case we would want to generalize reset_domain to 
other GPU drivers (which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent 
reset source into the reset domain as well. But I'm not sure those 
additional reset sources should be part of any common handling, 
that is largely amdgpu specific.



So it's for sure more then one source for the reasons described 
above, also note that for scheduler we already cancel delayed work 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 9:05 PM, Andrey Grodzovsky wrote:


On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a fix 
to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be aware 
who exactly schedules reset work and explicitly cancel them, this 
also means that for any new source added in the future you will 
need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if you 
share 'reset source' (meaning a delayed work) with another client 
(i.e. RAS and KFD) it means you make assumption that the other 
client always proceeds with the
reset exactly the same way as you expect. So today we have this only 
in scheduler vs non scheduler reset happening - non scheduler reset 
clients assume the reset is always fully executed in HW while 
scheduler based reset makes shortcuts and not always does HW reset 
hence they cannot share 'reset source' (delayed work). Yes, we can 
always add this in the future if and when such problem will arise 
but no one will remember this then and a new bug will be introduced 
and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.


The current reset domain queue design is not good for a hierarchichal 
reset within amdgpu itself :)


Thanks,
Lijo

But still one caveat, look at amdgpu_recover_work_struct and it's usage 
in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset and 
so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in amdgpu_device_gpu_recover.


Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array or 
other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way 
all this done automatically within reset_domain code and it's 
agnostic to specific driver and it's specific list of reset 
sources. Also in case we would want to generalize reset_domain to 
other GPU drivers (which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent reset 
source into the reset domain as well. But I'm not sure those 
additional reset sources should be part of any common handling, 
that is largely amdgpu specific.



So it's for sure more then one source for the reasons described 
above, also note that for scheduler we already cancel delayed work 
in drm_sched_stop so calling them again in amdgpu code kind of 
superfluous.


Andrey




Christian.



Andrey






Andrey






The only 

Re: [PATCH v2 00/19] DC/DM changes needed for amdgpu PSR-SU

2022-05-11 Thread Alex Deucher
On Tue, May 10, 2022 at 4:45 PM David Zhang  wrote:
>
> changes in v2:
> ---
> - set vsc_packet_rev2 for PSR1 which is safer
> - add exposure of AMD specific DPCD regs for PSR-SU-RC (rate-control)
> - add DC/DM change related to amdgpu PSR-SU-RC
>
>
> David Zhang (18):
>   drm/amd/display: align dmub cmd header to latest dmub FW to support
> PSR-SU
>   drm/amd/display: feed PSR-SU as psr version to dmub FW
>   drm/amd/display: combine dirty rectangles in DMUB FW
>   drm/amd/display: update GSP1 generic info packet for PSRSU
>   drm/amd/display: revise Start/End SDP data
>   drm/amd/display: program PSR2 DPCD Configuration
>   drm/amd/display: Passing Y-granularity to dmub fw
>   drm/amd/display: Set default value of line_capture_indication
>   drm/amd/display: add vline time in micro sec to PSR context
>   drm/amd/display: fix system hang when PSR exits
>   drm/amd/display: Set PSR level to enable ALPM by default
>   drm/amd/display: use HW lock mgr for PSR-SU
>   drm/amd/display: PSRSU+DSC WA for specific TCON
>   drm/amd/display: add shared helpers to update psr config fields to
> power module
>   drm/amd/display: calculate psr config settings in runtime in DM
>   drm/amd/display: update cursor position to DMUB FW
>   drm/amd/display: expose AMD source specific DPCD for FreeSync PSR
> support
>   drm/amd/display: PSR-SU rate control support in DC
>
> Leo Li (1):
>   drm/amd/display: Implement MPO PSR SU

A couple of suggestions from Daniel on IRC:
1.  Might be good to extract the "calculate total crtc damage" code
from i915 in intel_psr2_sel_fetch_update, stuff that into damage
helpers and reuse for i915 and amdgpu
2.  The commit message on "drm/amd/display: Implement MPO PSR SU" is a
bit funny, since if you use the helpers right you always get damage
information, just when it's from userspace that doesn't set explicit
damage it's just always the entire plane.

Alex

>
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 142 +-
>  .../drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c |  21 +-
>  drivers/gpu/drm/amd/display/dc/core/dc.c  |  54 
>  drivers/gpu/drm/amd/display/dc/core/dc_link.c |  47 +++-
>  drivers/gpu/drm/amd/display/dc/dc_link.h  |   4 +
>  drivers/gpu/drm/amd/display/dc/dc_stream.h|   5 +
>  drivers/gpu/drm/amd/display/dc/dc_types.h |  23 +-
>  .../drm/amd/display/dc/dce/dmub_hw_lock_mgr.c |   2 +
>  drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c |  64 +
>  drivers/gpu/drm/amd/display/dc/dce/dmub_psr.h |   2 +
>  .../gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c |   2 +
>  .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 131 +
>  .../gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c |   2 +
>  .../dc/dcn30/dcn30_dio_stream_encoder.c   |  15 ++
>  drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h  |   1 +
>  .../drm/amd/display/dc/inc/hw/link_encoder.h  |  21 +-
>  .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   | 250 +-
>  .../amd/display/include/ddc_service_types.h   |   1 +
>  .../display/modules/info_packet/info_packet.c |  29 +-
>  .../amd/display/modules/power/power_helpers.c |  84 ++
>  .../amd/display/modules/power/power_helpers.h |   6 +
>  21 files changed, 887 insertions(+), 19 deletions(-)
>
> --
> 2.25.1
>


Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 11:20, Lazar, Lijo wrote:



On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a fix 
to only insert reset work to pending reset list if it's not 
already there), the point of using list (or array) to which you 
add and from which you remove is that the logic of this is 
encapsulated within reset domain. In your way we need to be aware 
who exactly schedules reset work and explicitly cancel them, this 
also means that for any new source added in the future you will 
need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate 
reset sources would be if somebody wants to know if a specific 
source has been handled or not (e.g. call flush_work() or 
work_pending()). Like in the case of a reset triggered through 
debugfs.



This is indeed one reason, another is as we said before that if you 
share 'reset source' (meaning a delayed work) with another client 
(i.e. RAS and KFD) it means you make assumption that the other 
client always proceeds with the
reset exactly the same way as you expect. So today we have this only 
in scheduler vs non scheduler reset happening - non scheduler reset 
clients assume the reset is always fully executed in HW while 
scheduler based reset makes shortcuts and not always does HW reset 
hence they cannot share 'reset source' (delayed work). Yes, we can 
always add this in the future if and when such problem will arise 
but no one will remember this then and a new bug will be introduced 
and will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the 
new reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
 
 union {
     struct {
         struct work_item debugfs;
         struct work_item ras;
         
     };
     struct work_item array[]
 } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo



It's possible though it makes harder to generalize reset_domain later 
for other drivers.
But still one caveat, look at amdgpu_recover_work_struct and it's usage 
in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset and 
so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the 
return value from gpu_recover like I do now in amdgpu_device_gpu_recover.


Andrey




Not 100% sure if that works, but something like that should do the 
trick.


My main concern is that I don't want to allocate the work items on 
the stack and dynamic allocation (e.g. kmalloc) is usually not 
possible either.


Additional to that putting/removing work items from a list, array or 
other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way 
all this done automatically within reset_domain code and it's 
agnostic to specific driver and it's specific list of reset 
sources. Also in case we would want to generalize reset_domain to 
other GPU drivers (which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent reset 
source into the reset domain as well. But I'm not sure those 
additional reset sources should be part of any common handling, 
that is largely amdgpu specific.



So it's for sure more then one source for the reasons described 
above, also note that for scheduler we already cancel delayed work 
in drm_sched_stop so calling them again in amdgpu code kind of 
superfluous.


Andrey




Christian.



Andrey






Andrey






The only difference is I chose to do the canceling right 
BEFORE the HW reset and not AFTER. I did this because I see a 
possible race where a new reset request is being generated 

Re: [PATCH v6] drm/amdgpu: Ensure the DMA engine is deactivated during set ups

2022-05-11 Thread Alex Deucher
I've applied it with the change to the commit message.  Sorry for the confusion.

Alex

On Wed, May 11, 2022 at 12:05 AM Haohui Mai  wrote:
>
> It should be an identical patch except for the commit message. Do you
> want me to send out a new one? Either way is fine with me.
>
> ~Haohui
>
> On Tue, May 10, 2022 at 10:14 PM Alex Deucher  wrote:
> >
> > On Tue, May 10, 2022 at 6:53 AM Haohui Mai  wrote:
> > >
> > > Hi Alex,
> > >
> > > Is there anything open before it can be merged?
> >
> > Were you going to send an updated patch?
> >
> > Alex
> >
> > >
> > > Thanks,
> > > Haohui
> > >
> > > On Mon, May 9, 2022 at 10:48 PM Alex Deucher  
> > > wrote:
> > > >
> > > > On Fri, May 6, 2022 at 10:30 PM Haohui Mai  wrote:
> > > > >
> > > > > What about
> > > > >
> > > > > Setting the HALT bit of SDMA_F32_CNTL in all paths before programming
> > > > > the ring buffer of the SDMA engine.
> > > > >
> > > >
> > > > Sounds fine to me.
> > > >
> > > > Alex
> > > >
> > > > > No other changes are required in the patch.
> > > > >
> > > > > ~Haohui
> > > > >
> > > > > On Fri, May 6, 2022 at 9:36 PM Alex Deucher  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 6, 2022 at 1:11 AM Haohui Mai  
> > > > > > wrote:
> > > > > > >
> > > > > > > The only thing that matters is that the IP should be halted before
> > > > > > > programming the ring buffers.
> > > > > > >
> > > > > > > What about rephrasing the commit messages to highlight the issue a
> > > > > > > little bit better?
> > > > > >
> > > > > > Yeah, that is fine.  Thanks!
> > > > > >
> > > > > > Alex
> > > > > >
> > > > > > >
> > > > > > > On Fri, May 6, 2022 at 12:33 AM Alex Deucher 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, Apr 30, 2022 at 3:34 AM  wrote:
> > > > > > > > >
> > > > > > > > > From: Haohui Mai 
> > > > > > > > >
> > > > > > > > > The patch fully deactivates the DMA engine before setting up 
> > > > > > > > > the ring
> > > > > > > > > buffer to avoid potential data races and crashes.
> > > > > > > >
> > > > > > > > Does this actually fix an issue you are seeing?  I don't think 
> > > > > > > > it will
> > > > > > > > hurt anything, but I also don't think it's strictly necessary.  
> > > > > > > > AFAIK,
> > > > > > > > only the HALT bit really matters.  So the commit message might 
> > > > > > > > be
> > > > > > > > somewhat misleading.
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Haohui Mai 
> > > > > > > > > ---
> > > > > > > > >  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 109 
> > > > > > > > > +++--
> > > > > > > > >  1 file changed, 64 insertions(+), 45 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > > > index 013d2dec81d0..1fac9d8e99de 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > > > @@ -459,7 +459,6 @@ static void 
> > > > > > > > > sdma_v5_2_ring_emit_fence(struct amdgpu_ring *ring, u64 addr, 
> > > > > > > > > u64 se
> > > > > > > > > }
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > -
> > > > > > > > >  /**
> > > > > > > > >   * sdma_v5_2_gfx_stop - stop the gfx async dma engines
> > > > > > > > >   *
> > > > > > > > > @@ -505,17 +504,21 @@ static void sdma_v5_2_rlc_stop(struct 
> > > > > > > > > amdgpu_device *adev)
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >  /**
> > > > > > > > > - * sdma_v5_2_ctx_switch_enable - stop the async dma engines 
> > > > > > > > > context switch
> > > > > > > > > + * sdma_v5_2_ctx_switch_enable_for_instance - start the 
> > > > > > > > > async dma engines
> > > > > > > > > + * context switch for an instance
> > > > > > > > >   *
> > > > > > > > >   * @adev: amdgpu_device pointer
> > > > > > > > > - * @enable: enable/disable the DMA MEs context switch.
> > > > > > > > > + * @instance_idx: the index of the SDMA instance
> > > > > > > > >   *
> > > > > > > > > - * Halt or unhalt the async dma engines context switch.
> > > > > > > > > + * Unhalt the async dma engines context switch.
> > > > > > > > >   */
> > > > > > > > > -static void sdma_v5_2_ctx_switch_enable(struct amdgpu_device 
> > > > > > > > > *adev, bool enable)
> > > > > > > > > +static void sdma_v5_2_ctx_switch_enable_for_instance(struct 
> > > > > > > > > amdgpu_device *adev, int instance_idx)
> > > > > > > > >  {
> > > > > > > > > u32 f32_cntl, phase_quantum = 0;
> > > > > > > > > -   int i;
> > > > > > > > > +
> > > > > > > > > +   if (WARN_ON(instance_idx >= 
> > > > > > > > > adev->sdma.num_instances)) {
> > > > > > > > > +   return;
> > > > > > > > > +   }
> > > > > > > > >
> > > > > > > > > if (amdgpu_sdma_phase_quantum) {
> > > > > > > > > unsigned value = amdgpu_sdma_phase_quantum;
> > > > > > > > > @@ -539,50 +542,68 @@ static void 
> > > 

Re: [PATCH 1/8] drm: execution context for GEM buffers v2

2022-05-11 Thread Daniel Vetter
On Mon, May 09, 2022 at 05:01:33PM +0200, Christian König wrote:
> Am 09.05.22 um 16:31 schrieb Daniel Vetter:
> > On Wed, May 04, 2022 at 09:47:32AM +0200, Christian König wrote:
> > > [SNIP]
> > > +/* Make sure we have enough room and add an object the container */
> > > +static int drm_exec_objects_add(struct drm_exec_objects *container,
> > > + struct drm_gem_object *obj)
> > > +{
> > > + if (unlikely(container->num_objects == container->max_objects)) {
> > > + size_t size = container->max_objects * sizeof(void *);
> > > + void *tmp;
> > > +
> > > + tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> > > + GFP_KERNEL);
> > > + if (!tmp)
> > > + return -ENOMEM;
> > > +
> > > + container->objects = tmp;
> > > + container->max_objects += PAGE_SIZE / sizeof(void *);
> > Might be worth it to inquire the actual allocation size here, since if
> > it's kmalloc the generic buckets only cover doubling of sizes, so once
> > it's big it goes up a lot quicker than PAGE_SIZE.
> > 
> > But also krealloc checks this internally already so maybe better to not
> > break the abstraction.
> 
> How can I actually do this? ksize() only works with kmalloc().
> 
> Or do we had a function to figure out if vmalloc or kmalloc was used by
> kvrealloc()?

kvfree has a is_vmalloc_addr so it would boil down to open-code that a
bit.

Probably not worth the trouble really, otoh looking at kvrealloc it
doesn't use krealloc underneath, so it's not doing that check. Maybe we
should just push that check into kvrealloc for the !vmalloc_addr case.

> > > [SNIP]
> > > +/**
> > > + * drm_exec_cleanup - cleanup when contention is detected
> > > + * @exec: the drm_exec object to cleanup
> > > + *
> > > + * Cleanup the current state and return true if we should stay inside 
> > > the retry
> > > + * loop, false if there wasn't any contention detected and we can keep 
> > > the
> > > + * objects locked.
> > > + */
> > > +bool drm_exec_cleanup(struct drm_exec *exec)
> > > +{
> > > + if (likely(!exec->contended)) {
> > > + ww_acquire_done(>ticket);
> > > + return false;
> > > + }
> > > +
> > > + if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> > > + exec->contended = NULL;
> > > + ww_acquire_init(>ticket, _ww_class);
> > Not sure why this is here instead of in _init()? I thought you're playing
> > some really dangerous tricks with re-initting the acquire ctx, which would
> > at least be questionable, but does not look like that.
> 
> That was my initial design, but the problem with this approach is that all
> locks taken between drm_exec_init() and the loop suddenly have a lockdep
> dependency on reservation_ww_class. And that in turn goes boom immediately.
> 
> Took me a moment to realize what's wrong with that as well.

Uh crap, indeed. I think minimally this needs to be document, but
personally I'm leaning towards drm_exec_prepare_init(), which does this
explicitly.

I do agree we need this split, especially so we can eventually add helpers
for bo lookup, or maybe userptr/hmm prep and things like that, which all
has to be outside of the acquire_ctx.

> > [SNIP]
> > +/**
> > + * drm_exec_has_duplicates - check for duplicated GEM object
> > + * @exec: drm_exec object
> > + *
> > + * Return true if the drm_exec object has encountered some already locked 
> > GEM
> > + * objects while trying to lock them. This can happen if multiple GEM 
> > objects
> > + * share the same underlying resv object.
> > + */
> > +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> > +{
> > +   return exec->duplicates.num_objects > 0;
> > Definitely an aside, but in our i915 efforts to get rid of temporary pins
> > we run into some fun where the eviction code couldn't differentiate from
> > memory we need reserved for the CS and memory we just keep locked because
> > we evicted it and fun stuff like that. So maybe we need a bit more
> > tracking here eventually, but that's only when we have this somehow glued
> > into ttm eviction code.
> 
> Hehe, yeah that's what I was thinking about as well. But then I though one
> step at a time.
> 
> > Also the even more massive step would be to glue this into dma-buf so you
> > can do dynamic dma-buf eviction and still keep track of all the buffers. I
> > think with some clever pointer tagging and a bit more indirection we could
> > nest drm_exec structures (so that a driver could insert it's entire
> > drm_exec structure with a drm_exec-level callback for handling refcounting
> > and stuff like that).
> 
> I considered in which component to put this quite a bit as well, but then
> intentionally decided against DMA-buf.
> 
> One major reason was that not all buffers which needs to be locked this way
> are actually exported as DMA-buf.
> 
> Another reason is that DMA-buf doesn't necessary need a concept of an
> execution context. As far as I can see that's 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 7:28 PM, Christian König wrote:

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a fix 
to only insert reset work to pending reset list if it's not already 
there), the point of using list (or array) to which you add and from 
which you remove is that the logic of this is encapsulated within 
reset domain. In your way we need to be aware who exactly schedules 
reset work and explicitly cancel them, this also means that for any 
new source added in the future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate reset 
sources would be if somebody wants to know if a specific source has 
been handled or not (e.g. call flush_work() or work_pending()). Like 
in the case of a reset triggered through debugfs.



This is indeed one reason, another is as we said before that if you 
share 'reset source' (meaning a delayed work) with another client 
(i.e. RAS and KFD) it means you make assumption that the other client 
always proceeds with the
reset exactly the same way as you expect. So today we have this only 
in scheduler vs non scheduler reset happening - non scheduler reset 
clients assume the reset is always fully executed in HW while 
scheduler based reset makes shortcuts and not always does HW reset 
hence they cannot share 'reset source' (delayed work). Yes, we can 
always add this in the future if and when such problem will arise but 
no one will remember this then and a new bug will be introduced and 
will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the new 
reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
     
     union {
         struct {
             struct work_item debugfs;
             struct work_item ras;
             
         };
         struct work_item array[]
     } reset_sources;
}



If it's only about static array,

enum amdgpu_reset_soruce {

AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX

};

struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for 
each work item



Thanks,
Lijo


Not 100% sure if that works, but something like that should do the trick.

My main concern is that I don't want to allocate the work items on the 
stack and dynamic allocation (e.g. kmalloc) is usually not possible either.


Additional to that putting/removing work items from a list, array or 
other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way all 
this done automatically within reset_domain code and it's agnostic 
to specific driver and it's specific list of reset sources. Also in 
case we would want to generalize reset_domain to other GPU drivers 
(which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent reset 
source into the reset domain as well. But I'm not sure those 
additional reset sources should be part of any common handling, that 
is largely amdgpu specific.



So it's for sure more then one source for the reasons described above, 
also note that for scheduler we already cancel delayed work in 
drm_sched_stop so calling them again in amdgpu code kind of superfluous.


Andrey




Christian.



Andrey






Andrey






The only difference is I chose to do the canceling right BEFORE 
the HW reset and not AFTER. I did this because I see a possible 
race where a new reset request is being generated exactly after 
we finished the HW reset but before we canceled out all pending 
resets - in such case you wold not want to cancel this 'border 
line new' reset request.


Why not? Any new reset request directly after a hardware reset is 
most likely just falsely generated by the reset itself.


Ideally I would cancel all sources after the reset, but before 
starting any new work.


Regards,
Christian.




Andrey




Regards,
Christian.

You can see that if many different reset sources share same 
work struct what can happen is that the first to obtain the 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Christian König

Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:

On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:

[SNIP]
E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just 
bluntly reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a fix 
to only insert reset work to pending reset list if it's not already 
there), the point of using list (or array) to which you add and from 
which you remove is that the logic of this is encapsulated within 
reset domain. In your way we need to be aware who exactly schedules 
reset work and explicitly cancel them, this also means that for any 
new source added in the future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset 
sources, most likely even just one or two is enough.


The only justification I can see of having additional separate reset 
sources would be if somebody wants to know if a specific source has 
been handled or not (e.g. call flush_work() or work_pending()). Like 
in the case of a reset triggered through debugfs.



This is indeed one reason, another is as we said before that if you 
share 'reset source' (meaning a delayed work) with another client 
(i.e. RAS and KFD) it means you make assumption that the other client 
always proceeds with the
reset exactly the same way as you expect. So today we have this only 
in scheduler vs non scheduler reset happening - non scheduler reset 
clients assume the reset is always fully executed in HW while 
scheduler based reset makes shortcuts and not always does HW reset 
hence they cannot share 'reset source' (delayed work). Yes, we can 
always add this in the future if and when such problem will arise but 
no one will remember this then and a new bug will be introduced and 
will take time to find and resolve.


Mhm, so your main concern is that we forget to correctly handle the new 
reset sources?


How about we do it like this then:

struct amdgpu_reset_domain {
    
    union {
        struct {
            struct work_item debugfs;
            struct work_item ras;
            
        };
        struct work_item array[]
    } reset_sources;
}

Not 100% sure if that works, but something like that should do the trick.

My main concern is that I don't want to allocate the work items on the 
stack and dynamic allocation (e.g. kmalloc) is usually not possible either.


Additional to that putting/removing work items from a list, array or 
other container is a very common source for race conditions.


Regards,
Christian.



to the cancellation list which you showed above. In current way all 
this done automatically within reset_domain code and it's agnostic 
to specific driver and it's specific list of reset sources. Also in 
case we would want to generalize reset_domain to other GPU drivers 
(which was
a plan as far as i remember) this explicit mention of each reset 
works for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler independent reset 
source into the reset domain as well. But I'm not sure those 
additional reset sources should be part of any common handling, that 
is largely amdgpu specific.



So it's for sure more then one source for the reasons described above, 
also note that for scheduler we already cancel delayed work in 
drm_sched_stop so calling them again in amdgpu code kind of superfluous.


Andrey




Christian.



Andrey






Andrey






The only difference is I chose to do the canceling right BEFORE 
the HW reset and not AFTER. I did this because I see a possible 
race where a new reset request is being generated exactly after 
we finished the HW reset but before we canceled out all pending 
resets - in such case you wold not want to cancel this 'border 
line new' reset request.


Why not? Any new reset request directly after a hardware reset is 
most likely just falsely generated by the reset itself.


Ideally I would cancel all sources after the reset, but before 
starting any new work.


Regards,
Christian.




Andrey




Regards,
Christian.

You can see that if many different reset sources share same 
work struct what can happen is that the first to obtain the 
lock you describe bellow might opt out from full HW reset 
because his bad job did signal for example or because his 
hunged IP block was able to recover through SW reset but in 
the meantime another reset source who needed an actual HW 
reset just silently returned and we end up with unhandled 
reset request. True that 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Andrey Grodzovsky



On 2022-05-11 03:38, Christian König wrote:

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:


On 2022-05-10 13:19, Christian König wrote:

Am 10.05.22 um 19:01 schrieb Andrey Grodzovsky:


On 2022-05-10 12:17, Christian König wrote:

Am 10.05.22 um 18:00 schrieb Andrey Grodzovsky:

[SNIP]
That's one of the reasons why we should have multiple work items 
for job based reset and other reset sources.


See the whole idea is the following:
1. We have one single queued work queue for each reset domain 
which makes sure that all reset requests execute in order.
2. We have one delayed work item for each scheduler which fires 
when a timeout on a scheduler occurs and eventually calls the 
reset procedure with the last running job.

3. We have one work item for each necessary hard reset.

The delayed work item from the scheduler first tries a soft 
recovery and checks if a hard reset is really necessary. If it's 
not necessary and we can cancel the offending job we skip the 
hard reset.


The hard reset work item doesn't do any of those checks and just 
does a reset no matter what.


When we really do a reset, independent if its triggered by a job 
or other source we cancel all sources at the end of the reset 
procedure.


This makes sure that a) We only do one reset even when multiple 
sources fire at the same time and b) when any source bails out 
and only does a soft recovery we do a full reset anyway when 
necessary.


That design was outlined multiple times now on the mailing list 
and looks totally clear to me. We should probably document that 
somewhere.



If you look at the patch what you described above is exactly what 
is happening - since scheduler's delayed work is different from 
any non scheduler delayed work the SW reset which might take 
place from scheduler's reset
will not have any impact on any non scheduler delayed work and 
will not cancel them. In case the scheduler actually reaches the 
point of HW reset it will cancel out all pending reset works from 
any other sources on the same
reset domain. Non scheduler reset will always proceed to do full 
HW reset and will cancel any other pending resets.


Ok, but why you then need that linked list? The number of reset 
sources should be static and not in any way dynamic.



So array reset_src[i] holds a pointer to pending delayed work from 
source i or NULL if no pedning work ?
What if same source triggers multiple reset requests such as 
multiple RAS errors at once , don't set the delayed work pointer in 
the arr[RAS_index] if it's already not NULL ?




See using the linked list sounds like you only wanted to cancel 
the reset sources raised so far which would not be correct as far 
as I can see.



Not clear about this one ? We do want to cancel those reset sources 
that were raised so far because we just did a HW reset which should 
fix them anyway ? Those who not raised reset request so far their 
respective array index will have a NULL ptr.


And exactly that's what I want to prevent. See you don't care if a 
reset source has fired once, twice, ten times or never. You just 
cancel all of them!


That's why I want to come to a static list of reset sources.

E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and 
which hasn't, because that would be racy again. Instead just bluntly 
reset all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a fix to 
only insert reset work to pending reset list if it's not already 
there), the point of using list (or array) to which you add and from 
which you remove is that the logic of this is encapsulated within 
reset domain. In your way we need to be aware who exactly schedules 
reset work and explicitly cancel them, this also means that for any 
new source added in the future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset sources, 
most likely even just one or two is enough.


The only justification I can see of having additional separate reset 
sources would be if somebody wants to know if a specific source has 
been handled or not (e.g. call flush_work() or work_pending()). Like 
in the case of a reset triggered through debugfs.



This is indeed one reason, another is as we said before that if you 
share 'reset source' (meaning a delayed work) with another client (i.e. 
RAS and KFD) it means you make assumption that the other client always 
proceeds with the
reset exactly the same way as you expect. So today we have this only in 
scheduler vs non scheduler reset happening - non scheduler reset clients 
assume the reset is always fully executed in 

[PATCH] drm/amd/pm: consistent approach for smartshift

2022-05-11 Thread Sathishkumar S
create smartshift sysfs attributes from dGPU device even
on smartshift 1.0 platform to be consistent. Do not populate
the attributes on platforms that have APU only but not dGPU
or vice versa.

Suggested-by: Alex Deucher 
Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 146 +
 1 file changed, 86 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index d3228216b2da..c2406baeef93 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1734,22 +1734,11 @@ static ssize_t amdgpu_get_gpu_metrics(struct device 
*dev,
return size;
 }
 
-/**
- * DOC: smartshift_apu_power
- *
- * The amdgpu driver provides a sysfs API for reporting APU power
- * share if it supports smartshift. The value is expressed as
- * the proportion of stapm limit where stapm limit is the total APU
- * power limit. The result is in percentage. If APU power is 130% of
- * STAPM, then APU is using 30% of the dGPU's headroom.
- */
-
-static ssize_t amdgpu_get_smartshift_apu_power(struct device *dev, struct 
device_attribute *attr,
-  char *buf)
+static int amdgpu_read_powershift_percent(struct amdgpu_device *adev,
+   uint32_t *ss_power, bool 
dgpu_share)
 {
-   struct drm_device *ddev = dev_get_drvdata(dev);
-   struct amdgpu_device *adev = drm_to_adev(ddev);
-   uint32_t ss_power, size;
+   struct drm_device *ddev = adev_to_drm(adev);
+   uint32_t size;
int r = 0;
 
if (amdgpu_in_reset(adev))
@@ -1763,28 +1752,64 @@ static ssize_t amdgpu_get_smartshift_apu_power(struct 
device *dev, struct device
return r;
}
 
-   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_APU_SHARE,
-  (void *)_power, );
-   if (r)
-   goto out;
-
-   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   if (dgpu_share)
+   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_DGPU_SHARE,
+  (void *)ss_power, );
+   else
+   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_APU_SHARE,
+  (void *)ss_power, );
 
-out:
pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
return r;
 }
+/**
+ * DOC: smartshift_apu_power
+ *
+ * The amdgpu driver provides a sysfs API for reporting APU power
+ * shift in percentage if platform supports smartshift. Value 0 means that
+ * there is no powershift and values between [1-100] means that the power
+ * is shifted to APU, the percentage of boost is with respect to APU power
+ * limit on the platform.
+ */
+
+static ssize_t amdgpu_get_smartshift_apu_power(struct device *dev, struct 
device_attribute *attr,
+  char *buf)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = drm_to_adev(ddev);
+   uint32_t ss_power = 0;
+   int r = 0, i;
+
+   r = amdgpu_read_powershift_percent(adev, _power, false);
+   if (!r)
+   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   else if (r == -EOPNOTSUPP) {
+   /* sensor not available on dGPU, try to read from APU */
+   adev = NULL;
+   mutex_lock(_info.mutex);
+   for (i = 0; i < mgpu_info.num_gpu; i++) {
+   if (mgpu_info.gpu_ins[i].adev->flags & AMD_IS_APU) {
+   adev = mgpu_info.gpu_ins[i].adev;
+   break;
+   }
+   }
+   mutex_unlock(_info.mutex);
+   if (adev && !amdgpu_read_powershift_percent(adev, _power, 
false))
+   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   }
+
+   return r;
+}
 
 /**
  * DOC: smartshift_dgpu_power
  *
- * The amdgpu driver provides a sysfs API for reporting the dGPU power
- * share if the device is in HG and supports smartshift. The value
- * is expressed as the proportion of stapm limit where stapm limit
- * is the total APU power limit. The value is in percentage. If dGPU
- * power is 20% higher than STAPM power(120%), it's using 20% of the
- * APU's power headroom.
+ * The amdgpu driver provides a sysfs API for reporting dGPU power
+ * shift in percentage if platform supports smartshift. Value 0 means that
+ * there is no powershift and values between [1-100] means that the power is
+ * shifted to dGPU, the percentage of boost is with respect to dGPU power
+ * limit on the platform.
  */
 
 static ssize_t amdgpu_get_smartshift_dgpu_power(struct device *dev, struct 
device_attribute *attr,
@@ -1792,31 +1817,27 @@ static ssize_t amdgpu_get_smartshift_dgpu_power(struct 
device *dev, struct devic
 {
struct drm_device *ddev = 

Re: [PATCH] drm/amd/pm: update smartshift powerboost calc for smu13

2022-05-11 Thread Sundararaju, Sathishkumar



On 5/11/2022 5:27 PM, Lazar, Lijo wrote:



On 5/11/2022 5:16 PM, Sathishkumar S wrote:

smartshift apu and dgpu power boost are reported as percentage
with respect to their power limits. adjust the units of power before
calculating the percentage of boost.

Signed-off-by: Sathishkumar S 
---
  .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  | 60 ++-
  1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c

index e2d099409123..35fbeb72c05c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
@@ -276,6 +276,40 @@ static int yellow_carp_mode2_reset(struct 
smu_context *smu)

  return yellow_carp_mode_reset(smu, SMU_RESET_MODE_2);
  }
  +
+static void yellow_carp_get_ss_power_percent(SmuMetrics_t *metrics,
+    uint32_t *apu_percent, uint32_t *dgpu_percent)
+{
+    uint32_t apu_boost = 0;
+    uint32_t dgpu_boost = 0;
+    uint16_t apu_limit = 0;
+    uint16_t dgpu_limit = 0;
+    uint16_t apu_power = 0;
+    uint16_t dgpu_power = 0;
+
+    apu_power = metrics->ApuPower/1000;
+    apu_limit = metrics->StapmOpnLimit;
+    if (apu_power > apu_limit && apu_limit != 0)
+    apu_boost =  ((apu_power - apu_limit) * 100) / apu_limit;
+    apu_boost = (apu_boost > 100) ? 100 : apu_boost;
+
+    dgpu_power = metrics->dGpuPower/1000;


Before submitting (not expecting another version here), may add a 
comment that APU/dGPU power values are reported in milli Watts and 
STAPM power limits in Watts.

Okay, will add the comment here . Thank you.


Reviewed-by: Lijo Lazar 

Thanks,
Lijo


+    if (metrics->StapmCurrentLimit > metrics->StapmOpnLimit)
+    dgpu_limit = metrics->StapmCurrentLimit - 
metrics->StapmOpnLimit;

+    if (dgpu_power > dgpu_limit && dgpu_limit != 0)
+    dgpu_boost = ((dgpu_power - dgpu_limit) * 100) / dgpu_limit;
+    dgpu_boost = (dgpu_boost > 100) ? 100 : dgpu_boost;
+
+    if (dgpu_boost >= apu_boost)
+    apu_boost = 0;
+    else
+    dgpu_boost = 0;
+
+    *apu_percent = apu_boost;
+    *dgpu_percent = dgpu_boost;
+
+}
+
  static int yellow_carp_get_smu_metrics_data(struct smu_context *smu,
  MetricsMember_t member,
  uint32_t *value)
@@ -284,6 +318,8 @@ static int 
yellow_carp_get_smu_metrics_data(struct smu_context *smu,
    SmuMetrics_t *metrics = (SmuMetrics_t 
*)smu_table->metrics_table;

  int ret = 0;
+    uint32_t apu_percent = 0;
+    uint32_t dgpu_percent = 0;
    ret = smu_cmn_get_metrics_table(smu, NULL, false);
  if (ret)
@@ -332,26 +368,18 @@ static int 
yellow_carp_get_smu_metrics_data(struct smu_context *smu,

  *value = metrics->Voltage[1];
  break;
  case METRICS_SS_APU_SHARE:
-    /* return the percentage of APU power with respect to APU's 
power limit.
- * percentage is reported, this isn't boost value. 
Smartshift power

- * boost/shift is only when the percentage is more than 100.
+    /* return the percentage of APU power boost
+ * with respect to APU's power limit.
   */
-    if (metrics->StapmOpnLimit > 0)
-    *value =  (metrics->ApuPower * 100) / 
metrics->StapmOpnLimit;

-    else
-    *value = 0;
+    yellow_carp_get_ss_power_percent(metrics, _percent, 
_percent);

+    *value = apu_percent;
  break;
  case METRICS_SS_DGPU_SHARE:
-    /* return the percentage of dGPU power with respect to 
dGPU's power limit.
- * percentage is reported, this isn't boost value. 
Smartshift power

- * boost/shift is only when the percentage is more than 100.
+    /* return the percentage of dGPU power boost
+ * with respect to dGPU's power limit.
   */
-    if ((metrics->dGpuPower > 0) &&
-    (metrics->StapmCurrentLimit > metrics->StapmOpnLimit))
-    *value = (metrics->dGpuPower * 100) /
-  (metrics->StapmCurrentLimit - 
metrics->StapmOpnLimit);

-    else
-    *value = 0;
+    yellow_carp_get_ss_power_percent(metrics, _percent, 
_percent);

+    *value = dgpu_percent;
  break;
  default:
  *value = UINT_MAX;



[PATCH 2/2] drm/amdgpu: add drm-client-id to fdinfo

2022-05-11 Thread Christian König
This is enough to get gputop working :)

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 52c2b90925a0..780a48259682 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -55,17 +55,15 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
 
 void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 {
-   struct amdgpu_fpriv *fpriv;
uint64_t vram_mem = 0, gtt_mem = 0, cpu_mem = 0;
struct drm_file *file = f->private_data;
struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
+   struct amdgpu_fpriv *fpriv = file->driver_priv;
+   struct amdgpu_vm *vm = >vm;
uint32_t bus, dev, fn, domain, hw_ip;
struct amdgpu_bo *root;
int ret;
 
-   ret = amdgpu_file_to_fpriv(f, );
-   if (ret)
-   return;
bus = adev->pdev->bus->number;
domain = pci_domain_nr(adev->pdev->bus);
dev = PCI_SLOT(adev->pdev->devfn);
@@ -93,6 +91,7 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\npasid:\t%u\n", domain, bus,
dev, fn, fpriv->vm.pasid);
+   seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
seq_printf(m, "drm-memory-gtt:\t%llu KiB\n", gtt_mem/1024UL);
seq_printf(m, "drm-memory-cpu:\t%llu KiB\n", cpu_mem/1024UL);
-- 
2.25.1



[PATCH 1/2] drm/amdgpu: Convert to common fdinfo format v3

2022-05-11 Thread Christian König
Convert fdinfo format to one documented in drm-usage-stats.rst.

It turned out that the existing implementation was actually completely
nonsense. The calculated percentages indeed represented the usage of the
engine, but with varying time slices.

So 10% usage for application A could mean something completely different
than 10% usage for application B.

Completely nuke that and just use the now standardized nanosecond
interface.

v2: drop the documentation change for now, nuke percentage calculation
v3: only account for each hw_ip, move the time_spend to the ctx mgr.

Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Christian König 
Cc: David M Nieto 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 234 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|  23 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  41 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|   2 +-
 5 files changed, 153 insertions(+), 149 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 01853431249d..43b97ad3c6be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1239,7 +1239,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
p->fence = dma_fence_get(>base.s_fence->finished);
 
-   amdgpu_ctx_add_fence(p->ctx, entity, p->fence, );
+   seq = amdgpu_ctx_add_fence(>ctx_mgr, p->ctx, entity, p->fence);
amdgpu_cs_post_dependencies(p);
 
if ((job->preamble_status & AMDGPU_PREAMBLE_IB_PRESENT) &&
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 8f0e6d93bb9c..5a0d67cc3d75 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -135,9 +135,9 @@ static enum amdgpu_ring_priority_level 
amdgpu_ctx_sched_prio_to_ring_prio(int32_
 
 static unsigned int amdgpu_ctx_get_hw_prio(struct amdgpu_ctx *ctx, u32 hw_ip)
 {
-   struct amdgpu_device *adev = ctx->adev;
-   int32_t ctx_prio;
+   struct amdgpu_device *adev = ctx->mgr->adev;
unsigned int hw_prio;
+   int32_t ctx_prio;
 
ctx_prio = (ctx->override_priority == AMDGPU_CTX_PRIORITY_UNSET) ?
ctx->init_priority : ctx->override_priority;
@@ -162,17 +162,49 @@ static unsigned int amdgpu_ctx_get_hw_prio(struct 
amdgpu_ctx *ctx, u32 hw_ip)
return hw_prio;
 }
 
+/* Calculate the time spend on the hw */
+static ktime_t amdgpu_ctx_fence_time(struct dma_fence *fence)
+{
+   struct drm_sched_fence *s_fence;
+
+   if (!fence)
+   return ns_to_ktime(0);
+
+   /* When the fence is not even scheduled it can't have spend time */
+   s_fence = to_drm_sched_fence(fence);
+   if (!test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, _fence->scheduled.flags))
+   return ns_to_ktime(0);
+
+   if (!test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, _fence->finished.flags))
+   return ktime_sub(ktime_get(), s_fence->scheduled.timestamp);
+
+   return ktime_sub(s_fence->finished.timestamp,
+s_fence->scheduled.timestamp);
+}
+
+static ktime_t amdgpu_ctx_entity_time(struct amdgpu_ctx *ctx,
+ struct amdgpu_ctx_entity *centity)
+{
+   ktime_t res = ns_to_ktime(0);
+   uint32_t i;
+
+   spin_lock(>ring_lock);
+   for (i = 0; i < amdgpu_sched_jobs; i++) {
+   res = ktime_add(res, amdgpu_ctx_fence_time(centity->fences[i]));
+   }
+   spin_unlock(>ring_lock);
+   return res;
+}
 
 static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip,
  const u32 ring)
 {
-   struct amdgpu_device *adev = ctx->adev;
-   struct amdgpu_ctx_entity *entity;
struct drm_gpu_scheduler **scheds = NULL, *sched = NULL;
-   unsigned num_scheds = 0;
-   int32_t ctx_prio;
-   unsigned int hw_prio;
+   struct amdgpu_device *adev = ctx->mgr->adev;
+   struct amdgpu_ctx_entity *entity;
enum drm_sched_priority drm_prio;
+   unsigned int hw_prio, num_scheds;
+   int32_t ctx_prio;
int r;
 
entity = kzalloc(struct_size(entity, fences, amdgpu_sched_jobs),
@@ -182,6 +214,7 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
u32 hw_ip,
 
ctx_prio = (ctx->override_priority == AMDGPU_CTX_PRIORITY_UNSET) ?
ctx->init_priority : ctx->override_priority;
+   entity->hw_ip = hw_ip;
entity->sequence = 1;
hw_prio = amdgpu_ctx_get_hw_prio(ctx, hw_ip);
drm_prio = amdgpu_ctx_to_drm_sched_prio(ctx_prio);
@@ -220,11 +253,29 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
u32 hw_ip,
return r;
 }
 
-static int amdgpu_ctx_init(struct amdgpu_device *adev,
+static ktime_t amdgpu_ctx_fini_entity(struct amdgpu_ctx_entity *entity)
+{
+   ktime_t res = 

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-05-11 Thread Christian König

Am 11.05.22 um 11:05 schrieb Mikhail Gavrilov:

On Fri, Apr 15, 2022 at 1:04 PM Christian König
 wrote:

No, I just couldn't find time during all that bug fixing :)

Sorry for the delay, going to take a look after the eastern holiday here.

Christian.

The message is just for history. The issue was fixed between
b253435746d9a4a and 5.18rc4.


We have implemented a workaround, but still don't know the exact root cause.

If anybody wants to look into this it would be rather helpful to be able 
to reproduce the issue.


Regards,
Christian.


Re: [PATCH] drm/amd/pm: update smartshift powerboost calc for smu12

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 4:27 PM, Sathishkumar S wrote:

smartshift apu and dgpu power boost are reported as percentage with
respect to their power limits. This value[0-100] reflects the boost
for the respective device.

Signed-off-by: Sathishkumar S 


Reviewed-by: Lijo Lazar 

Thanks,
Lijo


---
  .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   | 60 ++-
  1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index fd6c44ece168..012e3bd99cc2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -1119,6 +1119,39 @@ static int renoir_get_power_profile_mode(struct 
smu_context *smu,
return size;
  }
  
+static void renoir_get_ss_power_percent(SmuMetrics_t *metrics,

+   uint32_t *apu_percent, uint32_t 
*dgpu_percent)
+{
+   uint32_t apu_boost = 0;
+   uint32_t dgpu_boost = 0;
+   uint16_t apu_limit = 0;
+   uint16_t dgpu_limit = 0;
+   uint16_t apu_power = 0;
+   uint16_t dgpu_power = 0;
+
+   apu_power = metrics->ApuPower;
+   apu_limit = metrics->StapmOriginalLimit;
+   if (apu_power > apu_limit && apu_limit != 0)
+   apu_boost =  ((apu_power - apu_limit) * 100) / apu_limit;
+   apu_boost = (apu_boost > 100) ? 100 : apu_boost;
+
+   dgpu_power = metrics->dGpuPower;
+   if (metrics->StapmCurrentLimit > metrics->StapmOriginalLimit)
+   dgpu_limit = metrics->StapmCurrentLimit - 
metrics->StapmOriginalLimit;
+   if (dgpu_power > dgpu_limit && dgpu_limit != 0)
+   dgpu_boost = ((dgpu_power - dgpu_limit) * 100) / dgpu_limit;
+   dgpu_boost = (dgpu_boost > 100) ? 100 : dgpu_boost;
+
+   if (dgpu_boost >= apu_boost)
+   apu_boost = 0;
+   else
+   dgpu_boost = 0;
+
+   *apu_percent = apu_boost;
+   *dgpu_percent = dgpu_boost;
+}
+
+
  static int renoir_get_smu_metrics_data(struct smu_context *smu,
   MetricsMember_t member,
   uint32_t *value)
@@ -1127,6 +1160,9 @@ static int renoir_get_smu_metrics_data(struct smu_context 
*smu,
  
  	SmuMetrics_t *metrics = (SmuMetrics_t *)smu_table->metrics_table;

int ret = 0;
+   uint32_t apu_percent = 0;
+   uint32_t dgpu_percent = 0;
+
  
  	ret = smu_cmn_get_metrics_table(smu,

NULL,
@@ -1171,26 +1207,18 @@ static int renoir_get_smu_metrics_data(struct 
smu_context *smu,
*value = metrics->Voltage[1];
break;
case METRICS_SS_APU_SHARE:
-   /* return the percentage of APU power with respect to APU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of APU power boost
+* with respect to APU's power limit.
 */
-   if (metrics->StapmOriginalLimit > 0)
-   *value =  (metrics->ApuPower * 100) / 
metrics->StapmOriginalLimit;
-   else
-   *value = 0;
+   renoir_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = apu_percent;
break;
case METRICS_SS_DGPU_SHARE:
-   /* return the percentage of dGPU power with respect to dGPU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of dGPU power boost
+* with respect to dGPU's power limit.
 */
-   if ((metrics->dGpuPower > 0) &&
-   (metrics->StapmCurrentLimit > metrics->StapmOriginalLimit))
-   *value = (metrics->dGpuPower * 100) /
- (metrics->StapmCurrentLimit - 
metrics->StapmOriginalLimit);
-   else
-   *value = 0;
+   renoir_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = dgpu_percent;
break;
default:
*value = UINT_MAX;



Re: [PATCH] drm/amd/pm: update smartshift powerboost calc for smu13

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 5:16 PM, Sathishkumar S wrote:

smartshift apu and dgpu power boost are reported as percentage
with respect to their power limits. adjust the units of power before
calculating the percentage of boost.

Signed-off-by: Sathishkumar S 
---
  .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  | 60 ++-
  1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
index e2d099409123..35fbeb72c05c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
@@ -276,6 +276,40 @@ static int yellow_carp_mode2_reset(struct smu_context *smu)
return yellow_carp_mode_reset(smu, SMU_RESET_MODE_2);
  }
  
+

+static void yellow_carp_get_ss_power_percent(SmuMetrics_t *metrics,
+   uint32_t *apu_percent, uint32_t 
*dgpu_percent)
+{
+   uint32_t apu_boost = 0;
+   uint32_t dgpu_boost = 0;
+   uint16_t apu_limit = 0;
+   uint16_t dgpu_limit = 0;
+   uint16_t apu_power = 0;
+   uint16_t dgpu_power = 0;
+
+   apu_power = metrics->ApuPower/1000;
+   apu_limit = metrics->StapmOpnLimit;
+   if (apu_power > apu_limit && apu_limit != 0)
+   apu_boost =  ((apu_power - apu_limit) * 100) / apu_limit;
+   apu_boost = (apu_boost > 100) ? 100 : apu_boost;
+
+   dgpu_power = metrics->dGpuPower/1000;


Before submitting (not expecting another version here), may add a 
comment that APU/dGPU power values are reported in milli Watts and STAPM 
power limits in Watts.


Reviewed-by: Lijo Lazar 

Thanks,
Lijo


+   if (metrics->StapmCurrentLimit > metrics->StapmOpnLimit)
+   dgpu_limit = metrics->StapmCurrentLimit - 
metrics->StapmOpnLimit;
+   if (dgpu_power > dgpu_limit && dgpu_limit != 0)
+   dgpu_boost = ((dgpu_power - dgpu_limit) * 100) / dgpu_limit;
+   dgpu_boost = (dgpu_boost > 100) ? 100 : dgpu_boost;
+
+   if (dgpu_boost >= apu_boost)
+   apu_boost = 0;
+   else
+   dgpu_boost = 0;
+
+   *apu_percent = apu_boost;
+   *dgpu_percent = dgpu_boost;
+
+}
+
  static int yellow_carp_get_smu_metrics_data(struct smu_context *smu,
MetricsMember_t member,
uint32_t *value)
@@ -284,6 +318,8 @@ static int yellow_carp_get_smu_metrics_data(struct 
smu_context *smu,
  
  	SmuMetrics_t *metrics = (SmuMetrics_t *)smu_table->metrics_table;

int ret = 0;
+   uint32_t apu_percent = 0;
+   uint32_t dgpu_percent = 0;
  
  	ret = smu_cmn_get_metrics_table(smu, NULL, false);

if (ret)
@@ -332,26 +368,18 @@ static int yellow_carp_get_smu_metrics_data(struct 
smu_context *smu,
*value = metrics->Voltage[1];
break;
case METRICS_SS_APU_SHARE:
-   /* return the percentage of APU power with respect to APU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of APU power boost
+* with respect to APU's power limit.
 */
-   if (metrics->StapmOpnLimit > 0)
-   *value =  (metrics->ApuPower * 100) / 
metrics->StapmOpnLimit;
-   else
-   *value = 0;
+   yellow_carp_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = apu_percent;
break;
case METRICS_SS_DGPU_SHARE:
-   /* return the percentage of dGPU power with respect to dGPU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of dGPU power boost
+* with respect to dGPU's power limit.
 */
-   if ((metrics->dGpuPower > 0) &&
-   (metrics->StapmCurrentLimit > metrics->StapmOpnLimit))
-   *value = (metrics->dGpuPower * 100) /
- (metrics->StapmCurrentLimit - 
metrics->StapmOpnLimit);
-   else
-   *value = 0;
+   yellow_carp_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = dgpu_percent;
break;
default:
*value = UINT_MAX;



[PATCH] drm/amd/pm: update smartshift powerboost calc for smu13

2022-05-11 Thread Sathishkumar S
smartshift apu and dgpu power boost are reported as percentage
with respect to their power limits. adjust the units of power before
calculating the percentage of boost.

Signed-off-by: Sathishkumar S 
---
 .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  | 60 ++-
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
index e2d099409123..35fbeb72c05c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
@@ -276,6 +276,40 @@ static int yellow_carp_mode2_reset(struct smu_context *smu)
return yellow_carp_mode_reset(smu, SMU_RESET_MODE_2);
 }
 
+
+static void yellow_carp_get_ss_power_percent(SmuMetrics_t *metrics,
+   uint32_t *apu_percent, uint32_t 
*dgpu_percent)
+{
+   uint32_t apu_boost = 0;
+   uint32_t dgpu_boost = 0;
+   uint16_t apu_limit = 0;
+   uint16_t dgpu_limit = 0;
+   uint16_t apu_power = 0;
+   uint16_t dgpu_power = 0;
+
+   apu_power = metrics->ApuPower/1000;
+   apu_limit = metrics->StapmOpnLimit;
+   if (apu_power > apu_limit && apu_limit != 0)
+   apu_boost =  ((apu_power - apu_limit) * 100) / apu_limit;
+   apu_boost = (apu_boost > 100) ? 100 : apu_boost;
+
+   dgpu_power = metrics->dGpuPower/1000;
+   if (metrics->StapmCurrentLimit > metrics->StapmOpnLimit)
+   dgpu_limit = metrics->StapmCurrentLimit - 
metrics->StapmOpnLimit;
+   if (dgpu_power > dgpu_limit && dgpu_limit != 0)
+   dgpu_boost = ((dgpu_power - dgpu_limit) * 100) / dgpu_limit;
+   dgpu_boost = (dgpu_boost > 100) ? 100 : dgpu_boost;
+
+   if (dgpu_boost >= apu_boost)
+   apu_boost = 0;
+   else
+   dgpu_boost = 0;
+
+   *apu_percent = apu_boost;
+   *dgpu_percent = dgpu_boost;
+
+}
+
 static int yellow_carp_get_smu_metrics_data(struct smu_context *smu,
MetricsMember_t member,
uint32_t *value)
@@ -284,6 +318,8 @@ static int yellow_carp_get_smu_metrics_data(struct 
smu_context *smu,
 
SmuMetrics_t *metrics = (SmuMetrics_t *)smu_table->metrics_table;
int ret = 0;
+   uint32_t apu_percent = 0;
+   uint32_t dgpu_percent = 0;
 
ret = smu_cmn_get_metrics_table(smu, NULL, false);
if (ret)
@@ -332,26 +368,18 @@ static int yellow_carp_get_smu_metrics_data(struct 
smu_context *smu,
*value = metrics->Voltage[1];
break;
case METRICS_SS_APU_SHARE:
-   /* return the percentage of APU power with respect to APU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of APU power boost
+* with respect to APU's power limit.
 */
-   if (metrics->StapmOpnLimit > 0)
-   *value =  (metrics->ApuPower * 100) / 
metrics->StapmOpnLimit;
-   else
-   *value = 0;
+   yellow_carp_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = apu_percent;
break;
case METRICS_SS_DGPU_SHARE:
-   /* return the percentage of dGPU power with respect to dGPU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of dGPU power boost
+* with respect to dGPU's power limit.
 */
-   if ((metrics->dGpuPower > 0) &&
-   (metrics->StapmCurrentLimit > metrics->StapmOpnLimit))
-   *value = (metrics->dGpuPower * 100) /
- (metrics->StapmCurrentLimit - 
metrics->StapmOpnLimit);
-   else
-   *value = 0;
+   yellow_carp_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = dgpu_percent;
break;
default:
*value = UINT_MAX;
-- 
2.25.1



[PATCH] drm/amd/pm: update smartshift powerboost calc for smu12

2022-05-11 Thread Sathishkumar S
smartshift apu and dgpu power boost are reported as percentage with
respect to their power limits. This value[0-100] reflects the boost
for the respective device.

Signed-off-by: Sathishkumar S 
---
 .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   | 60 ++-
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index fd6c44ece168..012e3bd99cc2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -1119,6 +1119,39 @@ static int renoir_get_power_profile_mode(struct 
smu_context *smu,
return size;
 }
 
+static void renoir_get_ss_power_percent(SmuMetrics_t *metrics,
+   uint32_t *apu_percent, uint32_t 
*dgpu_percent)
+{
+   uint32_t apu_boost = 0;
+   uint32_t dgpu_boost = 0;
+   uint16_t apu_limit = 0;
+   uint16_t dgpu_limit = 0;
+   uint16_t apu_power = 0;
+   uint16_t dgpu_power = 0;
+
+   apu_power = metrics->ApuPower;
+   apu_limit = metrics->StapmOriginalLimit;
+   if (apu_power > apu_limit && apu_limit != 0)
+   apu_boost =  ((apu_power - apu_limit) * 100) / apu_limit;
+   apu_boost = (apu_boost > 100) ? 100 : apu_boost;
+
+   dgpu_power = metrics->dGpuPower;
+   if (metrics->StapmCurrentLimit > metrics->StapmOriginalLimit)
+   dgpu_limit = metrics->StapmCurrentLimit - 
metrics->StapmOriginalLimit;
+   if (dgpu_power > dgpu_limit && dgpu_limit != 0)
+   dgpu_boost = ((dgpu_power - dgpu_limit) * 100) / dgpu_limit;
+   dgpu_boost = (dgpu_boost > 100) ? 100 : dgpu_boost;
+
+   if (dgpu_boost >= apu_boost)
+   apu_boost = 0;
+   else
+   dgpu_boost = 0;
+
+   *apu_percent = apu_boost;
+   *dgpu_percent = dgpu_boost;
+}
+
+
 static int renoir_get_smu_metrics_data(struct smu_context *smu,
   MetricsMember_t member,
   uint32_t *value)
@@ -1127,6 +1160,9 @@ static int renoir_get_smu_metrics_data(struct smu_context 
*smu,
 
SmuMetrics_t *metrics = (SmuMetrics_t *)smu_table->metrics_table;
int ret = 0;
+   uint32_t apu_percent = 0;
+   uint32_t dgpu_percent = 0;
+
 
ret = smu_cmn_get_metrics_table(smu,
NULL,
@@ -1171,26 +1207,18 @@ static int renoir_get_smu_metrics_data(struct 
smu_context *smu,
*value = metrics->Voltage[1];
break;
case METRICS_SS_APU_SHARE:
-   /* return the percentage of APU power with respect to APU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of APU power boost
+* with respect to APU's power limit.
 */
-   if (metrics->StapmOriginalLimit > 0)
-   *value =  (metrics->ApuPower * 100) / 
metrics->StapmOriginalLimit;
-   else
-   *value = 0;
+   renoir_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = apu_percent;
break;
case METRICS_SS_DGPU_SHARE:
-   /* return the percentage of dGPU power with respect to dGPU's 
power limit.
-* percentage is reported, this isn't boost value. Smartshift 
power
-* boost/shift is only when the percentage is more than 100.
+   /* return the percentage of dGPU power boost
+* with respect to dGPU's power limit.
 */
-   if ((metrics->dGpuPower > 0) &&
-   (metrics->StapmCurrentLimit > metrics->StapmOriginalLimit))
-   *value = (metrics->dGpuPower * 100) /
- (metrics->StapmCurrentLimit - 
metrics->StapmOriginalLimit);
-   else
-   *value = 0;
+   renoir_get_ss_power_percent(metrics, _percent, 
_percent);
+   *value = dgpu_percent;
break;
default:
*value = UINT_MAX;
-- 
2.25.1



[PATCH] drm/amdkfd: allocate MMIO/DOORBELL BOs with AMDGPU_GEM_CREATE_PREEMPTIBLE

2022-05-11 Thread Lang Yu
MMIO/DOORBELL BOs' backing resources(bus address resources that are
used to talk to the GPU) are not managed by GTT manager, but they
are counted by GTT manager. That makes no sense.

With AMDGPU_GEM_CREATE_PREEMPTIBLE flag, such BOs will be managed by
PREEMPT manager(for preemptible contexts, e.g., KFD). Then they won't
be evicted and don't need to be pinned as well.

But we still leave these BOs pinned to indicate that the underlying
resource never moves.

Signed-off-by: Lang Yu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 32 +--
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 8b14c55a0793..fada3b149361 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1518,26 +1518,26 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
} else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_flags = 0;
-   } else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+   } else {
domain = AMDGPU_GEM_DOMAIN_GTT;
alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
alloc_flags = AMDGPU_GEM_CREATE_PREEMPTIBLE;
-   if (!offset || !*offset)
-   return -EINVAL;
-   user_addr = untagged_addr(*offset);
-   } else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
-   KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
-   domain = AMDGPU_GEM_DOMAIN_GTT;
-   alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
-   bo_type = ttm_bo_type_sg;
-   alloc_flags = 0;
-   if (size > UINT_MAX)
+
+   if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+   if (!offset || !*offset)
+   return -EINVAL;
+   user_addr = untagged_addr(*offset);
+   } else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+   KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
+   bo_type = ttm_bo_type_sg;
+   if (size > UINT_MAX)
+   return -EINVAL;
+   sg = create_doorbell_sg(*offset, size);
+   if (!sg)
+   return -ENOMEM;
+   } else {
return -EINVAL;
-   sg = create_doorbell_sg(*offset, size);
-   if (!sg)
-   return -ENOMEM;
-   } else {
-   return -EINVAL;
+   }
}
 
*mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
-- 
2.25.1



Re: [PATCH] drm/amd/pm: consistent approach for smartshift

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 2:18 PM, Sathishkumar S wrote:

always create smartshift attributes from dgpu device even on SS1.0.
consider units of power in metrics table and convert if necessary.
powershift value is in percentage and values to range between 0-100.

Suggested-by: Alex Deucher 
Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 


Can you split the RN, YC and generic changes to 3 separate patches? 
Would be easier to review and revert specifically for an ASIC.


Thanks,
Lijo


---
  drivers/gpu/drm/amd/pm/amdgpu_pm.c| 146 +++---
  .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   |  54 +--
  .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  |  54 +--
  3 files changed, 166 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index d3228216b2da..c2406baeef93 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1734,22 +1734,11 @@ static ssize_t amdgpu_get_gpu_metrics(struct device 
*dev,
return size;
  }
  
-/**

- * DOC: smartshift_apu_power
- *
- * The amdgpu driver provides a sysfs API for reporting APU power
- * share if it supports smartshift. The value is expressed as
- * the proportion of stapm limit where stapm limit is the total APU
- * power limit. The result is in percentage. If APU power is 130% of
- * STAPM, then APU is using 30% of the dGPU's headroom.
- */
-
-static ssize_t amdgpu_get_smartshift_apu_power(struct device *dev, struct 
device_attribute *attr,
-  char *buf)
+static int amdgpu_read_powershift_percent(struct amdgpu_device *adev,
+   uint32_t *ss_power, bool 
dgpu_share)
  {
-   struct drm_device *ddev = dev_get_drvdata(dev);
-   struct amdgpu_device *adev = drm_to_adev(ddev);
-   uint32_t ss_power, size;
+   struct drm_device *ddev = adev_to_drm(adev);
+   uint32_t size;
int r = 0;
  
  	if (amdgpu_in_reset(adev))

@@ -1763,28 +1752,64 @@ static ssize_t amdgpu_get_smartshift_apu_power(struct 
device *dev, struct device
return r;
}
  
-	r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_APU_SHARE,

-  (void *)_power, );
-   if (r)
-   goto out;
-
-   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   if (dgpu_share)
+   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_DGPU_SHARE,
+  (void *)ss_power, );
+   else
+   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_APU_SHARE,
+  (void *)ss_power, );
  
-out:

pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
return r;
  }
+/**
+ * DOC: smartshift_apu_power
+ *
+ * The amdgpu driver provides a sysfs API for reporting APU power
+ * shift in percentage if platform supports smartshift. Value 0 means that
+ * there is no powershift and values between [1-100] means that the power
+ * is shifted to APU, the percentage of boost is with respect to APU power
+ * limit on the platform.
+ */
+
+static ssize_t amdgpu_get_smartshift_apu_power(struct device *dev, struct 
device_attribute *attr,
+  char *buf)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = drm_to_adev(ddev);
+   uint32_t ss_power = 0;
+   int r = 0, i;
+
+   r = amdgpu_read_powershift_percent(adev, _power, false);
+   if (!r)
+   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   else if (r == -EOPNOTSUPP) {
+   /* sensor not available on dGPU, try to read from APU */
+   adev = NULL;
+   mutex_lock(_info.mutex);
+   for (i = 0; i < mgpu_info.num_gpu; i++) {
+   if (mgpu_info.gpu_ins[i].adev->flags & AMD_IS_APU) {
+   adev = mgpu_info.gpu_ins[i].adev;
+   break;
+   }
+   }
+   mutex_unlock(_info.mutex);
+   if (adev && !amdgpu_read_powershift_percent(adev, _power, 
false))
+   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   }
+
+   return r;
+}
  
  /**

   * DOC: smartshift_dgpu_power
   *
- * The amdgpu driver provides a sysfs API for reporting the dGPU power
- * share if the device is in HG and supports smartshift. The value
- * is expressed as the proportion of stapm limit where stapm limit
- * is the total APU power limit. The value is in percentage. If dGPU
- * power is 20% higher than STAPM power(120%), it's using 20% of the
- * APU's power headroom.
+ * The amdgpu driver provides a sysfs API for reporting dGPU power
+ * shift in percentage if platform supports smartshift. Value 0 means that
+ * there is no powershift and values between [1-100] means that the power is
+ * shifted 

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-05-11 Thread Mikhail Gavrilov
On Fri, Apr 15, 2022 at 1:04 PM Christian König
 wrote:
>
> No, I just couldn't find time during all that bug fixing :)
>
> Sorry for the delay, going to take a look after the eastern holiday here.
>
> Christian.

The message is just for history. The issue was fixed between
b253435746d9a4a and 5.18rc4.

-- 
Best Regards,
Mike Gavrilov.


Re: [PATCH 1/2] drm/amdgpu: Convert to common fdinfo format v2

2022-05-11 Thread Christian König

Am 11.05.22 um 10:04 schrieb Tvrtko Ursulin:


On 10/05/2022 17:55, Christian König wrote:

From: Tvrtko Ursulin 


Feels like you should take over owenrship since you wrote a lot of 
new, amdgpu specific code here.


Yeah, probably.




Convert fdinfo format to one documented in drm-usage-stats.rst.

It turned out that the existing implementation was actually completely
nonsense. The calculated percentages indeed represented the usage of the
engine, but with varying time slices.

So 10% usage for application A could mean something completely different
than 10% usage for application B.

Completely nuke that and just use the now standardized nanosecond
interface.

v2: drop the documentation change for now, nuke percentage calculation

Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Christian König 
Cc: David M Nieto 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c    | 136 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h    |   8 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  45 ---
  3 files changed, 88 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 8f0e6d93bb9c..988c5a76cad2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -183,6 +183,7 @@ static int amdgpu_ctx_init_entity(struct 
amdgpu_ctx *ctx, u32 hw_ip,

  ctx_prio = (ctx->override_priority == AMDGPU_CTX_PRIORITY_UNSET) ?
  ctx->init_priority : ctx->override_priority;
  entity->sequence = 1;
+    entity->time_spend = 0;
  hw_prio = amdgpu_ctx_get_hw_prio(ctx, hw_ip);
  drm_prio = amdgpu_ctx_to_drm_sched_prio(ctx_prio);
  @@ -666,6 +667,26 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx)
  return 0;
  }
  +/* Calculate the time spend on the hw */
+static ktime_t amdgpu_ctx_fence_time(struct dma_fence *fence)
+{
+    struct drm_sched_fence *s_fence;
+
+    if (!fence)
+    return ns_to_ktime(0);
+
+    /* When the fence is not even scheduled it can't have spend time */
+    s_fence = to_drm_sched_fence(fence);
+    if (!test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, 
_fence->scheduled.flags))

+    return ns_to_ktime(0);
+
+    if (!test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, 
_fence->finished.flags))

+    return ktime_sub(ktime_get(), s_fence->scheduled.timestamp);
+
+    return ktime_sub(s_fence->finished.timestamp,
+ s_fence->scheduled.timestamp);
+}
+
  void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
    struct drm_sched_entity *entity,
    struct dma_fence *fence, uint64_t *handle)
@@ -685,6 +706,7 @@ void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
  spin_lock(>ring_lock);
  centity->fences[idx] = fence;
  centity->sequence++;
+    centity->time_spend += amdgpu_ctx_fence_time(other);
  spin_unlock(>ring_lock);
    dma_fence_put(other);
@@ -692,6 +714,42 @@ void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
  *handle = seq;
  }
  +static ktime_t amdgpu_ctx_entity_time(struct amdgpu_ctx *ctx,
+  struct amdgpu_ctx_entity *centity)
+{
+    ktime_t res;
+    uint32_t i;
+
+    spin_lock(>ring_lock);
+    res = centity->time_spend;
+    for (i = 0; i < amdgpu_sched_jobs; i++) {
+    res = ktime_add(res, 
amdgpu_ctx_fence_time(centity->fences[i]));

+    }
+    spin_unlock(>ring_lock);
+    return res;
+}
+
+ktime_t amdgpu_ctx_mgr_usage(struct amdgpu_ctx_mgr *mgr, uint32_t hwip,
+ uint32_t idx)
+{
+    struct amdgpu_ctx_entity *centity;
+    ktime_t res = ns_to_ktime(0);
+    struct amdgpu_ctx *ctx;
+    uint32_t id;
+
+    mutex_lock(>lock);
+    idr_for_each_entry(>ctx_handles, ctx, id) {
+
+    centity = ctx->entities[hwip][idx];
+    if (!centity)
+    continue;
+
+    res = ktime_add(res, amdgpu_ctx_entity_time(ctx, centity));
+    }
+    mutex_unlock(>lock);
+    return res;
+}


The amdgpu specific code I cannot really review - but I can ask a 
question - do you have a concept of contexts which were used with a 
client and then closed/abandoned, in which case would the time they 
spent on the GPU be forgotten or preserved in this accounting flow?


Oh, good point. So I should probably accumulate the time spend by 
garbage collected fences on the context manager instead of the context 
entity.


Additional to that I think separating the output by scheduler instance 
doesn't make any sense at all because those are not related to the 
hardware engines. In other word the SDMA0 queue of process A can go to a 
different hardware engine than the SDMA0 queue of process B and that 
makes the values not comparable.


Thanks,
Christian.



In i915 we have this so I am accumulating the times spent by contexts 
as they are closed into a data structure associated with the owning 
drm_file.



+
  struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
 struct drm_sched_entity *entity,
   

[PATCH] drm/amd/pm: consistent approach for smartshift

2022-05-11 Thread Sathishkumar S
always create smartshift attributes from dgpu device even on SS1.0.
consider units of power in metrics table and convert if necessary.
powershift value is in percentage and values to range between 0-100.

Suggested-by: Alex Deucher 
Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 146 +++---
 .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   |  54 +--
 .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  |  54 +--
 3 files changed, 166 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index d3228216b2da..c2406baeef93 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1734,22 +1734,11 @@ static ssize_t amdgpu_get_gpu_metrics(struct device 
*dev,
return size;
 }
 
-/**
- * DOC: smartshift_apu_power
- *
- * The amdgpu driver provides a sysfs API for reporting APU power
- * share if it supports smartshift. The value is expressed as
- * the proportion of stapm limit where stapm limit is the total APU
- * power limit. The result is in percentage. If APU power is 130% of
- * STAPM, then APU is using 30% of the dGPU's headroom.
- */
-
-static ssize_t amdgpu_get_smartshift_apu_power(struct device *dev, struct 
device_attribute *attr,
-  char *buf)
+static int amdgpu_read_powershift_percent(struct amdgpu_device *adev,
+   uint32_t *ss_power, bool 
dgpu_share)
 {
-   struct drm_device *ddev = dev_get_drvdata(dev);
-   struct amdgpu_device *adev = drm_to_adev(ddev);
-   uint32_t ss_power, size;
+   struct drm_device *ddev = adev_to_drm(adev);
+   uint32_t size;
int r = 0;
 
if (amdgpu_in_reset(adev))
@@ -1763,28 +1752,64 @@ static ssize_t amdgpu_get_smartshift_apu_power(struct 
device *dev, struct device
return r;
}
 
-   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_APU_SHARE,
-  (void *)_power, );
-   if (r)
-   goto out;
-
-   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   if (dgpu_share)
+   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_DGPU_SHARE,
+  (void *)ss_power, );
+   else
+   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_SS_APU_SHARE,
+  (void *)ss_power, );
 
-out:
pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
return r;
 }
+/**
+ * DOC: smartshift_apu_power
+ *
+ * The amdgpu driver provides a sysfs API for reporting APU power
+ * shift in percentage if platform supports smartshift. Value 0 means that
+ * there is no powershift and values between [1-100] means that the power
+ * is shifted to APU, the percentage of boost is with respect to APU power
+ * limit on the platform.
+ */
+
+static ssize_t amdgpu_get_smartshift_apu_power(struct device *dev, struct 
device_attribute *attr,
+  char *buf)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = drm_to_adev(ddev);
+   uint32_t ss_power = 0;
+   int r = 0, i;
+
+   r = amdgpu_read_powershift_percent(adev, _power, false);
+   if (!r)
+   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   else if (r == -EOPNOTSUPP) {
+   /* sensor not available on dGPU, try to read from APU */
+   adev = NULL;
+   mutex_lock(_info.mutex);
+   for (i = 0; i < mgpu_info.num_gpu; i++) {
+   if (mgpu_info.gpu_ins[i].adev->flags & AMD_IS_APU) {
+   adev = mgpu_info.gpu_ins[i].adev;
+   break;
+   }
+   }
+   mutex_unlock(_info.mutex);
+   if (adev && !amdgpu_read_powershift_percent(adev, _power, 
false))
+   r = sysfs_emit(buf, "%u%%\n", ss_power);
+   }
+
+   return r;
+}
 
 /**
  * DOC: smartshift_dgpu_power
  *
- * The amdgpu driver provides a sysfs API for reporting the dGPU power
- * share if the device is in HG and supports smartshift. The value
- * is expressed as the proportion of stapm limit where stapm limit
- * is the total APU power limit. The value is in percentage. If dGPU
- * power is 20% higher than STAPM power(120%), it's using 20% of the
- * APU's power headroom.
+ * The amdgpu driver provides a sysfs API for reporting dGPU power
+ * shift in percentage if platform supports smartshift. Value 0 means that
+ * there is no powershift and values between [1-100] means that the power is
+ * shifted to dGPU, the percentage of boost is with respect to dGPU power
+ * limit on the platform.
  */
 
 static ssize_t amdgpu_get_smartshift_dgpu_power(struct device *dev, struct 
device_attribute *attr,
@@ -1792,31 

[PATCH v5] drm/amd/pm: support ss metrics read for smu11

2022-05-11 Thread Sathishkumar S
support reading smartshift apu and dgpu power for smu11 based asic

v2: add new version of SmuMetrics and make calculation more readable (Lijo)
v3: avoid calculations that result in -ve values and skip related checks
v4: use the current power limit on dGPU and exclude smu 11_0_7 (Lijo)
v5: remove redundant code (Lijo)

Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 
Reviewed-by: Lijo Lazar 
---
 .../pmfw_if/smu11_driver_if_sienna_cichlid.h  |  63 +++
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 161 ++
 2 files changed, 187 insertions(+), 37 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
index 08f0bb2af5d2..351a4af429b3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
@@ -1540,11 +1540,74 @@ typedef struct {
 
 } SmuMetrics_V3_t;
 
+typedef struct {
+   uint32_t CurrClock[PPCLK_COUNT];
+
+   uint16_t AverageGfxclkFrequencyPreDs;
+   uint16_t AverageGfxclkFrequencyPostDs;
+   uint16_t AverageFclkFrequencyPreDs;
+   uint16_t AverageFclkFrequencyPostDs;
+   uint16_t AverageUclkFrequencyPreDs;
+   uint16_t AverageUclkFrequencyPostDs;
+
+
+   uint16_t AverageGfxActivity;
+   uint16_t AverageUclkActivity;
+   uint8_t  CurrSocVoltageOffset;
+   uint8_t  CurrGfxVoltageOffset;
+   uint8_t  CurrMemVidOffset;
+   uint8_t  Padding8;
+   uint16_t AverageSocketPower;
+   uint16_t TemperatureEdge;
+   uint16_t TemperatureHotspot;
+   uint16_t TemperatureMem;
+   uint16_t TemperatureVrGfx;
+   uint16_t TemperatureVrMem0;
+   uint16_t TemperatureVrMem1;
+   uint16_t TemperatureVrSoc;
+   uint16_t TemperatureLiquid0;
+   uint16_t TemperatureLiquid1;
+   uint16_t TemperaturePlx;
+   uint16_t Padding16;
+   uint32_t AccCnt;
+   uint8_t  ThrottlingPercentage[THROTTLER_COUNT];
+
+
+   uint8_t  LinkDpmLevel;
+   uint8_t  CurrFanPwm;
+   uint16_t CurrFanSpeed;
+
+   //BACO metrics, PMFW-1721
+   //metrics for D3hot entry/exit and driver ARM msgs
+   uint8_t D3HotEntryCountPerMode[D3HOT_SEQUENCE_COUNT];
+   uint8_t D3HotExitCountPerMode[D3HOT_SEQUENCE_COUNT];
+   uint8_t ArmMsgReceivedCountPerMode[D3HOT_SEQUENCE_COUNT];
+
+   //PMFW-4362
+   uint32_t EnergyAccumulator;
+   uint16_t AverageVclk0Frequency;
+   uint16_t AverageDclk0Frequency;
+   uint16_t AverageVclk1Frequency;
+   uint16_t AverageDclk1Frequency;
+   uint16_t VcnUsagePercentage0;
+   uint16_t VcnUsagePercentage1;
+   uint8_t  PcieRate;
+   uint8_t  PcieWidth;
+   uint16_t AverageGfxclkFrequencyTarget;
+
+   uint8_t  ApuSTAPMSmartShiftLimit;
+   uint8_t  AverageApuSocketPower;
+   uint8_t  ApuSTAPMLimit;
+   uint8_t  Padding8_2;
+
+} SmuMetrics_V4_t;
+
 typedef struct {
   union {
 SmuMetrics_t SmuMetrics;
 SmuMetrics_V2_t SmuMetrics_V2;
 SmuMetrics_V3_t SmuMetrics_V3;
+SmuMetrics_V4_t SmuMetrics_V4;
   };
   uint32_t Spare[1];
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 86ab276b6b0b..d68be8f8850e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -585,6 +585,100 @@ static uint32_t 
sienna_cichlid_get_throttler_status_locked(struct smu_context *s
return throttler_status;
 }
 
+static int sienna_cichlid_get_power_limit(struct smu_context *smu,
+ uint32_t *current_power_limit,
+ uint32_t *default_power_limit,
+ uint32_t *max_power_limit)
+{
+   struct smu_11_0_7_powerplay_table *powerplay_table =
+   (struct smu_11_0_7_powerplay_table 
*)smu->smu_table.power_play_table;
+   uint32_t power_limit, od_percent;
+   uint16_t *table_member;
+
+   GET_PPTABLE_MEMBER(SocketPowerLimitAc, _member);
+
+   if (smu_v11_0_get_current_power_limit(smu, _limit)) {
+   power_limit =
+   table_member[PPT_THROTTLER_PPT0];
+   }
+
+   if (current_power_limit)
+   *current_power_limit = power_limit;
+   if (default_power_limit)
+   *default_power_limit = power_limit;
+
+   if (max_power_limit) {
+   if (smu->od_enabled) {
+   od_percent =
+   
le32_to_cpu(powerplay_table->overdrive_table.max[
+   
SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
+
+   dev_dbg(smu->adev->dev, "ODSETTING_POWERPERCENTAGE: %d 
(default: %d)\n",
+   od_percent, 

答复: [PATCH 1/2] drm/amd/pm: add smu feature map support for smu_v13_0_7

2022-05-11 Thread Feng, Kenneth
[AMD Official Use Only - General]

Series is Reviewed-by: Kenneth Feng 
mailto:kenneth.f...@amd.com>>


Best wishes
Kenneth Feng


发件人: amd-gfx  代表 Yang Wang 

日期: 星期三, 2022年5月11日 13:09
收件人: amd-gfx@lists.freedesktop.org 
抄送: Wang, Yang(Kevin) 
主题: [PATCH 1/2] drm/amd/pm: add smu feature map support for smu_v13_0_7
[CAUTION: External Email]

the pp_features can't display full feauture information
when these mapping is not exiting.

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  | 23 
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  | 58 ---
 2 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 3f40cd6e4165..799050ea7515 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -368,6 +368,29 @@ enum smu_clk_type {
__SMU_DUMMY_MAP(DATA_CALCULATION),  \
__SMU_DUMMY_MAP(DPM_VCLK),  \
__SMU_DUMMY_MAP(DPM_DCLK),  \
+   __SMU_DUMMY_MAP(FW_DATA_READ),  \
+   __SMU_DUMMY_MAP(DPM_GFX_POWER_OPTIMIZER),   \
+   __SMU_DUMMY_MAP(DPM_DCN),   \
+   __SMU_DUMMY_MAP(VMEMP_SCALING), \
+   __SMU_DUMMY_MAP(VDDIO_MEM_SCALING), \
+   __SMU_DUMMY_MAP(MM_DPM),\
+   __SMU_DUMMY_MAP(SOC_MPCLK_DS),  \
+   __SMU_DUMMY_MAP(BACO_MPCLK_DS), \
+   __SMU_DUMMY_MAP(THROTTLERS),\
+   __SMU_DUMMY_MAP(SMARTSHIFT),\
+   __SMU_DUMMY_MAP(GFX_READ_MARGIN),   \
+   __SMU_DUMMY_MAP(GFX_IMU),   \
+   __SMU_DUMMY_MAP(GFX_PCC_DFLL),  \
+   __SMU_DUMMY_MAP(BOOT_TIME_CAL), \
+   __SMU_DUMMY_MAP(BOOT_POWER_OPT),\
+   __SMU_DUMMY_MAP(GFXCLK_SPREAD_SPECTRUM),\
+   __SMU_DUMMY_MAP(SOC_PCC),   \
+   __SMU_DUMMY_MAP(OPTIMIZED_VMIN),\
+   __SMU_DUMMY_MAP(CLOCK_POWER_DOWN_BYPASS),   \
+   __SMU_DUMMY_MAP(MEM_TEMP_READ), \
+   __SMU_DUMMY_MAP(ATHUB_MMHUB_PG),\
+   __SMU_DUMMY_MAP(BACO_CG),   \
+   __SMU_DUMMY_MAP(SOC_CG),

 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(feature)   SMU_FEATURE_##feature##_BIT
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
index 00964b3728fe..7c9e0ba7ab50 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
@@ -131,14 +131,56 @@ static struct cmn2asic_mapping 
smu_v13_0_7_clk_map[SMU_CLK_COUNT] = {
 };

 static struct cmn2asic_mapping smu_v13_0_7_feature_mask_map[SMU_FEATURE_COUNT] 
= {
-   [SMU_FEATURE_DPM_GFXCLK_BIT] = {1, FEATURE_DPM_GFXCLK_BIT},
-   [SMU_FEATURE_DPM_UCLK_BIT] = {1, FEATURE_DPM_UCLK_BIT},
-   [SMU_FEATURE_DPM_FCLK_BIT] = {1, FEATURE_DPM_FCLK_BIT},
-   [SMU_FEATURE_DPM_SOCCLK_BIT] = {1, FEATURE_DPM_SOCCLK_BIT},
-   [SMU_FEATURE_DPM_LINK_BIT] = {1, FEATURE_DPM_LINK_BIT},
-   [SMU_FEATURE_DPM_VCLK_BIT] = {1, FEATURE_MM_DPM_BIT},
-   [SMU_FEATURE_DPM_DCLK_BIT] = {1, FEATURE_MM_DPM_BIT},
-   [SMU_FEATURE_FAN_CONTROL_BIT] = {1, FEATURE_FAN_CONTROL_BIT},
+   FEA_MAP(FW_DATA_READ),
+   FEA_MAP(DPM_GFXCLK),
+   FEA_MAP(DPM_GFX_POWER_OPTIMIZER),
+   FEA_MAP(DPM_UCLK),
+   FEA_MAP(DPM_FCLK),
+   FEA_MAP(DPM_SOCCLK),
+   FEA_MAP(DPM_MP0CLK),
+   FEA_MAP(DPM_LINK),
+   FEA_MAP(DPM_DCN),
+   FEA_MAP(VMEMP_SCALING),
+   FEA_MAP(VDDIO_MEM_SCALING),
+   FEA_MAP(DS_GFXCLK),
+   FEA_MAP(DS_SOCCLK),
+   FEA_MAP(DS_FCLK),
+   FEA_MAP(DS_LCLK),
+   FEA_MAP(DS_DCFCLK),
+   FEA_MAP(DS_UCLK),
+   FEA_MAP(GFX_ULV),
+   FEA_MAP(FW_DSTATE),
+   FEA_MAP(GFXOFF),
+   FEA_MAP(BACO),
+   FEA_MAP(MM_DPM),
+   FEA_MAP(SOC_MPCLK_DS),
+   FEA_MAP(BACO_MPCLK_DS),
+   FEA_MAP(THROTTLERS),
+   FEA_MAP(SMARTSHIFT),
+   FEA_MAP(GTHR),
+   FEA_MAP(ACDC),
+   FEA_MAP(VR0HOT),
+   FEA_MAP(FW_CTF),
+   FEA_MAP(FAN_CONTROL),
+   FEA_MAP(GFX_DCS),
+   FEA_MAP(GFX_READ_MARGIN),
+   FEA_MAP(LED_DISPLAY),
+   FEA_MAP(GFXCLK_SPREAD_SPECTRUM),
+   FEA_MAP(OUT_OF_BAND_MONITOR),
+   FEA_MAP(OPTIMIZED_VMIN),
+   FEA_MAP(GFX_IMU),
+   FEA_MAP(BOOT_TIME_CAL),
+   FEA_MAP(GFX_PCC_DFLL),
+   FEA_MAP(SOC_CG),
+   FEA_MAP(DF_CSTATE),
+   FEA_MAP(GFX_EDC),
+   FEA_MAP(BOOT_POWER_OPT),
+   FEA_MAP(CLOCK_POWER_DOWN_BYPASS),
+   FEA_MAP(DS_VCN),
+   FEA_MAP(BACO_CG),
+   FEA_MAP(MEM_TEMP_READ),
+   FEA_MAP(ATHUB_MMHUB_PG),
+   

答复: [PATCH] drm/amd/pm: add smu power_limit callback for smu_v13_0_7

2022-05-11 Thread Feng, Kenneth
[AMD Official Use Only - General]

Reviewed-by: Kenneth Feng mailto:kenneth.f...@amd.com>>

Best wishes
Kenneth Feng


发件人: amd-gfx  代表 Yang Wang 

日期: 星期三, 2022年5月11日 15:01
收件人: amd-gfx@lists.freedesktop.org 
抄送: Wang, Yang(Kevin) 
主题: [PATCH] drm/amd/pm: add smu power_limit callback for smu_v13_0_7
[CAUTION: External Email]

- get_power_limit
- set_power_limit

add above callback functions to enable power_cap hwmon node.

Signed-off-by: Yang Wang 
---
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
index 7c9e0ba7ab50..4e1861fb2c6a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
@@ -1367,6 +1367,43 @@ static int smu_v13_0_7_enable_mgpu_fan_boost(struct 
smu_context *smu)
   NULL);
 }

+static int smu_v13_0_7_get_power_limit(struct smu_context *smu,
+  uint32_t *current_power_limit,
+  uint32_t *default_power_limit,
+  uint32_t *max_power_limit)
+{
+   struct smu_table_context *table_context = >smu_table;
+   struct smu_13_0_7_powerplay_table *powerplay_table =
+   (struct smu_13_0_7_powerplay_table 
*)table_context->power_play_table;
+   PPTable_t *pptable = table_context->driver_pptable;
+   SkuTable_t *skutable = >SkuTable;
+   uint32_t power_limit, od_percent;
+
+   if (smu_v13_0_get_current_power_limit(smu, _limit))
+   power_limit = smu->adev->pm.ac_power ?
+ skutable->SocketPowerLimitAc[PPT_THROTTLER_PPT0] :
+ skutable->SocketPowerLimitDc[PPT_THROTTLER_PPT0];
+
+   if (current_power_limit)
+   *current_power_limit = power_limit;
+   if (default_power_limit)
+   *default_power_limit = power_limit;
+
+   if (max_power_limit) {
+   if (smu->od_enabled) {
+   od_percent = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
+
+   dev_dbg(smu->adev->dev, "ODSETTING_POWERPERCENTAGE: %d 
(default: %d)\n", od_percent, power_limit);
+
+   power_limit *= (100 + od_percent);
+   power_limit /= 100;
+   }
+   *max_power_limit = power_limit;
+   }
+
+   return 0;
+}
+
 static int smu_v13_0_7_get_power_profile_mode(struct smu_context *smu, char 
*buf)
 {
DpmActivityMonitorCoeffIntExternal_t 
activity_monitor_external[PP_SMC_POWER_PROFILE_COUNT];
@@ -1539,6 +1576,8 @@ static const struct pptable_funcs smu_v13_0_7_ppt_funcs = 
{
.get_fan_control_mode = smu_v13_0_get_fan_control_mode,
.set_fan_control_mode = smu_v13_0_set_fan_control_mode,
.enable_mgpu_fan_boost = smu_v13_0_7_enable_mgpu_fan_boost,
+   .get_power_limit = smu_v13_0_7_get_power_limit,
+   .set_power_limit = smu_v13_0_set_power_limit,
.get_power_profile_mode = smu_v13_0_7_get_power_profile_mode,
.set_power_profile_mode = smu_v13_0_7_set_power_profile_mode,
.set_tool_table_location = smu_v13_0_set_tool_table_location,
--
2.25.1


Re: [PATCH] drm/amd/pm: support ss metrics read for smu11

2022-05-11 Thread Sundararaju, Sathishkumar



On 5/11/2022 1:14 PM, Lazar, Lijo wrote:



On 5/11/2022 12:51 PM, Sathishkumar S wrote:

support reading smartshift apu and dgpu power for smu11 based asic

v2: add new version of SmuMetrics and make calculation more readable 
(Lijo)

v3: avoid calculations that result in -ve values and skip related checks
v4: use the current power limit on dGPU and exclude smu 11_0_7 (Lijo)

Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 
---
  .../pmfw_if/smu11_driver_if_sienna_cichlid.h  |  63 +++
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 161 ++
  2 files changed, 187 insertions(+), 37 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 


index 08f0bb2af5d2..280d42778f28 100644
--- 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
+++ 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h

@@ -1540,11 +1540,74 @@ typedef struct {
    } SmuMetrics_V3_t;
  +typedef struct {
+    uint32_t CurrClock[PPCLK_COUNT];
+
+    uint16_t AverageGfxclkFrequencyPreDs;
+    uint16_t AverageGfxclkFrequencyPostDs;
+    uint16_t AverageFclkFrequencyPreDs;
+    uint16_t AverageFclkFrequencyPostDs;
+    uint16_t AverageUclkFrequencyPreDs;
+    uint16_t AverageUclkFrequencyPostDs;
+
+
+    uint16_t AverageGfxActivity;
+    uint16_t AverageUclkActivity;
+    uint8_t  CurrSocVoltageOffset;
+    uint8_t  CurrGfxVoltageOffset;
+    uint8_t  CurrMemVidOffset;
+    uint8_t  Padding8;
+    uint16_t AverageSocketPower;
+    uint16_t TemperatureEdge;
+    uint16_t TemperatureHotspot;
+    uint16_t TemperatureMem;
+    uint16_t TemperatureVrGfx;
+    uint16_t TemperatureVrMem0;
+    uint16_t TemperatureVrMem1;
+    uint16_t TemperatureVrSoc;
+    uint16_t TemperatureLiquid0;
+    uint16_t TemperatureLiquid1;
+    uint16_t TemperaturePlx;
+    uint16_t Padding16;
+    uint32_t AccCnt;
+    uint8_t  ThrottlingPercentage[THROTTLER_COUNT];
+
+
+    uint8_t  LinkDpmLevel;
+    uint8_t  CurrFanPwm;
+    uint16_t CurrFanSpeed;
+
+    //BACO metrics, PMFW-1721
+    //metrics for D3hot entry/exit and driver ARM msgs
+    uint8_t D3HotEntryCountPerMode[D3HOT_SEQUENCE_COUNT];
+    uint8_t D3HotExitCountPerMode[D3HOT_SEQUENCE_COUNT];
+    uint8_t ArmMsgReceivedCountPerMode[D3HOT_SEQUENCE_COUNT];
+
+    //PMFW-4362
+    uint32_t EnergyAccumulator;
+    uint16_t AverageVclk0Frequency;
+    uint16_t AverageDclk0Frequency;
+    uint16_t AverageVclk1Frequency;
+    uint16_t AverageDclk1Frequency;
+    uint16_t VcnUsagePercentage0;
+    uint16_t VcnUsagePercentage1;
+    uint8_t  PcieRate;
+    uint8_t  PcieWidth;
+    uint16_t AverageGfxclkFrequencyTarget;
+
+    uint8_t  ApuSTAPMSmartShiftLimit;
+    uint8_t  AverageApuSocketPower;
+    uint8_t  ApuSTAPMLimit;
+    uint8_t  Padding8_2;
+
+} SmuMetrics_V4_t;
+
  typedef struct {
    union {
  SmuMetrics_t SmuMetrics;
  SmuMetrics_V2_t SmuMetrics_V2;
  SmuMetrics_V3_t SmuMetrics_V3;
+    SmuMetrics_V4_t SmuMetrics_V4;


I see some extra indentation here .
I used tab and the current code here is using 4 spaces, will match it to 
existing code.



    };
    uint32_t Spare[1];
  diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c

index 86ab276b6b0b..503439754f08 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -585,6 +585,102 @@ static uint32_t 
sienna_cichlid_get_throttler_status_locked(struct smu_context *s

  return throttler_status;
  }
  +static int sienna_cichlid_get_power_limit(struct smu_context *smu,
+  uint32_t *current_power_limit,
+  uint32_t *default_power_limit,
+  uint32_t *max_power_limit)
+{
+    struct smu_11_0_7_powerplay_table *powerplay_table =
+    (struct smu_11_0_7_powerplay_table 
*)smu->smu_table.power_play_table;

+    uint32_t power_limit, od_percent;
+    uint16_t *table_member;
+
+    GET_PPTABLE_MEMBER(SocketPowerLimitAc, _member);
+
+    if (smu_v11_0_get_current_power_limit(smu, _limit)) {
+    power_limit =
+    table_member[PPT_THROTTLER_PPT0];
+    }
+
+    if (current_power_limit)
+    *current_power_limit = power_limit;
+    if (default_power_limit)
+    *default_power_limit = power_limit;
+
+    if (max_power_limit) {
+    if (smu->od_enabled) {
+    od_percent =
+ le32_to_cpu(powerplay_table->overdrive_table.max[
+ SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
+
+    dev_dbg(smu->adev->dev, "ODSETTING_POWERPERCENTAGE: %d 
(default: %d)\n",

+    od_percent, power_limit);
+
+    power_limit *= (100 + od_percent);
+    power_limit /= 100;
+    }
+    *max_power_limit = power_limit;
+    }
+
+    return 0;
+}
+
+static void 

Re: [PATCH 2/2] drm/amdgpu: add drm-client-id to fdinfo

2022-05-11 Thread Tvrtko Ursulin



On 10/05/2022 17:55, Christian König wrote:

This is enough to get gputop working :)


Super cool! So I guess it's time for me to cleanup that code base a bit.

Regards,

Tvrtko


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 38cb5eb105ad..4ef23224b617 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -55,17 +55,15 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
  
  void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)

  {
-   struct amdgpu_fpriv *fpriv;
uint64_t vram_mem = 0, gtt_mem = 0, cpu_mem = 0;
struct drm_file *file = f->private_data;
struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
+   struct amdgpu_fpriv *fpriv = file->driver_priv;
+   struct amdgpu_vm *vm = >vm;
uint32_t bus, dev, fn, domain, hw_ip;
struct amdgpu_bo *root;
int ret;
  
-	ret = amdgpu_file_to_fpriv(f, );

-   if (ret)
-   return;
bus = adev->pdev->bus->number;
domain = pci_domain_nr(adev->pdev->bus);
dev = PCI_SLOT(adev->pdev->devfn);
@@ -93,6 +91,7 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\npasid:\t%u\n", domain, bus,
dev, fn, fpriv->vm.pasid);
+   seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
seq_printf(m, "drm-memory-gtt:\t%llu KiB\n", gtt_mem/1024UL);
seq_printf(m, "drm-memory-cpu:\t%llu KiB\n", cpu_mem/1024UL);


Re: [PATCH 1/2] drm/amdgpu: Convert to common fdinfo format v2

2022-05-11 Thread Tvrtko Ursulin



On 10/05/2022 17:55, Christian König wrote:

From: Tvrtko Ursulin 


Feels like you should take over owenrship since you wrote a lot of new, 
amdgpu specific code here.



Convert fdinfo format to one documented in drm-usage-stats.rst.

It turned out that the existing implementation was actually completely
nonsense. The calculated percentages indeed represented the usage of the
engine, but with varying time slices.

So 10% usage for application A could mean something completely different
than 10% usage for application B.

Completely nuke that and just use the now standardized nanosecond
interface.

v2: drop the documentation change for now, nuke percentage calculation

Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Christian König 
Cc: David M Nieto 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 136 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|   8 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  45 ---
  3 files changed, 88 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 8f0e6d93bb9c..988c5a76cad2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -183,6 +183,7 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
u32 hw_ip,
ctx_prio = (ctx->override_priority == AMDGPU_CTX_PRIORITY_UNSET) ?
ctx->init_priority : ctx->override_priority;
entity->sequence = 1;
+   entity->time_spend = 0;
hw_prio = amdgpu_ctx_get_hw_prio(ctx, hw_ip);
drm_prio = amdgpu_ctx_to_drm_sched_prio(ctx_prio);
  
@@ -666,6 +667,26 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx)

return 0;
  }
  
+/* Calculate the time spend on the hw */

+static ktime_t amdgpu_ctx_fence_time(struct dma_fence *fence)
+{
+   struct drm_sched_fence *s_fence;
+
+   if (!fence)
+   return ns_to_ktime(0);
+
+   /* When the fence is not even scheduled it can't have spend time */
+   s_fence = to_drm_sched_fence(fence);
+   if (!test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, _fence->scheduled.flags))
+   return ns_to_ktime(0);
+
+   if (!test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, _fence->finished.flags))
+   return ktime_sub(ktime_get(), s_fence->scheduled.timestamp);
+
+   return ktime_sub(s_fence->finished.timestamp,
+s_fence->scheduled.timestamp);
+}
+
  void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
  struct drm_sched_entity *entity,
  struct dma_fence *fence, uint64_t *handle)
@@ -685,6 +706,7 @@ void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
spin_lock(>ring_lock);
centity->fences[idx] = fence;
centity->sequence++;
+   centity->time_spend += amdgpu_ctx_fence_time(other);
spin_unlock(>ring_lock);
  
  	dma_fence_put(other);

@@ -692,6 +714,42 @@ void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
*handle = seq;
  }
  
+static ktime_t amdgpu_ctx_entity_time(struct amdgpu_ctx *ctx,

+ struct amdgpu_ctx_entity *centity)
+{
+   ktime_t res;
+   uint32_t i;
+
+   spin_lock(>ring_lock);
+   res = centity->time_spend;
+   for (i = 0; i < amdgpu_sched_jobs; i++) {
+   res = ktime_add(res, amdgpu_ctx_fence_time(centity->fences[i]));
+   }
+   spin_unlock(>ring_lock);
+   return res;
+}
+
+ktime_t amdgpu_ctx_mgr_usage(struct amdgpu_ctx_mgr *mgr, uint32_t hwip,
+uint32_t idx)
+{
+   struct amdgpu_ctx_entity *centity;
+   ktime_t res = ns_to_ktime(0);
+   struct amdgpu_ctx *ctx;
+   uint32_t id;
+
+   mutex_lock(>lock);
+   idr_for_each_entry(>ctx_handles, ctx, id) {
+
+   centity = ctx->entities[hwip][idx];
+   if (!centity)
+   continue;
+
+   res = ktime_add(res, amdgpu_ctx_entity_time(ctx, centity));
+   }
+   mutex_unlock(>lock);
+   return res;
+}


The amdgpu specific code I cannot really review - but I can ask a 
question - do you have a concept of contexts which were used with a 
client and then closed/abandoned, in which case would the time they 
spent on the GPU be forgotten or preserved in this accounting flow?


In i915 we have this so I am accumulating the times spent by contexts as 
they are closed into a data structure associated with the owning drm_file.



+
  struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
   struct drm_sched_entity *entity,
   uint64_t seq)
@@ -869,81 +927,3 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
idr_destroy(>ctx_handles);
mutex_destroy(>lock);
  }
-
-static void amdgpu_ctx_fence_time(struct amdgpu_ctx *ctx,
-   struct amdgpu_ctx_entity *centity, ktime_t *total, 

Re: [PATCH] drm/amd/pm: support ss metrics read for smu11

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 12:51 PM, Sathishkumar S wrote:

support reading smartshift apu and dgpu power for smu11 based asic

v2: add new version of SmuMetrics and make calculation more readable (Lijo)
v3: avoid calculations that result in -ve values and skip related checks
v4: use the current power limit on dGPU and exclude smu 11_0_7 (Lijo)

Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 
---
  .../pmfw_if/smu11_driver_if_sienna_cichlid.h  |  63 +++
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 161 ++
  2 files changed, 187 insertions(+), 37 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
index 08f0bb2af5d2..280d42778f28 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
@@ -1540,11 +1540,74 @@ typedef struct {
  
  } SmuMetrics_V3_t;
  
+typedef struct {

+   uint32_t CurrClock[PPCLK_COUNT];
+
+   uint16_t AverageGfxclkFrequencyPreDs;
+   uint16_t AverageGfxclkFrequencyPostDs;
+   uint16_t AverageFclkFrequencyPreDs;
+   uint16_t AverageFclkFrequencyPostDs;
+   uint16_t AverageUclkFrequencyPreDs;
+   uint16_t AverageUclkFrequencyPostDs;
+
+
+   uint16_t AverageGfxActivity;
+   uint16_t AverageUclkActivity;
+   uint8_t  CurrSocVoltageOffset;
+   uint8_t  CurrGfxVoltageOffset;
+   uint8_t  CurrMemVidOffset;
+   uint8_t  Padding8;
+   uint16_t AverageSocketPower;
+   uint16_t TemperatureEdge;
+   uint16_t TemperatureHotspot;
+   uint16_t TemperatureMem;
+   uint16_t TemperatureVrGfx;
+   uint16_t TemperatureVrMem0;
+   uint16_t TemperatureVrMem1;
+   uint16_t TemperatureVrSoc;
+   uint16_t TemperatureLiquid0;
+   uint16_t TemperatureLiquid1;
+   uint16_t TemperaturePlx;
+   uint16_t Padding16;
+   uint32_t AccCnt;
+   uint8_t  ThrottlingPercentage[THROTTLER_COUNT];
+
+
+   uint8_t  LinkDpmLevel;
+   uint8_t  CurrFanPwm;
+   uint16_t CurrFanSpeed;
+
+   //BACO metrics, PMFW-1721
+   //metrics for D3hot entry/exit and driver ARM msgs
+   uint8_t D3HotEntryCountPerMode[D3HOT_SEQUENCE_COUNT];
+   uint8_t D3HotExitCountPerMode[D3HOT_SEQUENCE_COUNT];
+   uint8_t ArmMsgReceivedCountPerMode[D3HOT_SEQUENCE_COUNT];
+
+   //PMFW-4362
+   uint32_t EnergyAccumulator;
+   uint16_t AverageVclk0Frequency;
+   uint16_t AverageDclk0Frequency;
+   uint16_t AverageVclk1Frequency;
+   uint16_t AverageDclk1Frequency;
+   uint16_t VcnUsagePercentage0;
+   uint16_t VcnUsagePercentage1;
+   uint8_t  PcieRate;
+   uint8_t  PcieWidth;
+   uint16_t AverageGfxclkFrequencyTarget;
+
+   uint8_t  ApuSTAPMSmartShiftLimit;
+   uint8_t  AverageApuSocketPower;
+   uint8_t  ApuSTAPMLimit;
+   uint8_t  Padding8_2;
+
+} SmuMetrics_V4_t;
+
  typedef struct {
union {
  SmuMetrics_t SmuMetrics;
  SmuMetrics_V2_t SmuMetrics_V2;
  SmuMetrics_V3_t SmuMetrics_V3;
+   SmuMetrics_V4_t SmuMetrics_V4;


I see some extra indentation here .


};
uint32_t Spare[1];
  
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c

index 86ab276b6b0b..503439754f08 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -585,6 +585,102 @@ static uint32_t 
sienna_cichlid_get_throttler_status_locked(struct smu_context *s
return throttler_status;
  }
  
+static int sienna_cichlid_get_power_limit(struct smu_context *smu,

+ uint32_t *current_power_limit,
+ uint32_t *default_power_limit,
+ uint32_t *max_power_limit)
+{
+   struct smu_11_0_7_powerplay_table *powerplay_table =
+   (struct smu_11_0_7_powerplay_table 
*)smu->smu_table.power_play_table;
+   uint32_t power_limit, od_percent;
+   uint16_t *table_member;
+
+   GET_PPTABLE_MEMBER(SocketPowerLimitAc, _member);
+
+   if (smu_v11_0_get_current_power_limit(smu, _limit)) {
+   power_limit =
+   table_member[PPT_THROTTLER_PPT0];
+   }
+
+   if (current_power_limit)
+   *current_power_limit = power_limit;
+   if (default_power_limit)
+   *default_power_limit = power_limit;
+
+   if (max_power_limit) {
+   if (smu->od_enabled) {
+   od_percent =
+   
le32_to_cpu(powerplay_table->overdrive_table.max[
+   
SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
+
+   dev_dbg(smu->adev->dev, "ODSETTING_POWERPERCENTAGE: %d 
(default: %d)\n",
+

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Christian König
It really *is* a NOALLOC feature. In other words there is no latency 
improvement on reads because the cache is always checked, even with the 
noalloc flag set.


The only thing it affects is that misses not enter the cache and so 
don't cause any additional pressure on evicting cache lines.


You might want to double check with the hardware guys, but I'm something 
like 95% sure that it works this way.


Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:
Bypass means that the contents of the cache are ignored, which 
decreases latency at the cost of no coherency between bypassed and 
normal memory requests. NOA (noalloc) means that the cache is checked 
and can give you cache hits, but misses are not cached and the overall 
latency is higher. I don't know what the hw does, but I hope it was 
misnamed and it really means bypass because there is no point in doing 
cache lookups on every memory request if the driver wants to disable 
caching to *decrease* latency in the situations when the cache isn't 
helping.


Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It just doesn't allocate any
MALL
> entries on write.
>
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
sort
of attribute which decides LLC behaviour]

Thanks,
Lijo

>
> Christian.
>
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM Christian König
>>  wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>     Signed-off-by: Christian König 
>>     ---
>>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>      4 files changed, 10 insertions(+)
>>
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     index bf97d8f07f57..d8129626581f 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>     amdgpu_device *adev, uint32_t flags)
>>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>             if (flags & AMDGPU_VM_PAGE_PRT)
>>                     pte_flag |= AMDGPU_PTE_PRT;
>>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>
>>             if (adev->gmc.gmc_funcs->map_mtype)
>>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     index b8c79789e1e4..9077dfccaf3c 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>     amdgpu_device *adev,
>>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>             *flags |= (mapping->flags &
AMDGPU_PTE_MTYPE_NV10_MASK);
>>
>>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>     +
>>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                     *flags |= AMDGPU_PTE_PRT;
>>                     *flags |= AMDGPU_PTE_SNOOPED;
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     index 8d733eeac556..32ee56adb602 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>     amdgpu_device *adev,
>>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>             *flags |= (mapping->flags &
AMDGPU_PTE_MTYPE_NV10_MASK);
>>
>>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>     +
>>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                     *flags |= AMDGPU_PTE_PRT;
>>                     *flags |= AMDGPU_PTE_SNOOPED;
>>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>     b/include/uapi/drm/amdgpu_drm.h
>>     index 57b9d8f0133a..9d71d6330687 100644
>>     --- a/include/uapi/drm/amdgpu_drm.h
>>     +++ b/include/uapi/drm/amdgpu_drm.h
>>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
 

Re: [PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

2022-05-11 Thread Christian König

Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:


On 2022-05-10 13:19, Christian König wrote:

Am 10.05.22 um 19:01 schrieb Andrey Grodzovsky:


On 2022-05-10 12:17, Christian König wrote:

Am 10.05.22 um 18:00 schrieb Andrey Grodzovsky:

[SNIP]
That's one of the reasons why we should have multiple work items 
for job based reset and other reset sources.


See the whole idea is the following:
1. We have one single queued work queue for each reset domain 
which makes sure that all reset requests execute in order.
2. We have one delayed work item for each scheduler which fires 
when a timeout on a scheduler occurs and eventually calls the 
reset procedure with the last running job.

3. We have one work item for each necessary hard reset.

The delayed work item from the scheduler first tries a soft 
recovery and checks if a hard reset is really necessary. If it's 
not necessary and we can cancel the offending job we skip the 
hard reset.


The hard reset work item doesn't do any of those checks and just 
does a reset no matter what.


When we really do a reset, independent if its triggered by a job 
or other source we cancel all sources at the end of the reset 
procedure.


This makes sure that a) We only do one reset even when multiple 
sources fire at the same time and b) when any source bails out 
and only does a soft recovery we do a full reset anyway when 
necessary.


That design was outlined multiple times now on the mailing list 
and looks totally clear to me. We should probably document that 
somewhere.



If you look at the patch what you described above is exactly what 
is happening - since scheduler's delayed work is different from 
any non scheduler delayed work the SW reset which might take place 
from scheduler's reset
will not have any impact on any non scheduler delayed work and 
will not cancel them. In case the scheduler actually reaches the 
point of HW reset it will cancel out all pending reset works from 
any other sources on the same
reset domain. Non scheduler reset will always proceed to do full 
HW reset and will cancel any other pending resets.


Ok, but why you then need that linked list? The number of reset 
sources should be static and not in any way dynamic.



So array reset_src[i] holds a pointer to pending delayed work from 
source i or NULL if no pedning work ?
What if same source triggers multiple reset requests such as 
multiple RAS errors at once , don't set the delayed work pointer in 
the arr[RAS_index] if it's already not NULL ?




See using the linked list sounds like you only wanted to cancel the 
reset sources raised so far which would not be correct as far as I 
can see.



Not clear about this one ? We do want to cancel those reset sources 
that were raised so far because we just did a HW reset which should 
fix them anyway ? Those who not raised reset request so far their 
respective array index will have a NULL ptr.


And exactly that's what I want to prevent. See you don't care if a 
reset source has fired once, twice, ten times or never. You just 
cancel all of them!


That's why I want to come to a static list of reset sources.

E.g. in the reset code (either before or after the reset, that's 
debatable) you do something like this:


for (i = 0; i < num_ring; ++i)
    cancel_delayed_work(ring[i]->scheduler)
cancel_work(adev->ras_work);
cancel_work(adev->iofault_work);
cancel_work(adev->debugfs_work);
...

You don't really need to track which reset source has fired and which 
hasn't, because that would be racy again. Instead just bluntly reset 
all possible sources.


Christian.



I don't say we care if it fired once or twice (I need to add a fix to 
only insert reset work to pending reset list if it's not already 
there), the point of using list (or array) to which you add and from 
which you remove is that the logic of this is encapsulated within 
reset domain. In your way we need to be aware who exactly schedules 
reset work and explicitly cancel them, this also means that for any 
new source added in the future you will need to remember to add him


I don't think that this is a valid argument. Additionally to the 
schedulers we probably just need less than a handful of reset sources, 
most likely even just one or two is enough.


The only justification I can see of having additional separate reset 
sources would be if somebody wants to know if a specific source has been 
handled or not (e.g. call flush_work() or work_pending()). Like in the 
case of a reset triggered through debugfs.


to the cancellation list which you showed above. In current way all 
this done automatically within reset_domain code and it's agnostic to 
specific driver and it's specific list of reset sources. Also in case 
we would want to generalize reset_domain to other GPU drivers (which was
a plan as far as i remember) this explicit mention of each reset works 
for cancellation is again less suitable in my opinion.


Well we could put the work item for the scheduler 

Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Marek Olšák
Bypass means that the contents of the cache are ignored, which decreases
latency at the cost of no coherency between bypassed and normal memory
requests. NOA (noalloc) means that the cache is checked and can give you
cache hits, but misses are not cached and the overall latency is higher. I
don't know what the hw does, but I hope it was misnamed and it really means
bypass because there is no point in doing cache lookups on every memory
request if the driver wants to disable caching to *decrease* latency in the
situations when the cache isn't helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo  wrote:

>
>
> On 5/11/2022 11:36 AM, Christian König wrote:
> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
> > entries on write.
> >
> > How about AMDGPU_VM_PAGE_NO_MALL ?
>
> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
> of attribute which decides LLC behaviour]
>
> Thanks,
> Lijo
>
> >
> > Christian.
> >
> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
> >> A better name would be:
> >> AMDGPU_VM_PAGE_BYPASS_MALL
> >>
> >> Marek
> >>
> >> On Fri, May 6, 2022 at 7:23 AM Christian König
> >>  wrote:
> >>
> >> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
> >> allocation.
> >>
> >> Only compile tested!
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
> >>  include/uapi/drm/amdgpu_drm.h   | 2 ++
> >>  4 files changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> index bf97d8f07f57..d8129626581f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
> >> amdgpu_device *adev, uint32_t flags)
> >> pte_flag |= AMDGPU_PTE_WRITEABLE;
> >> if (flags & AMDGPU_VM_PAGE_PRT)
> >> pte_flag |= AMDGPU_PTE_PRT;
> >> +   if (flags & AMDGPU_VM_PAGE_NOALLOC)
> >> +   pte_flag |= AMDGPU_PTE_NOALLOC;
> >>
> >> if (adev->gmc.gmc_funcs->map_mtype)
> >> pte_flag |= amdgpu_gmc_map_mtype(adev,
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> index b8c79789e1e4..9077dfccaf3c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
> >> amdgpu_device *adev,
> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >> +
> >> if (mapping->flags & AMDGPU_PTE_PRT) {
> >> *flags |= AMDGPU_PTE_PRT;
> >> *flags |= AMDGPU_PTE_SNOOPED;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> index 8d733eeac556..32ee56adb602 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
> >> amdgpu_device *adev,
> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >> *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >> +   *flags &= ~AMDGPU_PTE_NOALLOC;
> >> +   *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >> +
> >> if (mapping->flags & AMDGPU_PTE_PRT) {
> >> *flags |= AMDGPU_PTE_PRT;
> >> *flags |= AMDGPU_PTE_SNOOPED;
> >> diff --git a/include/uapi/drm/amdgpu_drm.h
> >> b/include/uapi/drm/amdgpu_drm.h
> >> index 57b9d8f0133a..9d71d6330687 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
> >>  #define AMDGPU_VM_MTYPE_UC (4 << 5)
> >>  /* Use Read Write MTYPE instead of default MTYPE */
> >>  #define AMDGPU_VM_MTYPE_RW (5 << 5)
> >> +/* don't allocate MALL */
> >> +#define AMDGPU_VM_PAGE_NOALLOC (1 << 9)
> >>
> >>  struct drm_amdgpu_gem_va {
> >> /** GEM object handle */
> >> --
> >> 2.25.1
> >>
> >
>


[PATCH] drm/amd/pm: support ss metrics read for smu11

2022-05-11 Thread Sathishkumar S
support reading smartshift apu and dgpu power for smu11 based asic

v2: add new version of SmuMetrics and make calculation more readable (Lijo)
v3: avoid calculations that result in -ve values and skip related checks
v4: use the current power limit on dGPU and exclude smu 11_0_7 (Lijo)

Signed-off-by: Sathishkumar S 
Acked-by: Alex Deucher 
---
 .../pmfw_if/smu11_driver_if_sienna_cichlid.h  |  63 +++
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 161 ++
 2 files changed, 187 insertions(+), 37 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
index 08f0bb2af5d2..280d42778f28 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu11_driver_if_sienna_cichlid.h
@@ -1540,11 +1540,74 @@ typedef struct {
 
 } SmuMetrics_V3_t;
 
+typedef struct {
+   uint32_t CurrClock[PPCLK_COUNT];
+
+   uint16_t AverageGfxclkFrequencyPreDs;
+   uint16_t AverageGfxclkFrequencyPostDs;
+   uint16_t AverageFclkFrequencyPreDs;
+   uint16_t AverageFclkFrequencyPostDs;
+   uint16_t AverageUclkFrequencyPreDs;
+   uint16_t AverageUclkFrequencyPostDs;
+
+
+   uint16_t AverageGfxActivity;
+   uint16_t AverageUclkActivity;
+   uint8_t  CurrSocVoltageOffset;
+   uint8_t  CurrGfxVoltageOffset;
+   uint8_t  CurrMemVidOffset;
+   uint8_t  Padding8;
+   uint16_t AverageSocketPower;
+   uint16_t TemperatureEdge;
+   uint16_t TemperatureHotspot;
+   uint16_t TemperatureMem;
+   uint16_t TemperatureVrGfx;
+   uint16_t TemperatureVrMem0;
+   uint16_t TemperatureVrMem1;
+   uint16_t TemperatureVrSoc;
+   uint16_t TemperatureLiquid0;
+   uint16_t TemperatureLiquid1;
+   uint16_t TemperaturePlx;
+   uint16_t Padding16;
+   uint32_t AccCnt;
+   uint8_t  ThrottlingPercentage[THROTTLER_COUNT];
+
+
+   uint8_t  LinkDpmLevel;
+   uint8_t  CurrFanPwm;
+   uint16_t CurrFanSpeed;
+
+   //BACO metrics, PMFW-1721
+   //metrics for D3hot entry/exit and driver ARM msgs
+   uint8_t D3HotEntryCountPerMode[D3HOT_SEQUENCE_COUNT];
+   uint8_t D3HotExitCountPerMode[D3HOT_SEQUENCE_COUNT];
+   uint8_t ArmMsgReceivedCountPerMode[D3HOT_SEQUENCE_COUNT];
+
+   //PMFW-4362
+   uint32_t EnergyAccumulator;
+   uint16_t AverageVclk0Frequency;
+   uint16_t AverageDclk0Frequency;
+   uint16_t AverageVclk1Frequency;
+   uint16_t AverageDclk1Frequency;
+   uint16_t VcnUsagePercentage0;
+   uint16_t VcnUsagePercentage1;
+   uint8_t  PcieRate;
+   uint8_t  PcieWidth;
+   uint16_t AverageGfxclkFrequencyTarget;
+
+   uint8_t  ApuSTAPMSmartShiftLimit;
+   uint8_t  AverageApuSocketPower;
+   uint8_t  ApuSTAPMLimit;
+   uint8_t  Padding8_2;
+
+} SmuMetrics_V4_t;
+
 typedef struct {
   union {
 SmuMetrics_t SmuMetrics;
 SmuMetrics_V2_t SmuMetrics_V2;
 SmuMetrics_V3_t SmuMetrics_V3;
+   SmuMetrics_V4_t SmuMetrics_V4;
   };
   uint32_t Spare[1];
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 86ab276b6b0b..503439754f08 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -585,6 +585,102 @@ static uint32_t 
sienna_cichlid_get_throttler_status_locked(struct smu_context *s
return throttler_status;
 }
 
+static int sienna_cichlid_get_power_limit(struct smu_context *smu,
+ uint32_t *current_power_limit,
+ uint32_t *default_power_limit,
+ uint32_t *max_power_limit)
+{
+   struct smu_11_0_7_powerplay_table *powerplay_table =
+   (struct smu_11_0_7_powerplay_table 
*)smu->smu_table.power_play_table;
+   uint32_t power_limit, od_percent;
+   uint16_t *table_member;
+
+   GET_PPTABLE_MEMBER(SocketPowerLimitAc, _member);
+
+   if (smu_v11_0_get_current_power_limit(smu, _limit)) {
+   power_limit =
+   table_member[PPT_THROTTLER_PPT0];
+   }
+
+   if (current_power_limit)
+   *current_power_limit = power_limit;
+   if (default_power_limit)
+   *default_power_limit = power_limit;
+
+   if (max_power_limit) {
+   if (smu->od_enabled) {
+   od_percent =
+   
le32_to_cpu(powerplay_table->overdrive_table.max[
+   
SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
+
+   dev_dbg(smu->adev->dev, "ODSETTING_POWERPERCENTAGE: %d 
(default: %d)\n",
+   od_percent, power_limit);
+
+   power_limit *= (100 + 

Re: [PATCH] drm/amdgpu: switch DM to atomic fence helpers v2

2022-05-11 Thread Christian König
Ok, since I don't get any objects I'm pushing this patch to 
drm-misc-next now.


Regards,
Christian.

Am 09.05.22 um 09:48 schrieb Christian König:

Harry, Nicholas once more a gentle ping on this.

Any objections/comments? Otherwise I'm going to push this through 
drm-misc-next.


Thanks,
Christian.

Am 09.05.22 um 09:47 schrieb Christian König:

This gives us the standard atomic implicit and explicit fencing rules.

v2: move the wait to amdgpu_dm_atomic_commit_tail

Signed-off-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Roman Li 
Cc: Qingqing Zhuo 
Cc: Jude Shih 
Cc: Wayne Lin 
Cc: Rodrigo Siqueira 
---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 23 ---
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c

index a6880dd9c0bb..f9ce8cb45e6d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -83,6 +83,7 @@
  #include 
  #include 
  #include 
+#include 
    #if defined(CONFIG_DRM_AMD_DC_DCN)
  #include "ivsrcid/dcn/irqsrcs_dcn_1_0.h"
@@ -7627,6 +7628,10 @@ static int dm_plane_helper_prepare_fb(struct 
drm_plane *plane,

  goto error_unpin;
  }
  +    r = drm_gem_plane_helper_prepare_fb(plane, new_state);
+    if (unlikely(r != 0))
+    goto error_unpin;
+
  amdgpu_bo_unreserve(rbo);
    afb->address = amdgpu_bo_gpu_offset(rbo);
@@ -9160,7 +9165,6 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,

  struct dm_crtc_state *dm_old_crtc_state =
to_dm_crtc_state(drm_atomic_get_old_crtc_state(state, pcrtc));
  int planes_count = 0, vpos, hpos;
-    long r;
  unsigned long flags;
  struct amdgpu_bo *abo;
  uint32_t target_vblank, last_flip_vblank;
@@ -9235,18 +9239,6 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,

  }
    abo = gem_to_amdgpu_bo(fb->obj[0]);
-
-    /*
- * Wait for all fences on this FB. Do limited wait to avoid
- * deadlock during GPU reset when this fence will not signal
- * but we hold reservation lock for the BO.
- */
-    r = dma_resv_wait_timeout(abo->tbo.base.resv,
-  DMA_RESV_USAGE_WRITE, false,
-  msecs_to_jiffies(5000));
-    if (unlikely(r <= 0))
-    DRM_ERROR("Waiting for fences timed out!");
-
  fill_dc_plane_info_and_addr(
  dm->adev, new_plane_state,
  afb->tiling_flags,
@@ -9591,9 +9583,14 @@ static void 
amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)

  struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
  int crtc_disable_count = 0;
  bool mode_set_reset_required = false;
+    int r;
    trace_amdgpu_dm_atomic_commit_tail_begin(state);
  +    r = drm_atomic_helper_wait_for_fences(dev, state, false);
+    if (unlikely(r))
+    DRM_ERROR("Waiting for fences timed out!");
+
  drm_atomic_helper_update_legacy_modeset_state(dev, state);
    dm_state = dm_atomic_get_new_state(state);






Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE

2022-05-11 Thread Marek Olšák
OK that sounds good.

Marek

On Wed, May 11, 2022 at 2:04 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Hi Marek,
>
> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>
> A better flag name would be:
> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>
>
> A bit long for my taste and I think the best placement is just a side
> effect.
>
>
> Marek
>
> On Tue, May 10, 2022 at 4:13 PM Marek Olšák  wrote:
>
>> Does this really guarantee VRAM placement? The code doesn't say anything
>> about that.
>>
>
> Yes, see the code here:
>
>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 8b7ee1142d9a..1944ef37a61e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>> bp->domain;
>>> bo->allowed_domains = bo->preferred_domains;
>>> if (bp->type != ttm_bo_type_kernel &&
>>> +   !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>> bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>>> bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>
>>
> The only case where this could be circumvented is when you try to allocate
> more than physically available on an APU.
>
> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then the
> GEM code will catch the error and fallback to GTT (IIRC).
>
> Regards,
> Christian.
>


[PATCH] drm/amd/pm: add smu power_limit callback for smu_v13_0_7

2022-05-11 Thread Yang Wang
- get_power_limit
- set_power_limit

add above callback functions to enable power_cap hwmon node.

Signed-off-by: Yang Wang 
---
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
index 7c9e0ba7ab50..4e1861fb2c6a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
@@ -1367,6 +1367,43 @@ static int smu_v13_0_7_enable_mgpu_fan_boost(struct 
smu_context *smu)
   NULL);
 }
 
+static int smu_v13_0_7_get_power_limit(struct smu_context *smu,
+  uint32_t *current_power_limit,
+  uint32_t *default_power_limit,
+  uint32_t *max_power_limit)
+{
+   struct smu_table_context *table_context = >smu_table;
+   struct smu_13_0_7_powerplay_table *powerplay_table =
+   (struct smu_13_0_7_powerplay_table 
*)table_context->power_play_table;
+   PPTable_t *pptable = table_context->driver_pptable;
+   SkuTable_t *skutable = >SkuTable;
+   uint32_t power_limit, od_percent;
+
+   if (smu_v13_0_get_current_power_limit(smu, _limit))
+   power_limit = smu->adev->pm.ac_power ?
+ skutable->SocketPowerLimitAc[PPT_THROTTLER_PPT0] :
+ skutable->SocketPowerLimitDc[PPT_THROTTLER_PPT0];
+
+   if (current_power_limit)
+   *current_power_limit = power_limit;
+   if (default_power_limit)
+   *default_power_limit = power_limit;
+
+   if (max_power_limit) {
+   if (smu->od_enabled) {
+   od_percent = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
+
+   dev_dbg(smu->adev->dev, "ODSETTING_POWERPERCENTAGE: %d 
(default: %d)\n", od_percent, power_limit);
+
+   power_limit *= (100 + od_percent);
+   power_limit /= 100;
+   }
+   *max_power_limit = power_limit;
+   }
+
+   return 0;
+}
+
 static int smu_v13_0_7_get_power_profile_mode(struct smu_context *smu, char 
*buf)
 {
DpmActivityMonitorCoeffIntExternal_t 
activity_monitor_external[PP_SMC_POWER_PROFILE_COUNT];
@@ -1539,6 +1576,8 @@ static const struct pptable_funcs smu_v13_0_7_ppt_funcs = 
{
.get_fan_control_mode = smu_v13_0_get_fan_control_mode,
.set_fan_control_mode = smu_v13_0_set_fan_control_mode,
.enable_mgpu_fan_boost = smu_v13_0_7_enable_mgpu_fan_boost,
+   .get_power_limit = smu_v13_0_7_get_power_limit,
+   .set_power_limit = smu_v13_0_set_power_limit,
.get_power_profile_mode = smu_v13_0_7_get_power_profile_mode,
.set_power_profile_mode = smu_v13_0_7_set_power_profile_mode,
.set_tool_table_location = smu_v13_0_set_tool_table_location,
-- 
2.25.1



Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Lazar, Lijo




On 5/11/2022 11:36 AM, Christian König wrote:
Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL 
entries on write.


How about AMDGPU_VM_PAGE_NO_MALL ?


One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort 
of attribute which decides LLC behaviour]


Thanks,
Lijo



Christian.

Am 10.05.22 um 23:21 schrieb Marek Olšák:

A better name would be:
AMDGPU_VM_PAGE_BYPASS_MALL

Marek

On Fri, May 6, 2022 at 7:23 AM Christian König 
 wrote:


Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
allocation.

Only compile tested!

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 include/uapi/drm/amdgpu_drm.h           | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bf97d8f07f57..d8129626581f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
amdgpu_device *adev, uint32_t flags)
                pte_flag |= AMDGPU_PTE_WRITEABLE;
        if (flags & AMDGPU_VM_PAGE_PRT)
                pte_flag |= AMDGPU_PTE_PRT;
+       if (flags & AMDGPU_VM_PAGE_NOALLOC)
+               pte_flag |= AMDGPU_PTE_NOALLOC;

        if (adev->gmc.gmc_funcs->map_mtype)
                pte_flag |= amdgpu_gmc_map_mtype(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b8c79789e1e4..9077dfccaf3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 8d733eeac556..32ee56adb602 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index 57b9d8f0133a..9d71d6330687 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_UC             (4 << 5)
 /* Use Read Write MTYPE instead of default MTYPE */
 #define AMDGPU_VM_MTYPE_RW             (5 << 5)
+/* don't allocate MALL */
+#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)

 struct drm_amdgpu_gem_va {
        /** GEM object handle */
-- 
2.25.1






Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC

2022-05-11 Thread Christian König
Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL 
entries on write.


How about AMDGPU_VM_PAGE_NO_MALL ?

Christian.

Am 10.05.22 um 23:21 schrieb Marek Olšák:

A better name would be:
AMDGPU_VM_PAGE_BYPASS_MALL

Marek

On Fri, May 6, 2022 at 7:23 AM Christian König 
 wrote:


Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
allocation.

Only compile tested!

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 include/uapi/drm/amdgpu_drm.h           | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bf97d8f07f57..d8129626581f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
amdgpu_device *adev, uint32_t flags)
                pte_flag |= AMDGPU_PTE_WRITEABLE;
        if (flags & AMDGPU_VM_PAGE_PRT)
                pte_flag |= AMDGPU_PTE_PRT;
+       if (flags & AMDGPU_VM_PAGE_NOALLOC)
+               pte_flag |= AMDGPU_PTE_NOALLOC;

        if (adev->gmc.gmc_funcs->map_mtype)
                pte_flag |= amdgpu_gmc_map_mtype(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b8c79789e1e4..9077dfccaf3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 8d733eeac556..32ee56adb602 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
amdgpu_device *adev,
        *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
        *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);

+       *flags &= ~AMDGPU_PTE_NOALLOC;
+       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
        if (mapping->flags & AMDGPU_PTE_PRT) {
                *flags |= AMDGPU_PTE_PRT;
                *flags |= AMDGPU_PTE_SNOOPED;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index 57b9d8f0133a..9d71d6330687 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_UC             (4 << 5)
 /* Use Read Write MTYPE instead of default MTYPE */
 #define AMDGPU_VM_MTYPE_RW             (5 << 5)
+/* don't allocate MALL */
+#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)

 struct drm_amdgpu_gem_va {
        /** GEM object handle */
-- 
2.25.1




Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE

2022-05-11 Thread Christian König

Hi Marek,

Am 10.05.22 um 22:43 schrieb Marek Olšák:

A better flag name would be:
AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD


A bit long for my taste and I think the best placement is just a side 
effect.




Marek

On Tue, May 10, 2022 at 4:13 PM Marek Olšák  wrote:

Does this really guarantee VRAM placement? The code doesn't say
anything about that.



Yes, see the code here:



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8b7ee1142d9a..1944ef37a61e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device
*adev,
                bp->domain;
        bo->allowed_domains = bo->preferred_domains;
        if (bp->type != ttm_bo_type_kernel &&
+           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
            bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
                bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;



The only case where this could be circumvented is when you try to 
allocate more than physically available on an APU.


E.g. you only have something like 32 MiB VRAM and request 64 MiB, then 
the GEM code will catch the error and fallback to GTT (IIRC).


Regards,
Christian.