RE: [PATCH] drm/amd/display: Fix white screen page fault for gpuvm

2021-09-13 Thread Liu, Aaron
[AMD Official Use Only]

Verified on Yellow Carp.
Acked-by: Aaron Liu 

--
Best Regards
Aaron Liu

> -Original Message-
> From: Kazlauskas, Nicholas 
> Sent: Tuesday, September 14, 2021 3:26 AM
> To: Alex Deucher 
> Cc: amd-gfx list ; Liu, Aaron
> 
> Subject: Re: [PATCH] drm/amd/display: Fix white screen page fault for
> gpuvm
> 
> On 2021-09-13 3:13 p.m., Alex Deucher wrote:
> > Acked-by: Alex Deucher 
> >
> > Can you add a fixes: tag?
> >
> > Alex
> 
> Sure, I think the relevant patch is:
> 
> Fixes: 64b1d0e8d50 ("drm/amd/display: Add DCN3.1 HWSEQ")
> 
> Regards,
> Nicholas Kazlauskas
> 
> >
> > On Mon, Sep 13, 2021 at 3:11 PM Nicholas Kazlauskas
> >  wrote:
> >>
> >> [Why]
> >> The "base_addr_is_mc_addr" field was added for dcn3.1 support but
> >> pa_config was never updated to set it to false.
> >>
> >> Uninitialized memory causes it to be set to true which results in
> >> address mistranslation and white screen.
> >>
> >> [How]
> >> Use memset to ensure all fields are initialized to 0 by default.
> >>
> >> Cc: Aaron Liu 
> >> Signed-off-by: Nicholas Kazlauskas 
> >> ---
> >>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> index 53363728dbb..b0426bb3f2e 100644
> >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> @@ -1125,6 +1125,8 @@ static void
> mmhub_read_system_context(struct amdgpu_device *adev, struct dc_phy_
> >>  uint32_t agp_base, agp_bot, agp_top;
> >>  PHYSICAL_ADDRESS_LOC page_table_start, page_table_end,
> >> page_table_base;
> >>
> >> +   memset(pa_config, 0, sizeof(*pa_config));
> >> +
> >>  logical_addr_low  = min(adev->gmc.fb_start, adev->gmc.agp_start) 
> >> >>
> 18;
> >>  pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);
> >>
> >> --
> >> 2.25.1
> >>


RE: [RFC][PATCH] drm/amdgpu/powerplay/smu10: Add custom profile

2021-09-13 Thread Quan, Evan
[AMD Official Use Only]

Driver can exchange the custom profiling settings with SMU FW using the table 
below:
TABLE_CUSTOM_DPM

And the related data structure is CustomDpmSettings_t.

BR
Evan
> -Original Message-
> From: Alex Deucher 
> Sent: Monday, September 13, 2021 11:11 PM
> To: Daniel Gomez ; Huang, Ray ;
> Quan, Evan ; Zhu, Changfeng
> 
> Cc: amd-gfx list ; Maling list - DRI
> developers ; Daniel Gomez
> ; Deucher, Alexander
> ; Koenig, Christian
> ; Pan, Xinhui 
> Subject: Re: [RFC][PATCH] drm/amdgpu/powerplay/smu10: Add custom
> profile
> 
> On Wed, Sep 8, 2021 at 3:23 AM Daniel Gomez  wrote:
> >
> > On Tue, 7 Sept 2021 at 19:23, Alex Deucher 
> wrote:
> > >
> > > On Tue, Sep 7, 2021 at 4:53 AM Daniel Gomez  wrote:
> > > >
> > > > Add custom power profile mode support on smu10.
> > > > Update workload bit list.
> > > > ---
> > > >
> > > > Hi,
> > > >
> > > > I'm trying to add custom profile for the Raven Ridge but not sure
> > > > if I'd need a different parameter than PPSMC_MSG_SetCustomPolicy
> > > > to configure the custom values. The code seemed to support CUSTOM
> > > > for workload types but it didn't show up in the menu or accept any
> > > > user input parameter. So far, I've added that part but a bit
> > > > confusing to me what is the policy I need for setting these
> > > > parameters or if it's maybe not possible at all.
> > > >
> > > > After applying the changes I'd configure the CUSTOM mode as follows:
> > > >
> > > > echo manual >
> > > >
> /sys/class/drm/card0/device/hwmon/hwmon1/device/power_dpm_force_
> pe
> > > > rformance_level echo "6 70 90 0 0" >
> > > >
> /sys/class/drm/card0/device/hwmon/hwmon1/device/pp_power_profile_
> m
> > > > ode
> > > >
> > > > Then, using Darren Powell script for testing modes I get the
> > > > following
> > > > output:
> > > >
> > > > 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices,
> > > > Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega
> > > > Mobile Series] [1002:15dd] (rev 83) === pp_dpm_sclk ===
> > > > 0: 200Mhz
> > > > 1: 400Mhz *
> > > > 2: 1100Mhz
> > > > === pp_dpm_mclk ===
> > > > 0: 400Mhz
> > > > 1: 933Mhz *
> > > > 2: 1067Mhz
> > > > 3: 1200Mhz
> > > > === pp_power_profile_mode ===
> > > > NUMMODE_NAME BUSY_SET_POINT FPS USE_RLC_BUSY
> MIN_ACTIVE_LEVEL
> > > >   0 BOOTUP_DEFAULT : 70  60  0  0
> > > >   1 3D_FULL_SCREEN : 70  60  1  3
> > > >   2   POWER_SAVING : 90  60  0  0
> > > >   3  VIDEO : 70  60  0  0
> > > >   4 VR : 70  90  0  0
> > > >   5COMPUTE : 30  60  0  6
> > > >   6 CUSTOM*: 70  90  0  0
> > > >
> > > > As you can also see in my changes, I've also updated the workload
> > > > bit table but I'm not completely sure about that change. With the
> > > > tests I've done, using bit 5 for the WORKLOAD_PPLIB_CUSTOM_BIT
> > > > makes the gpu sclk locked around ~36%. So, maybe I'm missing a
> > > > clock limit configuraton table somewhere. Would you give me some
> > > > hints to proceed with this?
> > >
> > > I don't think APUs support customizing the workloads the same way
> > > dGPUs do.  I think they just support predefined profiles.
> > >
> > > Alex
> >
> >
> > Thanks Alex for the quick response. Would it make sense then to remove
> > the custom workload code (PP_SMC_POWER_PROFILE_CUSTOM) from the
> smu10?
> > That workload was added in this commit:
> > f6f75ebdc06c04d3cfcd100f1b10256a9cdca407 [1] and not use at all in the
> > code as it's limited to PP_SMC_POWER_PROFILE_COMPUTE index. The
> > smu10.h also includes the custom workload bit definition and that was
> > a bit confusing for me to understand if it was half-supported or not
> > possible to use at all as I understood from your comment.
> >
> > Perhaps could also be mentioned (if that's kind of standard) in the
> > documentation[2] so, the custom pp_power_profile_mode is only
> > supported in dGPUs.
> >
> > I can send the patches if it makes sense.
> 
> I guess I was thinking of another asic.  @Huang Rui, @changzhu, @Quan,
> Evan can any of you comment on what is required for custom profiles on
> APUs?
> 
> Alex
> 
> 
> >
> > [1]:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> >
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git
> %2
> >
> Fcommit%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Fpm%2Fpowerplay%2Fhwmg
> r%2Fsmu10_h
> >
> wmgr.c%3Fid%3Df6f75ebdc06c04d3cfcd100f1b10256a9cdca407data=0
> 4%7C0
> >
> 1%7CEvan.Quan%40amd.com%7Cfdb9fc6f03f84cb69dc608d976c8b517%7C3d
> d8961fe
> >
> 4884e608e11a82d994e183d%7C0%7C0%7C637671426675410633%7CUnknown
> %7CTWFpb
> >
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0
> > %3D%7C1000sdata=Nsmj%2BgJpv9QZj%2FKF57E9n7LJfcOc9Jg51jy0h
> eOPTRI%3
> > Dreserved=0
> > [2]:

[PATCH v2 1/1] drm/amdkfd: Add sysfs bitfields and enums to uAPI

2021-09-13 Thread Felix Kuehling
These bits are de-facto part of the uAPI, so declare them in a uAPI header.

The corresponding bit-fields and enums in user mode are defined in
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/include/hsakmttypes.h

HSA_CAP_...   -> HSA_CAPABILITY
HSA_MEM_HEAP_TYPE_... -> HSA_HEAPTYPE
HSA_MEM_FLAGS_... -> HSA_MEMORYPROPERTY
HSA_CACHE_TYPE_...-> HsaCacheType
HSA_IOLINK_TYPE_...   -> HSA_IOLINKTYPE
HSA_IOLINK_FLAGS_...  -> HSA_LINKPROPERTY

Signed-off-by: Felix Kuehling 
---
 MAINTAINERS   |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  46 +
 include/uapi/linux/kfd_sysfs.h| 108 ++
 3 files changed, 110 insertions(+), 45 deletions(-)
 create mode 100644 include/uapi/linux/kfd_sysfs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 84cd16694640..7554ec928ee2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -930,6 +930,7 @@ F:  drivers/gpu/drm/amd/include/kgd_kfd_interface.h
 F: drivers/gpu/drm/amd/include/v9_structs.h
 F: drivers/gpu/drm/amd/include/vi_structs.h
 F: include/uapi/linux/kfd_ioctl.h
+F: include/uapi/linux/kfd_sysfs.h
 
 AMD SPI DRIVER
 M: Sanjay R Mehta 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index a8db017c9b8e..f0cc59d2fd5d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -25,38 +25,11 @@
 
 #include 
 #include 
+#include 
 #include "kfd_crat.h"
 
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32
 
-#define HSA_CAP_HOT_PLUGGABLE  0x0001
-#define HSA_CAP_ATS_PRESENT0x0002
-#define HSA_CAP_SHARED_WITH_GRAPHICS   0x0004
-#define HSA_CAP_QUEUE_SIZE_POW20x0008
-#define HSA_CAP_QUEUE_SIZE_32BIT   0x0010
-#define HSA_CAP_QUEUE_IDLE_EVENT   0x0020
-#define HSA_CAP_VA_LIMIT   0x0040
-#define HSA_CAP_WATCH_POINTS_SUPPORTED 0x0080
-#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK0x0f00
-#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT   8
-#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK   0x3000
-#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT  12
-
-#define HSA_CAP_DOORBELL_TYPE_PRE_1_0  0x0
-#define HSA_CAP_DOORBELL_TYPE_1_0  0x1
-#define HSA_CAP_DOORBELL_TYPE_2_0  0x2
-#define HSA_CAP_AQL_QUEUE_DOUBLE_MAP   0x4000
-
-#define HSA_CAP_RESERVED_WAS_SRAM_EDCSUPPORTED 0x0008 /* Old buggy user 
mode depends on this being 0 */
-#define HSA_CAP_MEM_EDCSUPPORTED   0x0010
-#define HSA_CAP_RASEVENTNOTIFY 0x0020
-#define HSA_CAP_ASIC_REVISION_MASK 0x03c0
-#define HSA_CAP_ASIC_REVISION_SHIFT22
-#define HSA_CAP_SRAM_EDCSUPPORTED  0x0400
-#define HSA_CAP_SVMAPI_SUPPORTED   0x0800
-#define HSA_CAP_FLAGS_COHERENTHOSTACCESS   0x1000
-#define HSA_CAP_RESERVED   0xe00f8000
-
 struct kfd_node_properties {
uint64_t hive_id;
uint32_t cpu_cores_count;
@@ -93,17 +66,6 @@ struct kfd_node_properties {
char name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
 };
 
-#define HSA_MEM_HEAP_TYPE_SYSTEM   0
-#define HSA_MEM_HEAP_TYPE_FB_PUBLIC1
-#define HSA_MEM_HEAP_TYPE_FB_PRIVATE   2
-#define HSA_MEM_HEAP_TYPE_GPU_GDS  3
-#define HSA_MEM_HEAP_TYPE_GPU_LDS  4
-#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH  5
-
-#define HSA_MEM_FLAGS_HOT_PLUGGABLE0x0001
-#define HSA_MEM_FLAGS_NON_VOLATILE 0x0002
-#define HSA_MEM_FLAGS_RESERVED 0xfffc
-
 struct kfd_mem_properties {
struct list_headlist;
uint32_theap_type;
@@ -116,12 +78,6 @@ struct kfd_mem_properties {
struct attributeattr;
 };
 
-#define HSA_CACHE_TYPE_DATA0x0001
-#define HSA_CACHE_TYPE_INSTRUCTION 0x0002
-#define HSA_CACHE_TYPE_CPU 0x0004
-#define HSA_CACHE_TYPE_HSACU   0x0008
-#define HSA_CACHE_TYPE_RESERVED0xfff0
-
 struct kfd_cache_properties {
struct list_headlist;
uint32_tprocessor_id_low;
diff --git a/include/uapi/linux/kfd_sysfs.h b/include/uapi/linux/kfd_sysfs.h
new file mode 100644
index ..e1fb78b4bf09
--- /dev/null
+++ b/include/uapi/linux/kfd_sysfs.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT WITH Linux-syscall-note */
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software 

[PATCH] drm/amdkfd: SVM map to gpus check vma boundary

2021-09-13 Thread Philip Yang
SVM range may includes multiple VMAs with different vm_flags, if prange
page index is the last page of the VMA offset + npages, update GPU
mapping to create GPU page table with same VMA access permission.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 110c46cd7fac..2e3ee9c46a10 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1178,7 +1178,9 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
for (i = offset; i < offset + npages; i++) {
last_domain = dma_addr[i] & SVM_RANGE_VRAM_DOMAIN;
dma_addr[i] &= ~SVM_RANGE_VRAM_DOMAIN;
+
if ((prange->start + i) < prange->last &&
+   i + 1 < offset + npages &&
last_domain == (dma_addr[i + 1] & SVM_RANGE_VRAM_DOMAIN))
continue;
 
-- 
2.17.1



Re: [PATCH] drm/amd/display: Fix white screen page fault for gpuvm

2021-09-13 Thread Kazlauskas, Nicholas

On 2021-09-13 3:13 p.m., Alex Deucher wrote:

Acked-by: Alex Deucher 

Can you add a fixes: tag?

Alex


Sure, I think the relevant patch is:

Fixes: 64b1d0e8d50 ("drm/amd/display: Add DCN3.1 HWSEQ")

Regards,
Nicholas Kazlauskas



On Mon, Sep 13, 2021 at 3:11 PM Nicholas Kazlauskas
 wrote:


[Why]
The "base_addr_is_mc_addr" field was added for dcn3.1 support but
pa_config was never updated to set it to false.

Uninitialized memory causes it to be set to true which results in
address mistranslation and white screen.

[How]
Use memset to ensure all fields are initialized to 0 by default.

Cc: Aaron Liu 
Signed-off-by: Nicholas Kazlauskas 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 53363728dbb..b0426bb3f2e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1125,6 +1125,8 @@ static void mmhub_read_system_context(struct 
amdgpu_device *adev, struct dc_phy_
 uint32_t agp_base, agp_bot, agp_top;
 PHYSICAL_ADDRESS_LOC page_table_start, page_table_end, page_table_base;

+   memset(pa_config, 0, sizeof(*pa_config));
+
 logical_addr_low  = min(adev->gmc.fb_start, adev->gmc.agp_start) >> 18;
 pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);

--
2.25.1





Re: [PATCH] drm/amdgpu: Remove ununsed variable from amdgpu_ib_pool_init

2021-09-13 Thread Christian König

Am 13.09.21 um 19:27 schrieb Anson Jacob:

Remove unused variable 'size'.

Signed-off-by: Anson Jacob 


Yeah, that's because of the recent change that we now use the same size 
for everything.


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 9274f32c3661..bc1297dcdf97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -300,7 +300,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
   */
  int amdgpu_ib_pool_init(struct amdgpu_device *adev)
  {
-   unsigned size;
int r, i;
  
  	if (adev->ib_pool_ready)




Re: [PATCH] drm/amd/display: Fix white screen page fault for gpuvm

2021-09-13 Thread Alex Deucher
Acked-by: Alex Deucher 

Can you add a fixes: tag?

Alex

On Mon, Sep 13, 2021 at 3:11 PM Nicholas Kazlauskas
 wrote:
>
> [Why]
> The "base_addr_is_mc_addr" field was added for dcn3.1 support but
> pa_config was never updated to set it to false.
>
> Uninitialized memory causes it to be set to true which results in
> address mistranslation and white screen.
>
> [How]
> Use memset to ensure all fields are initialized to 0 by default.
>
> Cc: Aaron Liu 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 53363728dbb..b0426bb3f2e 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1125,6 +1125,8 @@ static void mmhub_read_system_context(struct 
> amdgpu_device *adev, struct dc_phy_
> uint32_t agp_base, agp_bot, agp_top;
> PHYSICAL_ADDRESS_LOC page_table_start, page_table_end, 
> page_table_base;
>
> +   memset(pa_config, 0, sizeof(*pa_config));
> +
> logical_addr_low  = min(adev->gmc.fb_start, adev->gmc.agp_start) >> 
> 18;
> pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);
>
> --
> 2.25.1
>


[PATCH] drm/amd/display: Fix white screen page fault for gpuvm

2021-09-13 Thread Nicholas Kazlauskas
[Why]
The "base_addr_is_mc_addr" field was added for dcn3.1 support but
pa_config was never updated to set it to false.

Uninitialized memory causes it to be set to true which results in
address mistranslation and white screen.

[How]
Use memset to ensure all fields are initialized to 0 by default.

Cc: Aaron Liu 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 53363728dbb..b0426bb3f2e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1125,6 +1125,8 @@ static void mmhub_read_system_context(struct 
amdgpu_device *adev, struct dc_phy_
uint32_t agp_base, agp_bot, agp_top;
PHYSICAL_ADDRESS_LOC page_table_start, page_table_end, page_table_base;
 
+   memset(pa_config, 0, sizeof(*pa_config));
+
logical_addr_low  = min(adev->gmc.fb_start, adev->gmc.agp_start) >> 18;
pt_base = amdgpu_gmc_pd_addr(adev->gart.bo);
 
-- 
2.25.1



Re: [PATCH] drm/amd/display: Add NULL checks for vblank workqueue

2021-09-13 Thread Alex Deucher
On Tue, Sep 7, 2021 at 9:42 PM Mike Lothian  wrote:
>
> Hi
>
> I've just tested this out against Linus's tree and it seems to fix things
>
> Out of interest does Tonga have GPU reset when things go wrong?

Yes, it does.

Alex

>
> Thanks
>
> Mike
>
> On Tue, 7 Sept 2021 at 15:20, Harry Wentland  wrote:
> >
> >
> >
> > On 2021-09-07 10:10 a.m., Nicholas Kazlauskas wrote:
> > > [Why]
> > > If we're running a headless config with 0 links then the vblank
> > > workqueue will be NULL - causing a NULL pointer exception during
> > > any commit.
> > >
> > > [How]
> > > Guard access to the workqueue if it's NULL and don't queue or flush
> > > work if it is.
> > >
> > > Cc: Roman Li 
> > > Cc: Wayne Lin 
> > > Cc: Harry Wentland 
> > > Reported-by: Mike Lothian 
> > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1700
> > > Fixes: 91f86d4cce2 ("drm/amd/display: Use vblank control events for PSR 
> > > enable/disable")
> > > Signed-off-by: Nicholas Kazlauskas 
> >
> > Reviewed-by: Harry Wentland 
> >
> > Harry
> >
> > > ---
> > >  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 32 +++
> > >  1 file changed, 18 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > index 8837259215d..46e08736f94 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > @@ -6185,21 +6185,23 @@ static inline int dm_set_vblank(struct drm_crtc 
> > > *crtc, bool enable)
> > >   return 0;
> > >
> > >  #if defined(CONFIG_DRM_AMD_DC_DCN)
> > > - work = kzalloc(sizeof(*work), GFP_ATOMIC);
> > > - if (!work)
> > > - return -ENOMEM;
> > > + if (dm->vblank_control_workqueue) {
> > > + work = kzalloc(sizeof(*work), GFP_ATOMIC);
> > > + if (!work)
> > > + return -ENOMEM;
> > >
> > > - INIT_WORK(>work, vblank_control_worker);
> > > - work->dm = dm;
> > > - work->acrtc = acrtc;
> > > - work->enable = enable;
> > > + INIT_WORK(>work, vblank_control_worker);
> > > + work->dm = dm;
> > > + work->acrtc = acrtc;
> > > + work->enable = enable;
> > >
> > > - if (acrtc_state->stream) {
> > > - dc_stream_retain(acrtc_state->stream);
> > > - work->stream = acrtc_state->stream;
> > > - }
> > > + if (acrtc_state->stream) {
> > > + dc_stream_retain(acrtc_state->stream);
> > > + work->stream = acrtc_state->stream;
> > > + }
> > >
> > > - queue_work(dm->vblank_control_workqueue, >work);
> > > + queue_work(dm->vblank_control_workqueue, >work);
> > > + }
> > >  #endif
> > >
> > >   return 0;
> > > @@ -8809,7 +8811,8 @@ static void amdgpu_dm_commit_planes(struct 
> > > drm_atomic_state *state,
> > >* If PSR or idle optimizations are enabled then flush out
> > >* any pending work before hardware programming.
> > >*/
> > > - flush_workqueue(dm->vblank_control_workqueue);
> > > + if (dm->vblank_control_workqueue)
> > > + flush_workqueue(dm->vblank_control_workqueue);
> > >  #endif
> > >
> > >   bundle->stream_update.stream = acrtc_state->stream;
> > > @@ -9144,7 +9147,8 @@ static void amdgpu_dm_atomic_commit_tail(struct 
> > > drm_atomic_state *state)
> > >   /* if there mode set or reset, disable eDP PSR */
> > >   if (mode_set_reset_required) {
> > >  #if defined(CONFIG_DRM_AMD_DC_DCN)
> > > - flush_workqueue(dm->vblank_control_workqueue);
> > > + if (dm->vblank_control_workqueue)
> > > + 
> > > flush_workqueue(dm->vblank_control_workqueue);
> > >  #endif
> > >   amdgpu_dm_psr_disable_all(dm);
> > >   }
> > >
> >


Re: [PATCH] amd/display: enable panel orientation quirks

2021-09-13 Thread Alex Deucher
Applied.  Thanks!

Alex

On Mon, Sep 13, 2021 at 11:24 AM Harry Wentland  wrote:
>
> On 2021-09-10 11:37 a.m., Simon Ser wrote:
> > This patch allows panel orientation quirks from DRM core to be
> > used. They attach a DRM connector property "panel orientation"
> > which indicates in which direction the panel has been mounted.
> > Some machines have the internal screen mounted with a rotation.
> >
> > Since the panel orientation quirks need the native mode from the
> > EDID, check for it in amdgpu_dm_connector_ddc_get_modes.
> >
> > Signed-off-by: Simon Ser 
> > Cc: Alex Deucher 
> > Cc: Harry Wentland 
> > Cc: Nicholas Kazlauskas 
>
> Reviewed-by: Harry Wentland 
>
> Harry
>
> > ---
> >  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 28 +++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > index 53363728dbbd..a420602f1794 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > @@ -7680,6 +7680,32 @@ static void 
> > amdgpu_dm_connector_add_common_modes(struct drm_encoder *encoder,
> >   }
> >  }
> >
> > +static void amdgpu_set_panel_orientation(struct drm_connector *connector)
> > +{
> > + struct drm_encoder *encoder;
> > + struct amdgpu_encoder *amdgpu_encoder;
> > + const struct drm_display_mode *native_mode;
> > +
> > + if (connector->connector_type != DRM_MODE_CONNECTOR_eDP &&
> > + connector->connector_type != DRM_MODE_CONNECTOR_LVDS)
> > + return;
> > +
> > + encoder = amdgpu_dm_connector_to_encoder(connector);
> > + if (!encoder)
> > + return;
> > +
> > + amdgpu_encoder = to_amdgpu_encoder(encoder);
> > +
> > + native_mode = _encoder->native_mode;
> > + if (native_mode->hdisplay == 0 || native_mode->vdisplay == 0)
> > + return;
> > +
> > + drm_connector_set_panel_orientation_with_quirk(connector,
> > +
> > DRM_MODE_PANEL_ORIENTATION_UNKNOWN,
> > +native_mode->hdisplay,
> > +native_mode->vdisplay);
> > +}
> > +
> >  static void amdgpu_dm_connector_ddc_get_modes(struct drm_connector 
> > *connector,
> > struct edid *edid)
> >  {
> > @@ -7708,6 +7734,8 @@ static void amdgpu_dm_connector_ddc_get_modes(struct 
> > drm_connector *connector,
> >* restored here.
> >*/
> >   amdgpu_dm_update_freesync_caps(connector, edid);
> > +
> > + amdgpu_set_panel_orientation(connector);
> >   } else {
> >   amdgpu_dm_connector->num_modes = 0;
> >   }
> >
>


Re: [PATCH] drm/amdgpu: Remove ununsed variable from amdgpu_ib_pool_init

2021-09-13 Thread Alex Deucher
Reviewed-by: Alex Deucher 

On Mon, Sep 13, 2021 at 1:28 PM Anson Jacob  wrote:
>
> Remove unused variable 'size'.
>
> Signed-off-by: Anson Jacob 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 9274f32c3661..bc1297dcdf97 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -300,7 +300,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   */
>  int amdgpu_ib_pool_init(struct amdgpu_device *adev)
>  {
> -   unsigned size;
> int r, i;
>
> if (adev->ib_pool_ready)
> --
> 2.25.1
>


[PATCH] drm/amdgpu: Remove ununsed variable from amdgpu_ib_pool_init

2021-09-13 Thread Anson Jacob
Remove unused variable 'size'.

Signed-off-by: Anson Jacob 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 9274f32c3661..bc1297dcdf97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -300,7 +300,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
  */
 int amdgpu_ib_pool_init(struct amdgpu_device *adev)
 {
-   unsigned size;
int r, i;
 
if (adev->ib_pool_ready)
-- 
2.25.1



Re: [PATCH] drm/amdkfd: Add dummy function for kgd2kfd_resume_iommu

2021-09-13 Thread Alex Deucher
Reviewed-by: Alex Deucher 

On Mon, Sep 13, 2021 at 12:56 PM Anson Jacob  wrote:
>
> Add dummy function when CONFIG_HSA_AMD is not enabled.
>
> Fixes: 433d2448d57c ("drm/amdkfd: separate kfd_iommu_resume from kfd_resume")
> Signed-off-by: Anson Jacob 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index b40ed399d2cf..3bc52b2c604f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -367,6 +367,11 @@ static inline void kgd2kfd_suspend(struct kfd_dev *kfd, 
> bool run_pm)
>  {
>  }
>
> +static int __maybe_unused kgd2kfd_resume_iommu(struct kfd_dev *kfd)
> +{
> +   return 0;
> +}
> +
>  static inline int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
>  {
> return 0;
> --
> 2.25.1
>


[PATCH] drm/amdkfd: Add dummy function for kgd2kfd_resume_iommu

2021-09-13 Thread Anson Jacob
Add dummy function when CONFIG_HSA_AMD is not enabled.

Fixes: 433d2448d57c ("drm/amdkfd: separate kfd_iommu_resume from kfd_resume")
Signed-off-by: Anson Jacob 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index b40ed399d2cf..3bc52b2c604f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -367,6 +367,11 @@ static inline void kgd2kfd_suspend(struct kfd_dev *kfd, 
bool run_pm)
 {
 }
 
+static int __maybe_unused kgd2kfd_resume_iommu(struct kfd_dev *kfd)
+{
+   return 0;
+}
+
 static inline int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 {
return 0;
-- 
2.25.1



Re: [PATCH 2/2] drm/amdgpu: Demote TMZ unsupported log message from warning to info

2021-09-13 Thread Alex Deucher
Applied.  Thanks.

Alex

On Mon, Sep 13, 2021 at 4:46 AM Paul Menzel  wrote:
>
> As the user cannot do anything about the unsupported Trusted Memory Zone
> (TMZ) feature, do not warn about it, but make it informational, so
> demote the log level from warning to info.
>
> Signed-off-by: Paul Menzel 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index c4c56c57b0c0..bfa0275ff5d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -598,7 +598,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
> break;
> default:
> adev->gmc.tmz_enabled = false;
> -   dev_warn(adev->dev,
> +   dev_info(adev->dev,
>  "Trusted Memory Zone (TMZ) feature not supported by 
> hardware\n");
> break;
> }
> --
> 2.33.0
>


Re: [PATCH] drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count

2021-09-13 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, Sep 9, 2021 at 2:07 PM Lyude Paul  wrote:
>
> Reviewed-by: Lyude Paul 
>
> On Thu, 2021-09-09 at 18:56 +0200, Michel Dänzer wrote:
> > From: Michel Dänzer 
> >
> > This was unusual; normally, inline functions are declared static as
> > well, and defined in a header file if used by multiple compilation
> > units. The latter would be more involved in this case, so just drop
> > the inline declaration for now.
> >
> > Fixes compile failure building for ppc64le on RHEL 8:
> >
> > In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
> >  from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
> > ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function
> > ‘amdgpu_ras_recovery_init’:
> > ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining
> > failed in call
> >  to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not
> > available
> >90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
> >   | ^~~
> > ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
> >  1985 | max_eeprom_records_len =
> > amdgpu_ras_eeprom_max_record_count();
> >   |
> > ^
> >
> > # The function is called amdgpu_ras_eeprom_get_record_max_length on
> > # stable branches
> > Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)"
> > Signed-off-by: Michel Dänzer 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > index 194590252bb9..210f30867870 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > @@ -756,7 +756,7 @@ int amdgpu_ras_eeprom_read(struct
> > amdgpu_ras_eeprom_control *control,
> > return res;
> >  }
> >
> > -inline uint32_t amdgpu_ras_eeprom_max_record_count(void)
> > +uint32_t amdgpu_ras_eeprom_max_record_count(void)
> >  {
> > return RAS_MAX_RECORD_COUNT;
> >  }
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
> > index f95fc61b3021..6bb00578bfbb 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
> > @@ -120,7 +120,7 @@ int amdgpu_ras_eeprom_read(struct
> > amdgpu_ras_eeprom_control *control,
> >  int amdgpu_ras_eeprom_append(struct amdgpu_ras_eeprom_control *control,
> >  struct eeprom_table_record *records, const u32
> > num);
> >
> > -inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
> > +uint32_t amdgpu_ras_eeprom_max_record_count(void);
> >
> >  void amdgpu_ras_debugfs_set_ret_size(struct amdgpu_ras_eeprom_control
> > *control);
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
>


Re: [PATCH] drm/amdkfd: Cast atomic64_read return value

2021-09-13 Thread Michel Dänzer
On 2021-09-13 18:28, Felix Kuehling wrote:
> Am 2021-09-13 um 12:18 p.m. schrieb Michel Dänzer:
>> On 2021-09-13 17:19, Felix Kuehling wrote:
>>> Am 2021-09-13 um 10:19 a.m. schrieb Michel Dänzer:
 From: Michel Dänzer 

 Avoids warning with -Wformat:

   CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
 ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c: In function 
 ‘kfd_smi_event_update_thermal_throttling’:
 ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c:224:60: warning: 
 format ‘%llx’ expects argument of type
  ‘long long unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
   224 | len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n",
   | ~~~^
   ||
   |long 
 long unsigned int
   | %lx
   225 |KFD_SMI_EVENT_THERMAL_THROTTLE, 
 throttle_bitmask,
   226 |
 atomic64_read(>smu.throttle_int_counter));
   |
 ~~
   ||
   |long int
>>> That's weird. As far as I can see, atomic64_read is defined to return
>>> s64, which should be the same as long long. Which architecture are you
>>> on?
>> This was from a 64-bit powerpc build. atomic64_read returns long there.
>>
>>
> This should be defined as s64:
> 
> ./arch/powerpc/include/asm/atomic.h:static __inline__ s64 atomic64_read(const 
> atomic64_t *v)
> 
> In arch/powerpc/include/uapi/asm/types.h I see this:
> 
> /*
>  * This is here because we used to use l64 for 64bit powerpc
>  * and we don't want to impact user mode with our change to ll64
>  * in the kernel.
>  *
>  * However, some user programs are fine with this.  They can
>  * flag __SANE_USERSPACE_TYPES__ to get int-ll64.h here.
>  */
> #if !defined(__SANE_USERSPACE_TYPES__) && defined(__powerpc64__) && 
> !defined(__KERNEL__)
> # include 
> #else
> # include 
> #endif
> 
> 
> So in kernel mode it should be using int-ll64.h, which defines s64 as
> long-long. The cast to u64 won't help either way. It's either
> unnecessary or it's still unsigned long.

Ah, I see now this is because the RHEL 8 kernel is based on 4.18, where this 
still returned long for powerpc.

I guess I'll have to deal with this downstream, sorry for the noise.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer


Re: [PATCH] drm/amdkfd: Cast atomic64_read return value

2021-09-13 Thread Felix Kuehling
Am 2021-09-13 um 12:18 p.m. schrieb Michel Dänzer:
> On 2021-09-13 17:19, Felix Kuehling wrote:
>> Am 2021-09-13 um 10:19 a.m. schrieb Michel Dänzer:
>>> From: Michel Dänzer 
>>>
>>> Avoids warning with -Wformat:
>>>
>>>   CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
>>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c: In function 
>>> ‘kfd_smi_event_update_thermal_throttling’:
>>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c:224:60: warning: 
>>> format ‘%llx’ expects argument of type
>>>  ‘long long unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
>>>   224 | len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n",
>>>   | ~~~^
>>>   ||
>>>   |long 
>>> long unsigned int
>>>   | %lx
>>>   225 |KFD_SMI_EVENT_THERMAL_THROTTLE, 
>>> throttle_bitmask,
>>>   226 |
>>> atomic64_read(>smu.throttle_int_counter));
>>>   |
>>> ~~
>>>   ||
>>>   |long int
>> That's weird. As far as I can see, atomic64_read is defined to return
>> s64, which should be the same as long long. Which architecture are you
>> on?
> This was from a 64-bit powerpc build. atomic64_read returns long there.
>
>
This should be defined as s64:

./arch/powerpc/include/asm/atomic.h:static __inline__ s64 atomic64_read(const 
atomic64_t *v)

In arch/powerpc/include/uapi/asm/types.h I see this:

/*
 * This is here because we used to use l64 for 64bit powerpc
 * and we don't want to impact user mode with our change to ll64
 * in the kernel.
 *
 * However, some user programs are fine with this.  They can
 * flag __SANE_USERSPACE_TYPES__ to get int-ll64.h here.
 */
#if !defined(__SANE_USERSPACE_TYPES__) && defined(__powerpc64__) && 
!defined(__KERNEL__)
# include 
#else
# include 
#endif


So in kernel mode it should be using int-ll64.h, which defines s64 as
long-long. The cast to u64 won't help either way. It's either
unnecessary or it's still unsigned long.

Regards,
  Felix




Re: [PATCH] drm/amdkfd: Cast atomic64_read return value

2021-09-13 Thread Michel Dänzer
On 2021-09-13 17:19, Felix Kuehling wrote:
> Am 2021-09-13 um 10:19 a.m. schrieb Michel Dänzer:
>> From: Michel Dänzer 
>>
>> Avoids warning with -Wformat:
>>
>>   CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c: In function 
>> ‘kfd_smi_event_update_thermal_throttling’:
>> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c:224:60: warning: 
>> format ‘%llx’ expects argument of type
>>  ‘long long unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
>>   224 | len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n",
>>   | ~~~^
>>   ||
>>   |long long 
>> unsigned int
>>   | %lx
>>   225 |KFD_SMI_EVENT_THERMAL_THROTTLE, 
>> throttle_bitmask,
>>   226 |
>> atomic64_read(>smu.throttle_int_counter));
>>   |~~
>>   ||
>>   |long int
> 
> That's weird. As far as I can see, atomic64_read is defined to return
> s64, which should be the same as long long. Which architecture are you
> on?

This was from a 64-bit powerpc build. atomic64_read returns long there.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer


[PATCH v2 08/12] lib: test_hmm add ioctl to get zone device type

2021-09-13 Thread Alex Sierra
new ioctl cmd added to query zone device type. This will be
used once the test_hmm adds zone device public type.

Signed-off-by: Alex Sierra 
---
 lib/test_hmm.c  | 15 ++-
 lib/test_hmm_uapi.h |  7 +++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 6998f10350ea..3cd91ca31dd7 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -82,6 +82,7 @@ struct dmirror_chunk {
 struct dmirror_device {
struct cdev cdevice;
struct hmm_devmem   *devmem;
+   unsigned intzone_device_type;
 
unsigned intdevmem_capacity;
unsigned intdevmem_count;
@@ -468,6 +469,7 @@ static bool dmirror_allocate_chunk(struct dmirror_device 
*mdevice,
if (IS_ERR(res))
goto err_devmem;
 
+   mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
devmem->pagemap.range.start = res->start;
devmem->pagemap.range.end = res->end;
@@ -912,6 +914,15 @@ static int dmirror_snapshot(struct dmirror *dmirror,
return ret;
 }
 
+static int dmirror_get_device_type(struct dmirror *dmirror,
+   struct hmm_dmirror_cmd *cmd)
+{
+   mutex_lock(>mutex);
+   cmd->zone_device_type = dmirror->mdevice->zone_device_type;
+   mutex_unlock(>mutex);
+
+   return 0;
+}
 static long dmirror_fops_unlocked_ioctl(struct file *filp,
unsigned int command,
unsigned long arg)
@@ -952,7 +963,9 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
case HMM_DMIRROR_SNAPSHOT:
ret = dmirror_snapshot(dmirror, );
break;
-
+   case HMM_DMIRROR_GET_MEM_DEV_TYPE:
+   ret = dmirror_get_device_type(dmirror, );
+   break;
default:
return -EINVAL;
}
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index 670b4ef2a5b6..ee88701793d5 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -26,6 +26,7 @@ struct hmm_dmirror_cmd {
__u64   npages;
__u64   cpages;
__u64   faults;
+   __u64   zone_device_type;
 };
 
 /* Expose the address space of the calling process through hmm device file */
@@ -33,6 +34,7 @@ struct hmm_dmirror_cmd {
 #define HMM_DMIRROR_WRITE  _IOWR('H', 0x01, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_MIGRATE_IOWR('H', 0x02, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_SNAPSHOT   _IOWR('H', 0x03, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_GET_MEM_DEV_TYPE   _IOWR('H', 0x04, struct hmm_dmirror_cmd)
 
 /*
  * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT.
@@ -60,4 +62,9 @@ enum {
HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE = 0x30,
 };
 
+enum {
+   /* 0 is reserved to catch uninitialized type fields */
+   HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1,
+};
+
 #endif /* _LIB_TEST_HMM_UAPI_H */
-- 
2.32.0



[PATCH v2 12/12] tools: update test_hmm script to support SP config

2021-09-13 Thread Alex Sierra
Add two more parameters to set spm_addr_dev0 & spm_addr_dev1
addresses. These two parameters configure the start SP
addresses for each device in test_hmm driver.
Consequently, this configures zone device type as public.

Signed-off-by: Alex Sierra 
---
 tools/testing/selftests/vm/test_hmm.sh | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/test_hmm.sh 
b/tools/testing/selftests/vm/test_hmm.sh
index 0647b525a625..3eeabe94399f 100755
--- a/tools/testing/selftests/vm/test_hmm.sh
+++ b/tools/testing/selftests/vm/test_hmm.sh
@@ -40,7 +40,18 @@ check_test_requirements()
 
 load_driver()
 {
-   modprobe $DRIVER > /dev/null 2>&1
+   if [ $# -eq 0 ]; then
+   modprobe $DRIVER > /dev/null 2>&1
+   else
+   if [ $# -eq 2 ]; then
+   modprobe $DRIVER spm_addr_dev0=$1 spm_addr_dev1=$2
+   > /dev/null 2>&1
+   else
+   echo "Missing module parameters. Make sure pass"\
+   "spm_addr_dev0 and spm_addr_dev1"
+   usage
+   fi
+   fi
if [ $? == 0 ]; then
major=$(awk "\$2==\"HMM_DMIRROR\" {print \$1}" /proc/devices)
mknod /dev/hmm_dmirror0 c $major 0
@@ -58,7 +69,7 @@ run_smoke()
 {
echo "Running smoke test. Note, this test provides basic coverage."
 
-   load_driver
+   load_driver $1 $2
$(dirname "${BASH_SOURCE[0]}")/hmm-tests
unload_driver
 }
@@ -75,6 +86,9 @@ usage()
echo "# Smoke testing"
echo "./${TEST_NAME}.sh smoke"
echo
+   echo "# Smoke testing with SPM enabled"
+   echo "./${TEST_NAME}.sh smoke  "
+   echo
exit 0
 }
 
@@ -84,7 +98,7 @@ function run_test()
usage
else
if [ "$1" = "smoke" ]; then
-   run_smoke
+   run_smoke $2 $3
else
usage
fi
-- 
2.32.0



[PATCH v2 11/12] tools: update hmm-test to support device public type

2021-09-13 Thread Alex Sierra
Test cases such as migrate_fault and migrate_multiple,
were modified to explicit migrate from device to sys memory
without the need of page faults, when using device public
type.

Snapshot test case updated to read memory device type
first and based on that, get the proper returned results
migrate_ping_pong test case added to test explicit migration
from device to sys memory for both private and public
zone types.

Helpers to migrate from device to sys memory and vicerversa
were also added.

Signed-off-by: Alex Sierra 
---
 tools/testing/selftests/vm/hmm-tests.c | 142 +
 1 file changed, 124 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/vm/hmm-tests.c 
b/tools/testing/selftests/vm/hmm-tests.c
index 5d1ac691b9f4..477c6283dd1b 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -44,6 +44,8 @@ struct hmm_buffer {
int fd;
uint64_tcpages;
uint64_tfaults;
+   int zone_device_type;
+   boolalloc_to_devmem;
 };
 
 #define TWOMEG (1 << 21)
@@ -133,6 +135,7 @@ static int hmm_dmirror_cmd(int fd,
cmd.addr = (__u64)buffer->ptr;
cmd.ptr = (__u64)buffer->mirror;
cmd.npages = npages;
+   cmd.alloc_to_devmem = buffer->alloc_to_devmem;
 
for (;;) {
ret = ioctl(fd, request, );
@@ -144,6 +147,7 @@ static int hmm_dmirror_cmd(int fd,
}
buffer->cpages = cmd.cpages;
buffer->faults = cmd.faults;
+   buffer->zone_device_type = cmd.zone_device_type;
 
return 0;
 }
@@ -211,6 +215,34 @@ static void hmm_nanosleep(unsigned int n)
nanosleep(, NULL);
 }
 
+static int hmm_migrate_sys_to_dev(int fd,
+  struct hmm_buffer *buffer,
+  unsigned long npages)
+{
+   buffer->alloc_to_devmem = true;
+   return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+}
+
+static int hmm_migrate_dev_to_sys(int fd,
+  struct hmm_buffer *buffer,
+  unsigned long npages)
+{
+   buffer->alloc_to_devmem = false;
+   return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+}
+
+static int hmm_is_private_device(int fd, bool *res)
+{
+   struct hmm_buffer buffer;
+   int ret;
+
+   buffer.ptr = 0;
+   ret = hmm_dmirror_cmd(fd, HMM_DMIRROR_GET_MEM_DEV_TYPE, , 1);
+   *res = (buffer.zone_device_type == HMM_DMIRROR_MEMORY_DEVICE_PRIVATE);
+
+   return ret;
+}
+
 /*
  * Simple NULL test of device open/close.
  */
@@ -875,7 +907,7 @@ TEST_F(hmm, migrate)
ptr[i] = i;
 
/* Migrate memory to device. */
-   ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+   ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
ASSERT_EQ(ret, 0);
ASSERT_EQ(buffer->cpages, npages);
 
@@ -923,7 +955,7 @@ TEST_F(hmm, migrate_fault)
ptr[i] = i;
 
/* Migrate memory to device. */
-   ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+   ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
ASSERT_EQ(ret, 0);
ASSERT_EQ(buffer->cpages, npages);
 
@@ -936,7 +968,7 @@ TEST_F(hmm, migrate_fault)
ASSERT_EQ(ptr[i], i);
 
/* Migrate memory to the device again. */
-   ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+   ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
ASSERT_EQ(ret, 0);
ASSERT_EQ(buffer->cpages, npages);
 
@@ -976,7 +1008,7 @@ TEST_F(hmm, migrate_shared)
ASSERT_NE(buffer->ptr, MAP_FAILED);
 
/* Migrate memory to device. */
-   ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+   ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
ASSERT_EQ(ret, -ENOENT);
 
hmm_buffer_free(buffer);
@@ -1015,7 +1047,7 @@ TEST_F(hmm2, migrate_mixed)
p = buffer->ptr;
 
/* Migrating a protected area should be an error. */
-   ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, npages);
+   ret = hmm_migrate_sys_to_dev(self->fd1, buffer, npages);
ASSERT_EQ(ret, -EINVAL);
 
/* Punch a hole after the first page address. */
@@ -1023,7 +1055,7 @@ TEST_F(hmm2, migrate_mixed)
ASSERT_EQ(ret, 0);
 
/* We expect an error if the vma doesn't cover the range. */
-   ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 3);
+   ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 3);
ASSERT_EQ(ret, -EINVAL);
 
/* Page 2 will be a read-only zero page. */
@@ -1055,13 +1087,13 @@ TEST_F(hmm2, migrate_mixed)
 
/* Now try to migrate pages 2-5 to device 1. */
buffer->ptr = p + 2 * self->page_size;
-   ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 4);
+   ret = 

[PATCH v2 10/12] lib: add support for device public type in test_hmm

2021-09-13 Thread Alex Sierra
Device Public type uses device memory that is coherently accesible by
the CPU. This could be shown as SP (special purpose) memory range
at the BIOS-e820 memory enumeration. If no SP memory is supported in
system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP.

Currently, test_hmm only supports two different SP ranges of at least
256MB size. This could be specified in the kernel parameter variable
efi_fake_mem. Ex. Two SP ranges of 1GB starting at 0x1 &
0x14000 physical address. Ex.
efi_fake_mem=1G@0x1:0x4,1G@0x14000:0x4

Signed-off-by: Alex Sierra 
---
 lib/test_hmm.c  | 166 +++-
 lib/test_hmm_uapi.h |  10 ++-
 2 files changed, 113 insertions(+), 63 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index ef27e355738a..e346a48e2509 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -469,6 +469,7 @@ static int dmirror_allocate_chunk(struct dmirror_device 
*mdevice,
unsigned long pfn_first;
unsigned long pfn_last;
void *ptr;
+   int ret = -ENOMEM;
 
devmem = kzalloc(sizeof(*devmem), GFP_KERNEL);
if (!devmem)
@@ -551,7 +552,7 @@ static int dmirror_allocate_chunk(struct dmirror_device 
*mdevice,
}
spin_unlock(>lock);
 
-   return true;
+   return 0;
 
 err_release:
mutex_unlock(>devmem_lock);
@@ -560,7 +561,7 @@ static int dmirror_allocate_chunk(struct dmirror_device 
*mdevice,
 err_devmem:
kfree(devmem);
 
-   return false;
+   return ret;
 }
 
 static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
@@ -569,8 +570,10 @@ static struct page *dmirror_devmem_alloc_page(struct 
dmirror_device *mdevice)
struct page *rpage;
 
/*
-* This is a fake device so we alloc real system memory to store
-* our device memory.
+* For ZONE_DEVICE private type, this is a fake device so we alloc real
+* system memory to store our device memory.
+* For ZONE_DEVICE public type we use the actual dpage to store the data
+* and ignore rpage.
 */
rpage = alloc_page(GFP_HIGHUSER);
if (!rpage)
@@ -603,7 +606,7 @@ static void dmirror_migrate_alloc_and_copy(struct 
migrate_vma *args,
   struct dmirror *dmirror)
 {
struct dmirror_device *mdevice = dmirror->mdevice;
-   const unsigned long *src = args->src;
+   unsigned long *src = args->src;
unsigned long *dst = args->dst;
unsigned long addr;
 
@@ -621,12 +624,18 @@ static void dmirror_migrate_alloc_and_copy(struct 
migrate_vma *args,
 * unallocated pte_none() or read-only zero page.
 */
spage = migrate_pfn_to_page(*src);
-
+   if (spage && is_zone_device_page(spage)) {
+   pr_debug("page already in device spage pfn: 0x%lx\n",
+ page_to_pfn(spage));
+   *src &= ~MIGRATE_PFN_MIGRATE;
+   continue;
+   }
dpage = dmirror_devmem_alloc_page(mdevice);
if (!dpage)
continue;
 
-   rpage = dpage->zone_device_data;
+   rpage = is_device_private_page(dpage) ? dpage->zone_device_data 
:
+   dpage;
if (spage)
copy_highpage(rpage, spage);
else
@@ -638,8 +647,10 @@ static void dmirror_migrate_alloc_and_copy(struct 
migrate_vma *args,
 * the simulated device memory and that page holds the pointer
 * to the mirror.
 */
+   rpage = dpage->zone_device_data;
rpage->zone_device_data = dmirror;
-
+   pr_debug("migrating from sys to dev pfn src: 0x%lx pfn dst: 
0x%lx\n",
+page_to_pfn(spage), page_to_pfn(dpage));
*dst = migrate_pfn(page_to_pfn(dpage)) |
MIGRATE_PFN_LOCKED;
if ((*src & MIGRATE_PFN_WRITE) ||
@@ -673,10 +684,13 @@ static int dmirror_migrate_finalize_and_map(struct 
migrate_vma *args,
continue;
 
/*
-* Store the page that holds the data so the page table
-* doesn't have to deal with ZONE_DEVICE private pages.
+* For ZONE_DEVICE private pages we store the page that
+* holds the data so the page table doesn't have to deal it.
+* For ZONE_DEVICE public pages we store the actual page, since
+* the CPU has coherent access to the page.
 */
-   entry = dpage->zone_device_data;
+   entry = is_device_private_page(dpage) ? dpage->zone_device_data 
:
+   dpage;
if (*dst & MIGRATE_PFN_WRITE)
 

[PATCH v2 09/12] lib: test_hmm add module param for zone device type

2021-09-13 Thread Alex Sierra
In order to configure device public in test_hmm, two module parameters
should be passed, which correspond to the SP start address of each
device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed,
private device type is configured.

Signed-off-by: Alex Sierra 
---
v5:
Remove devmem->pagemap.type = MEMORY_DEVICE_PRIVATE at
dmirror_allocate_chunk that was forcing to configure pagemap.type
to MEMORY_DEVICE_PRIVATE

v6:
Check for null pointers for resource and memremap references
at dmirror_allocate_chunk

v7:
Due to patch dropped from these patch series "kernel: resource:
lookup_resource as exported symbol", lookup_resource was not longer a
callable function. This was used in public device configuration, to
get start and end addresses, to create pgmap->range struct. This
information is now taken directly from the spm_addr_devX parameters and
the fixed size DEVMEM_CHUNK_SIZE.
---
 lib/test_hmm.c  | 66 +++--
 lib/test_hmm_uapi.h |  1 +
 2 files changed, 47 insertions(+), 20 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 3cd91ca31dd7..ef27e355738a 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -33,6 +33,16 @@
 #define DEVMEM_CHUNK_SIZE  (256 * 1024 * 1024U)
 #define DEVMEM_CHUNKS_RESERVE  16
 
+static unsigned long spm_addr_dev0;
+module_param(spm_addr_dev0, long, 0644);
+MODULE_PARM_DESC(spm_addr_dev0,
+   "Specify start address for SPM (special purpose memory) used 
for device 0. By setting this Generic device type will be used. Make sure 
spm_addr_dev1 is set too");
+
+static unsigned long spm_addr_dev1;
+module_param(spm_addr_dev1, long, 0644);
+MODULE_PARM_DESC(spm_addr_dev1,
+   "Specify start address for SPM (special purpose memory) used 
for device 1. By setting this Generic device type will be used. Make sure 
spm_addr_dev0 is set too");
+
 static const struct dev_pagemap_ops dmirror_devmem_ops;
 static const struct mmu_interval_notifier_ops dmirror_min_ops;
 static dev_t dmirror_dev;
@@ -450,11 +460,11 @@ static int dmirror_write(struct dmirror *dmirror, struct 
hmm_dmirror_cmd *cmd)
return ret;
 }
 
-static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
+static int dmirror_allocate_chunk(struct dmirror_device *mdevice,
   struct page **ppage)
 {
struct dmirror_chunk *devmem;
-   struct resource *res;
+   struct resource *res = NULL;
unsigned long pfn;
unsigned long pfn_first;
unsigned long pfn_last;
@@ -462,17 +472,29 @@ static bool dmirror_allocate_chunk(struct dmirror_device 
*mdevice,
 
devmem = kzalloc(sizeof(*devmem), GFP_KERNEL);
if (!devmem)
-   return false;
+   return -ENOMEM;
 
-   res = request_free_mem_region(_resource, DEVMEM_CHUNK_SIZE,
- "hmm_dmirror");
-   if (IS_ERR(res))
-   goto err_devmem;
+   if (!spm_addr_dev0 && !spm_addr_dev1) {
+   res = request_free_mem_region(_resource, 
DEVMEM_CHUNK_SIZE,
+ "hmm_dmirror");
+   if (IS_ERR_OR_NULL(res))
+   goto err_devmem;
+   devmem->pagemap.range.start = res->start;
+   devmem->pagemap.range.end = res->end;
+   devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
+   mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
+   } else if (spm_addr_dev0 && spm_addr_dev1) {
+   devmem->pagemap.range.start = MINOR(mdevice->cdevice.dev) ?
+   spm_addr_dev0 :
+   spm_addr_dev1;
+   devmem->pagemap.range.end = devmem->pagemap.range.start +
+   DEVMEM_CHUNK_SIZE - 1;
+   devmem->pagemap.type = MEMORY_DEVICE_PUBLIC;
+   mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PUBLIC;
+   } else {
+   pr_err("Both spm_addr_dev parameters should be set\n");
+   }
 
-   mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
-   devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
-   devmem->pagemap.range.start = res->start;
-   devmem->pagemap.range.end = res->end;
devmem->pagemap.nr_range = 1;
devmem->pagemap.ops = _devmem_ops;
devmem->pagemap.owner = mdevice;
@@ -493,10 +515,14 @@ static bool dmirror_allocate_chunk(struct dmirror_device 
*mdevice,
mdevice->devmem_capacity = new_capacity;
mdevice->devmem_chunks = new_chunks;
}
-
ptr = memremap_pages(>pagemap, numa_node_id());
-   if (IS_ERR(ptr))
+   if (IS_ERR_OR_NULL(ptr)) {
+   if (ptr)
+   ret = PTR_ERR(ptr);
+   else
+   ret = -EFAULT;
goto err_release;
+   }
 

[PATCH v2 06/12] drm/amdkfd: add SPM support for SVM

2021-09-13 Thread Alex Sierra
When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_PUBLIC to create the device
page map region.

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
v7:
Remove lookup_resource call, so export symbol for this function
is not longer required. Patch dropped "kernel: resource:
lookup_resource as exported symbol"
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 32 +++-
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index ffad39ffa8c6..d0e04f79a06e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -866,7 +866,7 @@ int svm_migrate_init(struct amdgpu_device *adev)
 {
struct kfd_dev *kfddev = adev->kfd.dev;
struct dev_pagemap *pgmap;
-   struct resource *res;
+   struct resource *res = NULL;
unsigned long size;
void *r;
 
@@ -881,22 +881,29 @@ int svm_migrate_init(struct amdgpu_device *adev)
 * should remove reserved size
 */
size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
-   res = devm_request_free_mem_region(adev->dev, _resource, size);
-   if (IS_ERR(res))
-   return -ENOMEM;
+   if (adev->gmc.xgmi.connected_to_cpu) {
+   pgmap->range.start = adev->gmc.aper_base;
+   pgmap->range.end = adev->gmc.aper_base + adev->gmc.aper_size - 
1;
+   pgmap->type = MEMORY_DEVICE_PUBLIC;
+   } else {
+   res = devm_request_free_mem_region(adev->dev, _resource, 
size);
+   if (IS_ERR(res))
+   return -ENOMEM;
+   pgmap->range.start = res->start;
+   pgmap->range.end = res->end;
+   pgmap->type = MEMORY_DEVICE_PRIVATE;
+   }
 
-   pgmap->type = MEMORY_DEVICE_PRIVATE;
pgmap->nr_range = 1;
-   pgmap->range.start = res->start;
-   pgmap->range.end = res->end;
pgmap->ops = _migrate_pgmap_ops;
pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
-   pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   pgmap->flags = 0;
r = devm_memremap_pages(adev->dev, pgmap);
if (IS_ERR(r)) {
pr_err("failed to register HMM device memory\n");
-   devm_release_mem_region(adev->dev, res->start,
-   res->end - res->start + 1);
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE)
+   devm_release_mem_region(adev->dev, res->start,
+   res->end - res->start + 1);
return PTR_ERR(r);
}
 
@@ -915,6 +922,7 @@ void svm_migrate_fini(struct amdgpu_device *adev)
struct dev_pagemap *pgmap = >kfd.dev->pgmap;
 
devm_memunmap_pages(adev->dev, pgmap);
-   devm_release_mem_region(adev->dev, pgmap->range.start,
-   pgmap->range.end - pgmap->range.start + 1);
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE)
+   devm_release_mem_region(adev->dev, pgmap->range.start,
+   pgmap->range.end - pgmap->range.start + 
1);
 }
-- 
2.32.0



[PATCH v2 02/12] mm: remove extra ZONE_DEVICE struct page refcount

2021-09-13 Thread Alex Sierra
From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
Reviewed-by: Christoph Hellwig 
---
v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type, while
page ref counter is checked less/equal to zero. In case of device zone, pages
ref counter are initialized to zero.

v7:
AS: fix condition at try_grab_page added at v5, is invalid. It supposed
to fix xfstests/generic/413 test, however, there's a known issue on
this test where DAX mapped area DIO to non-DAX expect to fail.
https://patchwork.kernel.org/project/fstests/patch/1489463960-3579-1-git-send-email-xz...@redhat.com
This condition was removed after rebase over patch series
https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com
---
 arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
 fs/dax.c   |  4 +-
 include/linux/dax.h|  2 +-
 include/linux/memremap.h   |  7 +--
 include/linux/mm.h | 11 
 lib/test_hmm.c |  2 +-
 mm/internal.h  |  8 +++
 mm/memremap.c  | 69 +++---
 mm/migrate.c   |  5 --
 mm/page_alloc.c|  3 ++
 mm/swap.c  | 45 ++---
 12 files changed, 45 insertions(+), 115 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..acee67710620 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -711,7 +711,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 
dpage = pfn_to_page(uvmem_pfn);
dpage->zone_device_data = pvt;
-   get_page(dpage);
+   init_page_count(dpage);
lock_page(dpage);
return dpage;
 out_clear:
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 92987daa5e17..8bc7120e1216 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -324,7 +324,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
return NULL;
}
 
-   get_page(page);
+   init_page_count(page);
lock_page(page);
return page;
 }
diff --git a/fs/dax.c b/fs/dax.c
index c387d09e3e5a..1166630b7190 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -571,14 +571,14 @@ static void *grab_mapping_entry(struct xa_state *xas,
 
 /**
  * dax_layout_busy_page_range - find first pinned page in @mapping
- * @mapping: address space to scan for a page with ref count > 1
+ * @mapping: address space to scan for a page with ref count > 0
  * @start: Starting offset. Page containing 'start' is included.
  * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX,
  *   pages from 'start' till the end of file are included.
  *
  * DAX requires ZONE_DEVICE mapped pages. These pages are never
  * 'onlined' to the page allocator so they are considered idle when
- * page->count == 1. A filesystem uses this interface to determine if
+ * page->count == 0. A filesystem uses this interface to determine if
  * any page in the mapping is busy, i.e. for DMA, or other
  * get_user_pages() usages.
  *
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8b5da1d60dbc..05fc982ce153 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -245,7 +245,7 @@ static inline bool dax_mapping(struct address_space 
*mapping)
 
 static inline bool dax_page_unused(struct page *page)
 {
-   return page_ref_count(page) == 1;
+   return page_ref_count(page) == 0;
 }
 
 #define dax_wait_page(_inode, _page, _wait_cb) \
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 45a79da89c5f..77ff5fd0685f 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -66,9 +66,10 @@ enum memory_type {
 
 struct dev_pagemap_ops {
/*
-* Called once the page refcount reaches 1.  (ZONE_DEVICE pages never
-* reach 0 refcount unless there is a refcount bug. This allows the
-* device driver to implement its own memory management.)
+* Called once the page refcount reaches 0. The reference count
+* should be reset to one with init_page_count(page) before reusing
+* the page. This allows the device driver to implement its own
+* memory management.
 */
void (*page_free)(struct page *page);
 
diff --git 

[PATCH v2 05/12] drm/amdkfd: ref count init for device pages

2021-09-13 Thread Alex Sierra
Ref counter from device pages is init to zero during memmap init zone.
The first time a new device page is allocated to migrate data into it,
its ref counter needs to be initialized to one.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index dab290a4d19d..ffad39ffa8c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -220,7 +220,8 @@ svm_migrate_get_vram_page(struct svm_range *prange, 
unsigned long pfn)
page = pfn_to_page(pfn);
svm_range_bo_ref(prange->svm_bo);
page->zone_device_data = prange->svm_bo;
-   get_page(page);
+   VM_BUG_ON_PAGE(page_ref_count(page), page);
+   init_page_count(page);
lock_page(page);
 }
 
-- 
2.32.0



[PATCH v2 07/12] drm/amdkfd: public type as sys mem on migration to ram

2021-09-13 Thread Alex Sierra
Public device type memory on VRAM to RAM migration, has similar access
as System RAM from the CPU. This flag sets the source from the sender.
Which in Public type case, should be set as
MIGRATE_VMA_SELECT_DEVICE_PUBLIC.

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index d0e04f79a06e..b5b9ae4e2e27 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -617,9 +617,12 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.vma = vma;
migrate.start = start;
migrate.end = end;
-   migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
 
+   if (adev->gmc.xgmi.connected_to_cpu)
+   migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PUBLIC;
+   else
+   migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
size *= npages;
buf = kvmalloc(size, GFP_KERNEL | __GFP_ZERO);
-- 
2.32.0



[PATCH v2 04/12] mm: add device public vma selection for memory migration

2021-09-13 Thread Alex Sierra
This case is used to migrate pages from device memory, back to system
memory. Device public type memory is cache coherent from device and CPU
point of view.

Signed-off-by: Alex Sierra 
---
v2:
condition added when migrations from device public pages.
---
 include/linux/migrate.h | 1 +
 mm/migrate.c| 9 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 4bb4e519e3f5..2fe22596e02c 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -156,6 +156,7 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
 enum migrate_vma_direction {
MIGRATE_VMA_SELECT_SYSTEM = 1 << 0,
MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1,
+   MIGRATE_VMA_SELECT_DEVICE_PUBLIC = 1 << 2,
 };
 
 struct migrate_vma {
diff --git a/mm/migrate.c b/mm/migrate.c
index 7392648966d2..036baf24b58b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2406,8 +2406,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (is_write_device_private_entry(entry))
mpfn |= MIGRATE_PFN_WRITE;
} else {
-   if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
-   goto next;
pfn = pte_pfn(pte);
if (is_zero_pfn(pfn)) {
mpfn = MIGRATE_PFN_MIGRATE;
@@ -2415,6 +2413,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
goto next;
}
page = vm_normal_page(migrate->vma, addr, pte);
+   if (!is_zone_device_page(page) &&
+   !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
+   goto next;
+   if (is_zone_device_page(page) &&
+   (!(migrate->flags & 
MIGRATE_VMA_SELECT_DEVICE_PUBLIC) ||
+page->pgmap->owner != migrate->pgmap_owner))
+   goto next;
mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
}
-- 
2.32.0



[PATCH v2 03/12] mm: add zone device public type memory support

2021-09-13 Thread Alex Sierra
Device memory that is cache coherent from device and CPU point of view.
This is use on platform that have an advance system bus (like CAPI or
CCIX). Any page of a process can be migrated to such memory. However,
no one should be allow to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra 
---
 include/linux/memremap.h |  8 
 include/linux/mm.h   |  8 
 mm/memcontrol.c  |  6 +++---
 mm/memory-failure.c  |  6 +-
 mm/memremap.c|  2 ++
 mm/migrate.c | 19 ---
 6 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 77ff5fd0685f..431e1b0bc949 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -39,6 +39,13 @@ struct vmem_altmap {
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.rst.
  *
+ * MEMORY_DEVICE_PUBLIC:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is use on platform that have an advance system bus (like CAPI or CCIX). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one
+ * should be allow to pin such memory so that it can always be evicted.
+ *
  * MEMORY_DEVICE_FS_DAX:
  * Host memory that has similar access semantics as System RAM i.e. DMA
  * coherent and supports page pinning. In support of coordinating page
@@ -59,6 +66,7 @@ struct vmem_altmap {
 enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
+   MEMORY_DEVICE_PUBLIC,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e24c904deeec..70a932e8a2ee 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1187,6 +1187,14 @@ static inline bool is_device_private_page(const struct 
page *page)
page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
+static inline bool is_device_page(const struct page *page)
+{
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   is_zone_device_page(page) &&
+   (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
+   page->pgmap->type == MEMORY_DEVICE_PUBLIC);
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 64ada9e650a5..1599ef1a3b03 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5530,8 +5530,8 @@ static int mem_cgroup_move_account(struct page *page,
  *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
  * target for charge migration. if @target is not NULL, the entry is stored
  * in target->ent.
- *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is 
MEMORY_DEVICE_PRIVATE
- * (so ZONE_DEVICE page and thus not on the lru).
+ *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PUBLIC
+ * or MEMORY_DEVICE_PRIVATE (so ZONE_DEVICE page and thus not on the lru).
  * For now we such page is charge like a regular page would be as for all
  * intent and purposes it is just special memory taking the place of a
  * regular page.
@@ -5565,7 +5565,7 @@ static enum mc_target_type get_mctgt_type(struct 
vm_area_struct *vma,
 */
if (page_memcg(page) == mc.from) {
ret = MC_TARGET_PAGE;
-   if (is_device_private_page(page))
+   if (is_device_page(page))
ret = MC_TARGET_DEVICE;
if (target)
target->page = page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6f5f78885ab4..16cadbabfc99 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1373,12 +1373,16 @@ static int memory_failure_dev_pagemap(unsigned long 
pfn, int flags,
goto unlock;
}
 
-   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+   switch (pgmap->type) {
+   case MEMORY_DEVICE_PRIVATE:
+   case MEMORY_DEVICE_PUBLIC:
/*
 * TODO: Handle HMM pages which may need coordination
 * with device-side memory.
 */
goto unlock;
+   default:
+   break;
}
 
/*
diff --git a/mm/memremap.c b/mm/memremap.c
index ab949a571e78..685be704b28e 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -294,6 +294,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 
switch (pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
+   case MEMORY_DEVICE_PUBLIC:
if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
WARN(1, "Device private memory not supported\n");

[PATCH v2 00/12] MEMORY_DEVICE_PUBLIC for CPU-accessible coherent device memory

2021-09-13 Thread Alex Sierra
v1:
AMD is building a system architecture for the Frontier supercomputer
with a coherent interconnect between CPUs and GPUs. This hardware
architecture allows the CPUs to coherently access GPU device memory.
We have hardware in our labs and we are working with our partner HPE on
the BIOS, firmware and software for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu
driver registers the memory with devmap as MEMORY_DEVICE_PUBLIC using
devm_memremap_pages.

This patch series adds MEMORY_DEVICE_PUBLIC, which is similar to
MEMORY_DEVICE_GENERIC in that it can be mapped for CPU access, but adds
support for migrating this memory similar to MEMORY_DEVICE_PRIVATE. We
also included and updated two patches from Ralph Campbell (Nvidia),
which change ZONE_DEVICE reference counting as requested in previous
reviews of this patch series (see 
https://patchwork.freedesktop.org/series/90706/).
Finally, we extended hmm_test to cover migration of MEMORY_DEVICE_PUBLIC.

This work is based on HMM and our SVM memory manager, which has landed
in Linux 5.14 recently.

v2:
Major changes on this version:
Fold patches: 'mm: call pgmap->ops->page_free for DEVICE_PUBLIC' and
'mm: add public type support to migrate_vma helpers' into 'mm: add
zone device public type memory support'

Condition added at migrate_vma_collect_pmd, for migrations from
device public pages. Making sure pages are from device zone and with
the proper MIGRATE_VMA_SELECT_DEVICE_PUBLIC flag.
Patch: 'mm: add device public vma selection for memory migration'

Fix logic in 'drm/amdkfd: add SPM support for SVM' to detect error in
both DEVICE_PRIVATE and DEVICE_PUBLIC.

Minor changes: 
Swap patch order 03 and 04.

Addings
Add VM_BUG_ON_PAGE(page_ref_count(page), page) to patch 'drm/amdkfd:
ref count init for device pages', to make sure page hasn't been used

Alex Sierra (10):
  mm: add zone device public type memory support
  mm: add device public vma selection for memory migration
  drm/amdkfd: ref count init for device pages
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: public type as sys mem on migration to ram
  lib: test_hmm add ioctl to get zone device type
  lib: test_hmm add module param for zone device type
  lib: add support for device public type in test_hmm
  tools: update hmm-test to support device public type
  tools: update test_hmm script to support SP config

Ralph Campbell (2):
  ext4/xfs: add page refcount helper
  mm: remove extra ZONE_DEVICE struct page refcount

 arch/powerpc/kvm/book3s_hv_uvmem.c   |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  40 ++--
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |   2 +-
 fs/dax.c |   8 +-
 fs/ext4/inode.c  |   5 +-
 fs/fuse/dax.c|   4 +-
 fs/xfs/xfs_file.c|   4 +-
 include/linux/dax.h  |  10 +
 include/linux/memremap.h |  15 +-
 include/linux/migrate.h  |   1 +
 include/linux/mm.h   |  19 +-
 lib/test_hmm.c   | 247 +++
 lib/test_hmm_uapi.h  |  16 ++
 mm/internal.h|   8 +
 mm/memcontrol.c  |   6 +-
 mm/memory-failure.c  |   6 +-
 mm/memremap.c|  71 ++-
 mm/migrate.c |  33 +--
 mm/page_alloc.c  |   3 +
 mm/swap.c|  45 +
 tools/testing/selftests/vm/hmm-tests.c   | 142 +++--
 tools/testing/selftests/vm/test_hmm.sh   |  20 +-
 22 files changed, 451 insertions(+), 256 deletions(-)

-- 
2.32.0



[PATCH v2 01/12] ext4/xfs: add page refcount helper

2021-09-13 Thread Alex Sierra
From: Ralph Campbell 

There are several places where ZONE_DEVICE struct pages assume a reference
count == 1 means the page is idle and free. Instead of open coding this,
add a helper function to hide this detail.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
Reviewed-by: Christoph Hellwig 
Acked-by: Theodore Ts'o 
Acked-by: Darrick J. Wong 
---
v3:
[AS]: rename dax_layout_is_idle_page func to dax_page_unused

v4:
[AS]: This ref count functionality was missing on fuse/dax.c.
---
 fs/dax.c|  4 ++--
 fs/ext4/inode.c |  5 +
 fs/fuse/dax.c   |  4 +---
 fs/xfs/xfs_file.c   |  4 +---
 include/linux/dax.h | 10 ++
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 62352cbcf0f4..c387d09e3e5a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -369,7 +369,7 @@ static void dax_disassociate_entry(void *entry, struct 
address_space *mapping,
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
 
-   WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
+   WARN_ON_ONCE(trunc && !dax_page_unused(page));
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
page->mapping = NULL;
page->index = 0;
@@ -383,7 +383,7 @@ static struct page *dax_busy_page(void *entry)
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
 
-   if (page_ref_count(page) > 1)
+   if (!dax_page_unused(page))
return page;
}
return NULL;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fe6045a46599..05ffe6875cb1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3971,10 +3971,7 @@ int ext4_break_layouts(struct inode *inode)
if (!page)
return 0;
 
-   error = ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1,
-   TASK_INTERRUPTIBLE, 0, 0,
-   ext4_wait_dax_page(ei));
+   error = dax_wait_page(ei, page, ext4_wait_dax_page);
} while (error == 0);
 
return error;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index ff99ab2a3c43..2b1f190ba78a 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -677,9 +677,7 @@ static int __fuse_dax_break_layouts(struct inode *inode, 
bool *retry,
return 0;
 
*retry = true;
-   return ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE,
-   0, 0, fuse_wait_dax_page(inode));
+   return dax_wait_page(inode, page, fuse_wait_dax_page);
 }
 
 /* dmap_end == 0 leads to unmapping of whole file */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 396ef36dcd0a..182057281086 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -840,9 +840,7 @@ xfs_break_dax_layouts(
return 0;
 
*retry = true;
-   return ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE,
-   0, 0, xfs_wait_dax_page(inode));
+   return dax_wait_page(inode, page, xfs_wait_dax_page);
 }
 
 int
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b52f084aa643..8b5da1d60dbc 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -243,6 +243,16 @@ static inline bool dax_mapping(struct address_space 
*mapping)
return mapping->host && IS_DAX(mapping->host);
 }
 
+static inline bool dax_page_unused(struct page *page)
+{
+   return page_ref_count(page) == 1;
+}
+
+#define dax_wait_page(_inode, _page, _wait_cb) \
+   ___wait_var_event(&(_page)->_refcount,  \
+   dax_page_unused(_page), \
+   TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode))
+
 #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
 void hmem_register_device(int target_nid, struct resource *r);
 #else
-- 
2.32.0



Re: [PATCH] amd/display: enable panel orientation quirks

2021-09-13 Thread Harry Wentland
On 2021-09-10 11:37 a.m., Simon Ser wrote:
> This patch allows panel orientation quirks from DRM core to be
> used. They attach a DRM connector property "panel orientation"
> which indicates in which direction the panel has been mounted.
> Some machines have the internal screen mounted with a rotation.
> 
> Since the panel orientation quirks need the native mode from the
> EDID, check for it in amdgpu_dm_connector_ddc_get_modes.
> 
> Signed-off-by: Simon Ser 
> Cc: Alex Deucher 
> Cc: Harry Wentland 
> Cc: Nicholas Kazlauskas 

Reviewed-by: Harry Wentland 

Harry

> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 28 +++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 53363728dbbd..a420602f1794 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -7680,6 +7680,32 @@ static void 
> amdgpu_dm_connector_add_common_modes(struct drm_encoder *encoder,
>   }
>  }
>  
> +static void amdgpu_set_panel_orientation(struct drm_connector *connector)
> +{
> + struct drm_encoder *encoder;
> + struct amdgpu_encoder *amdgpu_encoder;
> + const struct drm_display_mode *native_mode;
> +
> + if (connector->connector_type != DRM_MODE_CONNECTOR_eDP &&
> + connector->connector_type != DRM_MODE_CONNECTOR_LVDS)
> + return;
> +
> + encoder = amdgpu_dm_connector_to_encoder(connector);
> + if (!encoder)
> + return;
> +
> + amdgpu_encoder = to_amdgpu_encoder(encoder);
> +
> + native_mode = _encoder->native_mode;
> + if (native_mode->hdisplay == 0 || native_mode->vdisplay == 0)
> + return;
> +
> + drm_connector_set_panel_orientation_with_quirk(connector,
> +
> DRM_MODE_PANEL_ORIENTATION_UNKNOWN,
> +native_mode->hdisplay,
> +native_mode->vdisplay);
> +}
> +
>  static void amdgpu_dm_connector_ddc_get_modes(struct drm_connector 
> *connector,
> struct edid *edid)
>  {
> @@ -7708,6 +7734,8 @@ static void amdgpu_dm_connector_ddc_get_modes(struct 
> drm_connector *connector,
>* restored here.
>*/
>   amdgpu_dm_update_freesync_caps(connector, edid);
> +
> + amdgpu_set_panel_orientation(connector);
>   } else {
>   amdgpu_dm_connector->num_modes = 0;
>   }
> 



Re: [PATCH 1/1] drm/amdkfd: Add sysfs bitfields and enums to uAPI

2021-09-13 Thread Alex Deucher
On Fri, Sep 10, 2021 at 3:54 PM Felix Kuehling  wrote:
>
> These bits are de-facto part of the uAPI, so declare them in a uAPI header.
>

Please include a link to the userspace that uses this in the commit message.

Alex

> Signed-off-by: Felix Kuehling 
> ---
>  MAINTAINERS   |   1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  46 +
>  include/uapi/linux/kfd_sysfs.h| 108 ++
>  3 files changed, 110 insertions(+), 45 deletions(-)
>  create mode 100644 include/uapi/linux/kfd_sysfs.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 84cd16694640..7554ec928ee2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -930,6 +930,7 @@ F:  drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>  F: drivers/gpu/drm/amd/include/v9_structs.h
>  F: drivers/gpu/drm/amd/include/vi_structs.h
>  F: include/uapi/linux/kfd_ioctl.h
> +F: include/uapi/linux/kfd_sysfs.h
>
>  AMD SPI DRIVER
>  M: Sanjay R Mehta 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index a8db017c9b8e..f0cc59d2fd5d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -25,38 +25,11 @@
>
>  #include 
>  #include 
> +#include 
>  #include "kfd_crat.h"
>
>  #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32
>
> -#define HSA_CAP_HOT_PLUGGABLE  0x0001
> -#define HSA_CAP_ATS_PRESENT0x0002
> -#define HSA_CAP_SHARED_WITH_GRAPHICS   0x0004
> -#define HSA_CAP_QUEUE_SIZE_POW20x0008
> -#define HSA_CAP_QUEUE_SIZE_32BIT   0x0010
> -#define HSA_CAP_QUEUE_IDLE_EVENT   0x0020
> -#define HSA_CAP_VA_LIMIT   0x0040
> -#define HSA_CAP_WATCH_POINTS_SUPPORTED 0x0080
> -#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK0x0f00
> -#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT   8
> -#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK   0x3000
> -#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT  12
> -
> -#define HSA_CAP_DOORBELL_TYPE_PRE_1_0  0x0
> -#define HSA_CAP_DOORBELL_TYPE_1_0  0x1
> -#define HSA_CAP_DOORBELL_TYPE_2_0  0x2
> -#define HSA_CAP_AQL_QUEUE_DOUBLE_MAP   0x4000
> -
> -#define HSA_CAP_RESERVED_WAS_SRAM_EDCSUPPORTED 0x0008 /* Old buggy user 
> mode depends on this being 0 */
> -#define HSA_CAP_MEM_EDCSUPPORTED   0x0010
> -#define HSA_CAP_RASEVENTNOTIFY 0x0020
> -#define HSA_CAP_ASIC_REVISION_MASK 0x03c0
> -#define HSA_CAP_ASIC_REVISION_SHIFT22
> -#define HSA_CAP_SRAM_EDCSUPPORTED  0x0400
> -#define HSA_CAP_SVMAPI_SUPPORTED   0x0800
> -#define HSA_CAP_FLAGS_COHERENTHOSTACCESS   0x1000
> -#define HSA_CAP_RESERVED   0xe00f8000
> -
>  struct kfd_node_properties {
> uint64_t hive_id;
> uint32_t cpu_cores_count;
> @@ -93,17 +66,6 @@ struct kfd_node_properties {
> char name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>  };
>
> -#define HSA_MEM_HEAP_TYPE_SYSTEM   0
> -#define HSA_MEM_HEAP_TYPE_FB_PUBLIC1
> -#define HSA_MEM_HEAP_TYPE_FB_PRIVATE   2
> -#define HSA_MEM_HEAP_TYPE_GPU_GDS  3
> -#define HSA_MEM_HEAP_TYPE_GPU_LDS  4
> -#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH  5
> -
> -#define HSA_MEM_FLAGS_HOT_PLUGGABLE0x0001
> -#define HSA_MEM_FLAGS_NON_VOLATILE 0x0002
> -#define HSA_MEM_FLAGS_RESERVED 0xfffc
> -
>  struct kfd_mem_properties {
> struct list_headlist;
> uint32_theap_type;
> @@ -116,12 +78,6 @@ struct kfd_mem_properties {
> struct attributeattr;
>  };
>
> -#define HSA_CACHE_TYPE_DATA0x0001
> -#define HSA_CACHE_TYPE_INSTRUCTION 0x0002
> -#define HSA_CACHE_TYPE_CPU 0x0004
> -#define HSA_CACHE_TYPE_HSACU   0x0008
> -#define HSA_CACHE_TYPE_RESERVED0xfff0
> -
>  struct kfd_cache_properties {
> struct list_headlist;
> uint32_tprocessor_id_low;
> diff --git a/include/uapi/linux/kfd_sysfs.h b/include/uapi/linux/kfd_sysfs.h
> new file mode 100644
> index ..e1fb78b4bf09
> --- /dev/null
> +++ b/include/uapi/linux/kfd_sysfs.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT WITH Linux-syscall-note */
> +/*
> + * Copyright 2021 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the 

Re: [PATCH] drm/amdkfd: Cast atomic64_read return value

2021-09-13 Thread Felix Kuehling
Am 2021-09-13 um 10:19 a.m. schrieb Michel Dänzer:
> From: Michel Dänzer 
>
> Avoids warning with -Wformat:
>
>   CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c: In function 
> ‘kfd_smi_event_update_thermal_throttling’:
> ../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c:224:60: warning: 
> format ‘%llx’ expects argument of type
>  ‘long long unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
>   224 | len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n",
>   | ~~~^
>   ||
>   |long long 
> unsigned int
>   | %lx
>   225 |KFD_SMI_EVENT_THERMAL_THROTTLE, 
> throttle_bitmask,
>   226 |
> atomic64_read(>smu.throttle_int_counter));
>   |~~
>   ||
>   |long int

That's weird. As far as I can see, atomic64_read is defined to return
s64, which should be the same as long long. Which architecture are you
on? For the record, these are the definition for x86 and x86_64 on Linux
5.13:

./arch/x86/include/asm/atomic64_32.h:static inline s64
arch_atomic64_read(const atomic64_t *v)
./arch/x86/include/asm/atomic64_64.h:static inline s64
arch_atomic64_read(const atomic64_t *v)

Looks like x86 uses int-ll64.h (64-bit types are long-long). Some other
architectures use int-l64.h (64-bit types are long). On architectures
that use int-l64.h, this patch just casts s64 (long) to u64 (unsigned
long), which doesn't fix the problem.

Regards,
  Felix


>
> Signed-off-by: Michel Dänzer 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> index ed4bc5f844ce..46e1c0cda94c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> @@ -223,7 +223,7 @@ void kfd_smi_event_update_thermal_throttling(struct 
> kfd_dev *dev,
>  
>   len = snprintf(fifo_in, sizeof(fifo_in), "%x %llx:%llx\n",
>  KFD_SMI_EVENT_THERMAL_THROTTLE, throttle_bitmask,
> -atomic64_read(>smu.throttle_int_counter));
> +(u64)atomic64_read(>smu.throttle_int_counter));
>  
>   add_event_to_kfifo(dev, KFD_SMI_EVENT_THERMAL_THROTTLE, fifo_in, len);
>  }


Re: [RFC][PATCH] drm/amdgpu/powerplay/smu10: Add custom profile

2021-09-13 Thread Alex Deucher
On Wed, Sep 8, 2021 at 3:23 AM Daniel Gomez  wrote:
>
> On Tue, 7 Sept 2021 at 19:23, Alex Deucher  wrote:
> >
> > On Tue, Sep 7, 2021 at 4:53 AM Daniel Gomez  wrote:
> > >
> > > Add custom power profile mode support on smu10.
> > > Update workload bit list.
> > > ---
> > >
> > > Hi,
> > >
> > > I'm trying to add custom profile for the Raven Ridge but not sure if
> > > I'd need a different parameter than PPSMC_MSG_SetCustomPolicy to
> > > configure the custom values. The code seemed to support CUSTOM for
> > > workload types but it didn't show up in the menu or accept any user
> > > input parameter. So far, I've added that part but a bit confusing to
> > > me what is the policy I need for setting these parameters or if it's
> > > maybe not possible at all.
> > >
> > > After applying the changes I'd configure the CUSTOM mode as follows:
> > >
> > > echo manual > 
> > > /sys/class/drm/card0/device/hwmon/hwmon1/device/power_dpm_force_performance_level
> > > echo "6 70 90 0 0" > 
> > > /sys/class/drm/card0/device/hwmon/hwmon1/device/pp_power_profile_mode
> > >
> > > Then, using Darren Powell script for testing modes I get the following
> > > output:
> > >
> > > 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
> > > [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] 
> > > [1002:15dd] (rev 83)
> > > === pp_dpm_sclk ===
> > > 0: 200Mhz
> > > 1: 400Mhz *
> > > 2: 1100Mhz
> > > === pp_dpm_mclk ===
> > > 0: 400Mhz
> > > 1: 933Mhz *
> > > 2: 1067Mhz
> > > 3: 1200Mhz
> > > === pp_power_profile_mode ===
> > > NUMMODE_NAME BUSY_SET_POINT FPS USE_RLC_BUSY MIN_ACTIVE_LEVEL
> > >   0 BOOTUP_DEFAULT : 70  60  0  0
> > >   1 3D_FULL_SCREEN : 70  60  1  3
> > >   2   POWER_SAVING : 90  60  0  0
> > >   3  VIDEO : 70  60  0  0
> > >   4 VR : 70  90  0  0
> > >   5COMPUTE : 30  60  0  6
> > >   6 CUSTOM*: 70  90  0  0
> > >
> > > As you can also see in my changes, I've also updated the workload bit
> > > table but I'm not completely sure about that change. With the tests
> > > I've done, using bit 5 for the WORKLOAD_PPLIB_CUSTOM_BIT makes the
> > > gpu sclk locked around ~36%. So, maybe I'm missing a clock limit
> > > configuraton table somewhere. Would you give me some hints to
> > > proceed with this?
> >
> > I don't think APUs support customizing the workloads the same way
> > dGPUs do.  I think they just support predefined profiles.
> >
> > Alex
>
>
> Thanks Alex for the quick response. Would it make sense then to remove
> the custom workload code (PP_SMC_POWER_PROFILE_CUSTOM) from the smu10?
> That workload was added in this commit:
> f6f75ebdc06c04d3cfcd100f1b10256a9cdca407 [1] and not use at all in the
> code as it's limited to PP_SMC_POWER_PROFILE_COMPUTE index. The
> smu10.h also includes the custom workload bit definition and that was
> a bit confusing for me to understand if it was half-supported or not
> possible to use at all as I understood from your comment.
>
> Perhaps could also be mentioned (if that's kind of standard) in the
> documentation[2] so, the custom pp_power_profile_mode is only
> supported in dGPUs.
>
> I can send the patches if it makes sense.

I guess I was thinking of another asic.  @Huang Rui, @changzhu, @Quan,
Evan can any of you comment on what is required for custom profiles on
APUs?

Alex


>
> [1]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c?id=f6f75ebdc06c04d3cfcd100f1b10256a9cdca407
> [2]: 
> https://www.kernel.org/doc/html/latest/gpu/amdgpu.html#pp-power-profile-mode
>
> Daniel
>
> >
> >
> > >
> > > Thanks in advance,
> > > Daniel
> > >
> > >
> > >  drivers/gpu/drm/amd/pm/inc/smu10.h| 14 +++--
> > >  .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  | 57 +--
> > >  .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h  |  1 +
> > >  3 files changed, 61 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/pm/inc/smu10.h 
> > > b/drivers/gpu/drm/amd/pm/inc/smu10.h
> > > index 9e837a5014c5..b96520528240 100644
> > > --- a/drivers/gpu/drm/amd/pm/inc/smu10.h
> > > +++ b/drivers/gpu/drm/amd/pm/inc/smu10.h
> > > @@ -136,12 +136,14 @@
> > >  #define FEATURE_CORE_CSTATES_MASK (1 << FEATURE_CORE_CSTATES_BIT)
> > >
> > >  /* Workload bits */
> > > -#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 0
> > > -#define WORKLOAD_PPLIB_VIDEO_BIT  2
> > > -#define WORKLOAD_PPLIB_VR_BIT 3
> > > -#define WORKLOAD_PPLIB_COMPUTE_BIT4
> > > -#define WORKLOAD_PPLIB_CUSTOM_BIT 5
> > > -#define WORKLOAD_PPLIB_COUNT  6
> > > +#define WORKLOAD_DEFAULT_BIT  0
> > > +#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
> > > +#define 

Re: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Andrey Grodzovsky

Please add a tag V2 in description explaining what was the delta from V1.
Other then that looks good to me.

Andrey

On 2021-09-12 7:48 p.m., xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
}
  
  	/* Avoid accidently unparking the sched thread during GPU reset */

-   r = down_read_killable(>reset_sem);
+   r = down_write_killable(>reset_sem);
if (r)
return r;
  
@@ -1387,7 +1387,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

kthread_unpark(ring->sched.thread);
}
  
-	up_read(>reset_sem);

+   up_write(>reset_sem);
  
  	pm_runtime_mark_last_busy(dev->dev);

pm_runtime_put_autosuspend(dev->dev);


Re: [PATCH] drm/amdgpu: use generic fb helpers instead of setting up AMD own's.

2021-09-13 Thread Alex Deucher
On Thu, Sep 9, 2021 at 11:25 PM Evan Quan  wrote:
>
> With the shadow buffer support from generic framebuffer emulation, it's
> possible now to have runpm kicked when no update for console.
>
> Change-Id: I285472c9100ee6f649d3f3f3548f402b9cd34eaf
> Signed-off-by: Evan Quan 
> Acked-by: Christian König 

Reviewed-by: Alex Deucher 

> --
> v1->v2:
>   - rename amdgpu_align_pitch as amdgpu_gem_align_pitch to align with
> other APIs from the same file (Alex)
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  12 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |  11 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  13 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c  | 388 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  30 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h|  20 -
>  7 files changed, 50 insertions(+), 426 deletions(-)
>  delete mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 8d0748184a14..73a2151ee43f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -45,7 +45,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
> amdgpu_atombios.o atombios_crtc.o amdgpu_connectors.o \
> atom.o amdgpu_fence.o amdgpu_ttm.o amdgpu_object.o amdgpu_gart.o \
> amdgpu_encoders.o amdgpu_display.o amdgpu_i2c.o \
> -   amdgpu_fb.o amdgpu_gem.o amdgpu_ring.o \
> +   amdgpu_gem.o amdgpu_ring.o \
> amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o amdgpu_test.o \
> atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
> atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 682d459e992a..bcc308b7f826 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3695,8 +3695,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> /* Get a log2 for easy divisions. */
> adev->mm_stats.log2_max_MBps = ilog2(max(1u, max_MBps));
>
> -   amdgpu_fbdev_init(adev);
> -
> r = amdgpu_pm_sysfs_init(adev);
> if (r) {
> adev->pm_sysfs_en = false;
> @@ -3854,8 +3852,6 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
> amdgpu_ucode_sysfs_fini(adev);
> sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes);
>
> -   amdgpu_fbdev_fini(adev);
> -
> amdgpu_irq_fini_hw(adev);
>
> amdgpu_device_ip_fini_early(adev);
> @@ -3931,7 +3927,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
> drm_kms_helper_poll_disable(dev);
>
> if (fbcon)
> -   amdgpu_fbdev_set_suspend(adev, 1);
> +   
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
>
> cancel_delayed_work_sync(>delayed_init_work);
>
> @@ -4009,7 +4005,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
> flush_delayed_work(>delayed_init_work);
>
> if (fbcon)
> -   amdgpu_fbdev_set_suspend(adev, 0);
> +   
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, false);
>
> drm_kms_helper_poll_enable(dev);
>
> @@ -4638,7 +4634,7 @@ int amdgpu_do_asic_reset(struct list_head 
> *device_list_handle,
> if (r)
> goto out;
>
> -   amdgpu_fbdev_set_suspend(tmp_adev, 0);
> +   
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(tmp_adev)->fb_helper, false);
>
> /*
>  * The GPU enters bad state once faulty pages
> @@ -5025,7 +5021,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>  */
> amdgpu_unregister_gpu_instance(tmp_adev);
>
> -   amdgpu_fbdev_set_suspend(tmp_adev, 1);
> +   
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
>
> /* disable ras on ALL IPs */
> if (!need_emergency_restart &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 7a7316731911..58bfc7f00d76 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -1572,13 +1572,10 @@ int amdgpu_display_suspend_helper(struct 
> amdgpu_device *adev)
> continue;
> }
> robj = gem_to_amdgpu_bo(fb->obj[0]);
> -   /* don't unpin kernel fb objects */
> -   if (!amdgpu_fbdev_robj_is_fb(adev, robj)) {
> -   r = amdgpu_bo_reserve(robj, true);
> -   if (r == 0) {
> -  

[PATCH] drm/amdkfd: Cast atomic64_read return value

2021-09-13 Thread Michel Dänzer
From: Michel Dänzer 

Avoids warning with -Wformat:

  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c: In function 
‘kfd_smi_event_update_thermal_throttling’:
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c:224:60: warning: 
format ‘%llx’ expects argument of type
 ‘long long unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
  224 | len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n",
  | ~~~^
  ||
  |long long 
unsigned int
  | %lx
  225 |KFD_SMI_EVENT_THERMAL_THROTTLE, throttle_bitmask,
  226 |atomic64_read(>smu.throttle_int_counter));
  |~~
  ||
  |long int

Signed-off-by: Michel Dänzer 
---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index ed4bc5f844ce..46e1c0cda94c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -223,7 +223,7 @@ void kfd_smi_event_update_thermal_throttling(struct kfd_dev 
*dev,
 
len = snprintf(fifo_in, sizeof(fifo_in), "%x %llx:%llx\n",
   KFD_SMI_EVENT_THERMAL_THROTTLE, throttle_bitmask,
-  atomic64_read(>smu.throttle_int_counter));
+  (u64)atomic64_read(>smu.throttle_int_counter));
 
add_event_to_kfifo(dev, KFD_SMI_EVENT_THERMAL_THROTTLE, fifo_in, len);
 }
-- 
2.33.0



[PATCH] drm/amdkfd: Cast atomic64_read return value

2021-09-13 Thread Michel Dänzer
From: Michel Dänzer 

Avoids warning with -Wformat:

  CC [M]  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.o
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c: In function 
‘kfd_smi_event_update_thermal_throttling’:
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_smi_events.c:224:60: warning: 
format ‘%llx’ expects argument of type
 ‘long long unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
  224 | len = snprintf(fifo_in, sizeof(fifo_in), "%x %x:%llx\n",
  | ~~~^
  ||
  |long long 
unsigned int
  | %lx
  225 |KFD_SMI_EVENT_THERMAL_THROTTLE, throttle_bitmask,
  226 |atomic64_read(>smu.throttle_int_counter));
  |~~
  ||
  |long int

Signed-off-by: Michel Dänzer 
---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index ed4bc5f844ce..46e1c0cda94c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -223,7 +223,7 @@ void kfd_smi_event_update_thermal_throttling(struct kfd_dev 
*dev,
 
len = snprintf(fifo_in, sizeof(fifo_in), "%x %llx:%llx\n",
   KFD_SMI_EVENT_THERMAL_THROTTLE, throttle_bitmask,
-  atomic64_read(>smu.throttle_int_counter));
+  (u64)atomic64_read(>smu.throttle_int_counter));
 
add_event_to_kfifo(dev, KFD_SMI_EVENT_THERMAL_THROTTLE, fifo_in, len);
 }
-- 
2.33.0



Re: 回复: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Lazar, Lijo

Thanks for the clarification Xinhui.

Based on Christian's explanation, what I understood is - this is an 
exceptional case in debugfs calls and the other goal is to avoid 
maintenance of one more lock just to support this API. I no longer have 
any issues with this approach.


Thanks,
Lijo

On 9/13/2021 12:53 PM, Pan, Xinhui wrote:

[AMD Official Use Only]

Of source IB test can hang the GPU.
But it wait fence with one specific timeout. and it not depends on gpu 
scheduler.
So IB test must can return.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 15:15
收件人: Christian König; Koenig, Christian; Pan, Xinhui; 
amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 12:21 PM, Christian König wrote:

Keep in mind that we don't try to avoid contention here. The goal is
rather to have as few locks as possible to avoid the extra overhead in
the hot path.

Contention is completely irrelevant for the debug and device reset since
that are rarely occurring events and performance doesn't matter for them.

It is perfectly reasonable to take the write side of the reset lock as
necessary when we need to make sure that we don't have concurrent device
access.


The original code has down_read which gave the impression that there is
some protection to avoid access during reset. Basically would like to
avoid this as a precedence for this sort of usage for any debugfs call.
Reset semaphore is supposed to be a 'protect all' thing and provides a
shortcut.

BTW, question about a hypothetical case - what happens if the test
itself causes a hang and need to trigger a reset? Will there be chance
for the lock to be released (whether a submit call will hang
indefinitely) for the actual reset to be executed?

Thanks,
Lijo



Regards,
Christian.

Am 13.09.21 um 08:43 schrieb Lazar, Lijo:

There are other interfaces to emulate the exact reset process, or
atleast this is not the one we are using for doing any sort of reset
through debugfs.

In any case, the expectation is reset thread takes the write side of
the lock and it's already done somewhere else.

Reset semaphore is supposed to protect the device from concurrent
access (any sort of resource usage is thus protected by default). Then
the same logic can be applied for any other call and that is not a
reasonable ask.

Thanks,
Lijo

On 9/13/2021 12:07 PM, Christian König wrote:

That's complete nonsense.

The debugfs interface emulates parts of the reset procedure for
testing and we absolutely need to take the same locks as the reset to
avoid corruption of the involved objects.

Regards,
Christian.

Am 13.09.21 um 08:25 schrieb Lazar, Lijo:

This is a debugfs interface and adding another writer contention in
debugfs over an actual reset is lazy fix. This shouldn't be executed
in the first place and should not take precedence over any reset.

Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from
concurrent modification, so taking the write side of it is
perfectly valid here.

Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before
we issue test_ib on each ring.

发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
}

/* Avoid accidently unparking the sched thread during GPU
reset */
- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);

There are many ioctls and debugfs calls which takes this lock and
as you
know the purpose is to avoid them while there is a reset. The
purpose is
*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just
lazy and
not acceptable.

This will take away any fairness given to the writer in this rw
lock and
that is supposed to be the reset thread.

Thanks,
Lijo


if (r)
return r;

@@ -1387,7 +1387,7 @@ static int
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
kthread_unpark(ring->sched.thread);
}


RE: [PATCH v4 2/3] drm/amdgpu: VCE avoid memory allocation during IB test

2021-09-13 Thread Liu, Leo
[AMD Official Use Only]

256 bytes alignment is for Video HW that is with GFX9, so it should be fine in 
general.

Regards,
Leo


-Original Message-
From: Koenig, Christian 
Sent: September 13, 2021 5:04 AM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Liu, Leo ; 
Zhu, James 
Subject: Re: [PATCH v4 2/3] drm/amdgpu: VCE avoid memory allocation during IB 
test

Am 13.09.21 um 10:42 schrieb xinhui pan:
> alloc extra msg from direct IB pool.
>
> Signed-off-by: xinhui pan 

It would be cleaner if Leo could confirm that 256 byte alignment would work as 
well.

But either way Reviewed-by: Christian König 

Regards,
Christian.

> ---
> change from v1:
> msg is allocated separately.
> msg is aligned to gpu page boundary
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 27 -
>   1 file changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index e9fdf49d69e8..caa4d3420e00 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -82,7 +82,6 @@ MODULE_FIRMWARE(FIRMWARE_VEGA20);
>
>   static void amdgpu_vce_idle_work_handler(struct work_struct *work);
>   static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t 
> handle,
> -  struct amdgpu_bo *bo,
>struct dma_fence **fence);
>   static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t 
> handle,
> bool direct, struct dma_fence **fence); 
> @@ -441,12 +440,12
> @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, struct drm_file 
> *filp)
>* Open up a stream for HW test
>*/
>   static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t 
> handle,
> -  struct amdgpu_bo *bo,
>struct dma_fence **fence)
>   {
>   const unsigned ib_size_dw = 1024;
>   struct amdgpu_job *job;
>   struct amdgpu_ib *ib;
> + struct amdgpu_ib ib_msg;
>   struct dma_fence *f = NULL;
>   uint64_t addr;
>   int i, r;
> @@ -456,9 +455,17 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring 
> *ring, uint32_t handle,
>   if (r)
>   return r;
>
> - ib = >ibs[0];
> + memset(_msg, 0, sizeof(ib_msg));
> + /* only one gpu page is needed, alloc +1 page to make addr aligned. */
> + r = amdgpu_ib_get(ring->adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
> +   AMDGPU_IB_POOL_DIRECT,
> +   _msg);
> + if (r)
> + goto err;
>
> - addr = amdgpu_bo_gpu_offset(bo);
> + ib = >ibs[0];
> + /* let addr point to page boundary */
> + addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg.gpu_addr);
>
>   /* stitch together an VCE create msg */
>   ib->length_dw = 0;
> @@ -498,6 +505,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring 
> *ring, uint32_t handle,
>   ib->ptr[i] = 0x0;
>
>   r = amdgpu_job_submit_direct(job, ring, );
> + amdgpu_ib_free(ring->adev, _msg, f);
>   if (r)
>   goto err;
>
> @@ -1134,20 +1142,13 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring 
> *ring)
>   int amdgpu_vce_ring_test_ib(struct amdgpu_ring *ring, long timeout)
>   {
>   struct dma_fence *fence = NULL;
> - struct amdgpu_bo *bo = NULL;
>   long r;
>
>   /* skip vce ring1/2 ib test for now, since it's not reliable */
>   if (ring != >adev->vce.ring[0])
>   return 0;
>
> - r = amdgpu_bo_create_reserved(ring->adev, 512, PAGE_SIZE,
> -   AMDGPU_GEM_DOMAIN_VRAM,
> -   , NULL, NULL);
> - if (r)
> - return r;
> -
> - r = amdgpu_vce_get_create_msg(ring, 1, bo, NULL);
> + r = amdgpu_vce_get_create_msg(ring, 1, NULL);
>   if (r)
>   goto error;
>
> @@ -1163,8 +1164,6 @@ int amdgpu_vce_ring_test_ib(struct amdgpu_ring
> *ring, long timeout)
>
>   error:
>   dma_fence_put(fence);
> - amdgpu_bo_unreserve(bo);
> - amdgpu_bo_free_kernel(, NULL, NULL);
>   return r;
>   }
>



Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Christian König

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is perfectly 
valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before we issue 
test_ib on each ring.

发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
   }

   /* Avoid accidently unparking the sched thread during GPU reset */
- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);

There are many ioctls and debugfs calls which takes this lock and as you
know the purpose is to avoid them while there is a reset. The purpose is
*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just lazy and
not acceptable.

This will take away any fairness given to the writer in this rw lock and
that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);





Re: [PATCH v4 3/3] drm/amdgpu: VCN avoid memory allocation during IB test

2021-09-13 Thread Christian König

Am 13.09.21 um 10:42 schrieb xinhui pan:

alloc extra msg from direct IB pool.

Reviewed-by: Christian König 
Signed-off-by: xinhui pan 


Reviewed-by: Christian König 


---
change from v1:
msg is aligned to gpu page boundary
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 97 +++--
  1 file changed, 44 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 561296a85b43..b60b8fe5bf67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -541,15 +541,14 @@ int amdgpu_vcn_dec_sw_ring_test_ring(struct amdgpu_ring 
*ring)
  }
  
  static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,

-  struct amdgpu_bo *bo,
+  struct amdgpu_ib *ib_msg,
   struct dma_fence **fence)
  {
struct amdgpu_device *adev = ring->adev;
struct dma_fence *f = NULL;
struct amdgpu_job *job;
struct amdgpu_ib *ib;
-   uint64_t addr;
-   void *msg = NULL;
+   uint64_t addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg->gpu_addr);
int i, r;
  
  	r = amdgpu_job_alloc_with_ib(adev, 64,

@@ -558,8 +557,6 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
goto err;
  
  	ib = >ibs[0];

-   addr = amdgpu_bo_gpu_offset(bo);
-   msg = amdgpu_bo_kptr(bo);
ib->ptr[0] = PACKET0(adev->vcn.internal.data0, 0);
ib->ptr[1] = addr;
ib->ptr[2] = PACKET0(adev->vcn.internal.data1, 0);
@@ -576,9 +573,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
if (r)
goto err_free;
  
-	amdgpu_bo_fence(bo, f, false);

-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_free_kernel(, NULL, (void **));
+   amdgpu_ib_free(adev, ib_msg, f);
  
  	if (fence)

*fence = dma_fence_get(f);
@@ -588,27 +583,26 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring 
*ring,
  
  err_free:

amdgpu_job_free(job);
-
  err:
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_free_kernel(, NULL, (void **));
+   amdgpu_ib_free(adev, ib_msg, f);
return r;
  }
  
  static int amdgpu_vcn_dec_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,

-struct amdgpu_bo **bo)
+   struct amdgpu_ib *ib)
  {
struct amdgpu_device *adev = ring->adev;
uint32_t *msg;
int r, i;
  
-	*bo = NULL;

-   r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
- bo, NULL, (void **));
+   memset(ib, 0, sizeof(*ib));
+   r = amdgpu_ib_get(adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
+   AMDGPU_IB_POOL_DIRECT,
+   ib);
if (r)
return r;
  
+	msg = (uint32_t *)AMDGPU_GPU_PAGE_ALIGN((unsigned long)ib->ptr);

msg[0] = cpu_to_le32(0x0028);
msg[1] = cpu_to_le32(0x0038);
msg[2] = cpu_to_le32(0x0001);
@@ -630,19 +624,20 @@ static int amdgpu_vcn_dec_get_create_msg(struct 
amdgpu_ring *ring, uint32_t hand
  }
  
  static int amdgpu_vcn_dec_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,

- struct amdgpu_bo **bo)
+ struct amdgpu_ib *ib)
  {
struct amdgpu_device *adev = ring->adev;
uint32_t *msg;
int r, i;
  
-	*bo = NULL;

-   r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
- bo, NULL, (void **));
+   memset(ib, 0, sizeof(*ib));
+   r = amdgpu_ib_get(adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
+   AMDGPU_IB_POOL_DIRECT,
+   ib);
if (r)
return r;
  
+	msg = (uint32_t *)AMDGPU_GPU_PAGE_ALIGN((unsigned long)ib->ptr);

msg[0] = cpu_to_le32(0x0028);
msg[1] = cpu_to_le32(0x0018);
msg[2] = cpu_to_le32(0x);
@@ -658,21 +653,21 @@ static int amdgpu_vcn_dec_get_destroy_msg(struct 
amdgpu_ring *ring, uint32_t han
  int amdgpu_vcn_dec_ring_test_ib(struct amdgpu_ring *ring, long timeout)
  {
struct dma_fence *fence = NULL;
-   struct amdgpu_bo *bo;
+   struct amdgpu_ib ib;
long r;
  
-	r = amdgpu_vcn_dec_get_create_msg(ring, 1, );

+   r = amdgpu_vcn_dec_get_create_msg(ring, 1, );
if (r)
goto error;
  
-	r = amdgpu_vcn_dec_send_msg(ring, bo, NULL);

+   r = amdgpu_vcn_dec_send_msg(ring, , NULL);
if (r)
goto error;
-   r = amdgpu_vcn_dec_get_destroy_msg(ring, 1, );
+   r = amdgpu_vcn_dec_get_destroy_msg(ring, 1, );
if (r)
goto error;
  
-	r = amdgpu_vcn_dec_send_msg(ring, bo, );

+   r = 

Re: [PATCH v4 2/3] drm/amdgpu: VCE avoid memory allocation during IB test

2021-09-13 Thread Christian König

Am 13.09.21 um 10:42 schrieb xinhui pan:

alloc extra msg from direct IB pool.

Signed-off-by: xinhui pan 


It would be cleaner if Leo could confirm that 256 byte alignment would 
work as well.


But either way Reviewed-by: Christian König 

Regards,
Christian.


---
change from v1:
msg is allocated separately.
msg is aligned to gpu page boundary
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 27 -
  1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index e9fdf49d69e8..caa4d3420e00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -82,7 +82,6 @@ MODULE_FIRMWARE(FIRMWARE_VEGA20);
  
  static void amdgpu_vce_idle_work_handler(struct work_struct *work);

  static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t 
handle,
-struct amdgpu_bo *bo,
 struct dma_fence **fence);
  static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t 
handle,
  bool direct, struct dma_fence **fence);
@@ -441,12 +440,12 @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, 
struct drm_file *filp)
   * Open up a stream for HW test
   */
  static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t 
handle,
-struct amdgpu_bo *bo,
 struct dma_fence **fence)
  {
const unsigned ib_size_dw = 1024;
struct amdgpu_job *job;
struct amdgpu_ib *ib;
+   struct amdgpu_ib ib_msg;
struct dma_fence *f = NULL;
uint64_t addr;
int i, r;
@@ -456,9 +455,17 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring 
*ring, uint32_t handle,
if (r)
return r;
  
-	ib = >ibs[0];

+   memset(_msg, 0, sizeof(ib_msg));
+   /* only one gpu page is needed, alloc +1 page to make addr aligned. */
+   r = amdgpu_ib_get(ring->adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
+ AMDGPU_IB_POOL_DIRECT,
+ _msg);
+   if (r)
+   goto err;
  
-	addr = amdgpu_bo_gpu_offset(bo);

+   ib = >ibs[0];
+   /* let addr point to page boundary */
+   addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg.gpu_addr);
  
  	/* stitch together an VCE create msg */

ib->length_dw = 0;
@@ -498,6 +505,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring 
*ring, uint32_t handle,
ib->ptr[i] = 0x0;
  
  	r = amdgpu_job_submit_direct(job, ring, );

+   amdgpu_ib_free(ring->adev, _msg, f);
if (r)
goto err;
  
@@ -1134,20 +1142,13 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)

  int amdgpu_vce_ring_test_ib(struct amdgpu_ring *ring, long timeout)
  {
struct dma_fence *fence = NULL;
-   struct amdgpu_bo *bo = NULL;
long r;
  
  	/* skip vce ring1/2 ib test for now, since it's not reliable */

if (ring != >adev->vce.ring[0])
return 0;
  
-	r = amdgpu_bo_create_reserved(ring->adev, 512, PAGE_SIZE,

- AMDGPU_GEM_DOMAIN_VRAM,
- , NULL, NULL);
-   if (r)
-   return r;
-
-   r = amdgpu_vce_get_create_msg(ring, 1, bo, NULL);
+   r = amdgpu_vce_get_create_msg(ring, 1, NULL);
if (r)
goto error;
  
@@ -1163,8 +1164,6 @@ int amdgpu_vce_ring_test_ib(struct amdgpu_ring *ring, long timeout)
  
  error:

dma_fence_put(fence);
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_free_kernel(, NULL, NULL);
return r;
  }
  




Re: [PATCH v4 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-13 Thread Christian König

Am 13.09.21 um 10:42 schrieb xinhui pan:

move BO allocation in sw_init.

Signed-off-by: xinhui pan 
---
change from v3:
drop the bo resv lock in ib test.
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 102 
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |   1 +
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |  11 +--
  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |  11 +--
  4 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index d451c359606a..b0fbd5a1d5af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -134,6 +134,51 @@ MODULE_FIRMWARE(FIRMWARE_VEGA12);
  MODULE_FIRMWARE(FIRMWARE_VEGA20);
  
  static void amdgpu_uvd_idle_work_handler(struct work_struct *work);

+static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
+
+static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
+  uint32_t size,
+  struct amdgpu_bo **bo_ptr)
+{
+   struct ttm_operation_ctx ctx = { true, false };
+   struct amdgpu_bo *bo = NULL;
+   void *addr;
+   int r;
+
+   r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_GTT,
+ , NULL, );
+   if (r)
+   return r;
+
+   if (adev->uvd.address_64_bit)
+   goto succ;
+
+   amdgpu_bo_kunmap(bo);
+   amdgpu_bo_unpin(bo);
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
+   amdgpu_uvd_force_into_uvd_segment(bo);
+   r = ttm_bo_validate(>tbo, >placement, );
+   if (r)
+   goto err;
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM);
+   if (r)
+   goto err_pin;
+   r = amdgpu_bo_kmap(bo, );
+   if (r)
+   goto err_kmap;
+succ:
+   amdgpu_bo_unreserve(bo);
+   *bo_ptr = bo;
+   return 0;
+err_kmap:
+   amdgpu_bo_unpin(bo);
+err_pin:
+err:
+   amdgpu_bo_unreserve(bo);
+   amdgpu_bo_unref();
+   return r;
+}
  
  int amdgpu_uvd_sw_init(struct amdgpu_device *adev)

  {
@@ -302,6 +347,10 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 
0))
adev->uvd.address_64_bit = true;
  
+	r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, >uvd.ib_bo);

+   if (r)
+   return r;
+
switch (adev->asic_type) {
case CHIP_TONGA:
adev->uvd.use_ctx_buf = adev->uvd.fw_version >= FW_1_65_10;
@@ -324,6 +373,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
  
  int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)

  {
+   void *addr = amdgpu_bo_kptr(adev->uvd.ib_bo);
int i, j;
  
  	drm_sched_entity_destroy(>uvd.entity);

@@ -342,6 +392,7 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i)
amdgpu_ring_fini(>uvd.inst[j].ring_enc[i]);
}
+   amdgpu_bo_free_kernel(>uvd.ib_bo, NULL, );
release_firmware(adev->uvd.fw);
  
  	return 0;

@@ -1080,23 +1131,10 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring 
*ring, struct amdgpu_bo *bo,
unsigned offset_idx = 0;
unsigned offset[3] = { UVD_BASE_SI, 0, 0 };
  
-	amdgpu_bo_kunmap(bo);

-   amdgpu_bo_unpin(bo);
-
-   if (!ring->adev->uvd.address_64_bit) {
-   struct ttm_operation_ctx ctx = { true, false };
-
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
-   amdgpu_uvd_force_into_uvd_segment(bo);
-   r = ttm_bo_validate(>tbo, >placement, );
-   if (r)
-   goto err;
-   }
-
r = amdgpu_job_alloc_with_ib(adev, 64, direct ? AMDGPU_IB_POOL_DIRECT :
 AMDGPU_IB_POOL_DELAYED, );
if (r)
-   goto err;
+   return r;
  
  	if (adev->asic_type >= CHIP_VEGA10) {

offset_idx = 1 + ring->me;
@@ -1148,8 +1186,6 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, 
struct amdgpu_bo *bo,
}
  
  	amdgpu_bo_fence(bo, f, false);

-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_unref();
  
  	if (fence)

*fence = dma_fence_get(f);
@@ -1159,10 +1195,6 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, 
struct amdgpu_bo *bo,
  
  err_free:

amdgpu_job_free(job);
-
-err:
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_unref();
return r;
  }
  
@@ -1173,16 +1205,11 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,

  struct dma_fence **fence)
  {
struct amdgpu_device *adev = ring->adev;
-   struct amdgpu_bo *bo = NULL;
+   struct amdgpu_bo *bo = adev->uvd.ib_bo;
 

[PATCH 1/2] drm/amdgpu: Clarify that TMZ unsupported message is due to hardware

2021-09-13 Thread Paul Menzel
The warning

amdgpu :05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported

leaves the reader wondering, if anything can be done about it. As it’s
unsupported in the hardware, and nothing can be done about, mention that
in the log message.

amdgpu :05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not 
supported by hardware

Signed-off-by: Paul Menzel 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index c7797eac83c3..c4c56c57b0c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -599,7 +599,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
default:
adev->gmc.tmz_enabled = false;
dev_warn(adev->dev,
-"Trusted Memory Zone (TMZ) feature not supported\n");
+"Trusted Memory Zone (TMZ) feature not supported by 
hardware\n");
break;
}
 }
-- 
2.33.0



[PATCH 2/2] drm/amdgpu: Demote TMZ unsupported log message from warning to info

2021-09-13 Thread Paul Menzel
As the user cannot do anything about the unsupported Trusted Memory Zone
(TMZ) feature, do not warn about it, but make it informational, so
demote the log level from warning to info.

Signed-off-by: Paul Menzel 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index c4c56c57b0c0..bfa0275ff5d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -598,7 +598,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
break;
default:
adev->gmc.tmz_enabled = false;
-   dev_warn(adev->dev,
+   dev_info(adev->dev,
 "Trusted Memory Zone (TMZ) feature not supported by 
hardware\n");
break;
}
-- 
2.33.0



[PATCH v4 3/3] drm/amdgpu: VCN avoid memory allocation during IB test

2021-09-13 Thread xinhui pan
alloc extra msg from direct IB pool.

Reviewed-by: Christian König 
Signed-off-by: xinhui pan 
---
change from v1:
msg is aligned to gpu page boundary
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 97 +++--
 1 file changed, 44 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 561296a85b43..b60b8fe5bf67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -541,15 +541,14 @@ int amdgpu_vcn_dec_sw_ring_test_ring(struct amdgpu_ring 
*ring)
 }
 
 static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
-  struct amdgpu_bo *bo,
+  struct amdgpu_ib *ib_msg,
   struct dma_fence **fence)
 {
struct amdgpu_device *adev = ring->adev;
struct dma_fence *f = NULL;
struct amdgpu_job *job;
struct amdgpu_ib *ib;
-   uint64_t addr;
-   void *msg = NULL;
+   uint64_t addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg->gpu_addr);
int i, r;
 
r = amdgpu_job_alloc_with_ib(adev, 64,
@@ -558,8 +557,6 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
goto err;
 
ib = >ibs[0];
-   addr = amdgpu_bo_gpu_offset(bo);
-   msg = amdgpu_bo_kptr(bo);
ib->ptr[0] = PACKET0(adev->vcn.internal.data0, 0);
ib->ptr[1] = addr;
ib->ptr[2] = PACKET0(adev->vcn.internal.data1, 0);
@@ -576,9 +573,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
if (r)
goto err_free;
 
-   amdgpu_bo_fence(bo, f, false);
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_free_kernel(, NULL, (void **));
+   amdgpu_ib_free(adev, ib_msg, f);
 
if (fence)
*fence = dma_fence_get(f);
@@ -588,27 +583,26 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring 
*ring,
 
 err_free:
amdgpu_job_free(job);
-
 err:
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_free_kernel(, NULL, (void **));
+   amdgpu_ib_free(adev, ib_msg, f);
return r;
 }
 
 static int amdgpu_vcn_dec_get_create_msg(struct amdgpu_ring *ring, uint32_t 
handle,
-struct amdgpu_bo **bo)
+   struct amdgpu_ib *ib)
 {
struct amdgpu_device *adev = ring->adev;
uint32_t *msg;
int r, i;
 
-   *bo = NULL;
-   r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
- bo, NULL, (void **));
+   memset(ib, 0, sizeof(*ib));
+   r = amdgpu_ib_get(adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
+   AMDGPU_IB_POOL_DIRECT,
+   ib);
if (r)
return r;
 
+   msg = (uint32_t *)AMDGPU_GPU_PAGE_ALIGN((unsigned long)ib->ptr);
msg[0] = cpu_to_le32(0x0028);
msg[1] = cpu_to_le32(0x0038);
msg[2] = cpu_to_le32(0x0001);
@@ -630,19 +624,20 @@ static int amdgpu_vcn_dec_get_create_msg(struct 
amdgpu_ring *ring, uint32_t hand
 }
 
 static int amdgpu_vcn_dec_get_destroy_msg(struct amdgpu_ring *ring, uint32_t 
handle,
- struct amdgpu_bo **bo)
+ struct amdgpu_ib *ib)
 {
struct amdgpu_device *adev = ring->adev;
uint32_t *msg;
int r, i;
 
-   *bo = NULL;
-   r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
- bo, NULL, (void **));
+   memset(ib, 0, sizeof(*ib));
+   r = amdgpu_ib_get(adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
+   AMDGPU_IB_POOL_DIRECT,
+   ib);
if (r)
return r;
 
+   msg = (uint32_t *)AMDGPU_GPU_PAGE_ALIGN((unsigned long)ib->ptr);
msg[0] = cpu_to_le32(0x0028);
msg[1] = cpu_to_le32(0x0018);
msg[2] = cpu_to_le32(0x);
@@ -658,21 +653,21 @@ static int amdgpu_vcn_dec_get_destroy_msg(struct 
amdgpu_ring *ring, uint32_t han
 int amdgpu_vcn_dec_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
struct dma_fence *fence = NULL;
-   struct amdgpu_bo *bo;
+   struct amdgpu_ib ib;
long r;
 
-   r = amdgpu_vcn_dec_get_create_msg(ring, 1, );
+   r = amdgpu_vcn_dec_get_create_msg(ring, 1, );
if (r)
goto error;
 
-   r = amdgpu_vcn_dec_send_msg(ring, bo, NULL);
+   r = amdgpu_vcn_dec_send_msg(ring, , NULL);
if (r)
goto error;
-   r = amdgpu_vcn_dec_get_destroy_msg(ring, 1, );
+   r = amdgpu_vcn_dec_get_destroy_msg(ring, 1, );
if (r)
goto error;
 
-   r = amdgpu_vcn_dec_send_msg(ring, bo, );
+   r = amdgpu_vcn_dec_send_msg(ring, , );
if (r)
goto error;
 

[PATCH v4 2/3] drm/amdgpu: VCE avoid memory allocation during IB test

2021-09-13 Thread xinhui pan
alloc extra msg from direct IB pool.

Signed-off-by: xinhui pan 
---
change from v1:
msg is allocated separately.
msg is aligned to gpu page boundary
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 27 -
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index e9fdf49d69e8..caa4d3420e00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -82,7 +82,6 @@ MODULE_FIRMWARE(FIRMWARE_VEGA20);
 
 static void amdgpu_vce_idle_work_handler(struct work_struct *work);
 static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
-struct amdgpu_bo *bo,
 struct dma_fence **fence);
 static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t 
handle,
  bool direct, struct dma_fence **fence);
@@ -441,12 +440,12 @@ void amdgpu_vce_free_handles(struct amdgpu_device *adev, 
struct drm_file *filp)
  * Open up a stream for HW test
  */
 static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
-struct amdgpu_bo *bo,
 struct dma_fence **fence)
 {
const unsigned ib_size_dw = 1024;
struct amdgpu_job *job;
struct amdgpu_ib *ib;
+   struct amdgpu_ib ib_msg;
struct dma_fence *f = NULL;
uint64_t addr;
int i, r;
@@ -456,9 +455,17 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring 
*ring, uint32_t handle,
if (r)
return r;
 
-   ib = >ibs[0];
+   memset(_msg, 0, sizeof(ib_msg));
+   /* only one gpu page is needed, alloc +1 page to make addr aligned. */
+   r = amdgpu_ib_get(ring->adev, NULL, AMDGPU_GPU_PAGE_SIZE * 2,
+ AMDGPU_IB_POOL_DIRECT,
+ _msg);
+   if (r)
+   goto err;
 
-   addr = amdgpu_bo_gpu_offset(bo);
+   ib = >ibs[0];
+   /* let addr point to page boundary */
+   addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg.gpu_addr);
 
/* stitch together an VCE create msg */
ib->length_dw = 0;
@@ -498,6 +505,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring 
*ring, uint32_t handle,
ib->ptr[i] = 0x0;
 
r = amdgpu_job_submit_direct(job, ring, );
+   amdgpu_ib_free(ring->adev, _msg, f);
if (r)
goto err;
 
@@ -1134,20 +1142,13 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
 int amdgpu_vce_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
struct dma_fence *fence = NULL;
-   struct amdgpu_bo *bo = NULL;
long r;
 
/* skip vce ring1/2 ib test for now, since it's not reliable */
if (ring != >adev->vce.ring[0])
return 0;
 
-   r = amdgpu_bo_create_reserved(ring->adev, 512, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_VRAM,
- , NULL, NULL);
-   if (r)
-   return r;
-
-   r = amdgpu_vce_get_create_msg(ring, 1, bo, NULL);
+   r = amdgpu_vce_get_create_msg(ring, 1, NULL);
if (r)
goto error;
 
@@ -1163,8 +1164,6 @@ int amdgpu_vce_ring_test_ib(struct amdgpu_ring *ring, 
long timeout)
 
 error:
dma_fence_put(fence);
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_free_kernel(, NULL, NULL);
return r;
 }
 
-- 
2.25.1



[PATCH v4 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-13 Thread xinhui pan
move BO allocation in sw_init.

Signed-off-by: xinhui pan 
---
change from v3:
drop the bo resv lock in ib test.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 102 
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |   1 +
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |  11 +--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |  11 +--
 4 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index d451c359606a..b0fbd5a1d5af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -134,6 +134,51 @@ MODULE_FIRMWARE(FIRMWARE_VEGA12);
 MODULE_FIRMWARE(FIRMWARE_VEGA20);
 
 static void amdgpu_uvd_idle_work_handler(struct work_struct *work);
+static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
+
+static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
+  uint32_t size,
+  struct amdgpu_bo **bo_ptr)
+{
+   struct ttm_operation_ctx ctx = { true, false };
+   struct amdgpu_bo *bo = NULL;
+   void *addr;
+   int r;
+
+   r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_GTT,
+ , NULL, );
+   if (r)
+   return r;
+
+   if (adev->uvd.address_64_bit)
+   goto succ;
+
+   amdgpu_bo_kunmap(bo);
+   amdgpu_bo_unpin(bo);
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
+   amdgpu_uvd_force_into_uvd_segment(bo);
+   r = ttm_bo_validate(>tbo, >placement, );
+   if (r)
+   goto err;
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM);
+   if (r)
+   goto err_pin;
+   r = amdgpu_bo_kmap(bo, );
+   if (r)
+   goto err_kmap;
+succ:
+   amdgpu_bo_unreserve(bo);
+   *bo_ptr = bo;
+   return 0;
+err_kmap:
+   amdgpu_bo_unpin(bo);
+err_pin:
+err:
+   amdgpu_bo_unreserve(bo);
+   amdgpu_bo_unref();
+   return r;
+}
 
 int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 {
@@ -302,6 +347,10 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 
0))
adev->uvd.address_64_bit = true;
 
+   r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, >uvd.ib_bo);
+   if (r)
+   return r;
+
switch (adev->asic_type) {
case CHIP_TONGA:
adev->uvd.use_ctx_buf = adev->uvd.fw_version >= FW_1_65_10;
@@ -324,6 +373,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 
 int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
 {
+   void *addr = amdgpu_bo_kptr(adev->uvd.ib_bo);
int i, j;
 
drm_sched_entity_destroy(>uvd.entity);
@@ -342,6 +392,7 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i)
amdgpu_ring_fini(>uvd.inst[j].ring_enc[i]);
}
+   amdgpu_bo_free_kernel(>uvd.ib_bo, NULL, );
release_firmware(adev->uvd.fw);
 
return 0;
@@ -1080,23 +1131,10 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring 
*ring, struct amdgpu_bo *bo,
unsigned offset_idx = 0;
unsigned offset[3] = { UVD_BASE_SI, 0, 0 };
 
-   amdgpu_bo_kunmap(bo);
-   amdgpu_bo_unpin(bo);
-
-   if (!ring->adev->uvd.address_64_bit) {
-   struct ttm_operation_ctx ctx = { true, false };
-
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
-   amdgpu_uvd_force_into_uvd_segment(bo);
-   r = ttm_bo_validate(>tbo, >placement, );
-   if (r)
-   goto err;
-   }
-
r = amdgpu_job_alloc_with_ib(adev, 64, direct ? AMDGPU_IB_POOL_DIRECT :
 AMDGPU_IB_POOL_DELAYED, );
if (r)
-   goto err;
+   return r;
 
if (adev->asic_type >= CHIP_VEGA10) {
offset_idx = 1 + ring->me;
@@ -1148,8 +1186,6 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, 
struct amdgpu_bo *bo,
}
 
amdgpu_bo_fence(bo, f, false);
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_unref();
 
if (fence)
*fence = dma_fence_get(f);
@@ -1159,10 +1195,6 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, 
struct amdgpu_bo *bo,
 
 err_free:
amdgpu_job_free(job);
-
-err:
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_unref();
return r;
 }
 
@@ -1173,16 +1205,11 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, 
uint32_t handle,
  struct dma_fence **fence)
 {
struct amdgpu_device *adev = ring->adev;
-   struct amdgpu_bo *bo = NULL;
+   struct amdgpu_bo *bo = adev->uvd.ib_bo;
uint32_t *msg;
int r, i;
 
-

Re: [PATCH 1/2] drm/amdgpu: Clarify that TMZ unsupported message is due to hardware

2021-09-13 Thread Christian König

Am 13.09.21 um 10:34 schrieb Paul Menzel:

The warning

 amdgpu :05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not 
supported

leaves the reader wondering, if anything can be done about it. As it’s
unsupported in the hardware, and nothing can be done about, mention that
in the log message.

 amdgpu :05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not 
supported by hardware


I think we should just completely remove the message instead.

Christian.



Signed-off-by: Paul Menzel 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index c7797eac83c3..c4c56c57b0c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -599,7 +599,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
default:
adev->gmc.tmz_enabled = false;
dev_warn(adev->dev,
-"Trusted Memory Zone (TMZ) feature not supported\n");
+"Trusted Memory Zone (TMZ) feature not supported by 
hardware\n");
break;
}
  }




Re: [PATCH 1/1] drm/radeon: pass drm dev radeon_agp_head_init directly

2021-09-13 Thread Christian König

Am 13.09.21 um 10:27 schrieb Nirmoy Das:

Pass drm dev directly as rdev->ddev gets initialized later on
at radeon_device_init().

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214375
Signed-off-by: Nirmoy Das 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 0473583dcdac..482fb0ae6cb5 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -119,7 +119,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
  #endif
  
  	if (pci_find_capability(pdev, PCI_CAP_ID_AGP))

-   rdev->agp = radeon_agp_head_init(rdev->ddev);
+   rdev->agp = radeon_agp_head_init(dev);
if (rdev->agp) {
rdev->agp->agp_mtrr = arch_phys_wc_add(
rdev->agp->agp_info.aper_base,




[PATCH 1/1] drm/radeon: pass drm dev radeon_agp_head_init directly

2021-09-13 Thread Nirmoy Das
Pass drm dev directly as rdev->ddev gets initialized later on
at radeon_device_init().

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214375
Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 0473583dcdac..482fb0ae6cb5 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -119,7 +119,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
 #endif
 
if (pci_find_capability(pdev, PCI_CAP_ID_AGP))
-   rdev->agp = radeon_agp_head_init(rdev->ddev);
+   rdev->agp = radeon_agp_head_init(dev);
if (rdev->agp) {
rdev->agp->agp_mtrr = arch_phys_wc_add(
rdev->agp->agp_info.aper_base,
-- 
2.32.0



Re: [RFC PATCH v2] drm/ttm: Try to check if new ttm man out of bounds during compile

2021-09-13 Thread Christian König

Am 13.09.21 um 10:09 schrieb xinhui pan:

Allow TTM know if vendor set new ttm mananger out of bounds by adding
build_bug_on.

Signed-off-by: xinhui pan 


Yeah, that looks better. Reviewed-by: Christian König 



Going to push that to drm-misc-next.

Thanks,
Christian.


---
  drivers/gpu/drm/ttm/ttm_range_manager.c |  8 
  include/drm/ttm/ttm_device.h|  3 +++
  include/drm/ttm/ttm_range_manager.h | 18 --
  3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index 03395386e8a7..f2d702b66749 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -138,7 +138,7 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
   * Initialise a generic range manager for the selected memory type.
   * The range manager is installed for this device in the type slot.
   */
-int ttm_range_man_init(struct ttm_device *bdev,
+int ttm_range_man_init_nocheck(struct ttm_device *bdev,
   unsigned type, bool use_tt,
   unsigned long p_size)
  {
@@ -163,7 +163,7 @@ int ttm_range_man_init(struct ttm_device *bdev,
ttm_resource_manager_set_used(man, true);
return 0;
  }
-EXPORT_SYMBOL(ttm_range_man_init);
+EXPORT_SYMBOL(ttm_range_man_init_nocheck);
  
  /**

   * ttm_range_man_fini
@@ -173,7 +173,7 @@ EXPORT_SYMBOL(ttm_range_man_init);
   *
   * Remove the generic range manager from a slot and tear it down.
   */
-int ttm_range_man_fini(struct ttm_device *bdev,
+int ttm_range_man_fini_nocheck(struct ttm_device *bdev,
   unsigned type)
  {
struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
@@ -197,4 +197,4 @@ int ttm_range_man_fini(struct ttm_device *bdev,
kfree(rman);
return 0;
  }
-EXPORT_SYMBOL(ttm_range_man_fini);
+EXPORT_SYMBOL(ttm_range_man_fini_nocheck);
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..6f23724f5a06 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -285,12 +285,15 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
  static inline struct ttm_resource_manager *
  ttm_manager_type(struct ttm_device *bdev, int mem_type)
  {
+   BUILD_BUG_ON(__builtin_constant_p(mem_type)
+&& mem_type >= TTM_NUM_MEM_TYPES);
return bdev->man_drv[mem_type];
  }
  
  static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,

  struct ttm_resource_manager *manager)
  {
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
bdev->man_drv[type] = manager;
  }
  
diff --git a/include/drm/ttm/ttm_range_manager.h b/include/drm/ttm/ttm_range_manager.h

index 22b6fa42ac20..7963b957e9ef 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -4,6 +4,7 @@
  #define _TTM_RANGE_MANAGER_H_
  
  #include 

+#include 
  #include 
  
  /**

@@ -33,10 +34,23 @@ to_ttm_range_mgr_node(struct ttm_resource *res)
return container_of(res, struct ttm_range_mgr_node, base);
  }
  
-int ttm_range_man_init(struct ttm_device *bdev,

+int ttm_range_man_init_nocheck(struct ttm_device *bdev,
   unsigned type, bool use_tt,
   unsigned long p_size);
-int ttm_range_man_fini(struct ttm_device *bdev,
+int ttm_range_man_fini_nocheck(struct ttm_device *bdev,
   unsigned type);
+static __always_inline int ttm_range_man_init(struct ttm_device *bdev,
+  unsigned int type, bool use_tt,
+  unsigned long p_size)
+{
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
+   return ttm_range_man_init_nocheck(bdev, type, use_tt, p_size);
+}
  
+static __always_inline int ttm_range_man_fini(struct ttm_device *bdev,

+  unsigned int type)
+{
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
+   return ttm_range_man_fini_nocheck(bdev, type);
+}
  #endif




[RFC PATCH v2] drm/ttm: Try to check if new ttm man out of bounds during compile

2021-09-13 Thread xinhui pan
Allow TTM know if vendor set new ttm mananger out of bounds by adding
build_bug_on.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/ttm/ttm_range_manager.c |  8 
 include/drm/ttm/ttm_device.h|  3 +++
 include/drm/ttm/ttm_range_manager.h | 18 --
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index 03395386e8a7..f2d702b66749 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -138,7 +138,7 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
  * Initialise a generic range manager for the selected memory type.
  * The range manager is installed for this device in the type slot.
  */
-int ttm_range_man_init(struct ttm_device *bdev,
+int ttm_range_man_init_nocheck(struct ttm_device *bdev,
   unsigned type, bool use_tt,
   unsigned long p_size)
 {
@@ -163,7 +163,7 @@ int ttm_range_man_init(struct ttm_device *bdev,
ttm_resource_manager_set_used(man, true);
return 0;
 }
-EXPORT_SYMBOL(ttm_range_man_init);
+EXPORT_SYMBOL(ttm_range_man_init_nocheck);
 
 /**
  * ttm_range_man_fini
@@ -173,7 +173,7 @@ EXPORT_SYMBOL(ttm_range_man_init);
  *
  * Remove the generic range manager from a slot and tear it down.
  */
-int ttm_range_man_fini(struct ttm_device *bdev,
+int ttm_range_man_fini_nocheck(struct ttm_device *bdev,
   unsigned type)
 {
struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
@@ -197,4 +197,4 @@ int ttm_range_man_fini(struct ttm_device *bdev,
kfree(rman);
return 0;
 }
-EXPORT_SYMBOL(ttm_range_man_fini);
+EXPORT_SYMBOL(ttm_range_man_fini_nocheck);
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..6f23724f5a06 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -285,12 +285,15 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
 static inline struct ttm_resource_manager *
 ttm_manager_type(struct ttm_device *bdev, int mem_type)
 {
+   BUILD_BUG_ON(__builtin_constant_p(mem_type)
+&& mem_type >= TTM_NUM_MEM_TYPES);
return bdev->man_drv[mem_type];
 }
 
 static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
  struct ttm_resource_manager *manager)
 {
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
bdev->man_drv[type] = manager;
 }
 
diff --git a/include/drm/ttm/ttm_range_manager.h 
b/include/drm/ttm/ttm_range_manager.h
index 22b6fa42ac20..7963b957e9ef 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -4,6 +4,7 @@
 #define _TTM_RANGE_MANAGER_H_
 
 #include 
+#include 
 #include 
 
 /**
@@ -33,10 +34,23 @@ to_ttm_range_mgr_node(struct ttm_resource *res)
return container_of(res, struct ttm_range_mgr_node, base);
 }
 
-int ttm_range_man_init(struct ttm_device *bdev,
+int ttm_range_man_init_nocheck(struct ttm_device *bdev,
   unsigned type, bool use_tt,
   unsigned long p_size);
-int ttm_range_man_fini(struct ttm_device *bdev,
+int ttm_range_man_fini_nocheck(struct ttm_device *bdev,
   unsigned type);
+static __always_inline int ttm_range_man_init(struct ttm_device *bdev,
+  unsigned int type, bool use_tt,
+  unsigned long p_size)
+{
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
+   return ttm_range_man_init_nocheck(bdev, type, use_tt, p_size);
+}
 
+static __always_inline int ttm_range_man_fini(struct ttm_device *bdev,
+  unsigned int type)
+{
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
+   return ttm_range_man_fini_nocheck(bdev, type);
+}
 #endif
-- 
2.25.1



[PATCH] drm/amdgpu: Update PSP TA unload function

2021-09-13 Thread Candice Li
Update PSP TA unload function to use PSP TA context as input argument.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index bc861f2fe0ecf6..7d09b28889afef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -46,7 +46,7 @@ static int psp_sysfs_init(struct amdgpu_device *adev);
 static void psp_sysfs_fini(struct amdgpu_device *adev);
 
 static int psp_load_smu_fw(struct psp_context *psp);
-static int psp_ta_unload(struct psp_context *psp, uint32_t session_id);
+static int psp_ta_unload(struct psp_context *psp, struct ta_context *context);
 static int psp_ta_load(struct psp_context *psp, struct ta_context *context);
 static int psp_rap_terminate(struct psp_context *psp);
 static int psp_securedisplay_terminate(struct psp_context *psp);
@@ -816,12 +816,12 @@ static void psp_prep_ta_unload_cmd_buf(struct 
psp_gfx_cmd_resp *cmd,
cmd->cmd.cmd_unload_ta.session_id = session_id;
 }
 
-static int psp_ta_unload(struct psp_context *psp, uint32_t session_id)
+static int psp_ta_unload(struct psp_context *psp, struct ta_context *context)
 {
int ret;
struct psp_gfx_cmd_resp *cmd = acquire_psp_cmd_buf(psp);
 
-   psp_prep_ta_unload_cmd_buf(cmd, session_id);
+   psp_prep_ta_unload_cmd_buf(cmd, context->session_id);
 
ret = psp_cmd_submit_buf(psp, NULL, cmd, psp->fence_buf_mc_addr);
 
@@ -832,7 +832,7 @@ static int psp_ta_unload(struct psp_context *psp, uint32_t 
session_id)
 
 static int psp_asd_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, psp->asd_context.session_id);
+   return psp_ta_unload(psp, >asd_context);
 }
 
 static int psp_asd_terminate(struct psp_context *psp)
@@ -984,7 +984,7 @@ static int psp_xgmi_load(struct psp_context *psp)
 
 static int psp_xgmi_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, psp->xgmi_context.context.session_id);
+   return psp_ta_unload(psp, >xgmi_context.context);
 }
 
 int psp_xgmi_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
@@ -1275,7 +1275,7 @@ static int psp_ras_load(struct psp_context *psp)
 
 static int psp_ras_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, psp->ras_context.context.session_id);
+   return psp_ta_unload(psp, >ras_context.context);
 }
 
 int psp_ras_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
@@ -1540,7 +1540,7 @@ static int psp_hdcp_initialize(struct psp_context *psp)
 
 static int psp_hdcp_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, psp->hdcp_context.context.session_id);
+   return psp_ta_unload(psp, >hdcp_context.context);
 }
 
 int psp_hdcp_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
@@ -1632,7 +1632,7 @@ static int psp_dtm_initialize(struct psp_context *psp)
 
 static int psp_dtm_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, psp->dtm_context.context.session_id);
+   return psp_ta_unload(psp, >dtm_context.context);
 }
 
 int psp_dtm_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
@@ -1690,7 +1690,7 @@ static int psp_rap_load(struct psp_context *psp)
 
 static int psp_rap_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, psp->rap_context.context.session_id);
+   return psp_ta_unload(psp, >rap_context.context);
 }
 
 static int psp_rap_initialize(struct psp_context *psp)
@@ -1805,7 +1805,7 @@ static int psp_securedisplay_load(struct psp_context *psp)
 
 static int psp_securedisplay_unload(struct psp_context *psp)
 {
-   return psp_ta_unload(psp, 
psp->securedisplay_context.context.session_id);
+   return psp_ta_unload(psp, >securedisplay_context.context);
 }
 
 static int psp_securedisplay_initialize(struct psp_context *psp)
-- 
2.17.1



[PATCH] drm/amdgpu: Conform ASD header/loading to generic TA systems

2021-09-13 Thread Candice Li
Update asd_context structure and add asd_initialize function to
conform ASD header/loading to generic TA systems.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 60 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 10 ++---
 2 files changed, 26 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 071dadf3a4509f..bc861f2fe0ecf6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -47,6 +47,7 @@ static void psp_sysfs_fini(struct amdgpu_device *adev);
 
 static int psp_load_smu_fw(struct psp_context *psp);
 static int psp_ta_unload(struct psp_context *psp, uint32_t session_id);
+static int psp_ta_load(struct psp_context *psp, struct ta_context *context);
 static int psp_rap_terminate(struct psp_context *psp);
 static int psp_securedisplay_terminate(struct psp_context *psp);
 
@@ -781,23 +782,14 @@ static int psp_rl_load(struct amdgpu_device *adev)
return ret;
 }
 
-static void psp_prep_asd_load_cmd_buf(struct psp_gfx_cmd_resp *cmd,
-   uint64_t asd_mc, uint32_t size)
+static int psp_asd_load(struct psp_context *psp)
 {
-   cmd->cmd_id = GFX_CMD_ID_LOAD_ASD;
-   cmd->cmd.cmd_load_ta.app_phy_addr_lo = lower_32_bits(asd_mc);
-   cmd->cmd.cmd_load_ta.app_phy_addr_hi = upper_32_bits(asd_mc);
-   cmd->cmd.cmd_load_ta.app_len = size;
-
-   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_lo = 0;
-   cmd->cmd.cmd_load_ta.cmd_buf_phy_addr_hi = 0;
-   cmd->cmd.cmd_load_ta.cmd_buf_len = 0;
+   return psp_ta_load(psp, >asd_context);
 }
 
-static int psp_asd_load(struct psp_context *psp)
+static int psp_asd_initialize(struct psp_context *psp)
 {
int ret;
-   struct psp_gfx_cmd_resp *cmd;
 
/* If PSP version doesn't match ASD version, asd loading will be failed.
 * add workaround to bypass it for sriov now.
@@ -806,22 +798,13 @@ static int psp_asd_load(struct psp_context *psp)
if (amdgpu_sriov_vf(psp->adev) || !psp->asd_context.bin_desc.size_bytes)
return 0;
 
-   cmd = acquire_psp_cmd_buf(psp);
+   psp->asd_context.mem_context.shared_mc_addr  = 0;
+   psp->asd_context.mem_context.shared_mem_size = PSP_ASD_SHARED_MEM_SIZE;
+   psp->asd_context.ta_load_type= GFX_CMD_ID_LOAD_ASD;
 
-   psp_copy_fw(psp, psp->asd_context.bin_desc.start_addr,
-   psp->asd_context.bin_desc.size_bytes);
-
-   psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
- psp->asd_context.bin_desc.size_bytes);
-
-   ret = psp_cmd_submit_buf(psp, NULL, cmd,
-psp->fence_buf_mc_addr);
-   if (!ret) {
-   psp->asd_context.asd_initialized = true;
-   psp->asd_context.session_id = cmd->resp.session_id;
-   }
-
-   release_psp_cmd_buf(psp);
+   ret = psp_asd_load(psp);
+   if (!ret)
+   psp->asd_context.initialized = true;
 
return ret;
 }
@@ -859,13 +842,13 @@ static int psp_asd_terminate(struct psp_context *psp)
if (amdgpu_sriov_vf(psp->adev))
return 0;
 
-   if (!psp->asd_context.asd_initialized)
+   if (!psp->asd_context.initialized)
return 0;
 
ret = psp_asd_unload(psp);
 
if (!ret)
-   psp->asd_context.asd_initialized = false;
+   psp->asd_context.initialized = false;
 
return ret;
 }
@@ -903,7 +886,7 @@ static void psp_prep_ta_load_cmd_buf(struct 
psp_gfx_cmd_resp *cmd,
 uint64_t ta_bin_mc,
 struct ta_context *context)
 {
-   cmd->cmd_id = GFX_CMD_ID_LOAD_TA;
+   cmd->cmd_id = context->ta_load_type;
cmd->cmd.cmd_load_ta.app_phy_addr_lo= lower_32_bits(ta_bin_mc);
cmd->cmd.cmd_load_ta.app_phy_addr_hi= upper_32_bits(ta_bin_mc);
cmd->cmd.cmd_load_ta.app_len= context->bin_desc.size_bytes;
@@ -970,8 +953,7 @@ static int psp_ta_invoke(struct psp_context *psp,
return ret;
 }
 
-static int psp_ta_load(struct psp_context *psp,
-  struct ta_context *context)
+static int psp_ta_load(struct psp_context *psp, struct ta_context *context)
 {
int ret;
struct psp_gfx_cmd_resp *cmd;
@@ -981,9 +963,7 @@ static int psp_ta_load(struct psp_context *psp,
psp_copy_fw(psp, context->bin_desc.start_addr,
context->bin_desc.size_bytes);
 
-   psp_prep_ta_load_cmd_buf(cmd,
-psp->fw_pri_mc_addr,
-context);
+   psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, context);
 
ret = psp_cmd_submit_buf(psp, NULL, cmd,
 psp->fence_buf_mc_addr);
@@ -1051,6 +1031,7 @@ int 

回复: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Pan, Xinhui
[AMD Official Use Only]

Of source IB test can hang the GPU.
But it wait fence with one specific timeout. and it not depends on gpu 
scheduler.
So IB test must can return.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 15:15
收件人: Christian König; Koenig, Christian; Pan, Xinhui; 
amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 12:21 PM, Christian König wrote:
> Keep in mind that we don't try to avoid contention here. The goal is
> rather to have as few locks as possible to avoid the extra overhead in
> the hot path.
>
> Contention is completely irrelevant for the debug and device reset since
> that are rarely occurring events and performance doesn't matter for them.
>
> It is perfectly reasonable to take the write side of the reset lock as
> necessary when we need to make sure that we don't have concurrent device
> access.

The original code has down_read which gave the impression that there is
some protection to avoid access during reset. Basically would like to
avoid this as a precedence for this sort of usage for any debugfs call.
Reset semaphore is supposed to be a 'protect all' thing and provides a
shortcut.

BTW, question about a hypothetical case - what happens if the test
itself causes a hang and need to trigger a reset? Will there be chance
for the lock to be released (whether a submit call will hang
indefinitely) for the actual reset to be executed?

Thanks,
Lijo

>
> Regards,
> Christian.
>
> Am 13.09.21 um 08:43 schrieb Lazar, Lijo:
>> There are other interfaces to emulate the exact reset process, or
>> atleast this is not the one we are using for doing any sort of reset
>> through debugfs.
>>
>> In any case, the expectation is reset thread takes the write side of
>> the lock and it's already done somewhere else.
>>
>> Reset semaphore is supposed to protect the device from concurrent
>> access (any sort of resource usage is thus protected by default). Then
>> the same logic can be applied for any other call and that is not a
>> reasonable ask.
>>
>> Thanks,
>> Lijo
>>
>> On 9/13/2021 12:07 PM, Christian König wrote:
>>> That's complete nonsense.
>>>
>>> The debugfs interface emulates parts of the reset procedure for
>>> testing and we absolutely need to take the same locks as the reset to
>>> avoid corruption of the involved objects.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
 This is a debugfs interface and adding another writer contention in
 debugfs over an actual reset is lazy fix. This shouldn't be executed
 in the first place and should not take precedence over any reset.

 Thanks,
 Lijo


 On 9/13/2021 11:52 AM, Christian König wrote:
> NAK, this is not the lazy way to fix it at all.
>
> The reset semaphore protects the scheduler and ring objects from
> concurrent modification, so taking the write side of it is
> perfectly valid here.
>
> Christian.
>
> Am 13.09.21 um 06:42 schrieb Pan, Xinhui:
>> [AMD Official Use Only]
>>
>> yep, that is a lazy way to fix it.
>>
>> I am thinking of adding one amdgpu_ring.direct_access_mutex before
>> we issue test_ib on each ring.
>> 
>> 发件人: Lazar, Lijo 
>> 发送时间: 2021年9月13日 12:00
>> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
>> 抄送: Deucher, Alexander; Koenig, Christian
>> 主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test
>>
>>
>>
>> On 9/13/2021 5:18 AM, xinhui pan wrote:
>>> Direct IB submission should be exclusive. So use write lock.
>>>
>>> Signed-off-by: xinhui pan 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>>>1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> index 19323b4cce7b..be5d12ed3db1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> @@ -1358,7 +1358,7 @@ static int
>>> amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
>>>}
>>>
>>>/* Avoid accidently unparking the sched thread during GPU
>>> reset */
>>> - r = down_read_killable(>reset_sem);
>>> + r = down_write_killable(>reset_sem);
>> There are many ioctls and debugfs calls which takes this lock and
>> as you
>> know the purpose is to avoid them while there is a reset. The
>> purpose is
>> *not to* fix any concurrency issues those calls themselves have
>> otherwise and fixing those concurrency issues this way is just
>> lazy and
>> not acceptable.
>>
>> This will take away any fairness given to the writer in this rw
>> lock and
>> that is supposed to be the reset thread.
>>
>> 

Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Christian König

Am 13.09.21 um 09:15 schrieb Lazar, Lijo:

On 9/13/2021 12:21 PM, Christian König wrote:
Keep in mind that we don't try to avoid contention here. The goal is 
rather to have as few locks as possible to avoid the extra overhead 
in the hot path.


Contention is completely irrelevant for the debug and device reset 
since that are rarely occurring events and performance doesn't matter 
for them.


It is perfectly reasonable to take the write side of the reset lock 
as necessary when we need to make sure that we don't have concurrent 
device access.


The original code has down_read which gave the impression that there 
is some protection to avoid access during reset. Basically would like 
to avoid this as a precedence for this sort of usage for any debugfs 
call. Reset semaphore is supposed to be a 'protect all' thing and 
provides a shortcut.


Yeah, that's indeed a very valid fear. We had to reject that approach 
for multiple IOCTL, sysfs and debugfs accesses countless times now.


But in the case here it is indeed thee right thing to do, the only 
alternative would be to allocate an entity and use that for pushing the 
IBs though the scheduler.




BTW, question about a hypothetical case - what happens if the test 
itself causes a hang and need to trigger a reset? Will there be chance 
for the lock to be released (whether a submit call will hang 
indefinitely) for the actual reset to be executed?


Not sure if we added some timeout, but essentially it should hang 
forever, yes.


Regards,
Christian.



Thanks,
Lijo



Regards,
Christian.

Am 13.09.21 um 08:43 schrieb Lazar, Lijo:
There are other interfaces to emulate the exact reset process, or 
atleast this is not the one we are using for doing any sort of reset 
through debugfs.


In any case, the expectation is reset thread takes the write side of 
the lock and it's already done somewhere else.


Reset semaphore is supposed to protect the device from concurrent 
access (any sort of resource usage is thus protected by default). 
Then the same logic can be applied for any other call and that is 
not a reasonable ask.


Thanks,
Lijo

On 9/13/2021 12:07 PM, Christian König wrote:

That's complete nonsense.

The debugfs interface emulates parts of the reset procedure for 
testing and we absolutely need to take the same locks as the reset 
to avoid corruption of the involved objects.


Regards,
Christian.

Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
This is a debugfs interface and adding another writer contention 
in debugfs over an actual reset is lazy fix. This shouldn't be 
executed in the first place and should not take precedence over 
any reset.


Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is 
perfectly valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex 
before we issue test_ib on each ring.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int 
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

   }

   /* Avoid accidently unparking the sched thread during 
GPU reset */

- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
There are many ioctls and debugfs calls which takes this lock 
and as you
know the purpose is to avoid them while there is a reset. The 
purpose is

*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just 
lazy and

not acceptable.

This will take away any fairness given to the writer in this rw 
lock and

that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int 
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);











Re: [PATCH v3 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:34PM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> new file mode 100644
> index ..3c9bacd3c3f3
> --- /dev/null
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Confidential Computing Platform Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +bool cc_platform_has(enum cc_attr attr)
> +{
> + if (sme_me_mask)

Why are you still checking the sme_me_mask here? AFAIR, we said that
we'll do that only when the KVM folks come with a valid use case...

> + return amd_cc_platform_has(attr);
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(cc_platform_has);
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..18fe19916bc3 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -389,6 +390,26 @@ bool noinstr sev_es_active(void)
>   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
>  }
>  
> +bool amd_cc_platform_has(enum cc_attr attr)
> +{
> + switch (attr) {
> + case CC_ATTR_MEM_ENCRYPT:
> + return sme_me_mask != 0;

No need for the "!= 0"

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 2/8] mm: Introduce a function to check for confidential computing features

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:33PM -0500, Tom Lendacky wrote:
> In prep for other confidential computing technologies, introduce a generic

preparation

> helper function, cc_platform_has(), that can be used to check for specific
> active confidential computing attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks to
> the code (e.g. if (sev_active() || tdx_active())).

...

> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> new file mode 100644
> index ..253f3ea66cd8
> --- /dev/null
> +++ b/include/linux/cc_platform.h
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Confidential Computing Platform Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _CC_PLATFORM_H

_LINUX_CC_PLATFORM_H

> +#define _CC_PLATFORM_H

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)

2021-09-13 Thread Robin Murphy

On 2021-09-10 11:09, Guchun Chen wrote:

Vendor will define their own memory types on top of TTM_PL_PRIV,
but call ttm_set_driver_manager directly without checking mem_type
value when setting up memory manager. So add such check to aware
the case when array bounds.

v2: lower check level to WARN_ON

Signed-off-by: Leslie Shi 
Signed-off-by: Guchun Chen 
---
  include/drm/ttm/ttm_device.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..aa79953c807c 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
  static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
  struct ttm_resource_manager *manager)
  {
+   WARN_ON(type >= TTM_NUM_MEM_TYPES);


Nit: I know nothing about this code, but from the context alone it would 
seem sensible to do


if (WARN_ON(type >= TTM_NUM_MEM_TYPES))
return;

to avoid making the subsequent assignment when we *know* it's invalid 
and likely to corrupt memory.


Robin.


bdev->man_drv[type] = manager;
  }
  



Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Lazar, Lijo




On 9/13/2021 12:21 PM, Christian König wrote:
Keep in mind that we don't try to avoid contention here. The goal is 
rather to have as few locks as possible to avoid the extra overhead in 
the hot path.


Contention is completely irrelevant for the debug and device reset since 
that are rarely occurring events and performance doesn't matter for them.


It is perfectly reasonable to take the write side of the reset lock as 
necessary when we need to make sure that we don't have concurrent device 
access.


The original code has down_read which gave the impression that there is 
some protection to avoid access during reset. Basically would like to 
avoid this as a precedence for this sort of usage for any debugfs call. 
Reset semaphore is supposed to be a 'protect all' thing and provides a 
shortcut.


BTW, question about a hypothetical case - what happens if the test 
itself causes a hang and need to trigger a reset? Will there be chance 
for the lock to be released (whether a submit call will hang 
indefinitely) for the actual reset to be executed?


Thanks,
Lijo



Regards,
Christian.

Am 13.09.21 um 08:43 schrieb Lazar, Lijo:
There are other interfaces to emulate the exact reset process, or 
atleast this is not the one we are using for doing any sort of reset 
through debugfs.


In any case, the expectation is reset thread takes the write side of 
the lock and it's already done somewhere else.


Reset semaphore is supposed to protect the device from concurrent 
access (any sort of resource usage is thus protected by default). Then 
the same logic can be applied for any other call and that is not a 
reasonable ask.


Thanks,
Lijo

On 9/13/2021 12:07 PM, Christian König wrote:

That's complete nonsense.

The debugfs interface emulates parts of the reset procedure for 
testing and we absolutely need to take the same locks as the reset to 
avoid corruption of the involved objects.


Regards,
Christian.

Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
This is a debugfs interface and adding another writer contention in 
debugfs over an actual reset is lazy fix. This shouldn't be executed 
in the first place and should not take precedence over any reset.


Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is 
perfectly valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before 
we issue test_ib on each ring.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int 
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

   }

   /* Avoid accidently unparking the sched thread during GPU 
reset */

- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
There are many ioctls and debugfs calls which takes this lock and 
as you
know the purpose is to avoid them while there is a reset. The 
purpose is

*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just 
lazy and

not acceptable.

This will take away any fairness given to the writer in this rw 
lock and

that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int 
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);









Re: 回复: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Christian König

Yeah, because we avoid need to allocate an entity otherwise.

Ok, all that comes swapped back into my head once more.

As far as I can see that should work, but I would ask Andrey as well 
since he now takes care of GPU reset.


Christian.

Am 13.09.21 um 08:55 schrieb Pan, Xinhui:

[AMD Official Use Only]

These IB tests are all using direct IB submission including the delayed init 
work.

发件人: Koenig, Christian 
发送时间: 2021年9月13日 14:19
收件人: Pan, Xinhui; Christian König; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

Well is the delayed init work using direct submission or submission
through the scheduler?

If the later we have the down_write of the reset semaphore pulled in
through the scheduler dependency.

Anyway just having the sync before taking the lock should work.

Christian.

Am 11.09.21 um 12:18 schrieb Pan, Xinhui:

[AMD Official Use Only]

For the possible deadlock, we can just move flush_delayed_work above 
down_write. not a big thing.
But I am not aware why delayed init work would try to lock reset_sem.

delayed init work is enqueued when device resume. It calls amdgpu_ib_ring_tests 
directly. We need one sync method.
But I see device resume itself woud flush it. So there is no race between them 
as userspace is still freezed.

I will drop this flush in V2.

发件人: Christian König 
发送时间: 2021年9月11日 15:45
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH] drm/amdgpu: Fix a race of IB test



Am 11.09.21 um 03:55 schrieb xinhui pan:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..acbe02928791 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,10 +1358,15 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
}

/* Avoid accidently unparking the sched thread during GPU reset */
- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
if (r)
return r;

+ /* Avoid concurrently IB test but not cancel it as I don't know whether we
+  * would add more code in the delayed init work.
+  */
+ flush_delayed_work(>delayed_init_work);
+

That won't work. It's at least theoretical possible that the delayed
init work waits for the reset_sem which we are holding here.

Very unlikely to happen, but lockdep might be able to point that out
with a nice backtrace in the logs.

On the other hand delayed init work and direct IB test through this
interface should work at the same time, so I would just drop it.

Christian.


/* hold on the scheduler */
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -1387,7 +1392,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
kthread_unpark(ring->sched.thread);
}

- up_read(>reset_sem);
+ up_write(>reset_sem);

pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);




回复: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Pan, Xinhui
[AMD Official Use Only]

These IB tests are all using direct IB submission including the delayed init 
work.

发件人: Koenig, Christian 
发送时间: 2021年9月13日 14:19
收件人: Pan, Xinhui; Christian König; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

Well is the delayed init work using direct submission or submission
through the scheduler?

If the later we have the down_write of the reset semaphore pulled in
through the scheduler dependency.

Anyway just having the sync before taking the lock should work.

Christian.

Am 11.09.21 um 12:18 schrieb Pan, Xinhui:
> [AMD Official Use Only]
>
> For the possible deadlock, we can just move flush_delayed_work above 
> down_write. not a big thing.
> But I am not aware why delayed init work would try to lock reset_sem.
>
> delayed init work is enqueued when device resume. It calls 
> amdgpu_ib_ring_tests directly. We need one sync method.
> But I see device resume itself woud flush it. So there is no race between 
> them as userspace is still freezed.
>
> I will drop this flush in V2.
> 
> 发件人: Christian König 
> 发送时间: 2021年9月11日 15:45
> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
> 抄送: Deucher, Alexander; Koenig, Christian
> 主题: Re: [PATCH] drm/amdgpu: Fix a race of IB test
>
>
>
> Am 11.09.21 um 03:55 schrieb xinhui pan:
>> Direct IB submission should be exclusive. So use write lock.
>>
>> Signed-off-by: xinhui pan 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 9 +++--
>>1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 19323b4cce7b..acbe02928791 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> @@ -1358,10 +1358,15 @@ static int amdgpu_debugfs_test_ib_show(struct 
>> seq_file *m, void *unused)
>>}
>>
>>/* Avoid accidently unparking the sched thread during GPU reset */
>> - r = down_read_killable(>reset_sem);
>> + r = down_write_killable(>reset_sem);
>>if (r)
>>return r;
>>
>> + /* Avoid concurrently IB test but not cancel it as I don't know 
>> whether we
>> +  * would add more code in the delayed init work.
>> +  */
>> + flush_delayed_work(>delayed_init_work);
>> +
> That won't work. It's at least theoretical possible that the delayed
> init work waits for the reset_sem which we are holding here.
>
> Very unlikely to happen, but lockdep might be able to point that out
> with a nice backtrace in the logs.
>
> On the other hand delayed init work and direct IB test through this
> interface should work at the same time, so I would just drop it.
>
> Christian.
>
>>/* hold on the scheduler */
>>for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
>>struct amdgpu_ring *ring = adev->rings[i];
>> @@ -1387,7 +1392,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
>> *m, void *unused)
>>kthread_unpark(ring->sched.thread);
>>}
>>
>> - up_read(>reset_sem);
>> + up_write(>reset_sem);
>>
>>pm_runtime_mark_last_busy(dev->dev);
>>pm_runtime_put_autosuspend(dev->dev);



Re: 回复: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-13 Thread Christian König
Ah, missed the local variable in amdgpu_uvd_create_msg_bo_helper(). 
Please add a comment on that.


Apart from that looks good to me.

Regards,
Christian.

Am 13.09.21 um 08:51 schrieb Pan, Xinhui:

[AMD Official Use Only]

1) Of cource I can drop the bo resv lock as long as we fix the race of IB test. 
will do it in v4

2) amdgpu_uvd_create_msg_bo_helper always use a local variable *bo = NULL 
passed to bo_create. and assiged it to **bo_ptr on success. Of cource, I will 
make code easier to understand.


发件人: Koenig, Christian 
发送时间: 2021年9月13日 14:31
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

Am 11.09.21 um 03:34 schrieb xinhui pan:

move BO allocation in sw_init.

Signed-off-by: xinhui pan 
---
change from v2:
use reservation trylock for direct IB test.
change from v1:
only use pre-allocated BO for direct IB submission.
and take its reservation lock to avoid any potential race.
better safe than sorry.
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 104 +---
   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |   1 +
   drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |   8 +-
   drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |   8 +-
   4 files changed, 79 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index d451c359606a..a4b3dd6b38c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -134,6 +134,51 @@ MODULE_FIRMWARE(FIRMWARE_VEGA12);
   MODULE_FIRMWARE(FIRMWARE_VEGA20);

   static void amdgpu_uvd_idle_work_handler(struct work_struct *work);
+static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
+
+static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
+uint32_t size,
+struct amdgpu_bo **bo_ptr)
+{
+ struct ttm_operation_ctx ctx = { true, false };
+ struct amdgpu_bo *bo = NULL;
+ void *addr;
+ int r;
+
+ r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_GTT,
+   , NULL, );
+ if (r)
+ return r;
+
+ if (adev->uvd.address_64_bit) {
+ *bo_ptr = bo;
+ return 0;
+ }
+
+ amdgpu_bo_kunmap(bo);
+ amdgpu_bo_unpin(bo);
+ amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
+ amdgpu_uvd_force_into_uvd_segment(bo);
+ r = ttm_bo_validate(>tbo, >placement, );
+ if (r)
+ goto err;
+ r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM);
+ if (r)
+ goto err_pin;
+ r = amdgpu_bo_kmap(bo, );
+ if (r)
+ goto err_kmap;
+ *bo_ptr = bo;
+ return 0;
+err_kmap:
+ amdgpu_bo_unpin(bo);
+err_pin:
+err:
+ amdgpu_bo_unreserve(bo);
+ amdgpu_bo_unref();
+ return r;
+}

   int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
   {
@@ -302,6 +347,11 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
   if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 
0))
   adev->uvd.address_64_bit = true;

+ r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, >uvd.ib_bo);
+ if (r)
+ return r;
+ amdgpu_bo_unreserve(adev->uvd.ib_bo);
+
   switch (adev->asic_type) {
   case CHIP_TONGA:
   adev->uvd.use_ctx_buf = adev->uvd.fw_version >= FW_1_65_10;
@@ -324,6 +374,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)

   int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
   {
+ void *addr = amdgpu_bo_kptr(adev->uvd.ib_bo);
   int i, j;

   drm_sched_entity_destroy(>uvd.entity);
@@ -342,6 +393,7 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
   for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i)
   amdgpu_ring_fini(>uvd.inst[j].ring_enc[i]);
   }
+ amdgpu_bo_free_kernel(>uvd.ib_bo, NULL, );
   release_firmware(adev->uvd.fw);

   return 0;
@@ -1080,23 +1132,10 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring 
*ring, struct amdgpu_bo *bo,
   unsigned offset_idx = 0;
   unsigned offset[3] = { UVD_BASE_SI, 0, 0 };

- amdgpu_bo_kunmap(bo);
- amdgpu_bo_unpin(bo);
-
- if (!ring->adev->uvd.address_64_bit) {
- struct ttm_operation_ctx ctx = { true, false };
-
- amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
- amdgpu_uvd_force_into_uvd_segment(bo);
- r = ttm_bo_validate(>tbo, >placement, );
- if (r)
- goto err;
- }
-
   r = amdgpu_job_alloc_with_ib(adev, 64, direct ? AMDGPU_IB_POOL_DIRECT :
AMDGPU_IB_POOL_DELAYED, );
   if (r)
- goto err;
+ return r;

   if (adev->asic_type >= CHIP_VEGA10) {
   

Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Christian König
Keep in mind that we don't try to avoid contention here. The goal is 
rather to have as few locks as possible to avoid the extra overhead in 
the hot path.


Contention is completely irrelevant for the debug and device reset since 
that are rarely occurring events and performance doesn't matter for them.


It is perfectly reasonable to take the write side of the reset lock as 
necessary when we need to make sure that we don't have concurrent device 
access.


Regards,
Christian.

Am 13.09.21 um 08:43 schrieb Lazar, Lijo:
There are other interfaces to emulate the exact reset process, or 
atleast this is not the one we are using for doing any sort of reset 
through debugfs.


In any case, the expectation is reset thread takes the write side of 
the lock and it's already done somewhere else.


Reset semaphore is supposed to protect the device from concurrent 
access (any sort of resource usage is thus protected by default). Then 
the same logic can be applied for any other call and that is not a 
reasonable ask.


Thanks,
Lijo

On 9/13/2021 12:07 PM, Christian König wrote:

That's complete nonsense.

The debugfs interface emulates parts of the reset procedure for 
testing and we absolutely need to take the same locks as the reset to 
avoid corruption of the involved objects.


Regards,
Christian.

Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
This is a debugfs interface and adding another writer contention in 
debugfs over an actual reset is lazy fix. This shouldn't be executed 
in the first place and should not take precedence over any reset.


Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is 
perfectly valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before 
we issue test_ib on each ring.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int 
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

   }

   /* Avoid accidently unparking the sched thread during GPU 
reset */

- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
There are many ioctls and debugfs calls which takes this lock and 
as you
know the purpose is to avoid them while there is a reset. The 
purpose is

*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just 
lazy and

not acceptable.

This will take away any fairness given to the writer in this rw 
lock and

that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int 
amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)

   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);









回复: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-13 Thread Pan, Xinhui
[AMD Official Use Only]

1) Of cource I can drop the bo resv lock as long as we fix the race of IB test. 
will do it in v4

2) amdgpu_uvd_create_msg_bo_helper always use a local variable *bo = NULL 
passed to bo_create. and assiged it to **bo_ptr on success. Of cource, I will 
make code easier to understand.


发件人: Koenig, Christian 
发送时间: 2021年9月13日 14:31
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

Am 11.09.21 um 03:34 schrieb xinhui pan:
> move BO allocation in sw_init.
>
> Signed-off-by: xinhui pan 
> ---
> change from v2:
> use reservation trylock for direct IB test.
> change from v1:
> only use pre-allocated BO for direct IB submission.
> and take its reservation lock to avoid any potential race.
> better safe than sorry.
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 104 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |   1 +
>   drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |   8 +-
>   drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |   8 +-
>   4 files changed, 79 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index d451c359606a..a4b3dd6b38c6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -134,6 +134,51 @@ MODULE_FIRMWARE(FIRMWARE_VEGA12);
>   MODULE_FIRMWARE(FIRMWARE_VEGA20);
>
>   static void amdgpu_uvd_idle_work_handler(struct work_struct *work);
> +static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
> +
> +static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
> +uint32_t size,
> +struct amdgpu_bo **bo_ptr)
> +{
> + struct ttm_operation_ctx ctx = { true, false };
> + struct amdgpu_bo *bo = NULL;
> + void *addr;
> + int r;
> +
> + r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
> +   AMDGPU_GEM_DOMAIN_GTT,
> +   , NULL, );
> + if (r)
> + return r;
> +
> + if (adev->uvd.address_64_bit) {
> + *bo_ptr = bo;
> + return 0;
> + }
> +
> + amdgpu_bo_kunmap(bo);
> + amdgpu_bo_unpin(bo);
> + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
> + amdgpu_uvd_force_into_uvd_segment(bo);
> + r = ttm_bo_validate(>tbo, >placement, );
> + if (r)
> + goto err;
> + r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM);
> + if (r)
> + goto err_pin;
> + r = amdgpu_bo_kmap(bo, );
> + if (r)
> + goto err_kmap;
> + *bo_ptr = bo;
> + return 0;
> +err_kmap:
> + amdgpu_bo_unpin(bo);
> +err_pin:
> +err:
> + amdgpu_bo_unreserve(bo);
> + amdgpu_bo_unref();
> + return r;
> +}
>
>   int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>   {
> @@ -302,6 +347,11 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>   if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 
> 0))
>   adev->uvd.address_64_bit = true;
>
> + r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, >uvd.ib_bo);
> + if (r)
> + return r;
> + amdgpu_bo_unreserve(adev->uvd.ib_bo);
> +
>   switch (adev->asic_type) {
>   case CHIP_TONGA:
>   adev->uvd.use_ctx_buf = adev->uvd.fw_version >= FW_1_65_10;
> @@ -324,6 +374,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>
>   int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
>   {
> + void *addr = amdgpu_bo_kptr(adev->uvd.ib_bo);
>   int i, j;
>
>   drm_sched_entity_destroy(>uvd.entity);
> @@ -342,6 +393,7 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
>   for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i)
>   amdgpu_ring_fini(>uvd.inst[j].ring_enc[i]);
>   }
> + amdgpu_bo_free_kernel(>uvd.ib_bo, NULL, );
>   release_firmware(adev->uvd.fw);
>
>   return 0;
> @@ -1080,23 +1132,10 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring 
> *ring, struct amdgpu_bo *bo,
>   unsigned offset_idx = 0;
>   unsigned offset[3] = { UVD_BASE_SI, 0, 0 };
>
> - amdgpu_bo_kunmap(bo);
> - amdgpu_bo_unpin(bo);
> -
> - if (!ring->adev->uvd.address_64_bit) {
> - struct ttm_operation_ctx ctx = { true, false };
> -
> - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
> - amdgpu_uvd_force_into_uvd_segment(bo);
> - r = ttm_bo_validate(>tbo, >placement, );
> - if (r)
> - goto err;
> - }
> -
>   r = amdgpu_job_alloc_with_ib(adev, 64, direct ? AMDGPU_IB_POOL_DIRECT :
>AMDGPU_IB_POOL_DELAYED, );
>   if (r)
> - goto err;
> + return r;
>
>   if (adev->asic_type >= 

回复: [RFC PATCH] drm/ttm: Try to check if new ttm man out of bounds during compile

2021-09-13 Thread Pan, Xinhui
[AMD Official Use Only]

ttm_range_man_init/fini are exported. Someone else might use it by find_symbol.
I just want to not break things.

Developer usually compile the whole kernel. So add a checked version of 
ttm_range_man_init/fini by the wrappers.


发件人: Christian König 
发送时间: 2021年9月13日 14:35
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Koenig, Christian; dan...@ffwll.ch; dri-de...@lists.freedesktop.org; Chen, 
Guchun
主题: Re: [RFC PATCH] drm/ttm: Try to check if new ttm man out of bounds during 
compile



Am 13.09.21 um 05:36 schrieb xinhui pan:
> Allow TTM know if vendor set new ttm mananger out of bounds by adding
> build_bug_on.

I really like the part in the inline functions, but the wrappers around
the ttm_range_man_init/fini look a bit awkward of hand.

Christian.

>
> Signed-off-by: xinhui pan 
> ---
>   drivers/gpu/drm/ttm/ttm_range_manager.c |  2 ++
>   include/drm/ttm/ttm_device.h|  3 +++
>   include/drm/ttm/ttm_range_manager.h | 10 ++
>   3 files changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
> b/drivers/gpu/drm/ttm/ttm_range_manager.c
> index 03395386e8a7..47e304719b88 100644
> --- a/drivers/gpu/drm/ttm/ttm_range_manager.c
> +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
> @@ -127,6 +127,8 @@ static const struct ttm_resource_manager_func 
> ttm_range_manager_func = {
>   .debug = ttm_range_man_debug
>   };
>
> +#undef ttm_range_man_init
> +#undef ttm_range_man_fini
>   /**
>* ttm_range_man_init
>*
> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
> index 07d722950d5b..6f23724f5a06 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -285,12 +285,15 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
> ttm_operation_ctx *ctx,
>   static inline struct ttm_resource_manager *
>   ttm_manager_type(struct ttm_device *bdev, int mem_type)
>   {
> + BUILD_BUG_ON(__builtin_constant_p(mem_type)
> +  && mem_type >= TTM_NUM_MEM_TYPES);
>   return bdev->man_drv[mem_type];
>   }
>
>   static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
> struct ttm_resource_manager *manager)
>   {
> + BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
>   bdev->man_drv[type] = manager;
>   }
>
> diff --git a/include/drm/ttm/ttm_range_manager.h 
> b/include/drm/ttm/ttm_range_manager.h
> index 22b6fa42ac20..9250ade54e2c 100644
> --- a/include/drm/ttm/ttm_range_manager.h
> +++ b/include/drm/ttm/ttm_range_manager.h
> @@ -38,5 +38,15 @@ int ttm_range_man_init(struct ttm_device *bdev,
>  unsigned long p_size);
>   int ttm_range_man_fini(struct ttm_device *bdev,
>  unsigned type);
> +#define ttm_range_man_init(bdev, type, use_tt, size) ({  \
> + BUILD_BUG_ON(__builtin_constant_p(type) \
> + && type >= TTM_NUM_MEM_TYPES);  \
> + ttm_range_man_init(bdev, type, use_tt, size);   \
> +})
> +#define ttm_range_man_fini(bdev, type) ({\
> + BUILD_BUG_ON(__builtin_constant_p(type) \
> + && type >= TTM_NUM_MEM_TYPES);  \
> + ttm_range_man_fini(bdev, type); \
> +})
>
>   #endif



Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Lazar, Lijo
There are other interfaces to emulate the exact reset process, or 
atleast this is not the one we are using for doing any sort of reset 
through debugfs.


In any case, the expectation is reset thread takes the write side of the 
lock and it's already done somewhere else.


Reset semaphore is supposed to protect the device from concurrent access 
(any sort of resource usage is thus protected by default). Then the same 
logic can be applied for any other call and that is not a reasonable ask.


Thanks,
Lijo

On 9/13/2021 12:07 PM, Christian König wrote:

That's complete nonsense.

The debugfs interface emulates parts of the reset procedure for testing 
and we absolutely need to take the same locks as the reset to avoid 
corruption of the involved objects.


Regards,
Christian.

Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
This is a debugfs interface and adding another writer contention in 
debugfs over an actual reset is lazy fix. This shouldn't be executed 
in the first place and should not take precedence over any reset.


Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is perfectly 
valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before 
we issue test_ib on each ring.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int amdgpu_debugfs_test_ib_show(struct 
seq_file *m, void *unused)

   }

   /* Avoid accidently unparking the sched thread during GPU 
reset */

- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
There are many ioctls and debugfs calls which takes this lock and as 
you
know the purpose is to avoid them while there is a reset. The 
purpose is

*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just lazy and
not acceptable.

This will take away any fairness given to the writer in this rw lock 
and

that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int amdgpu_debugfs_test_ib_show(struct 
seq_file *m, void *unused)

   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);







Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Christian König

That's complete nonsense.

The debugfs interface emulates parts of the reset procedure for testing 
and we absolutely need to take the same locks as the reset to avoid 
corruption of the involved objects.


Regards,
Christian.

Am 13.09.21 um 08:25 schrieb Lazar, Lijo:
This is a debugfs interface and adding another writer contention in 
debugfs over an actual reset is lazy fix. This shouldn't be executed 
in the first place and should not take precedence over any reset.


Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is perfectly 
valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before 
we issue test_ib on each ring.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int amdgpu_debugfs_test_ib_show(struct 
seq_file *m, void *unused)

   }

   /* Avoid accidently unparking the sched thread during GPU 
reset */

- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
There are many ioctls and debugfs calls which takes this lock and as 
you
know the purpose is to avoid them while there is a reset. The 
purpose is

*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just lazy and
not acceptable.

This will take away any fairness given to the writer in this rw lock 
and

that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int amdgpu_debugfs_test_ib_show(struct 
seq_file *m, void *unused)

   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);







Re: [RFC PATCH] drm/ttm: Try to check if new ttm man out of bounds during compile

2021-09-13 Thread Christian König




Am 13.09.21 um 05:36 schrieb xinhui pan:

Allow TTM know if vendor set new ttm mananger out of bounds by adding
build_bug_on.


I really like the part in the inline functions, but the wrappers around 
the ttm_range_man_init/fini look a bit awkward of hand.


Christian.



Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/ttm/ttm_range_manager.c |  2 ++
  include/drm/ttm/ttm_device.h|  3 +++
  include/drm/ttm/ttm_range_manager.h | 10 ++
  3 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index 03395386e8a7..47e304719b88 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -127,6 +127,8 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
.debug = ttm_range_man_debug
  };
  
+#undef ttm_range_man_init

+#undef ttm_range_man_fini
  /**
   * ttm_range_man_init
   *
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..6f23724f5a06 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -285,12 +285,15 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
  static inline struct ttm_resource_manager *
  ttm_manager_type(struct ttm_device *bdev, int mem_type)
  {
+   BUILD_BUG_ON(__builtin_constant_p(mem_type)
+&& mem_type >= TTM_NUM_MEM_TYPES);
return bdev->man_drv[mem_type];
  }
  
  static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,

  struct ttm_resource_manager *manager)
  {
+   BUILD_BUG_ON(__builtin_constant_p(type) && type >= TTM_NUM_MEM_TYPES);
bdev->man_drv[type] = manager;
  }
  
diff --git a/include/drm/ttm/ttm_range_manager.h b/include/drm/ttm/ttm_range_manager.h

index 22b6fa42ac20..9250ade54e2c 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -38,5 +38,15 @@ int ttm_range_man_init(struct ttm_device *bdev,
   unsigned long p_size);
  int ttm_range_man_fini(struct ttm_device *bdev,
   unsigned type);
+#define ttm_range_man_init(bdev, type, use_tt, size) ({\
+   BUILD_BUG_ON(__builtin_constant_p(type) \
+   && type >= TTM_NUM_MEM_TYPES);   \
+   ttm_range_man_init(bdev, type, use_tt, size);   \
+})
+#define ttm_range_man_fini(bdev, type) ({  \
+   BUILD_BUG_ON(__builtin_constant_p(type) \
+   && type >= TTM_NUM_MEM_TYPES);   \
+   ttm_range_man_fini(bdev, type); \
+})
  
  #endif




Re: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test

2021-09-13 Thread Christian König

Am 11.09.21 um 03:34 schrieb xinhui pan:

move BO allocation in sw_init.

Signed-off-by: xinhui pan 
---
change from v2:
use reservation trylock for direct IB test.
change from v1:
only use pre-allocated BO for direct IB submission.
and take its reservation lock to avoid any potential race.
better safe than sorry.
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 104 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h |   1 +
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c   |   8 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c   |   8 +-
  4 files changed, 79 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index d451c359606a..a4b3dd6b38c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -134,6 +134,51 @@ MODULE_FIRMWARE(FIRMWARE_VEGA12);
  MODULE_FIRMWARE(FIRMWARE_VEGA20);
  
  static void amdgpu_uvd_idle_work_handler(struct work_struct *work);

+static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo);
+
+static int amdgpu_uvd_create_msg_bo_helper(struct amdgpu_device *adev,
+  uint32_t size,
+  struct amdgpu_bo **bo_ptr)
+{
+   struct ttm_operation_ctx ctx = { true, false };
+   struct amdgpu_bo *bo = NULL;
+   void *addr;
+   int r;
+
+   r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_GTT,
+ , NULL, );
+   if (r)
+   return r;
+
+   if (adev->uvd.address_64_bit) {
+   *bo_ptr = bo;
+   return 0;
+   }
+
+   amdgpu_bo_kunmap(bo);
+   amdgpu_bo_unpin(bo);
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
+   amdgpu_uvd_force_into_uvd_segment(bo);
+   r = ttm_bo_validate(>tbo, >placement, );
+   if (r)
+   goto err;
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM);
+   if (r)
+   goto err_pin;
+   r = amdgpu_bo_kmap(bo, );
+   if (r)
+   goto err_kmap;
+   *bo_ptr = bo;
+   return 0;
+err_kmap:
+   amdgpu_bo_unpin(bo);
+err_pin:
+err:
+   amdgpu_bo_unreserve(bo);
+   amdgpu_bo_unref();
+   return r;
+}
  
  int amdgpu_uvd_sw_init(struct amdgpu_device *adev)

  {
@@ -302,6 +347,11 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
if (!amdgpu_device_ip_block_version_cmp(adev, AMD_IP_BLOCK_TYPE_UVD, 5, 
0))
adev->uvd.address_64_bit = true;
  
+	r = amdgpu_uvd_create_msg_bo_helper(adev, 128 << 10, >uvd.ib_bo);

+   if (r)
+   return r;
+   amdgpu_bo_unreserve(adev->uvd.ib_bo);
+
switch (adev->asic_type) {
case CHIP_TONGA:
adev->uvd.use_ctx_buf = adev->uvd.fw_version >= FW_1_65_10;
@@ -324,6 +374,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
  
  int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)

  {
+   void *addr = amdgpu_bo_kptr(adev->uvd.ib_bo);
int i, j;
  
  	drm_sched_entity_destroy(>uvd.entity);

@@ -342,6 +393,7 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i)
amdgpu_ring_fini(>uvd.inst[j].ring_enc[i]);
}
+   amdgpu_bo_free_kernel(>uvd.ib_bo, NULL, );
release_firmware(adev->uvd.fw);
  
  	return 0;

@@ -1080,23 +1132,10 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring 
*ring, struct amdgpu_bo *bo,
unsigned offset_idx = 0;
unsigned offset[3] = { UVD_BASE_SI, 0, 0 };
  
-	amdgpu_bo_kunmap(bo);

-   amdgpu_bo_unpin(bo);
-
-   if (!ring->adev->uvd.address_64_bit) {
-   struct ttm_operation_ctx ctx = { true, false };
-
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
-   amdgpu_uvd_force_into_uvd_segment(bo);
-   r = ttm_bo_validate(>tbo, >placement, );
-   if (r)
-   goto err;
-   }
-
r = amdgpu_job_alloc_with_ib(adev, 64, direct ? AMDGPU_IB_POOL_DIRECT :
 AMDGPU_IB_POOL_DELAYED, );
if (r)
-   goto err;
+   return r;
  
  	if (adev->asic_type >= CHIP_VEGA10) {

offset_idx = 1 + ring->me;
@@ -1148,8 +1187,6 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, 
struct amdgpu_bo *bo,
}
  
  	amdgpu_bo_fence(bo, f, false);

-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_unref();
  
  	if (fence)

*fence = dma_fence_get(f);
@@ -1159,10 +1196,6 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, 
struct amdgpu_bo *bo,
  
  err_free:

amdgpu_job_free(job);
-
-err:
-   amdgpu_bo_unreserve(bo);
-   amdgpu_bo_unref();
return r;
  }
  
@@ -1173,16 +1206,16 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, uint32_t 

Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Lazar, Lijo
This is a debugfs interface and adding another writer contention in 
debugfs over an actual reset is lazy fix. This shouldn't be executed in 
the first place and should not take precedence over any reset.


Thanks,
Lijo


On 9/13/2021 11:52 AM, Christian König wrote:

NAK, this is not the lazy way to fix it at all.

The reset semaphore protects the scheduler and ring objects from 
concurrent modification, so taking the write side of it is perfectly 
valid here.


Christian.

Am 13.09.21 um 06:42 schrieb Pan, Xinhui:

[AMD Official Use Only]

yep, that is a lazy way to fix it.

I am thinking of adding one amdgpu_ring.direct_access_mutex before we 
issue test_ib on each ring.


发件人: Lazar, Lijo 
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test



On 9/13/2021 5:18 AM, xinhui pan wrote:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int amdgpu_debugfs_test_ib_show(struct 
seq_file *m, void *unused)

   }

   /* Avoid accidently unparking the sched thread during GPU 
reset */

- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);

There are many ioctls and debugfs calls which takes this lock and as you
know the purpose is to avoid them while there is a reset. The purpose is
*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just lazy and
not acceptable.

This will take away any fairness given to the writer in this rw lock and
that is supposed to be the reset thread.

Thanks,
Lijo


   if (r)
   return r;

@@ -1387,7 +1387,7 @@ static int amdgpu_debugfs_test_ib_show(struct 
seq_file *m, void *unused)

   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);





Re: [RFC PATCH 1/2] drm/amdgpu: Introduce ring lock

2021-09-13 Thread Christian König

NAK, that is exactly what we try to avoid here.

Christian.

Am 13.09.21 um 07:55 schrieb xinhui pan:

This is used for direct IB submission to ring.

Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
  2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index ab2351ba9574..f97a28a49120 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -184,6 +184,8 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
else if (ring == >sdma.instance[0].page)
sched_hw_submission = 256;
  
+	mutex_init(>lock);

+
if (ring->adev == NULL) {
if (adev->num_rings >= AMDGPU_MAX_RINGS)
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 4d380e79752c..544766429b5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -215,6 +215,7 @@ struct amdgpu_ring {
struct amdgpu_fence_driver  fence_drv;
struct drm_gpu_schedulersched;
  
+	struct mutex		lock;

struct amdgpu_bo*ring_obj;
volatile uint32_t   *ring;
unsignedrptr_offs;




Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)

2021-09-13 Thread Christian König
Well it will crash later on when accessing the invalid offset, so not 
much gained.


But either way works for me.

Christian.

Am 13.09.21 um 04:35 schrieb Chen, Guchun:

[Public]

Thanks for your suggestion, Robin. Do you agree with this as well, Christian 
and Xinhui?

Regards,
Guchun

-Original Message-
From: Robin Murphy 
Sent: Saturday, September 11, 2021 2:25 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; Koenig, Christian ; Pan, Xinhui 
; Deucher, Alexander 
Cc: Shi, Leslie 
Subject: Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when 
array bounds (v2)

On 2021-09-10 11:09, Guchun Chen wrote:

Vendor will define their own memory types on top of TTM_PL_PRIV, but
call ttm_set_driver_manager directly without checking mem_type value
when setting up memory manager. So add such check to aware the case
when array bounds.

v2: lower check level to WARN_ON

Signed-off-by: Leslie Shi 
Signed-off-by: Guchun Chen 
---
   include/drm/ttm/ttm_device.h | 1 +
   1 file changed, 1 insertion(+)

diff --git a/include/drm/ttm/ttm_device.h
b/include/drm/ttm/ttm_device.h index 07d722950d5b..aa79953c807c 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
   static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
  struct ttm_resource_manager *manager)
   {
+   WARN_ON(type >= TTM_NUM_MEM_TYPES);

Nit: I know nothing about this code, but from the context alone it would seem 
sensible to do

if (WARN_ON(type >= TTM_NUM_MEM_TYPES))
return;

to avoid making the subsequent assignment when we *know* it's invalid and 
likely to corrupt memory.

Robin.


bdev->man_drv[type] = manager;
   }
   





Re: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

2021-09-13 Thread Christian König
Well is the delayed init work using direct submission or submission 
through the scheduler?


If the later we have the down_write of the reset semaphore pulled in 
through the scheduler dependency.


Anyway just having the sync before taking the lock should work.

Christian.

Am 11.09.21 um 12:18 schrieb Pan, Xinhui:

[AMD Official Use Only]

For the possible deadlock, we can just move flush_delayed_work above 
down_write. not a big thing.
But I am not aware why delayed init work would try to lock reset_sem.

delayed init work is enqueued when device resume. It calls amdgpu_ib_ring_tests 
directly. We need one sync method.
But I see device resume itself woud flush it. So there is no race between them 
as userspace is still freezed.

I will drop this flush in V2.

发件人: Christian König 
发送时间: 2021年9月11日 15:45
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH] drm/amdgpu: Fix a race of IB test



Am 11.09.21 um 03:55 schrieb xinhui pan:

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 9 +++--
   1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..acbe02928791 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,10 +1358,15 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
   }

   /* Avoid accidently unparking the sched thread during GPU reset */
- r = down_read_killable(>reset_sem);
+ r = down_write_killable(>reset_sem);
   if (r)
   return r;

+ /* Avoid concurrently IB test but not cancel it as I don't know whether we
+  * would add more code in the delayed init work.
+  */
+ flush_delayed_work(>delayed_init_work);
+

That won't work. It's at least theoretical possible that the delayed
init work waits for the reset_sem which we are holding here.

Very unlikely to happen, but lockdep might be able to point that out
with a nice backtrace in the logs.

On the other hand delayed init work and direct IB test through this
interface should work at the same time, so I would just drop it.

Christian.


   /* hold on the scheduler */
   for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
   struct amdgpu_ring *ring = adev->rings[i];
@@ -1387,7 +1392,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
*m, void *unused)
   kthread_unpark(ring->sched.thread);
   }

- up_read(>reset_sem);
+ up_write(>reset_sem);

   pm_runtime_mark_last_busy(dev->dev);
   pm_runtime_put_autosuspend(dev->dev);