Re: [PATCH 03/13] drm: add Makefile support for drm buddy

2021-10-22 Thread Arunpravin



On 20/10/21 1:51 pm, Thomas Zimmermann wrote:
> Hi
> 
> Am 20.10.21 um 00:53 schrieb Arunpravin:
>> - Include drm buddy to DRM root Makefile
>> - Add drm buddy init and exit function calls
>>to drm core
> 
> Is there a hard requirement to have this code in the core?
> 
> IMHO there's already too much code in the DRM core that should rather go 
> into helpers. The allocator should live in it's own module and driver 
> should init it if needed.
> 
Hi Thomas,
I will check on this
@Daniel, Could you please share your opinion

Regards,
Arun
>>
>> Signed-off-by: Arunpravin 
>> ---
>>   drivers/gpu/drm/Makefile  | 2 +-
>>   drivers/gpu/drm/drm_drv.c | 3 +++
>>   2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 0dff40bb863c..dc61e91a3154 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -18,7 +18,7 @@ drm-y   := drm_aperture.o drm_auth.o drm_cache.o \
>>  drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \
>>  drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \
>>  drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \
>> -drm_managed.o drm_vblank_work.o
>> +drm_managed.o drm_vblank_work.o drm_buddy.o
>>   
>>   drm-$(CONFIG_DRM_LEGACY) += drm_agpsupport.o drm_bufs.o drm_context.o 
>> drm_dma.o \
>>  drm_legacy_misc.o drm_lock.o drm_memory.o 
>> drm_scatter.o \
>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>> index 7a5097467ba5..6707eec21bef 100644
>> --- a/drivers/gpu/drm/drm_drv.c
>> +++ b/drivers/gpu/drm/drm_drv.c
>> @@ -43,6 +43,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
> 
> In alphabetical order please. Here and everywhere else.
> 
> Best regards
> Thomas
> 
>>   
>>   #include "drm_crtc_internal.h"
>>   #include "drm_internal.h"
>> @@ -1034,6 +1035,7 @@ static void drm_core_exit(void)
>>  drm_sysfs_destroy();
>>  idr_destroy(_minors_idr);
>>  drm_connector_ida_destroy();
>> +drm_buddy_module_exit();
>>   }
>>   
>>   static int __init drm_core_init(void)
>> @@ -1043,6 +1045,7 @@ static int __init drm_core_init(void)
>>  drm_connector_ida_init();
>>  idr_init(_minors_idr);
>>  drm_memcpy_init_early();
>> +drm_buddy_module_init();
>>   
>>  ret = drm_sysfs_init();
>>  if (ret < 0) {
>>
> 


Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-22 Thread Andrey Grodzovsky



What do you mean by underflow in this case ? You mean use after free 
because of extra dma_fence_put() ?


On 2021-10-22 4:14 a.m., JingWen Chen wrote:

ping

On 2021/10/22 AM11:33, Jingwen Chen wrote:

[Why]
In advance tdr mode, the real bad job will be resubmitted twice, while
in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
is put one more time than other jobs.

[How]
Adding dma_fence_get before resbumit job in
amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs

Signed-off-by: Jingwen Chen 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..975f069f6fe8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4841,6 +4841,9 @@ static void amdgpu_device_recheck_guilty_jobs(
  
  		/* clear job's guilty and depend the folowing step to decide the real one */

drm_sched_reset_karma(s_job);
+   /* for the real bad job, it will be resubmitted twice, adding a 
dma_fence_get
+* to make sure fence is balanced */



But that put in drm_sched_resubmit_jobs_ext is for the previous parent 
fence.
fence = sched->ops->run_job(s_job); returns a new HW fence and the put 
drops the refcount on the old one.


Andrey



+   dma_fence_get(s_job->s_fence->parent);
drm_sched_resubmit_jobs_ext(>sched, 1);
  
  		ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout);

@@ -4876,6 +4879,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  
  		/* got the hw fence, signal finished fence */

atomic_dec(ring->sched.score);
+   dma_fence_put(s_job->s_fence->parent);
dma_fence_get(_job->s_fence->finished);
dma_fence_signal(_job->s_fence->finished);
dma_fence_put(_job->s_fence->finished);


[pull] amdgpu, amdkfd drm-next-5.16

2021-10-22 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.16.

The following changes since commit 48737ac4d70faffeb516e2a9847e24f9a7eee05f:

  drm/amdgpu/psp: add some missing cases to 
psp_check_pmfw_centralized_cstate_management (2021-10-13 22:51:41 -0400)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.16-2021-10-22

for you to fetch changes up to 41ad36623fabe7d02c9f89aff077dd4c8ba5d602:

  amd/display: remove ChromeOS workaround (2021-10-22 10:11:52 -0400)


amd-drm-next-5.16-2021-10-22:

amdgpu:
- PSP fix for resume
- XGMI fixes
- Interrupt fix in device tear down
- Renoir USB-C DP alt mode fix for resume
- DP 2.0 fixes
- Yellow Carp display fixes
- Misc display fixes
- RAS fixes
- IP Discovery enumeration fixes
- VGH fixes
- SR-IOV fixes
- Revert ChromeOS workaround in display code
- Cyan Skillfish fixes

amdkfd:
- Fix error handling in gpu memory allocation
- Fix build warnings with some configs
- SVM fixes


Aaron Liu (1):
  drm/amdgpu: support B0 external revision id for yellow carp

Agustin Gutierrez (2):
  Revert "drm/amd/display: Fix error in dmesg at boot"
  Revert "drm/amd/display: Add helper for blanking all dp displays"

Alex Deucher (15):
  drm/amdgpu/pm: properly handle sclk for profiling modes on vangogh
  drm/amdgpu/gfx10: fix typo in gfx_v10_0_update_gfx_clock_gating()
  drm/amdkfd: protect hawaii_device_info with CONFIG_DRM_AMDGPU_CIK
  drm/amdkfd: protect raven_device_info with KFD_SUPPORT_IOMMU_V2
  drm/amdgpu: drop soc15_set_ip_blocks()
  drm/amdgpu: drop nv_set_ip_blocks()
  drm/amdgpu: drop navi reg init functions
  drm/amdgpu/vcn3.0: handle harvesting in firmware setup
  drm/amdgpu: Consolidate VCN firmware setup code
  drm/amdgpu/vcn2.0: remove intermediate variable
  drm/amdgpu/vcn3.0: remove intermediate variable
  drm/amdgpu: Workaround harvesting info for some navy flounder boards
  drm/amdgpu/swsmu: handle VCN harvesting for VCN SMU setup
  drm/amdgpu/smu11.0: add missing IP version check
  drm/amdgpu/nbio7.4: use original HDP_FLUSH bits

Anthony Koo (2):
  drm/amd/display: Change initializer to single brace
  drm/amd/display: [FW Promotion] Release 0.0.88

Aric Cyr (2):
  drm/amd/display: Validate plane rects before use
  drm/amd/display: 3.2.157

Eric Yang (1):
  drm/amd/display: increase Z9 latency to workaround underflow in Z9

Guchun Chen (1):
  drm/amdgpu/discovery: parse hw_id_name for SDMA instance 2 and 3

Hansen (1):
  drm/amd/display: Fix DP2 SE and LE SYMCLK selection for B0 PHY

Huang Rui (1):
  drm/amdgpu: remove grbm cam index/data operations for gfx v10

Jake Wang (6):
  drm/amd/display: Disable dpp root clock when not being used
  drm/amd/display: Disable dsc root clock when not being used
  drm/amd/display: Disable dpstreamclk, symclk32_se, and symclk32_le
  drm/amd/display: Removed z10 save after dsc disable
  drm/amd/display: Moved dccg init to after bios golden init
  drm/amd/display: Disable hdmistream and hdmichar clocks

Jimmy Kizito (2):
  drm/amd/display: Clear encoder assignment for copied streams
  drm/amd/display: Do not skip link training on DP quick hot plug

Jingwen Chen (1):
  drm/amd/amdgpu: add dummy_page_addr to sriov msg

Jonathan Kim (1):
  drm/amdkfd: map gpu hive id to xgmi connected cpu

Josip Pavic (1):
  drm/amd/display: do not compare integers of different widths

Kent Russell (1):
  drm/amdgpu: Clarify error when hitting bad page threshold

Lai, Derek (1):
  drm/amd/display: Removed power down on boot from DCN31

Lang Yu (1):
  drm/amdkfd: Fix an inappropriate error handling in allloc memory of gpu

Lijo Lazar (1):
  drm/amd/pm: Disable fan control if not supported

Michael Strauss (1):
  drm/amd/display: Clean Up VPG Low Mem Power

Mikita Lipski (1):
  drm/amd/display: Add missing PSR state

Nevenko Stupar (1):
  drm/amd/display: Add bios parser support for latest firmware_info

Nicholas Kazlauskas (2):
  drm/amd/display: Fix prefetch bandwidth calculation for DCN3.1
  drm/amd/display: Require immediate flip support for DCN3.1 planes

Nikola Cornij (2):
  drm/amd/display: Limit display scaling to up to true 4k for DCN 3.1
  drm/amd/display: Increase watermark latencies for DCN3.1

Philip Yang (2):
  drm/amdkfd: clarify the origin of cpages returned by migration functions
  drm/amdkfd: debug message to count successfully migrated pages

Qing Wang (1):
  drm/amdgpu: replace snprintf in show functions with sysfs_emit

Roman Li (1):
  drm/amd/display: Fully switch to dmub for all dcn21 asics

Simon Ser (1):
  amd/display: remove ChromeOS workaround

Tao Zhou (3):
  drm/amdgpu: load PSP RL in resume path
  drm/amdgpu: centralize checking for RAS TA 

Re: [PATCH] drm/amdkfd: restore userptr ignore bad address error

2021-10-22 Thread Felix Kuehling
Am 2021-10-22 um 1:06 p.m. schrieb Philip Yang:
> The userptr can be unmapped by application and still registered to
> driver, restore userptr work return user pages will get -EFAULT bad
> address error. Pretend this error as succeed. GPU access this userptr
> will have VM fault later, it is better than application soft hangs with
> stalled user mode queues.
>
> Signed-off-by: Philip Yang 

Reviewed-by: Felix Kuehling 


> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 27 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  3 +++
>  2 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index cdf46bd0d8d5..6f01c6145a87 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2041,19 +2041,26 @@ static int update_invalid_user_pages(struct 
> amdkfd_process_info *process_info,
>   /* Get updated user pages */
>   ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
>   if (ret) {
> - pr_debug("%s: Failed to get user pages: %d\n",
> - __func__, ret);
> + pr_debug("Failed %d to get user pages\n", ret);
> +
> + /* Return -EFAULT bad address error as success. It will
> +  * fail later with a VM fault if the GPU tries to access
> +  * it. Better than hanging indefinitely with stalled
> +  * user mode queues.
> +  *
> +  * Return other error -EBUSY or -ENOMEM to retry restore
> +  */
> + if (ret != -EFAULT)
> + return ret;
> + } else {
>  
> - /* Return error -EBUSY or -ENOMEM, retry restore */
> - return ret;
> + /*
> +  * FIXME: Cannot ignore the return code, must hold
> +  * notifier_lock
> +  */
> + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
>   }
>  
> - /*
> -  * FIXME: Cannot ignore the return code, must hold
> -  * notifier_lock
> -  */
> - amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
> -
>   /* Mark the BO as valid unless it was invalidated
>* again concurrently.
>*/
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index d784f8d3a834..ae6694f2c73d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -693,6 +693,9 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   r = amdgpu_hmm_range_get_pages(>notifier, mm, pages, start,
>  ttm->num_pages, >range, readonly,
>  false, NULL);
> + if (r)
> + pr_debug("failed %d to get user pages 0x%llx\n", r, start);
> +
>  out_putmm:
>   mmput(mm);
>  


[PATCH] drm/amdkfd: restore userptr ignore bad address error

2021-10-22 Thread Philip Yang
The userptr can be unmapped by application and still registered to
driver, restore userptr work return user pages will get -EFAULT bad
address error. Pretend this error as succeed. GPU access this userptr
will have VM fault later, it is better than application soft hangs with
stalled user mode queues.

Signed-off-by: Philip Yang 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 27 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  3 +++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index cdf46bd0d8d5..6f01c6145a87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2041,19 +2041,26 @@ static int update_invalid_user_pages(struct 
amdkfd_process_info *process_info,
/* Get updated user pages */
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
-   pr_debug("%s: Failed to get user pages: %d\n",
-   __func__, ret);
+   pr_debug("Failed %d to get user pages\n", ret);
+
+   /* Return -EFAULT bad address error as success. It will
+* fail later with a VM fault if the GPU tries to access
+* it. Better than hanging indefinitely with stalled
+* user mode queues.
+*
+* Return other error -EBUSY or -ENOMEM to retry restore
+*/
+   if (ret != -EFAULT)
+   return ret;
+   } else {
 
-   /* Return error -EBUSY or -ENOMEM, retry restore */
-   return ret;
+   /*
+* FIXME: Cannot ignore the return code, must hold
+* notifier_lock
+*/
+   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
}
 
-   /*
-* FIXME: Cannot ignore the return code, must hold
-* notifier_lock
-*/
-   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
-
/* Mark the BO as valid unless it was invalidated
 * again concurrently.
 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d784f8d3a834..ae6694f2c73d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -693,6 +693,9 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
struct page **pages)
r = amdgpu_hmm_range_get_pages(>notifier, mm, pages, start,
   ttm->num_pages, >range, readonly,
   false, NULL);
+   if (r)
+   pr_debug("failed %d to get user pages 0x%llx\n", r, start);
+
 out_putmm:
mmput(mm);
 
-- 
2.17.1



Re: FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()

2021-10-22 Thread Andrey Grodzovsky



On 2021-10-21 7:33 p.m., Yu, Lang wrote:

[AMD Official Use Only]




-Original Message-
From: Grodzovsky, Andrey 
Sent: Thursday, October 21, 2021 11:18 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Subject: Re: FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in
amdgpu_device_fini_sw()

On 2021-10-21 3:19 a.m., Yu, Lang wrote:


[AMD Official Use Only]




-Original Message-
From: Yu, Lang 
Sent: Thursday, October 21, 2021 3:18 PM
To: Grodzovsky, Andrey 
Cc: Deucher, Alexander ; Koenig, Christian
; Huang, Ray ; Yu, Lang

Subject: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in
amdgpu_device_fini_sw()

amdgpu_fence_driver_sw_fini() should be executed before
amdgpu_device_ip_fini(), otherwise fence driver resource won't be
properly freed as adev->rings have been tore down.


Cam you clarify more where exactly the memleak happens ?

Andrey

See amdgpu_fence_driver_sw_fini(), ring->fence_drv.fences will only be freed
when adev->rings[i] is not NULL.

void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
{
unsigned int i, j;

for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !ring->fence_drv.initialized)
continue;

if (!ring->no_scheduler)
drm_sched_fini(>sched);

for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
dma_fence_put(ring->fence_drv.fences[j]);
kfree(ring->fence_drv.fences);
ring->fence_drv.fences = NULL;
ring->fence_drv.initialized = false;
}
}

If amdgpu_device_ip_fini() is executed before amdgpu_fence_driver_sw_fini(),
amdgpu_device_ip_fini() will call gfx_vX_0_sw_fini()
then call amdgpu_ring_fini() and set adev->rings[i] to NULL.
Nothing will be freed in amdgpu_fence_driver_sw_fini().
ring->fence_drv.fences  memory leak happened!

void amdgpu_ring_fini(struct amdgpu_ring *ring)
{
..
ring->adev->rings[ring->idx] = NULL;
}

Regards,
Lang



Got it, Looks good to me.

Reviewed-by: Andrey Grodzovsky 

Andrey






Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early
and late")

Signed-off-by: Lang Yu 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..5654c4790773 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3843,8 +3843,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device
*adev)

void amdgpu_device_fini_sw(struct amdgpu_device *adev)  {
-   amdgpu_device_ip_fini(adev);
amdgpu_fence_driver_sw_fini(adev);
+   amdgpu_device_ip_fini(adev);
release_firmware(adev->firmware.gpu_info_fw);
adev->firmware.gpu_info_fw = NULL;
adev->accel_working = false;
--
2.25.1


Re: [PATCH] amd/display: remove ChromeOS workaround

2021-10-22 Thread Alex Deucher
On Fri, Oct 22, 2021 at 11:02 AM Simon Ser  wrote:
>
> On Friday, October 22nd, 2021 at 15:58, Alex Deucher  
> wrote:
>
> > > Agreed that this patch is good but we'll need to also revert the 
> > > is_chromeos w/a.
> >
> > I've reverted that and applied this one. Thanks!
>
> Ah, didn't realize I needed to revert that one too. Thank you!

Well it was only applied to our amd-staging-drm-next branch, not to upstream.

Alex


Re: [PATCH] amd/display: remove ChromeOS workaround

2021-10-22 Thread Simon Ser
On Friday, October 22nd, 2021 at 15:58, Alex Deucher  
wrote:

> > Agreed that this patch is good but we'll need to also revert the 
> > is_chromeos w/a.
>
> I've reverted that and applied this one. Thanks!

Ah, didn't realize I needed to revert that one too. Thank you!


Re: [PATCH] amd/display: remove ChromeOS workaround

2021-10-22 Thread Alex Deucher
On Thu, Oct 21, 2021 at 2:19 PM Harry Wentland  wrote:
>
>
>
> On 2021-10-21 13:55, Rodrigo Siqueira Jordao wrote:
> > Hi Simon,
> >
> > I tested this patch and it lgtm. I also agree to revert it.
> >
> > Btw, did you send the revert patch for "amd/display: only require overlay 
> > plane to cover whole CRTC on ChromeOS"? I think we need to revert it as 
> > well.
> >
>
> Agreed that this patch is good but we'll need to also revert the is_chromeos 
> w/a.

I've reverted that and applied this one.  Thanks!

Alex

>
> This patch is
> Reviewed-by: Harry Wentland 
>
> Harry
>
> > Sean, Mark
> >
> > For ChromeOS, we should ignore this patch. Do we need to take any action to 
> > avoid landing this patch on ChromeOS tree?
> >
> > Thanks
> > Siqueira
> >
> > On 2021-10-14 11:35 a.m., Simon Ser wrote:
> >> This reverts commits ddab8bd788f5 ("drm/amd/display: Fix two cursor 
> >> duplication
> >> when using overlay") and e7d9560aeae5 ("Revert "drm/amd/display: Fix 
> >> overlay
> >> validation by considering cursors"").
> >>
> >> tl;dr ChromeOS uses the atomic interface for everything except the cursor. 
> >> This
> >> is incorrect and forces amdgpu to disable some hardware features. Let's 
> >> revert
> >> the ChromeOS-specific workaround in mainline and allow the Chrome team to 
> >> keep
> >> it internally in their own tree.
> >>
> >> See [1] for more details. This patch is an alternative to [2], which added
> >> ChromeOS detection.
> >>
> >> [1]: 
> >> https://lore.kernel.org/amd-gfx/JIQ_93_cHcshiIDsrMU1huBzx9P9LVQxucx8hQArpQu7Wk5DrCl_vTXj_Q20m_L-8C8A5dSpNcSJ8ehfcCrsQpfB5QG_Spn14EYkH9chtg0=@emersion.fr/>>
> >>  [2]: 
> >> https://lore.kernel.org/amd-gfx/20211011151609.452132-1-cont...@emersion.fr/>>
> >>  Signed-off-by: Simon Ser 
> >> Cc: Alex Deucher 
> >> Cc: Harry Wentland 
> >> Cc: Nicholas Kazlauskas 
> >> Cc: Bas Nieuwenhuizen 
> >> Cc: Rodrigo Siqueira 
> >> Cc: Sean Paul 
> >> Fixes: ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when 
> >> using overlay")
> >> Fixes: e7d9560aeae5 ("Revert "drm/amd/display: Fix overlay validation by 
> >> considering cursors"")
> >> ---
> >>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 51 ---
> >>   1 file changed, 51 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> index 20065a145851..014c5a9fe461 100644
> >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> @@ -10628,53 +10628,6 @@ static int add_affected_mst_dsc_crtcs(struct 
> >> drm_atomic_state *state, struct drm
> >>   }
> >>   #endif
> >>   -static int validate_overlay(struct drm_atomic_state *state)
> >> -{
> >> -int i;
> >> -struct drm_plane *plane;
> >> -struct drm_plane_state *new_plane_state;
> >> -struct drm_plane_state *primary_state, *overlay_state = NULL;
> >> -
> >> -/* Check if primary plane is contained inside overlay */
> >> -for_each_new_plane_in_state_reverse(state, plane, new_plane_state, i) 
> >> {
> >> -if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
> >> -if (drm_atomic_plane_disabling(plane->state, new_plane_state))
> >> -return 0;
> >> -
> >> -overlay_state = new_plane_state;
> >> -continue;
> >> -}
> >> -}
> >> -
> >> -/* check if we're making changes to the overlay plane */
> >> -if (!overlay_state)
> >> -return 0;
> >> -
> >> -/* check if overlay plane is enabled */
> >> -if (!overlay_state->crtc)
> >> -return 0;
> >> -
> >> -/* find the primary plane for the CRTC that the overlay is enabled on 
> >> */
> >> -primary_state = drm_atomic_get_plane_state(state, 
> >> overlay_state->crtc->primary);
> >> -if (IS_ERR(primary_state))
> >> -return PTR_ERR(primary_state);
> >> -
> >> -/* check if primary plane is enabled */
> >> -if (!primary_state->crtc)
> >> -return 0;
> >> -
> >> -/* Perform the bounds check to ensure the overlay plane covers the 
> >> primary */
> >> -if (primary_state->crtc_x < overlay_state->crtc_x ||
> >> -primary_state->crtc_y < overlay_state->crtc_y ||
> >> -primary_state->crtc_x + primary_state->crtc_w > 
> >> overlay_state->crtc_x + overlay_state->crtc_w ||
> >> -primary_state->crtc_y + primary_state->crtc_h > 
> >> overlay_state->crtc_y + overlay_state->crtc_h) {
> >> -DRM_DEBUG_ATOMIC("Overlay plane is enabled with hardware cursor 
> >> but does not fully cover primary plane\n");
> >> -return -EINVAL;
> >> -}
> >> -
> >> -return 0;
> >> -}
> >> -
> >>   /**
> >>* amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM.
> >>* @dev: The DRM device
> >> @@ -10856,10 +10809,6 @@ static int amdgpu_dm_atomic_check(struct 
> >> drm_device *dev,
> >>   goto fail;
> >>   }
> >>   -ret = validate_overlay(state);
> >> -

Re: [PATCH] drm/amd/pm: Disable fan control if not supported

2021-10-22 Thread Alex Deucher
On Fri, Oct 22, 2021 at 3:44 AM Lijo Lazar  wrote:
>
> On arcturus, not all platforms use PMFW based fan control. On such
> ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon
> knobs for fan control also as it is not possible to report or control
> fan speed on such platforms through driver.
>
> Signed-off-by: Lijo Lazar 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> index 082f01893f3d..fd1d30a93db5 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> @@ -436,6 +436,19 @@ static void arcturus_check_bxco_support(struct 
> smu_context *smu)
> }
>  }
>
> +static void arcturus_check_fan_support(struct smu_context *smu)
> +{
> +   struct smu_table_context *table_context = >smu_table;
> +   PPTable_t *pptable = table_context->driver_pptable;
> +
> +   /* No sort of fan control possible if PPTable has it disabled */
> +   smu->adev->pm.no_fan =
> +   !(pptable->FeaturesToRun[0] & FEATURE_FAN_CONTROL_MASK);
> +   if (smu->adev->pm.no_fan)
> +   dev_info_once(smu->adev->dev,
> + "PMFW based fan control disabled");
> +}
> +
>  static int arcturus_check_powerplay_table(struct smu_context *smu)
>  {
> struct smu_table_context *table_context = >smu_table;
> @@ -443,6 +456,7 @@ static int arcturus_check_powerplay_table(struct 
> smu_context *smu)
> table_context->power_play_table;
>
> arcturus_check_bxco_support(smu);
> +   arcturus_check_fan_support(smu);
>
> table_context->thermal_controller_type =
> powerplay_table->thermal_controller_type;
> --
> 2.17.1
>


Re: [PATCH v4 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr

2021-10-22 Thread Das, Nirmoy



On 10/22/2021 1:46 PM, Christian König wrote:
Reviewed-by: Christian König  for the entire 
series.


But please keep in mind that this here needs extensive testing.



I will do multiple round of test again on gfx7-10 cards and gfx9 APU 
that I have but I don't have gfx6 and sriov cards to test.



Regards,

Nirmoy



Thanks,
Christian.

Am 22.10.21 um 12:54 schrieb Nirmoy Das:

Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.

v4: remove unused adev.
v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 22 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
  4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 41ce86244144..2b53d86aebac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct 
amdgpu_device *adev,


  amdgpu_virt_init_data_exchange(adev);
  /* we need recover gart prior to run SMC/CP/SDMA resume */
- amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
+    amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

  r = amdgpu_device_fw_loading(adev);
  if (r)
@@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,

  amdgpu_inc_vram_lost(tmp_adev);
  }

-    r = 
amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, 
TTM_PL_TT));

+    r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
  if (r)
  goto out;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c

index c18f16b3be9c..9151950e0cc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,10 +77,8 @@ static ssize_t 
amdgpu_mem_info_gtt_used_show(struct device *dev,

  {
  struct drm_device *ddev = dev_get_drvdata(dev);
  struct amdgpu_device *adev = drm_to_adev(ddev);
-    struct ttm_resource_manager *man;

-    man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-    return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man));
+    return sysfs_emit(buf, "%llu\n", 
amdgpu_gtt_mgr_usage(>mman.gtt_mgr));

  }

  static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
@@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct 
ttm_resource_manager *man,

  /**
   * amdgpu_gtt_mgr_usage - return usage of GTT domain
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Return how many bytes are used in the GTT domain
   */
-uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
+uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr)
  {
-    struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-    s64 result = man->size - atomic64_read(>available);
+    s64 result;
+
+    result = mgr->manager.size - atomic64_read(>available);

  return (result > 0 ? result : 0) * PAGE_SIZE;
  }
@@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct 
ttm_resource_manager *man)

  /**
   * amdgpu_gtt_mgr_recover - re-init gart
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Re-init the gart for each known BO in the GTT.
   */
-int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man)
+int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
  {
-    struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-    struct amdgpu_device *adev;
  struct amdgpu_gtt_node *node;
  struct drm_mm_node *mm_node;
+    struct amdgpu_device *adev;
  int r = 0;

  adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -267,7 +265,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,


  drm_printf(printer, "man size:%llu pages, gtt available:%lld 
pages, usage:%lluMB\n",

 man->size, (u64)atomic64_read(>available),
-   amdgpu_gtt_mgr_usage(man) >> 20);
+   amdgpu_gtt_mgr_usage(mgr) >> 20);
  }

  static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

index d2955ea4a62b..603ce32db5c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, 
void *data, struct drm_file *filp)
  ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, 
TTM_PL_VRAM));

  return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
  case 

Re: [Intel-gfx] [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

2021-10-22 Thread Ville Syrjälä
On Fri, Oct 22, 2021 at 03:01:52PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 22, 2021 at 12:25:33PM +0200, Claudio Suarez wrote:
> > On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote:
> > > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote:
> > > > drm_get_edid() internally calls to drm_connector_update_edid_property()
> > > > and then drm_add_display_info(), which parses the EDID.
> > > > This happens in the function intel_hdmi_set_edid() and
> > > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()).
> > > > 
> > > > Once EDID is parsed, the monitor HDMI support information is available
> > > > through drm_display_info.is_hdmi. Retriving the same information with
> > > > drm_detect_hdmi_monitor() is less efficient. Change to
> > > > drm_display_info.is_hdmi
> > > 
> > > I meant we need to examine all call chains that can lead to
> > > .detect() to make sure all of them do in fact update the
> > > display_info beforehand.
> > 
> > Well, I studied it carefully and, yes, all call chains that can lead to
> > drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info
> > beforehand. In the case that this doesn't happen, the code is unchanged.
> > 
> > Do you want I explain the changes in the code here again ? Or do you want
> > to me change the commit message to be more clear ? In the first case, I can
> > write here a detailed explanation. In the second case I can make a longer 
> > commit
> > message.
> > 
> > Or both?
> 
> I want all those call chains explained in the commit message,
> otherwise I have no easy way to confirm whether the change
> is correct or not.

Hmm. OK, so I had a bit of a dig around and seems that what we do now
.detect()->drm_get_edid()->drm_connector_update_edid_property()->drm_add_display_info()

Now the question is when did that start happening? Looks like it was
commit 4b4df570b41d ("drm: Update edid-derived drm_display_info fields
at edid property set [v2]") that started to call drm_add_display_info()
from drm_connector_update_edid_property(), and then commit 5186421cbfe2
("drm: Introduce epoch counter to drm_connector") started to call
drm_connector_update_edid_property() from drm_get_edid(). Before both
of those commits were in place display_info would still contain
some stale garbage during .detect().

That is the story I think we want in these commit messages since it
a) explains why the old code was directly parsing the edid instead
b) why it's now safe to change this

PS. connector->force handling in drm_get_edid() looks a bit busted
since it doesn't call drm_connector_update_edid_property() at all
in some cases. I think there might be some path that leads there
anywya if/when we change connector->force, but we should fix
drm_get_edid() to do the right thing regarless.

-- 
Ville Syrjälä
Intel


Re: [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

2021-10-22 Thread Ville Syrjälä
On Fri, Oct 22, 2021 at 12:25:33PM +0200, Claudio Suarez wrote:
> On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote:
> > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote:
> > > drm_get_edid() internally calls to drm_connector_update_edid_property()
> > > and then drm_add_display_info(), which parses the EDID.
> > > This happens in the function intel_hdmi_set_edid() and
> > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()).
> > > 
> > > Once EDID is parsed, the monitor HDMI support information is available
> > > through drm_display_info.is_hdmi. Retriving the same information with
> > > drm_detect_hdmi_monitor() is less efficient. Change to
> > > drm_display_info.is_hdmi
> > 
> > I meant we need to examine all call chains that can lead to
> > .detect() to make sure all of them do in fact update the
> > display_info beforehand.
> 
> Well, I studied it carefully and, yes, all call chains that can lead to
> drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info
> beforehand. In the case that this doesn't happen, the code is unchanged.
> 
> Do you want I explain the changes in the code here again ? Or do you want
> to me change the commit message to be more clear ? In the first case, I can
> write here a detailed explanation. In the second case I can make a longer 
> commit
> message.
> 
> Or both?

I want all those call chains explained in the commit message,
otherwise I have no easy way to confirm whether the change
is correct or not.

-- 
Ville Syrjälä
Intel


Re: [PATCH v4 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr

2021-10-22 Thread Christian König
Reviewed-by: Christian König  for the entire 
series.


But please keep in mind that this here needs extensive testing.

Thanks,
Christian.

Am 22.10.21 um 12:54 schrieb Nirmoy Das:

Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.

v4: remove unused adev.
v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 22 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
  4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..2b53d86aebac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,

amdgpu_virt_init_data_exchange(adev);
/* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
+   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT));
+   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
if (r)
goto out;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index c18f16b3be9c..9151950e0cc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device 
*dev,
  {
struct drm_device *ddev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(ddev);
-   struct ttm_resource_manager *man;

-   man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-   return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man));
+   return sysfs_emit(buf, "%llu\n", 
amdgpu_gtt_mgr_usage(>mman.gtt_mgr));
  }

  static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
@@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct 
ttm_resource_manager *man,
  /**
   * amdgpu_gtt_mgr_usage - return usage of GTT domain
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Return how many bytes are used in the GTT domain
   */
-uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
+uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr)
  {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   s64 result = man->size - atomic64_read(>available);
+   s64 result;
+
+   result = mgr->manager.size - atomic64_read(>available);

return (result > 0 ? result : 0) * PAGE_SIZE;
  }
@@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager 
*man)
  /**
   * amdgpu_gtt_mgr_recover - re-init gart
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Re-init the gart for each known BO in the GTT.
   */
-int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man)
+int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
  {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   struct amdgpu_device *adev;
struct amdgpu_gtt_node *node;
struct drm_mm_node *mm_node;
+   struct amdgpu_device *adev;
int r = 0;

adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -267,7 +265,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,

drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, 
usage:%lluMB\n",
   man->size, (u64)atomic64_read(>available),
-  amdgpu_gtt_mgr_usage(man) >> 20);
+  amdgpu_gtt_mgr_usage(mgr) >> 20);
  }

  static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d2955ea4a62b..603ce32db5c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM));
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GTT_USAGE:

Re: [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

2021-10-22 Thread Claudio Suarez
On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote:
> On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote:
> > drm_get_edid() internally calls to drm_connector_update_edid_property()
> > and then drm_add_display_info(), which parses the EDID.
> > This happens in the function intel_hdmi_set_edid() and
> > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()).
> > 
> > Once EDID is parsed, the monitor HDMI support information is available
> > through drm_display_info.is_hdmi. Retriving the same information with
> > drm_detect_hdmi_monitor() is less efficient. Change to
> > drm_display_info.is_hdmi
> 
> I meant we need to examine all call chains that can lead to
> .detect() to make sure all of them do in fact update the
> display_info beforehand.

Well, I studied it carefully and, yes, all call chains that can lead to
drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info
beforehand. In the case that this doesn't happen, the code is unchanged.

Do you want I explain the changes in the code here again ? Or do you want
to me change the commit message to be more clear ? In the first case, I can
write here a detailed explanation. In the second case I can make a longer commit
message.

Or both?

Best Regards,
Claudio Suarez.




Re: [PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case

2021-10-22 Thread Lazar, Lijo




On 10/21/2021 7:26 PM, Russell, Kent wrote:

[AMD Official Use Only]




-Original Message-
From: Lazar, Lijo 
Sent: Thursday, October 21, 2021 1:25 AM
To: Russell, Kent ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Joshi, Mukul 
Subject: Re: [PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case



On 10/20/2021 10:05 PM, Kent Russell wrote:

If the bad_page_threshold kernel parameter is set to -2,
continue to post the GPU. Print a warning to dmesg that this action has
been done, and that page retirement will obviously not work for said GPU

Cc: Luben Tuikov 
Cc: Mukul Joshi 
Signed-off-by: Kent Russell 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 13 +
   1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c

index 1ede0f0d6f55..31852330c1db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1115,11 +1115,16 @@ int amdgpu_ras_eeprom_init(struct

amdgpu_ras_eeprom_control *control,

res = amdgpu_ras_eeprom_correct_header_tag(control,
   
RAS_TABLE_HDR_VAL);
} else {
-   *exceed_err_limit = true;
-   dev_err(adev->dev,
-   "RAS records:%d exceed threshold:%d, "
-   "GPU will not be initialized. Replace this GPU 
or increase the

threshold",

+   dev_err(adev->dev, "RAS records:%d exceed threshold:%d",
control->ras_num_recs, 
ras->bad_page_cnt_threshold);
+   if (amdgpu_bad_page_threshold == -2) {
+   dev_warn(adev->dev, "GPU will be initialized 
due to

bad_page_threshold = -2.");

+   dev_warn(adev->dev, "Page retirement will not 
work for

this GPU in this state.");

Now, this looks as good as booting with 0 = disable bad page retirement.
I thought page retirement will work as long as EEPROM has space, but it
won't bother about the threshold. If the intent is to ignore bad page
retirement, then 0 is good enough and -2 is not required.

Also, when user passes threshold=-2, what is the threshold being
compared against to say *exceed_err_limit = true?


My thought on having the -2 option is that we'll still enable page retirement, 
we just won't shut the GPU down when it hits the threshold. The bad pages will 
still be retired as we hit them, it will just never disable the GPU. The 
comment about retirement not working is definitely incorrect now (leftover from 
previous local patches), so I'll remove that. In this case, I don't think we'd 
ever exceed the error limit. exceed_err_limit is only really used when we are 
disabling the GPU, so we wouldn't want to set that to true. Otherwise we 
wouldn't be loading the bad pages in gpu_recovery_init, and we'll still return 
0 from gpu_recovery_init.



Probably, this is too late as this patch is superseded. Replying to this 
to keep the context.


I realized that the question went in the wrong direction because of 
*exceed_err_limit = true. Saw it in the earlier line but didn't realize 
that it was removed.


Anyway, main question I wanted to ask was -

	"When user passes threshold=-2, what is the threshold being compared 
against?"


Looked it up today and saw this piece of code.
if (amdgpu_bad_page_threshold < 0) {
u64 val = adev->gmc.mc_vram_size;

do_div(val, RAS_BAD_PAGE_COVER);
con->bad_page_cnt_threshold = min(lower_32_bits(val),
  max_count);


For a user who has been specifying say bad_page_threshold=100 in the 
command line and unable to boot, now this is doing something else.


The comparison in next boot is done against a different threshold. In 
fact, it won't even come to this comparison logic in the next boot as 
the threshold is raised by default when this parameter is specified.


The only case this parameter takes effect is when a user is unable to 
boot because bad page retirements have exceeded the "default threshold". 
In other cases it just resets the comparison limit. Hope that is the 
intention.


Thanks,
Lijo



  Kent


Thanks,
Lijo


+   res = 0;
+   } else {
+   *exceed_err_limit = true;
+   dev_err(adev->dev, "GPU will not be 
initialized. Replace this

GPU or increase the threshold.");

+   }
}
} else {
DRM_INFO("Creating a new EEPROM table");



[PATCH 1/1] drm/amdgpu: remove unnecessary checks

2021-10-22 Thread Nirmoy Das
amdgpu_ttm_backend_bind() only needed for TTM_PL_TT
and AMDGPU_PL_PREEMPT.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d784f8d3a834..eb872fc4ad92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -914,11 +914,6 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
 ttm->num_pages, bo_mem, ttm);
}
 
-   if (bo_mem->mem_type == AMDGPU_PL_GDS ||
-   bo_mem->mem_type == AMDGPU_PL_GWS ||
-   bo_mem->mem_type == AMDGPU_PL_OA)
-   return -EINVAL;
-
if (bo_mem->mem_type != TTM_PL_TT ||
!amdgpu_gtt_mgr_has_gart_addr(bo_mem)) {
gtt->offset = AMDGPU_BO_INVALID_OFFSET;
-- 
2.32.0



Re: [PATCH v4 3/3] drm/amdgpu: recover gart table at resume

2021-10-22 Thread Nirmoy



On 10/22/21 12:54 PM, Nirmoy Das wrote:

Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.

v4: return amdgpu_bo_create_kernel() directly without checking
its return value.
v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 82 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  3 +-
  7 files changed, 11 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b53d86aebac..f0c70e9d37fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
if (!adev->in_s0ix)
amdgpu_amdkfd_suspend(adev, adev->in_runpm);

-   /* First evict vram memory */
amdgpu_device_evict_resources(adev);

amdgpu_fence_driver_hw_fini(adev);

amdgpu_device_ip_suspend_phase2(adev);
-   /* This second call to evict device resources is to evict
-* the gart page table using the CPU.
-*/
-   amdgpu_device_evict_resources(adev);

return 0;
  }
@@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
goto error;

amdgpu_virt_init_data_exchange(adev);
-   /* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
-   if (r)
-   goto out;
-
r = amdgpu_device_fw_loading(tmp_adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index d3e4203f6217..3525f87dc1af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -116,78 +116,12 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device 
*adev)
  {
int r;



I forgot to remove this unused r and now resent[1] this patch again but 
forgot add in-reply-to.



https://patchwork.freedesktop.org/patch/460939/


Regards,

Nirmoy



-   if (adev->gart.bo == NULL) {
-   struct amdgpu_bo_param bp;
-
-   memset(, 0, sizeof(bp));
-   bp.size = adev->gart.table_size;
-   bp.byte_align = PAGE_SIZE;
-   bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
-   bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
-   bp.type = ttm_bo_type_kernel;
-   bp.resv = NULL;
-   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
-
-   r = amdgpu_bo_create(adev, , >gart.bo);
-   if (r) {
-   return r;
-   }
-   }
-   return 0;
-}
-
-/**
- * amdgpu_gart_table_vram_pin - pin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Pin the GART page table in vram so it will not be moved
- * by the memory manager (pcie r4xx, r5xx+).  These asics require the
- * gart table to be in video memory.
- * Returns 0 for success, error for failure.
- */
-int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev)
-{
-   int r;
-
-   r = amdgpu_bo_reserve(adev->gart.bo, false);
-   if (unlikely(r != 0))
-   return r;
-   r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM);
-   if (r) {
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-   }
-   r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr);
-   if (r)
-   amdgpu_bo_unpin(adev->gart.bo);
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-}
-
-/**
- * amdgpu_gart_table_vram_unpin - unpin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Unpin the GART page table in vram (pcie r4xx, r5xx+).
- * These asics require the gart table to be in video memory.
- */
-void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev)
-{
-   int r;
+   

[PATCH 3/3] drm/amdgpu: recover gart table at resume

2021-10-22 Thread Nirmoy Das
Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.

v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()
Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 84 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  3 +-
 7 files changed, 11 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b53d86aebac..f0c70e9d37fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
if (!adev->in_s0ix)
amdgpu_amdkfd_suspend(adev, adev->in_runpm);
 
-   /* First evict vram memory */
amdgpu_device_evict_resources(adev);
 
amdgpu_fence_driver_hw_fini(adev);
 
amdgpu_device_ip_suspend_phase2(adev);
-   /* This second call to evict device resources is to evict
-* the gart page table using the CPU.
-*/
-   amdgpu_device_evict_resources(adev);
 
return 0;
 }
@@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
goto error;
 
amdgpu_virt_init_data_exchange(adev);
-   /* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);
 
r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}
 
-   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
-   if (r)
-   goto out;
-
r = amdgpu_device_fw_loading(tmp_adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index d3e4203f6217..645950a653a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -114,80 +114,12 @@ void amdgpu_gart_dummy_page_fini(struct amdgpu_device 
*adev)
  */
 int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev)
 {
-   int r;
-
-   if (adev->gart.bo == NULL) {
-   struct amdgpu_bo_param bp;
-
-   memset(, 0, sizeof(bp));
-   bp.size = adev->gart.table_size;
-   bp.byte_align = PAGE_SIZE;
-   bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
-   bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
-   bp.type = ttm_bo_type_kernel;
-   bp.resv = NULL;
-   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
-
-   r = amdgpu_bo_create(adev, , >gart.bo);
-   if (r) {
-   return r;
-   }
-   }
-   return 0;
-}
-
-/**
- * amdgpu_gart_table_vram_pin - pin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Pin the GART page table in vram so it will not be moved
- * by the memory manager (pcie r4xx, r5xx+).  These asics require the
- * gart table to be in video memory.
- * Returns 0 for success, error for failure.
- */
-int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev)
-{
-   int r;
-
-   r = amdgpu_bo_reserve(adev->gart.bo, false);
-   if (unlikely(r != 0))
-   return r;
-   r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM);
-   if (r) {
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-   }
-   r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr);
-   if (r)
-   amdgpu_bo_unpin(adev->gart.bo);
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-}
-
-/**
- * amdgpu_gart_table_vram_unpin - unpin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Unpin the GART page table in vram (pcie r4xx, r5xx+).
- * These asics require the gart table to be in video memory.
- */
-void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev)
-{
-   int r;
+   if (adev->gart.bo != NULL)
+   return 0;
 
-   if (adev->gart.bo == NULL) {
-   return;
-   }
-   r = amdgpu_bo_reserve(adev->gart.bo, true);
-   if (likely(r == 0)) {
-   

[PATCH 2/3] drm/amdgpu: do not pass ttm_resource_manager to vram_mgr

2021-10-22 Thread Nirmoy Das
Do not allow exported amdgpu_vram_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call vram_mgr functions.

v2: pass adev's vram_mgr instead of adev
Signed-off-by: Nirmoy Das 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  5 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 10 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  5 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 40 
 7 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7077f21f0021..df818e145d9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -531,9 +531,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int 
dma_buf_fd,
 uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
-   struct ttm_resource_manager *vram_man = 
ttm_manager_type(>mman.bdev, TTM_PL_VRAM);

-   return amdgpu_vram_mgr_usage(vram_man);
+   return amdgpu_vram_mgr_usage(>mman.vram_mgr);
 }

 uint64_t amdgpu_amdkfd_get_hive_id(struct kgd_dev *kgd)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 76fe5b71e35d..7e745164a624 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -298,7 +298,6 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
 {
s64 time_us, increment_us;
u64 free_vram, total_vram, used_vram;
-   struct ttm_resource_manager *vram_man = 
ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
/* Allow a maximum of 200 accumulated ms. This is basically per-IB
 * throttling.
 *
@@ -315,7 +314,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
}

total_vram = adev->gmc.real_vram_size - 
atomic64_read(>vram_pin_size);
-   used_vram = amdgpu_vram_mgr_usage(vram_man);
+   used_vram = amdgpu_vram_mgr_usage(>mman.vram_mgr);
free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram;

spin_lock(>mm_stats.lock);
@@ -362,7 +361,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
if (!amdgpu_gmc_vram_full_visible(>gmc)) {
u64 total_vis_vram = adev->gmc.visible_vram_size;
u64 used_vis_vram =
- amdgpu_vram_mgr_vis_usage(vram_man);
+ amdgpu_vram_mgr_vis_usage(>mman.vram_mgr);

if (used_vis_vram < total_vis_vram) {
u64 free_vis_vram = total_vis_vram - used_vis_vram;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 603ce32db5c5..b426e03ad630 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -672,10 +672,10 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
ui64 = atomic64_read(>num_vram_cpu_page_faults);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VRAM_USAGE:
-   ui64 = amdgpu_vram_mgr_usage(ttm_manager_type(>mman.bdev, 
TTM_PL_VRAM));
+   ui64 = amdgpu_vram_mgr_usage(>mman.vram_mgr);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VIS_VRAM_USAGE:
-   ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM));
+   ui64 = amdgpu_vram_mgr_vis_usage(>mman.vram_mgr);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GTT_USAGE:
ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr);
@@ -709,8 +709,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
}
case AMDGPU_INFO_MEMORY: {
struct drm_amdgpu_memory_info mem;
-   struct ttm_resource_manager *vram_man =
-   ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
struct ttm_resource_manager *gtt_man =
ttm_manager_type(>mman.bdev, TTM_PL_TT);
memset(, 0, sizeof(mem));
@@ -719,7 +717,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
atomic64_read(>vram_pin_size) -
AMDGPU_VM_RESERVED_VRAM;
mem.vram.heap_usage =
-   amdgpu_vram_mgr_usage(vram_man);
+   amdgpu_vram_mgr_usage(>mman.vram_mgr);
mem.vram.max_allocation = mem.vram.usable_heap_size 

[PATCH v4 3/3] drm/amdgpu: recover gart table at resume

2021-10-22 Thread Nirmoy Das
Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.

v4: return amdgpu_bo_create_kernel() directly without checking
its return value.
v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()
Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 82 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  3 +-
 7 files changed, 11 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b53d86aebac..f0c70e9d37fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
if (!adev->in_s0ix)
amdgpu_amdkfd_suspend(adev, adev->in_runpm);

-   /* First evict vram memory */
amdgpu_device_evict_resources(adev);

amdgpu_fence_driver_hw_fini(adev);

amdgpu_device_ip_suspend_phase2(adev);
-   /* This second call to evict device resources is to evict
-* the gart page table using the CPU.
-*/
-   amdgpu_device_evict_resources(adev);

return 0;
 }
@@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
goto error;

amdgpu_virt_init_data_exchange(adev);
-   /* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
-   if (r)
-   goto out;
-
r = amdgpu_device_fw_loading(tmp_adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index d3e4203f6217..3525f87dc1af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -116,78 +116,12 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device 
*adev)
 {
int r;

-   if (adev->gart.bo == NULL) {
-   struct amdgpu_bo_param bp;
-
-   memset(, 0, sizeof(bp));
-   bp.size = adev->gart.table_size;
-   bp.byte_align = PAGE_SIZE;
-   bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
-   bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
-   bp.type = ttm_bo_type_kernel;
-   bp.resv = NULL;
-   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
-
-   r = amdgpu_bo_create(adev, , >gart.bo);
-   if (r) {
-   return r;
-   }
-   }
-   return 0;
-}
-
-/**
- * amdgpu_gart_table_vram_pin - pin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Pin the GART page table in vram so it will not be moved
- * by the memory manager (pcie r4xx, r5xx+).  These asics require the
- * gart table to be in video memory.
- * Returns 0 for success, error for failure.
- */
-int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev)
-{
-   int r;
-
-   r = amdgpu_bo_reserve(adev->gart.bo, false);
-   if (unlikely(r != 0))
-   return r;
-   r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM);
-   if (r) {
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-   }
-   r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr);
-   if (r)
-   amdgpu_bo_unpin(adev->gart.bo);
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-}
-
-/**
- * amdgpu_gart_table_vram_unpin - unpin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Unpin the GART page table in vram (pcie r4xx, r5xx+).
- * These asics require the gart table to be in video memory.
- */
-void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev)
-{
-   int r;
+   if (adev->gart.bo != NULL)
+   return 0;

-   if (adev->gart.bo == NULL) {
-   return;
-   }
-   r = amdgpu_bo_reserve(adev->gart.bo, true);
-   if (likely(r == 0)) {
-   

[PATCH v4 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr

2021-10-22 Thread Nirmoy Das
Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.

v4: remove unused adev.
v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 22 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..2b53d86aebac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,

amdgpu_virt_init_data_exchange(adev);
/* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
+   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT));
+   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
if (r)
goto out;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index c18f16b3be9c..9151950e0cc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device 
*dev,
 {
struct drm_device *ddev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(ddev);
-   struct ttm_resource_manager *man;

-   man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-   return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man));
+   return sysfs_emit(buf, "%llu\n", 
amdgpu_gtt_mgr_usage(>mman.gtt_mgr));
 }

 static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
@@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct 
ttm_resource_manager *man,
 /**
  * amdgpu_gtt_mgr_usage - return usage of GTT domain
  *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
  *
  * Return how many bytes are used in the GTT domain
  */
-uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
+uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr)
 {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   s64 result = man->size - atomic64_read(>available);
+   s64 result;
+
+   result = mgr->manager.size - atomic64_read(>available);

return (result > 0 ? result : 0) * PAGE_SIZE;
 }
@@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager 
*man)
 /**
  * amdgpu_gtt_mgr_recover - re-init gart
  *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
  *
  * Re-init the gart for each known BO in the GTT.
  */
-int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man)
+int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
 {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   struct amdgpu_device *adev;
struct amdgpu_gtt_node *node;
struct drm_mm_node *mm_node;
+   struct amdgpu_device *adev;
int r = 0;

adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -267,7 +265,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,

drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, 
usage:%lluMB\n",
   man->size, (u64)atomic64_read(>available),
-  amdgpu_gtt_mgr_usage(man) >> 20);
+  amdgpu_gtt_mgr_usage(mgr) >> 20);
 }

 static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d2955ea4a62b..603ce32db5c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM));
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GTT_USAGE:
-   ui64 = amdgpu_gtt_mgr_usage(ttm_manager_type(>mman.bdev, 
TTM_PL_TT));
+   ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr);
return copy_to_user(out, , min(size, 8u)) ? 

Re: [PATCH v3 3/3] drm/amdgpu: recover gart table at resume

2021-10-22 Thread Christian König

Am 22.10.21 um 11:32 schrieb Nirmoy Das:

Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.

v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 80 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  3 +-
  7 files changed, 12 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b53d86aebac..f0c70e9d37fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
if (!adev->in_s0ix)
amdgpu_amdkfd_suspend(adev, adev->in_runpm);

-   /* First evict vram memory */
amdgpu_device_evict_resources(adev);

amdgpu_fence_driver_hw_fini(adev);

amdgpu_device_ip_suspend_phase2(adev);
-   /* This second call to evict device resources is to evict
-* the gart page table using the CPU.
-*/
-   amdgpu_device_evict_resources(adev);

return 0;
  }
@@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
goto error;

amdgpu_virt_init_data_exchange(adev);
-   /* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
-   if (r)
-   goto out;
-
r = amdgpu_device_fw_loading(tmp_adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index d3e4203f6217..679eec122bb5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -116,78 +116,16 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device 
*adev)
  {
int r;

-   if (adev->gart.bo == NULL) {
-   struct amdgpu_bo_param bp;
-
-   memset(, 0, sizeof(bp));
-   bp.size = adev->gart.table_size;
-   bp.byte_align = PAGE_SIZE;
-   bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
-   bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
-   bp.type = ttm_bo_type_kernel;
-   bp.resv = NULL;
-   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
-
-   r = amdgpu_bo_create(adev, , >gart.bo);
-   if (r) {
-   return r;
-   }
-   }
-   return 0;
-}
-
-/**
- * amdgpu_gart_table_vram_pin - pin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Pin the GART page table in vram so it will not be moved
- * by the memory manager (pcie r4xx, r5xx+).  These asics require the
- * gart table to be in video memory.
- * Returns 0 for success, error for failure.
- */
-int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev)
-{
-   int r;
+   if (adev->gart.bo != NULL)
+   return 0;

-   r = amdgpu_bo_reserve(adev->gart.bo, false);
-   if (unlikely(r != 0))
-   return r;
-   r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM);
+   r = amdgpu_bo_create_kernel(adev,  adev->gart.table_size, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_VRAM, >gart.bo,
+   NULL, (void *)>gart.ptr);
if (r) {
-   amdgpu_bo_unreserve(adev->gart.bo);
return r;
}


Just "return amdgpu_bo_create_kernel(..);" should do it now as far as I 
can see.


Apart from that looks really good now.

Christian.


-   r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr);
-   if (r)
-   amdgpu_bo_unpin(adev->gart.bo);
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-}
-
-/**
- * amdgpu_gart_table_vram_unpin - unpin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Unpin the GART page table in vram (pcie r4xx, r5xx+).

Re: [PATCH v3 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr

2021-10-22 Thread Nirmoy



On 10/22/21 11:44 AM, Christian König wrote:

Am 22.10.21 um 11:32 schrieb Nirmoy Das:

Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.

v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 23 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
  4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 41ce86244144..2b53d86aebac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct 
amdgpu_device *adev,


  amdgpu_virt_init_data_exchange(adev);
  /* we need recover gart prior to run SMC/CP/SDMA resume */
- amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
+    amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

  r = amdgpu_device_fw_loading(adev);
  if (r)
@@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,

  amdgpu_inc_vram_lost(tmp_adev);
  }

-    r = 
amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, 
TTM_PL_TT));

+    r = amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
  if (r)
  goto out;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c

index c18f16b3be9c..62e14efa61fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,10 +77,8 @@ static ssize_t 
amdgpu_mem_info_gtt_used_show(struct device *dev,

  {
  struct drm_device *ddev = dev_get_drvdata(dev);
  struct amdgpu_device *adev = drm_to_adev(ddev);
-    struct ttm_resource_manager *man;

-    man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-    return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man));
+    return sysfs_emit(buf, "%llu\n", 
amdgpu_gtt_mgr_usage(>mman.gtt_mgr));

  }

  static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
@@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct 
ttm_resource_manager *man,

  /**
   * amdgpu_gtt_mgr_usage - return usage of GTT domain
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Return how many bytes are used in the GTT domain
   */
-uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
+uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr)
  {
-    struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-    s64 result = man->size - atomic64_read(>available);
+    s64 result;
+
+    result = mgr->manager.size - atomic64_read(>available);

  return (result > 0 ? result : 0) * PAGE_SIZE;
  }
@@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct 
ttm_resource_manager *man)

  /**
   * amdgpu_gtt_mgr_recover - re-init gart
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Re-init the gart for each known BO in the GTT.
   */
-int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man)
+int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
  {
-    struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-    struct amdgpu_device *adev;
  struct amdgpu_gtt_node *node;
  struct drm_mm_node *mm_node;
+    struct amdgpu_device *adev;
  int r = 0;

  adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -260,6 +258,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,

   struct drm_printer *printer)
  {
  struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
+    struct amdgpu_device *adev = container_of(mgr, typeof(*adev), 
mman.gtt_mgr);


I think that variable is now unused, isn't it?

Apart from that looks good to me now.



Yes it is. I will remove it. Strangely GCC didn't complain.


Nirmoy




Christian.



  spin_lock(>lock);
  drm_mm_print(>mm, printer);
@@ -267,7 +266,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,


  drm_printf(printer, "man size:%llu pages, gtt available:%lld 
pages, usage:%lluMB\n",

 man->size, (u64)atomic64_read(>available),
-   amdgpu_gtt_mgr_usage(man) >> 20);
+   amdgpu_gtt_mgr_usage(mgr) >> 20);
  }

  static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

index d2955ea4a62b..603ce32db5c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, 
void 

Re: [PATCH v2 2/3] drm/amdgpu: do not pass ttm_resource_manager to vram_mgr

2021-10-22 Thread Christian König

Am 22.10.21 um 11:32 schrieb Nirmoy Das:

Do not allow exported amdgpu_vram_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call vram_mgr functions.

v2: pass adev's vram_mgr instead of adev

Signed-off-by: Nirmoy Das 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   |  3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  5 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 10 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  |  6 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  8 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  5 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 40 
  7 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7077f21f0021..df818e145d9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -531,9 +531,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int 
dma_buf_fd,
  uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
-   struct ttm_resource_manager *vram_man = 
ttm_manager_type(>mman.bdev, TTM_PL_VRAM);

-   return amdgpu_vram_mgr_usage(vram_man);
+   return amdgpu_vram_mgr_usage(>mman.vram_mgr);
  }

  uint64_t amdgpu_amdkfd_get_hive_id(struct kgd_dev *kgd)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 76fe5b71e35d..7e745164a624 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -298,7 +298,6 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
  {
s64 time_us, increment_us;
u64 free_vram, total_vram, used_vram;
-   struct ttm_resource_manager *vram_man = 
ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
/* Allow a maximum of 200 accumulated ms. This is basically per-IB
 * throttling.
 *
@@ -315,7 +314,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
}

total_vram = adev->gmc.real_vram_size - 
atomic64_read(>vram_pin_size);
-   used_vram = amdgpu_vram_mgr_usage(vram_man);
+   used_vram = amdgpu_vram_mgr_usage(>mman.vram_mgr);
free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram;

spin_lock(>mm_stats.lock);
@@ -362,7 +361,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
if (!amdgpu_gmc_vram_full_visible(>gmc)) {
u64 total_vis_vram = adev->gmc.visible_vram_size;
u64 used_vis_vram =
- amdgpu_vram_mgr_vis_usage(vram_man);
+ amdgpu_vram_mgr_vis_usage(>mman.vram_mgr);

if (used_vis_vram < total_vis_vram) {
u64 free_vis_vram = total_vis_vram - used_vis_vram;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 603ce32db5c5..b426e03ad630 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -672,10 +672,10 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
ui64 = atomic64_read(>num_vram_cpu_page_faults);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VRAM_USAGE:
-   ui64 = amdgpu_vram_mgr_usage(ttm_manager_type(>mman.bdev, 
TTM_PL_VRAM));
+   ui64 = amdgpu_vram_mgr_usage(>mman.vram_mgr);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VIS_VRAM_USAGE:
-   ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM));
+   ui64 = amdgpu_vram_mgr_vis_usage(>mman.vram_mgr);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GTT_USAGE:
ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr);
@@ -709,8 +709,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
}
case AMDGPU_INFO_MEMORY: {
struct drm_amdgpu_memory_info mem;
-   struct ttm_resource_manager *vram_man =
-   ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
struct ttm_resource_manager *gtt_man =
ttm_manager_type(>mman.bdev, TTM_PL_TT);
memset(, 0, sizeof(mem));
@@ -719,7 +717,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
atomic64_read(>vram_pin_size) -
AMDGPU_VM_RESERVED_VRAM;
mem.vram.heap_usage =
-   amdgpu_vram_mgr_usage(vram_man);
+   amdgpu_vram_mgr_usage(>mman.vram_mgr);
   

Re: [PATCH v3 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr

2021-10-22 Thread Christian König

Am 22.10.21 um 11:32 schrieb Nirmoy Das:

Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.

v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 23 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
  4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..2b53d86aebac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,

amdgpu_virt_init_data_exchange(adev);
/* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
+   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT));
+   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
if (r)
goto out;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index c18f16b3be9c..62e14efa61fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device 
*dev,
  {
struct drm_device *ddev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(ddev);
-   struct ttm_resource_manager *man;

-   man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-   return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man));
+   return sysfs_emit(buf, "%llu\n", 
amdgpu_gtt_mgr_usage(>mman.gtt_mgr));
  }

  static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
@@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct 
ttm_resource_manager *man,
  /**
   * amdgpu_gtt_mgr_usage - return usage of GTT domain
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Return how many bytes are used in the GTT domain
   */
-uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
+uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr)
  {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   s64 result = man->size - atomic64_read(>available);
+   s64 result;
+
+   result = mgr->manager.size - atomic64_read(>available);

return (result > 0 ? result : 0) * PAGE_SIZE;
  }
@@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager 
*man)
  /**
   * amdgpu_gtt_mgr_recover - re-init gart
   *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
   *
   * Re-init the gart for each known BO in the GTT.
   */
-int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man)
+int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
  {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   struct amdgpu_device *adev;
struct amdgpu_gtt_node *node;
struct drm_mm_node *mm_node;
+   struct amdgpu_device *adev;
int r = 0;

adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -260,6 +258,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,
 struct drm_printer *printer)
  {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
+   struct amdgpu_device *adev = container_of(mgr, typeof(*adev), 
mman.gtt_mgr);


I think that variable is now unused, isn't it?

Apart from that looks good to me now.

Christian.



spin_lock(>lock);
drm_mm_print(>mm, printer);
@@ -267,7 +266,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,

drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, 
usage:%lluMB\n",
   man->size, (u64)atomic64_read(>available),
-  amdgpu_gtt_mgr_usage(man) >> 20);
+  amdgpu_gtt_mgr_usage(mgr) >> 20);
  }

  static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d2955ea4a62b..603ce32db5c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -678,7 

[PATCH v2 2/3] drm/amdgpu: do not pass ttm_resource_manager to vram_mgr

2021-10-22 Thread Nirmoy Das
Do not allow exported amdgpu_vram_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call vram_mgr functions.

v2: pass adev's vram_mgr instead of adev

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  5 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 10 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  5 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 40 
 7 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7077f21f0021..df818e145d9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -531,9 +531,8 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int 
dma_buf_fd,
 uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
-   struct ttm_resource_manager *vram_man = 
ttm_manager_type(>mman.bdev, TTM_PL_VRAM);

-   return amdgpu_vram_mgr_usage(vram_man);
+   return amdgpu_vram_mgr_usage(>mman.vram_mgr);
 }

 uint64_t amdgpu_amdkfd_get_hive_id(struct kgd_dev *kgd)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 76fe5b71e35d..7e745164a624 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -298,7 +298,6 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
 {
s64 time_us, increment_us;
u64 free_vram, total_vram, used_vram;
-   struct ttm_resource_manager *vram_man = 
ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
/* Allow a maximum of 200 accumulated ms. This is basically per-IB
 * throttling.
 *
@@ -315,7 +314,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
}

total_vram = adev->gmc.real_vram_size - 
atomic64_read(>vram_pin_size);
-   used_vram = amdgpu_vram_mgr_usage(vram_man);
+   used_vram = amdgpu_vram_mgr_usage(>mman.vram_mgr);
free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram;

spin_lock(>mm_stats.lock);
@@ -362,7 +361,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
if (!amdgpu_gmc_vram_full_visible(>gmc)) {
u64 total_vis_vram = adev->gmc.visible_vram_size;
u64 used_vis_vram =
- amdgpu_vram_mgr_vis_usage(vram_man);
+ amdgpu_vram_mgr_vis_usage(>mman.vram_mgr);

if (used_vis_vram < total_vis_vram) {
u64 free_vis_vram = total_vis_vram - used_vis_vram;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 603ce32db5c5..b426e03ad630 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -672,10 +672,10 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
ui64 = atomic64_read(>num_vram_cpu_page_faults);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VRAM_USAGE:
-   ui64 = amdgpu_vram_mgr_usage(ttm_manager_type(>mman.bdev, 
TTM_PL_VRAM));
+   ui64 = amdgpu_vram_mgr_usage(>mman.vram_mgr);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VIS_VRAM_USAGE:
-   ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, TTM_PL_VRAM));
+   ui64 = amdgpu_vram_mgr_vis_usage(>mman.vram_mgr);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GTT_USAGE:
ui64 = amdgpu_gtt_mgr_usage(>mman.gtt_mgr);
@@ -709,8 +709,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
}
case AMDGPU_INFO_MEMORY: {
struct drm_amdgpu_memory_info mem;
-   struct ttm_resource_manager *vram_man =
-   ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
struct ttm_resource_manager *gtt_man =
ttm_manager_type(>mman.bdev, TTM_PL_TT);
memset(, 0, sizeof(mem));
@@ -719,7 +717,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
atomic64_read(>vram_pin_size) -
AMDGPU_VM_RESERVED_VRAM;
mem.vram.heap_usage =
-   amdgpu_vram_mgr_usage(vram_man);
+   amdgpu_vram_mgr_usage(>mman.vram_mgr);
mem.vram.max_allocation = mem.vram.usable_heap_size * 3 / 4;


[PATCH v3 3/3] drm/amdgpu: recover gart table at resume

2021-10-22 Thread Nirmoy Das
Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.

v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 80 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  3 +-
 7 files changed, 12 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b53d86aebac..f0c70e9d37fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3935,16 +3935,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
if (!adev->in_s0ix)
amdgpu_amdkfd_suspend(adev, adev->in_runpm);

-   /* First evict vram memory */
amdgpu_device_evict_resources(adev);

amdgpu_fence_driver_hw_fini(adev);

amdgpu_device_ip_suspend_phase2(adev);
-   /* This second call to evict device resources is to evict
-* the gart page table using the CPU.
-*/
-   amdgpu_device_evict_resources(adev);

return 0;
 }
@@ -4286,8 +4281,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
goto error;

amdgpu_virt_init_data_exchange(adev);
-   /* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,10 +4597,6 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
-   if (r)
-   goto out;
-
r = amdgpu_device_fw_loading(tmp_adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index d3e4203f6217..679eec122bb5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -116,78 +116,16 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device 
*adev)
 {
int r;

-   if (adev->gart.bo == NULL) {
-   struct amdgpu_bo_param bp;
-
-   memset(, 0, sizeof(bp));
-   bp.size = adev->gart.table_size;
-   bp.byte_align = PAGE_SIZE;
-   bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
-   bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
-   bp.type = ttm_bo_type_kernel;
-   bp.resv = NULL;
-   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
-
-   r = amdgpu_bo_create(adev, , >gart.bo);
-   if (r) {
-   return r;
-   }
-   }
-   return 0;
-}
-
-/**
- * amdgpu_gart_table_vram_pin - pin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Pin the GART page table in vram so it will not be moved
- * by the memory manager (pcie r4xx, r5xx+).  These asics require the
- * gart table to be in video memory.
- * Returns 0 for success, error for failure.
- */
-int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev)
-{
-   int r;
+   if (adev->gart.bo != NULL)
+   return 0;

-   r = amdgpu_bo_reserve(adev->gart.bo, false);
-   if (unlikely(r != 0))
-   return r;
-   r = amdgpu_bo_pin(adev->gart.bo, AMDGPU_GEM_DOMAIN_VRAM);
+   r = amdgpu_bo_create_kernel(adev,  adev->gart.table_size, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_VRAM, >gart.bo,
+   NULL, (void *)>gart.ptr);
if (r) {
-   amdgpu_bo_unreserve(adev->gart.bo);
return r;
}
-   r = amdgpu_bo_kmap(adev->gart.bo, >gart.ptr);
-   if (r)
-   amdgpu_bo_unpin(adev->gart.bo);
-   amdgpu_bo_unreserve(adev->gart.bo);
-   return r;
-}
-
-/**
- * amdgpu_gart_table_vram_unpin - unpin gart page table in vram
- *
- * @adev: amdgpu_device pointer
- *
- * Unpin the GART page table in vram (pcie r4xx, r5xx+).
- * These asics require the gart table to be in video memory.
- */
-void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev)
-{
-   int r;
-
-   if (adev->gart.bo == NULL) {
- 

[PATCH v3 1/3] drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr

2021-10-22 Thread Nirmoy Das
Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.

v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 23 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
 4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..2b53d86aebac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4287,7 +4287,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,

amdgpu_virt_init_data_exchange(adev);
/* we need recover gart prior to run SMC/CP/SDMA resume */
-   amdgpu_gtt_mgr_recover(ttm_manager_type(>mman.bdev, TTM_PL_TT));
+   amdgpu_gtt_mgr_recover(>mman.gtt_mgr);

r = amdgpu_device_fw_loading(adev);
if (r)
@@ -4604,7 +4604,7 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
amdgpu_inc_vram_lost(tmp_adev);
}

-   r = 
amdgpu_gtt_mgr_recover(ttm_manager_type(_adev->mman.bdev, TTM_PL_TT));
+   r = 
amdgpu_gtt_mgr_recover(_adev->mman.gtt_mgr);
if (r)
goto out;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index c18f16b3be9c..62e14efa61fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,10 +77,8 @@ static ssize_t amdgpu_mem_info_gtt_used_show(struct device 
*dev,
 {
struct drm_device *ddev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(ddev);
-   struct ttm_resource_manager *man;

-   man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-   return sysfs_emit(buf, "%llu\n", amdgpu_gtt_mgr_usage(man));
+   return sysfs_emit(buf, "%llu\n", 
amdgpu_gtt_mgr_usage(>mman.gtt_mgr));
 }

 static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
@@ -206,14 +204,15 @@ static void amdgpu_gtt_mgr_del(struct 
ttm_resource_manager *man,
 /**
  * amdgpu_gtt_mgr_usage - return usage of GTT domain
  *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
  *
  * Return how many bytes are used in the GTT domain
  */
-uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
+uint64_t amdgpu_gtt_mgr_usage(struct amdgpu_gtt_mgr *mgr)
 {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   s64 result = man->size - atomic64_read(>available);
+   s64 result;
+
+   result = mgr->manager.size - atomic64_read(>available);

return (result > 0 ? result : 0) * PAGE_SIZE;
 }
@@ -221,16 +220,15 @@ uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager 
*man)
 /**
  * amdgpu_gtt_mgr_recover - re-init gart
  *
- * @man: TTM memory type manager
+ * @mgr: amdgpu_gtt_mgr pointer
  *
  * Re-init the gart for each known BO in the GTT.
  */
-int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man)
+int amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
 {
-   struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   struct amdgpu_device *adev;
struct amdgpu_gtt_node *node;
struct drm_mm_node *mm_node;
+   struct amdgpu_device *adev;
int r = 0;

adev = container_of(mgr, typeof(*adev), mman.gtt_mgr);
@@ -260,6 +258,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,
 struct drm_printer *printer)
 {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
+   struct amdgpu_device *adev = container_of(mgr, typeof(*adev), 
mman.gtt_mgr);

spin_lock(>lock);
drm_mm_print(>mm, printer);
@@ -267,7 +266,7 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,

drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, 
usage:%lluMB\n",
   man->size, (u64)atomic64_read(>available),
-  amdgpu_gtt_mgr_usage(man) >> 20);
+  amdgpu_gtt_mgr_usage(mgr) >> 20);
 }

 static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d2955ea4a62b..603ce32db5c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -678,7 +678,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
ui64 = 
amdgpu_vram_mgr_vis_usage(ttm_manager_type(>mman.bdev, 

Re: [PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

2021-10-22 Thread JingWen Chen
ping

On 2021/10/22 AM11:33, Jingwen Chen wrote:
> [Why]
> In advance tdr mode, the real bad job will be resubmitted twice, while
> in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
> is put one more time than other jobs.
>
> [How]
> Adding dma_fence_get before resbumit job in
> amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs
>
> Signed-off-by: Jingwen Chen 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 41ce86244144..975f069f6fe8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4841,6 +4841,9 @@ static void amdgpu_device_recheck_guilty_jobs(
>  
>   /* clear job's guilty and depend the folowing step to decide 
> the real one */
>   drm_sched_reset_karma(s_job);
> + /* for the real bad job, it will be resubmitted twice, adding a 
> dma_fence_get
> +  * to make sure fence is balanced */
> + dma_fence_get(s_job->s_fence->parent);
>   drm_sched_resubmit_jobs_ext(>sched, 1);
>  
>   ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, 
> ring->sched.timeout);
> @@ -4876,6 +4879,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>  
>   /* got the hw fence, signal finished fence */
>   atomic_dec(ring->sched.score);
> + dma_fence_put(s_job->s_fence->parent);
>   dma_fence_get(_job->s_fence->finished);
>   dma_fence_signal(_job->s_fence->finished);
>   dma_fence_put(_job->s_fence->finished);


[PATCH] drm/amd/pm: Disable fan control if not supported

2021-10-22 Thread Lijo Lazar
On arcturus, not all platforms use PMFW based fan control. On such
ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon
knobs for fan control also as it is not possible to report or control
fan speed on such platforms through driver.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 082f01893f3d..fd1d30a93db5 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -436,6 +436,19 @@ static void arcturus_check_bxco_support(struct smu_context 
*smu)
}
 }
 
+static void arcturus_check_fan_support(struct smu_context *smu)
+{
+   struct smu_table_context *table_context = >smu_table;
+   PPTable_t *pptable = table_context->driver_pptable;
+
+   /* No sort of fan control possible if PPTable has it disabled */
+   smu->adev->pm.no_fan =
+   !(pptable->FeaturesToRun[0] & FEATURE_FAN_CONTROL_MASK);
+   if (smu->adev->pm.no_fan)
+   dev_info_once(smu->adev->dev,
+ "PMFW based fan control disabled");
+}
+
 static int arcturus_check_powerplay_table(struct smu_context *smu)
 {
struct smu_table_context *table_context = >smu_table;
@@ -443,6 +456,7 @@ static int arcturus_check_powerplay_table(struct 
smu_context *smu)
table_context->power_play_table;
 
arcturus_check_bxco_support(smu);
+   arcturus_check_fan_support(smu);
 
table_context->thermal_controller_type =
powerplay_table->thermal_controller_type;
-- 
2.17.1