RE: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU active time per entity"

2023-08-16 Thread Chen, Guchun
[Public]

Hi Xinhui,

That patch has been reverted on Linux mainline.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/scheduler/sched_main.c?h=v6.5-rc6=baad10973fdb442912af676de3348e80bd8fe602

Regards,
Guchun

> -Original Message-
> From: amd-gfx  On Behalf Of
> xinhui pan
> Sent: Thursday, August 17, 2023 1:05 PM
> To: amd-...@lists.freedesktop.org
> Cc: Pan, Xinhui ; dri-devel@lists.freedesktop.org;
> Tuikov, Luben ; airl...@gmail.com; Koenig,
> Christian ; l.st...@pengutronix.de
> Subject: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU
> active time per entity"
>
> This patch partially revert commit df622729ddbf ("drm/scheduler: track GPU
> active time per entity") which touchs entity without any reference.
>
> I notice there is one memory overwritten from gpu scheduler side.
> The case is like below.
> A(drm_sched_main) B(vm fini)
> drm_sched_job_begin   drm_sched_entity_kill
>   //job in pending_list   wait_for_completion
> complete_all  ...
> ...   kfree entity
> drm_sched_get_cleanup_job
>   //fetch job from pending_list
>   access job->entity //memory overwitten
>
> As long as we can NOT guarantee entity is alive in this case, lets revert it 
> for
> now.
>
> Signed-off-by: xinhui pan 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 602361c690c9..1b3f1a6a8514 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -907,12 +907,6 @@ drm_sched_get_cleanup_job(struct
> drm_gpu_scheduler *sched)
>
>   spin_unlock(>job_list_lock);
>
> - if (job) {
> - job->entity->elapsed_ns += ktime_to_ns(
> - ktime_sub(job->s_fence->finished.timestamp,
> -   job->s_fence->scheduled.timestamp));
> - }
> -
>   return job;
>  }
>
> --
> 2.34.1

<>

RE: [PATCH] drm/ttm: check null pointer before accessing when swapping

2023-07-27 Thread Chen, Guchun
[Public]

> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, July 27, 2023 3:28 PM
> To: Alex Deucher ; Chen, Guchun
> 
> Cc: Deucher, Alexander ; airl...@gmail.com;
> dan...@ffwll.ch; dri-devel@lists.freedesktop.org; Mikhail Gavrilov
> 
> Subject: Re: [PATCH] drm/ttm: check null pointer before accessing when
> swapping
>
> Am 24.07.23 um 15:36 schrieb Alex Deucher:
> > On Sun, Jul 23, 2023 at 10:43 PM Guchun Chen 
> wrote:
> >> Add a check to avoid null pointer dereference as below:
> >>
> >> [   90.002283] general protection fault, probably for non-canonical
> >> address 0xdc00:  [#1] PREEMPT SMP KASAN NOPTI
> >> [   90.002292] KASAN: null-ptr-deref in range
> >> [0x-0x0007]
> >> [   90.002346]  ? exc_general_protection+0x159/0x240
> >> [   90.002352]  ? asm_exc_general_protection+0x26/0x30
> >> [   90.002357]  ? ttm_bo_evict_swapout_allowable+0x322/0x5e0 [ttm]
> >> [   90.002365]  ? ttm_bo_evict_swapout_allowable+0x42e/0x5e0 [ttm]
> >> [   90.002373]  ttm_bo_swapout+0x134/0x7f0 [ttm]
> >> [   90.002383]  ? __pfx_ttm_bo_swapout+0x10/0x10 [ttm]
> >> [   90.002391]  ? lock_acquire+0x44d/0x4f0
> >> [   90.002398]  ? ttm_device_swapout+0xa5/0x260 [ttm]
> >> [   90.002412]  ? lock_acquired+0x355/0xa00
> >> [   90.002416]  ? do_raw_spin_trylock+0xb6/0x190
> >> [   90.002421]  ? __pfx_lock_acquired+0x10/0x10
> >> [   90.002426]  ? ttm_global_swapout+0x25/0x210 [ttm]
> >> [   90.002442]  ttm_device_swapout+0x198/0x260 [ttm]
> >> [   90.002456]  ? __pfx_ttm_device_swapout+0x10/0x10 [ttm]
> >> [   90.002472]  ttm_global_swapout+0x75/0x210 [ttm]
> >> [   90.002486]  ttm_tt_populate+0x187/0x3f0 [ttm]
> >> [   90.002501]  ttm_bo_handle_move_mem+0x437/0x590 [ttm]
> >> [   90.002517]  ttm_bo_validate+0x275/0x430 [ttm]
> >> [   90.002530]  ? __pfx_ttm_bo_validate+0x10/0x10 [ttm]
> >> [   90.002544]  ? kasan_save_stack+0x33/0x60
> >> [   90.002550]  ? kasan_set_track+0x25/0x30
> >> [   90.002554]  ? __kasan_kmalloc+0x8f/0xa0
> >> [   90.002558]  ? amdgpu_gtt_mgr_new+0x81/0x420 [amdgpu]
> >> [   90.003023]  ? ttm_resource_alloc+0xf6/0x220 [ttm]
> >> [   90.003038]  amdgpu_bo_pin_restricted+0x2dd/0x8b0 [amdgpu]
> >> [   90.003210]  ? __x64_sys_ioctl+0x131/0x1a0
> >> [   90.003210]  ? do_syscall_64+0x60/0x90
> >>
> >> Fixes: a2848d08742c ("drm/ttm: never consider pinned BOs for
> >> eviction")
> >> Tested-by: Mikhail Gavrilov 
> >> Signed-off-by: Guchun Chen 
> > Reviewed-by: Alex Deucher 
>
> Reviewed-by: Christian König 
>
> Has this already been pushed to drm-misc-next?
>
> Thanks,
> Christian.

Not yet, Christian, as I don't have push permission. I saw you were on 
vacation, so I would expect to ping you to push after you are back with full 
recharge.

Regards,
Guchun

> >
> >> ---
> >>   drivers/gpu/drm/ttm/ttm_bo.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> >> b/drivers/gpu/drm/ttm/ttm_bo.c index 7139a522b2f3..54e3083076b7
> >> 100644
> >> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >> @@ -519,7 +519,8 @@ static bool
> ttm_bo_evict_swapout_allowable(struct
> >> ttm_buffer_object *bo,
> >>
> >>  if (bo->pin_count) {
> >>  *locked = false;
> >> -   *busy = false;
> >> +   if (busy)
> >> +   *busy = false;
> >>  return false;
> >>  }
> >>
> >> --
> >> 2.25.1
> >>



RE: [PATCH] drm/amdgpu: display/Kconfig: replace leading spaces with tab

2023-06-07 Thread Chen, Guchun
[Public]

It's 
https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-staging-drm-next?ref_type=heads.
 Latest patches including yours's will be pushed to this branch after a while.

Regards,
Guchun

> -Original Message-
> From: amd-gfx  On Behalf Of Sui
> Jingfeng
> Sent: Wednesday, June 7, 2023 2:34 PM
> To: Alex Deucher 
> Cc: Li, Sun peng (Leo) ; David Airlie
> ; Pan, Xinhui ; Siqueira, Rodrigo
> ; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; Daniel Vetter
> ; Deucher, Alexander ;
> Wentland, Harry ; Koenig, Christian
> 
> Subject: Re: [PATCH] drm/amdgpu: display/Kconfig: replace leading spaces
> with tab
>
> https://cgit.freedesktop.org/amd/drm-amd/
>
>
> This one has a long time with no update.
>
>
> On 2023/6/7 14:31, Sui Jingfeng wrote:
> > Hi,
> >
> > On 2023/6/7 03:15, Alex Deucher wrote:
> >> Applied.  Thanks!
> >
> > Where is the official branch of drm/amdgpu, I can't find it on the
> > internet.
> >
> > Sorry for asking this silly question.
>
> >
> >> Alex
> >>
> >> On Tue, Jun 6, 2023 at 9:33 AM Sui Jingfeng 
> >> wrote:
> >>> This patch replace the leading spaces with tab, make them keep
> >>> aligned with the rest of the config options. No functional change.
> >>>
> >>> Signed-off-by: Sui Jingfeng 
> >>> ---
> >>>   drivers/gpu/drm/amd/display/Kconfig | 17 +++--
> >>>   1 file changed, 7 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/display/Kconfig
> >>> b/drivers/gpu/drm/amd/display/Kconfig
> >>> index 2d8e55e29637..04ccfc70d583 100644
> >>> --- a/drivers/gpu/drm/amd/display/Kconfig
> >>> +++ b/drivers/gpu/drm/amd/display/Kconfig
> >>> @@ -42,16 +42,13 @@ config DEBUG_KERNEL_DC
> >>>Choose this option if you want to hit kdgb_break in assert.
> >>>
> >>>   config DRM_AMD_SECURE_DISPLAY
> >>> -bool "Enable secure display support"
> >>> -depends on DEBUG_FS
> >>> -depends on DRM_AMD_DC_FP
> >>> -help
> >>> -Choose this option if you want to
> >>> -support secure display
> >>> -
> >>> -This option enables the calculation
> >>> -of crc of specific region via debugfs.
> >>> -Cooperate with specific DMCU FW.
> >>> +   bool "Enable secure display support"
> >>> +   depends on DEBUG_FS
> >>> +   depends on DRM_AMD_DC_FP
> >>> +   help
> >>> + Choose this option if you want to support secure display
> >>>
> >>> + This option enables the calculation of crc of specific
> >>> region via
> >>> + debugfs. Cooperate with specific DMCU FW.
> >>>
> >>>   endmenu
> >>> --
> >>> 2.25.1
> >>>
> --
> Jingfeng



RE: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields

2023-05-31 Thread Chen, Guchun
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Ma
> Jun
> Sent: Wednesday, May 31, 2023 1:31 PM
> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig,
> Christian 
> Cc: Ma, Jun 
> Subject: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
>
> Remove redundant assignment code for ttm->caching as it's overwritten
>
> just a few lines later.

Please drop the blank line in above message. With it fixed, the patch is: 
Reviewed-by: Guchun Chen 

Regards,
Guchun

> v2:
>  - Update the commit message.
>
> Signed-off-by: Ma Jun 
> ---
>  drivers/gpu/drm/ttm/ttm_tt.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 02b812dacc5d..45a44544b656 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>  unsigned long extra_pages)
>  {
>   ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) +
> extra_pages;
> - ttm->caching = ttm_cached;
>   ttm->page_flags = page_flags;
>   ttm->dma_address = NULL;
>   ttm->swap_storage = NULL;
> --
> 2.34.1



RE: [PATCH 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit

2023-04-25 Thread Chen, Guchun
Looks you can drop macro 'AMDGPU_DEFAULT_GTT_SIZE_MB' as well.

Regards,
Guchun

> -Original Message-
> From: amd-gfx  On Behalf Of
> Mukul Joshi
> Sent: Wednesday, April 26, 2023 9:53 AM
> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Joshi, Mukul ; Kuehling, Felix
> ; Koenig, Christian 
> Subject: [PATCH 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit
> 
> Use the helper function in TTM to get TTM mem limit and set GTT size to be
> equal to TTL mem limit.
> 
> Signed-off-by: Mukul Joshi 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 25 ++---
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index ce34b73d05bc..ac220c779fc8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1807,26 +1807,13 @@ int amdgpu_ttm_init(struct amdgpu_device
> *adev)
>   DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
>(unsigned) (adev->gmc.real_vram_size / (1024 * 1024)));
> 
> - /* Compute GTT size, either based on 1/2 the size of RAM size
> -  * or whatever the user passed on module init */
> - if (amdgpu_gtt_size == -1) {
> - struct sysinfo si;
> -
> - si_meminfo();
> - /* Certain GL unit tests for large textures can cause problems
> -  * with the OOM killer since there is no way to link this
> memory
> -  * to a process.  This was originally mitigated (but not
> necessarily
> -  * eliminated) by limiting the GTT size.  The problem is this
> limit
> -  * is often too low for many modern games so just make the
> limit 1/2
> -  * of system memory which aligns with TTM. The OOM
> accounting needs
> -  * to be addressed, but we shouldn't prevent common 3D
> applications
> -  * from being usable just to potentially mitigate that corner
> case.
> -  */
> - gtt_size = max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
> -(u64)si.totalram * si.mem_unit / 2);
> - } else {
> + /* Compute GTT size, either based on TTM limit
> +  * or whatever the user passed on module init.
> +  */
> + if (amdgpu_gtt_size == -1)
> + gtt_size = ttm_tt_pages_limit() << PAGE_SHIFT;
> + else
>   gtt_size = (uint64_t)amdgpu_gtt_size << 20;
> - }
> 
>   /* Initialize GTT memory pool */
>   r = amdgpu_gtt_mgr_init(adev, gtt_size);
> --
> 2.35.1



RE: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-25 Thread Chen, Guchun
>From coding style's perspective, this lock/unlock handling should be put into 
>amdgpu_ctx_priority_override.

Regards,
Guchun

> -Original Message-
> From: amd-gfx  On Behalf Of Chia-
> I Wu
> Sent: Wednesday, April 26, 2023 8:48 AM
> To: dri-devel@lists.freedesktop.org
> Cc: Pan, Xinhui ; linux-ker...@vger.kernel.org;
> sta...@vger.kernel.org; amd-...@lists.freedesktop.org; Daniel Vetter
> ; Deucher, Alexander ;
> David Airlie ; Koenig, Christian
> 
> Subject: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED
> 
> Signed-off-by: Chia-I Wu 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> index e9b45089a28a6..863b2a34b2d64 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> @@ -38,6 +38,7 @@ static int
> amdgpu_sched_process_priority_override(struct amdgpu_device *adev,  {
>   struct fd f = fdget(fd);
>   struct amdgpu_fpriv *fpriv;
> + struct amdgpu_ctx_mgr *mgr;
>   struct amdgpu_ctx *ctx;
>   uint32_t id;
>   int r;
> @@ -51,8 +52,11 @@ static int
> amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
>   return r;
>   }
> 
> - idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id)
> + mgr = >ctx_mgr;
> + mutex_lock(>lock);
> + idr_for_each_entry(>ctx_handles, ctx, id)
>   amdgpu_ctx_priority_override(ctx, priority);
> + mutex_unlock(>lock);
> 
>   fdput(f);
>   return 0;
> --
> 2.40.0.634.g4ca3ef3211-goog



RE: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-25 Thread Chen, Guchun
After reviewing this whole history, maybe attached patch is able to fix your 
problem. Can you have a try please?

Regards,
Guchun

> -Original Message-
> From: amd-gfx  On Behalf Of
> Mikhail Gavrilov
> Sent: Tuesday, April 25, 2023 9:20 PM
> To: Koenig, Christian 
> Cc: Daniel Vetter ; dri-devel  de...@lists.freedesktop.org>; amd-gfx list ;
> Linux List Kernel Mailing 
> Subject: Re: BUG: KASAN: null-ptr-deref in
> drm_sched_job_cleanup+0x96/0x290 [gpu_sched]
> 
> On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov
>  wrote:
> >
> > Important don't give up.
> > https://youtu.be/25zhHBGIHJ8 [40 min]
> > https://youtu.be/utnDR26eYBY [50 min]
> > https://youtu.be/DJQ_tiimW6g [12 min]
> > https://youtu.be/Y6AH1oJKivA [6 min]
> > Yes the issue is everything reproducible, but time to time it not
> > happens at first attempt.
> > I also uploaded other videos which proves that the issue definitely
> > exists if someone will launch those games in turn.
> > Reproducibility is only a matter of time.
> >
> > Anyway I didn't want you to spend so much time trying to reproduce it.
> > This monkey business fits me more than you.
> > It would be better if I could collect more useful info.
> 
> Christian,
> Did you manage to reproduce the problem?
> 
> At the weekend I faced with slab-use-after-free in
> amdgpu_vm_handle_moved.
> I didn't play in the games at this time.
> The Xwayland process was affected so it leads to desktop hang.
> 
> 
> ==
> BUG: KASAN: slab-use-after-free in
> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] Read of size 8 at addr
> 888295c66190 by task Xwayland:cs0/173185
> 
> CPU: 21 PID: 173185 Comm: Xwayland:cs0 Tainted: GWL
> ---  ---  6.3.0-0.rc7.20230420gitcb0856346a60.59.fc39.x86_64+debug
> #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 4601 02/02/2023 Call Trace:
>  
>  dump_stack_lvl+0x76/0xd0
>  print_report+0xcf/0x670
>  ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]  ?
> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>  kasan_report+0xa8/0xe0
>  ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>  amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>  amdgpu_cs_ioctl+0x2b7e/0x5630 [amdgpu]
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]  ? mark_lock+0x101/0x16e0  ?
> __lock_acquire+0xe54/0x59f0  ? __pfx_lock_release+0x10/0x10  ?
> __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
>  drm_ioctl_kernel+0x1fc/0x3d0
>  ? __pfx_drm_ioctl_kernel+0x10/0x10
>  drm_ioctl+0x4c5/0xaa0
>  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]  ?
> __pfx_drm_ioctl+0x10/0x10  ? _raw_spin_unlock_irqrestore+0x66/0x80
>  ? lockdep_hardirqs_on+0x81/0x110
>  ? _raw_spin_unlock_irqrestore+0x4f/0x80
>  amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>  __x64_sys_ioctl+0x131/0x1a0
>  do_syscall_64+0x60/0x90
>  ? do_syscall_64+0x6c/0x90
>  ? lockdep_hardirqs_on+0x81/0x110
>  ? do_syscall_64+0x6c/0x90
>  ? lockdep_hardirqs_on+0x81/0x110
>  ? do_syscall_64+0x6c/0x90
>  ? lockdep_hardirqs_on+0x81/0x110
>  ? do_syscall_64+0x6c/0x90
>  ? lockdep_hardirqs_on+0x81/0x110
>  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7ffb71b0892d
> Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00
> 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00
> f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> RSP: 002b:7ffb677fe840 EFLAGS: 0246 ORIG_RAX:
> 0010
> RAX: ffda RBX: 7ffb677fe9f8 RCX: 7ffb71b0892d
> RDX: 7ffb677fe900 RSI: c0186444 RDI: 000d
> RBP: 7ffb677fe890 R08: 7ffb677fea50 R09: 7ffb677fe8e0
> R10: 556c4611bec0 R11: 0246 R12: 7ffb677fe900
> R13: c0186444 R14: 000d R15: 7ffb677fe9f8
> 
> 
> Allocated by task 173181:
>  kasan_save_stack+0x33/0x60
>  kasan_set_track+0x25/0x30
>  __kasan_kmalloc+0x8f/0xa0
>  __kmalloc_node+0x65/0x160
>  amdgpu_bo_create+0x31e/0xfb0 [amdgpu]
>  amdgpu_bo_create_user+0xca/0x160 [amdgpu]
>  amdgpu_gem_create_ioctl+0x398/0x980 [amdgpu]
>  drm_ioctl_kernel+0x1fc/0x3d0
>  drm_ioctl+0x4c5/0xaa0
>  amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>  __x64_sys_ioctl+0x131/0x1a0
>  do_syscall_64+0x60/0x90
>  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Freed by task 173185:
>  kasan_save_stack+0x33/0x60
>  kasan_set_track+0x25/0x30
>  kasan_save_free_info+0x2e/0x50
>  __kasan_slab_free+0x10b/0x1a0
>  slab_free_freelist_hook+0x11e/0x1d0
>  __kmem_cache_free+0xc0/0x2e0
>  ttm_bo_release+0x667/0x9e0 [ttm]
>  amdgpu_bo_unref+0x35/0x70 [amdgpu]
>  amdgpu_gem_object_free+0x73/0xb0 [amdgpu]
>  drm_gem_handle_delete+0xe3/0x150
>  drm_ioctl_kernel+0x1fc/0x3d0
>  drm_ioctl+0x4c5/0xaa0
>  amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>  __x64_sys_ioctl+0x131/0x1a0
>  do_syscall_64+0x60/0x90
>  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Last potentially related work creation:
>  kasan_save_stack+0x33/0x60
>  __kasan_record_aux_stack+0x97/0xb0
>  

RE: [PATCH v3 2/2] drm/probe_helper: warning on poll_enabled for issue catching

2023-03-10 Thread Chen, Guchun


> -Original Message-
> From: Jani Nikula 
> Sent: Friday, March 10, 2023 8:05 PM
> To: Chen, Guchun ; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Deucher,
> Alexander ; Zhang, Hawking
> ; dmitry.barysh...@linaro.org;
> spassw...@web.de; m...@fireburn.co.uk
> Cc: Chen, Guchun 
> Subject: Re: [PATCH v3 2/2] drm/probe_helper: warning on poll_enabled for
> issue catching
> 
> On Fri, 10 Mar 2023, Guchun Chen  wrote:
> > In order to catch issues in other drivers to ensure proper call
> > sequence of polling function.
> >
> > v2: drop Fixes tag in commit message
> >
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411
> > Reported-by: Bert Karwatzki 
> > Suggested-by: Dmitry Baryshkov 
> > Signed-off-by: Guchun Chen 
> > ---
> >  drivers/gpu/drm/drm_probe_helper.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_probe_helper.c
> > b/drivers/gpu/drm/drm_probe_helper.c
> > index 8127be134c39..85e0e80d4a52 100644
> > --- a/drivers/gpu/drm/drm_probe_helper.c
> > +++ b/drivers/gpu/drm/drm_probe_helper.c
> > @@ -852,6 +852,8 @@
> EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
> >   */
> >  void drm_kms_helper_poll_disable(struct drm_device *dev)  {
> > +   WARN_ON(!dev->mode_config.poll_enabled);
> 
> Please address all previous review comments [1].

Sorry for missing your previous review email. Will address it in next patch set.

Regards,
Guchun

> BR,
> Jani.
> 
> 
> [1] https://lore.kernel.org/r/87o7p3bde6@intel.com
> 
> 
> > +
> > if (dev->mode_config.poll_running)
> > drm_kms_helper_disable_hpd(dev);
> 
> --
> Jani Nikula, Intel Open Source Graphics Center


RE: [PATCH] drm/amdgpu: resove reboot exception for si oland

2023-03-10 Thread Chen, Guchun


> -Original Message-
> From: amd-gfx  On Behalf Of
> Zhenneng Li
> Sent: Friday, March 10, 2023 3:40 PM
> To: Deucher, Alexander 
> Cc: David Airlie ; Pan, Xinhui ;
> linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; Zhenneng Li
> ; amd-...@lists.freedesktop.org; Daniel Vetter
> ; Koenig, Christian 
> Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland
> 
> During reboot test on arm64 platform, it may failure on boot.
> 
> The error message are as follows:
> [6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> *ERROR*
>   late_init of IP block  failed -22
> [7.006919][ 7] [  T295] amdgpu :04:00.0: amdgpu_device_ip_late_init
> failed
> [7.014224][ 7] [  T295] amdgpu :04:00.0: Fatal error during GPU init
> ---
>  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index d6d9e3b1b2c0..dee51c757ac0 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
>   if (!adev->pm.dpm_enabled)
>   return 0;
> 
> - ret = si_set_temperature_range(adev);
> - if (ret)
> - return ret;

si_set_temperature_range should be platform agnostic. Can you please elaborate 
more?

Regards,
Guchun

>  #if 0 //TODO ?
>   si_dpm_powergate_uvd(adev, true);
>  #endif
> --
> 2.25.1



RE: [PATCH 2/2] drm/probe_helper: warning on poll_enabled for issue catching

2023-03-09 Thread Chen, Guchun


> -Original Message-
> From: Dmitry Baryshkov 
> Sent: Thursday, March 9, 2023 4:49 PM
> To: Chen, Guchun ; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Deucher,
> Alexander ; Zhang, Hawking
> ; spassw...@web.de; m...@fireburn.co.uk
> Subject: Re: [PATCH 2/2] drm/probe_helper: warning on poll_enabled for
> issue catching
> 
> On 09/03/2023 07:48, Guchun Chen wrote:
> > In order to catch issues in other drivers to ensure proper call
> > sequence of polling function.
> >
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411
> > Fixes: a4e771729a51("drm/probe_helper: sort out poll_running vs
> > poll_enabled")
> 
> Previously it was suggested that this is not a fix, so the Fixes header is
> incorrect.
> 
> Also please use -vN when preparing/sending patchsets. This is v2.

Will fix these in V3.
 
Regards,
Guchun

> > Reported-by: Bert Karwatzki 
> > Suggested-by: Dmitry Baryshkov 
> > Signed-off-by: Guchun Chen 
> > ---
> >   drivers/gpu/drm/drm_probe_helper.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_probe_helper.c
> > b/drivers/gpu/drm/drm_probe_helper.c
> > index 8127be134c39..85e0e80d4a52 100644
> > --- a/drivers/gpu/drm/drm_probe_helper.c
> > +++ b/drivers/gpu/drm/drm_probe_helper.c
> > @@ -852,6 +852,8 @@
> EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
> >*/
> >   void drm_kms_helper_poll_disable(struct drm_device *dev)
> >   {
> > +   WARN_ON(!dev->mode_config.poll_enabled);
> > +
> > if (dev->mode_config.poll_running)
> > drm_kms_helper_disable_hpd(dev);
> >
> 
> --
> With best wishes
> Dmitry



RE: [PATCH 1/2] drm/amdgpu: add flag to enable/disable poll in suspend/resume path

2023-03-08 Thread Chen, Guchun
Relying on dc_enabled will be more simple, thanks for your suggestion. I will 
send v2 to address this.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, March 9, 2023 12:29 AM
To: Chen, Guchun 
Cc: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
dmitry.barysh...@linaro.org; spassw...@web.de; Deucher, Alexander 
; Zhang, Hawking 
Subject: Re: [PATCH 1/2] drm/amdgpu: add flag to enable/disable poll in 
suspend/resume path

On Wed, Mar 8, 2023 at 7:17 AM Guchun Chen  wrote:
>
> Some amd asics having reliable hotplug support don't call 
> drm_kms_helper_poll_init in driver init sequence. However, due to the 
> unified suspend/resume path for all asics, because the 
> output_poll_work->func is not set for these asics, a warning arrives 
> when suspending.
>
> [   90.656049]  
> [   90.656050]  ? console_unlock+0x4d/0x100
> [   90.656053]  ? __irq_work_queue_local+0x27/0x60
> [   90.656056]  ? irq_work_queue+0x2b/0x50
> [   90.656057]  ? __wake_up_klogd+0x40/0x60
> [   90.656059]  __cancel_work_timer+0xed/0x180
> [   90.656061]  drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
> [   90.656072]  amdgpu_device_suspend+0x81/0x170 [amdgpu]
> [   90.656180]  amdgpu_pmops_runtime_suspend+0xb5/0x1b0 [amdgpu]
> [   90.656269]  pci_pm_runtime_suspend+0x61/0x1b0
>
> So add use_kms_poll flag as the initialization check in amdgpu code 
> before calling drm_kms_helper_poll_disable/drm_kms_helper_poll_enable 
> in suspend/resume path.
>
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411
> Fixes: a4e771729a51("drm/probe_helper: sort out poll_running vs 
> poll_enabled")
> Reported-by: Bert Karwatzki 
> Suggested-by: Dmitry Baryshkov 
> Signed-off-by: Guchun Chen 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h   | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c   | 1 +
>  drivers/gpu/drm/amd/amdgpu/dce_v10_0.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/dce_v11_0.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 1 +
>  drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 1 +
>  7 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c4a4e2fe6681..74af0b8c0d08 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4145,7 +4145,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
> if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
> DRM_WARN("smart shift update failed\n");
>
> -   drm_kms_helper_poll_disable(dev);
> +   if (adev->mode_info.use_kms_poll)
> +   drm_kms_helper_poll_disable(dev);
>
> if (fbcon)
> 
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true); @@ 
> -4243,7 +4244,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
> if (fbcon)
> 
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, 
> false);
>
> -   drm_kms_helper_poll_enable(dev);
> +   if (adev->mode_info.use_kms_poll)
> +   drm_kms_helper_poll_enable(dev);
>

Since polling is only enabled for analog outputs and DC doesn't support any 
analog outputs, I think we can simplify this to

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c4a4e2fe6681..74af0b8c0d08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4145,7 +4145,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
  if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
  DRM_WARN("smart shift update failed\n");

- drm_kms_helper_poll_disable(dev);
+ if (!adev->dc_enabled)
+ drm_kms_helper_poll_disable(dev);

  if (fbcon)
  drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true); @@ 
-4243,7 +4244,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
  if (fbcon)
  drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, false);

- drm_kms_helper_poll_enable(dev);
+ if (!adev->dc_enabled)
+ drm_kms_helper_poll_enable(dev);

  amdgpu_ras_resume(adev);

Alternatively, we could also just move drm_kms_helper_poll_disable() into 
amdgpu_display_suspend_helper() and drm_kms_helper_poll_enable() into 
amdgpu_display_resume_helper(), but I'm not sure if the ordering here is 
important or not off hand.

Alex



> amdgpu_ras_resume(adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
> index 32fe05c81

RE: [PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Chen, Guchun
Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Guilherme G. Piccoli  
Sent: Thursday, February 2, 2023 12:48 AM
To: amd-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org; Deucher, Alexander 
; Koenig, Christian ; Pan, 
Xinhui ; ker...@gpiccoli.net; kernel-...@igalia.com; 
Guilherme G. Piccoli ; Chen, Guchun ; 
Tuikov, Luben ; Limonciello, Mario 

Subject: [PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched 
init/fini

Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - 
such function is expected to be called only after the respective init function 
- drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck recently, and 
the function drm_sched_fini() was called even without its counter-part had been 
previously called, causing the following oops:

amdgpu: probe of :04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0090 PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace:
 
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a given 
ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest 
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such 
field - in the above oops for example, it was a GFX ring causing the crash, and 
the sched.ready field was set to true in the ring init routine, regardless of 
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per 
Christian's suggestion [0].

[0] 
https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb...@amd.com/

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)")
Suggested-by: Christian König 
Cc: Guchun Chen 
Cc: Luben Tuikov 
Cc: Mario Limonciello 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 00444203220d..3b962cb680a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
*adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler)
+   /*
+* Notice we check for sched.ops since there's some
+* override on the meaning of sched.ready by amdgpu.
+* The natural check would be sched.ready, which is
+* set as drm_sched_init() finishes...
+*/
+   if (!ring->no_scheduler && ring->sched.ops)
drm_sched_fini(>sched);
 
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
--
2.39.0



RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Chen, Guchun
Hi Christian,

Do you think if it makes sense that we can set 'ring->sched.ready' to be true 
in each ring init, even if before executing/setting up drm_sched_init in 
amdgpu_device_init_schedulers? As 'ready' is a member of gpu scheduler 
structure.

Regards,
Guchun

-Original Message-
From: Koenig, Christian  
Sent: Tuesday, January 31, 2023 6:59 PM
To: Chen, Guchun ; Alex Deucher ; 
Guilherme G. Piccoli 
Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui 
; dri-devel@lists.freedesktop.org; Tuikov, Luben 
; Limonciello, Mario ; 
kernel-...@igalia.com; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched 
init/fini

Am 31.01.23 um 10:17 schrieb Chen, Guchun:
> Hi Piccoli,
>
> Please ignore my request of full dmesg log. I can reproduce the issue and get 
> the same failure callstack by returning early with an error code prior to 
> amdgpu_device_init_schedulers.
>
> Regards,
> Guchun
>
> -----Original Message-
> From: Chen, Guchun
> Sent: Tuesday, January 31, 2023 2:37 PM
> To: Alex Deucher ; Guilherme G. Piccoli 
> 
> Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui 
> ; dri-devel@lists.freedesktop.org; Tuikov, Luben 
> ; Limonciello, Mario 
> ; kernel-...@igalia.com; Deucher, Alexander 
> ; Koenig, Christian 
> 
> Subject: RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching 
> drm_sched init/fini
>
> Hi Piccoli,
>
> I agree with Alex's point, using ring->sched.name for such check is not a 
> good way. BTW, can you please attach a full dmesg long in bad case to help me 
> understand more?
>
> Regards,
> Guchun
>
> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, January 31, 2023 6:30 AM
> To: Guilherme G. Piccoli 
> Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun 
> ; Pan, Xinhui ; 
> dri-devel@lists.freedesktop.org; Tuikov, Luben ; 
> Limonciello, Mario ; kernel-...@igalia.com; 
> Deucher, Alexander ; Koenig, Christian 
> 
> Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching 
> drm_sched init/fini
>
> On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli  
> wrote:
>> + Luben
>>
>> (sorry, missed that in the first submission).
>>
>> On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
>>> Currently amdgpu calls drm_sched_fini() from the fence driver sw 
>>> fini routine - such function is expected to be called only after the 
>>> respective init function - drm_sched_init() - was executed successfully.
>>>
>>> Happens that we faced a driver probe failure in the Steam Deck 
>>> recently, and the function drm_sched_fini() was called even without 
>>> its counter-part had been previously called, causing the following oops:
>>>
>>> amdgpu: probe of :04:00.0 failed with error -110
>>> BUG: kernel NULL pointer dereference, address: 0090 PGD
>>> 0 P4D 0
>>> Oops: 0002 [#1] PREEMPT SMP NOPTI
>>> CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli
>>> #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
>>> RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace:
>>>   
>>>   amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
>>>   amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
>>>   amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
>>>   devm_drm_dev_init_release+0x49/0x70
>>>   [...]
>>>
>>> To prevent that, check if the drm_sched was properly initialized for 
>>> a given ring before calling its fini counter-part.
>>>
>>> Notice ideally we'd use sched.ready for that; such field is set as 
>>> the latest thing on drm_sched_init(). But amdgpu seems to "override"
>>> the meaning of such field - in the above oops for example, it was a 
>>> GFX ring causing the crash, and the sched.ready field was set to 
>>> true in the ring init routine, regardless of the state of the DRM 
>>> scheduler. Hence, we ended-up using another sched field.
>>>>> Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence 
>>>>> driver fini in s3 test (v2)")
>>> Cc: Andrey Grodzovsky 
>>> Cc: Guchun Chen 
>>> Cc: Mario Limonciello 
>>> Signed-off-by: Guilherme G. Piccoli 
>>> ---
>>>
>>>
>>> Hi folks, first of all thanks in advance for reviews / comments!
>>> Notice that I've used the Fixes tag more in the sense to bring it to 
>>> stable, I didn't find a good patch candidate that added the call to 
>>> drm_sched_fini(), was reaching way too old commits...

RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Chen, Guchun
Hi Piccoli,

Please ignore my request of full dmesg log. I can reproduce the issue and get 
the same failure callstack by returning early with an error code prior to 
amdgpu_device_init_schedulers.

Regards,
Guchun

-Original Message-
From: Chen, Guchun 
Sent: Tuesday, January 31, 2023 2:37 PM
To: Alex Deucher ; Guilherme G. Piccoli 

Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui 
; dri-devel@lists.freedesktop.org; Tuikov, Luben 
; Limonciello, Mario ; 
kernel-...@igalia.com; Deucher, Alexander ; Koenig, 
Christian 
Subject: RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched 
init/fini

Hi Piccoli,

I agree with Alex's point, using ring->sched.name for such check is not a good 
way. BTW, can you please attach a full dmesg long in bad case to help me 
understand more?

Regards,
Guchun

-Original Message-
From: Alex Deucher 
Sent: Tuesday, January 31, 2023 6:30 AM
To: Guilherme G. Piccoli 
Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun 
; Pan, Xinhui ; 
dri-devel@lists.freedesktop.org; Tuikov, Luben ; 
Limonciello, Mario ; kernel-...@igalia.com; Deucher, 
Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched 
init/fini

On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli  
wrote:
>
> + Luben
>
> (sorry, missed that in the first submission).
>
> On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
> > Currently amdgpu calls drm_sched_fini() from the fence driver sw 
> > fini routine - such function is expected to be called only after the 
> > respective init function - drm_sched_init() - was executed successfully.
> >
> > Happens that we faced a driver probe failure in the Steam Deck 
> > recently, and the function drm_sched_fini() was called even without 
> > its counter-part had been previously called, causing the following oops:
> >
> > amdgpu: probe of :04:00.0 failed with error -110
> > BUG: kernel NULL pointer dereference, address: 0090 PGD
> > 0 P4D 0
> > Oops: 0002 [#1] PREEMPT SMP NOPTI
> > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli
> > #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
> > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace:
> >  
> >  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
> >  amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
> >  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
> >  devm_drm_dev_init_release+0x49/0x70
> >  [...]
> >
> > To prevent that, check if the drm_sched was properly initialized for 
> > a given ring before calling its fini counter-part.
> >
> > Notice ideally we'd use sched.ready for that; such field is set as 
> > the latest thing on drm_sched_init(). But amdgpu seems to "override"
> > the meaning of such field - in the above oops for example, it was a 
> > GFX ring causing the crash, and the sched.ready field was set to 
> > true in the ring init routine, regardless of the state of the DRM 
> > scheduler. Hence, we ended-up using another sched field.
> >> > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence 
> >> > driver fini in s3 test (v2)")
> > Cc: Andrey Grodzovsky 
> > Cc: Guchun Chen 
> > Cc: Mario Limonciello 
> > Signed-off-by: Guilherme G. Piccoli 
> > ---
> >
> >
> > Hi folks, first of all thanks in advance for reviews / comments!
> > Notice that I've used the Fixes tag more in the sense to bring it to 
> > stable, I didn't find a good patch candidate that added the call to 
> > drm_sched_fini(), was reaching way too old commits...so
> > 067f44c8b459 seems a good candidate - or maybe not?
> >
> > Now, with regards sched.ready, spent a bit of time to figure what 
> > was happening...would be feasible maybe to stop using that to mark 
> > some kind ring status? I think it should be possible to add a flag 
> > to the ring structure for that, and free sched.ready from being 
> > manipulate by the amdgpu driver, what's your thoughts on that?

It's been a while, but IIRC, we used to have a ring->ready field in the driver 
which at some point got migrated out of the driver into the GPU scheduler and 
the driver side code never got cleaned up.  I think we should probably just 
drop the driver messing with that field and leave it up to the drm scheduler.

Alex


> >
> > I could try myself, but first of course I'd like to raise the 
> > "temperature" on this topic and check if somebody is already working 
> > on that.
> >
> > Cheers,
> >
> > Guilherme
> >
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-

RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Chen, Guchun
Hi Piccoli,

I agree with Alex's point, using ring->sched.name for such check is not a good 
way. BTW, can you please attach a full dmesg long in bad case to help me 
understand more?

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Tuesday, January 31, 2023 6:30 AM
To: Guilherme G. Piccoli 
Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun 
; Pan, Xinhui ; 
dri-devel@lists.freedesktop.org; Tuikov, Luben ; 
Limonciello, Mario ; kernel-...@igalia.com; Deucher, 
Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched 
init/fini

On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli  
wrote:
>
> + Luben
>
> (sorry, missed that in the first submission).
>
> On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
> > Currently amdgpu calls drm_sched_fini() from the fence driver sw 
> > fini routine - such function is expected to be called only after the 
> > respective init function - drm_sched_init() - was executed successfully.
> >
> > Happens that we faced a driver probe failure in the Steam Deck 
> > recently, and the function drm_sched_fini() was called even without 
> > its counter-part had been previously called, causing the following oops:
> >
> > amdgpu: probe of :04:00.0 failed with error -110
> > BUG: kernel NULL pointer dereference, address: 0090 PGD 
> > 0 P4D 0
> > Oops: 0002 [#1] PREEMPT SMP NOPTI
> > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli 
> > #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
> > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace:
> >  
> >  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
> >  amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
> >  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
> >  devm_drm_dev_init_release+0x49/0x70
> >  [...]
> >
> > To prevent that, check if the drm_sched was properly initialized for 
> > a given ring before calling its fini counter-part.
> >
> > Notice ideally we'd use sched.ready for that; such field is set as 
> > the latest thing on drm_sched_init(). But amdgpu seems to "override" 
> > the meaning of such field - in the above oops for example, it was a 
> > GFX ring causing the crash, and the sched.ready field was set to 
> > true in the ring init routine, regardless of the state of the DRM 
> > scheduler. Hence, we ended-up using another sched field.
> >> > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence 
> >> > driver fini in s3 test (v2)")
> > Cc: Andrey Grodzovsky 
> > Cc: Guchun Chen 
> > Cc: Mario Limonciello 
> > Signed-off-by: Guilherme G. Piccoli 
> > ---
> >
> >
> > Hi folks, first of all thanks in advance for reviews / comments!
> > Notice that I've used the Fixes tag more in the sense to bring it to 
> > stable, I didn't find a good patch candidate that added the call to 
> > drm_sched_fini(), was reaching way too old commits...so
> > 067f44c8b459 seems a good candidate - or maybe not?
> >
> > Now, with regards sched.ready, spent a bit of time to figure what 
> > was happening...would be feasible maybe to stop using that to mark 
> > some kind ring status? I think it should be possible to add a flag 
> > to the ring structure for that, and free sched.ready from being 
> > manipulate by the amdgpu driver, what's your thoughts on that?

It's been a while, but IIRC, we used to have a ring->ready field in the driver 
which at some point got migrated out of the driver into the GPU scheduler and 
the driver side code never got cleaned up.  I think we should probably just 
drop the driver messing with that field and leave it up to the drm scheduler.

Alex


> >
> > I could try myself, but first of course I'd like to raise the 
> > "temperature" on this topic and check if somebody is already working 
> > on that.
> >
> > Cheers,
> >
> > Guilherme
> >
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 00444203220d..e154eb8241fb 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
> > *adev)
> >   if (!ring || !ring->fence_drv.initialized)
> >   continue;
> >
> > - if (!ring->no_scheduler)
> > + /*
> > +  * Notice we check for sched.name since there's some
> > +  * override on the meaning of sched.ready by amdgpu.
> > +  * The natural check would be sched.ready, which is
> > +  * set as drm_sched_init() finishes...
> > +  */
> > + if (!ring->no_scheduler && ring->sched.name)
> >   drm_sched_fini(>sched);
> >
> >   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)


RE: [pull] amdgpu, amdkfd drm-fixes-6.1

2022-10-26 Thread Chen, Guchun
Hello Alex,

Regarding below patch, I guess we need to pick "8eb402f16d5b drm/amdgpu: Fix 
uninitialized warning in mmhub_v2_0_get_clockgating()" together, otherwise, 
build will possibly fail. Is it true?

 " Lijo Lazar (1): 
  drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x"

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, October 27, 2022 10:41 AM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
airl...@gmail.com; daniel.vet...@ffwll.ch
Cc: Deucher, Alexander 
Subject: [pull] amdgpu, amdkfd drm-fixes-6.1

Hi Dave, Daniel,

Fixes for 6.1.  Fixes for new IPs and misc other fixes.

The following changes since commit cbc543c59e8e7c8bc8604d6ac3e18a029e3d5118:

  Merge tag 'drm-misc-fixes-2022-10-20' of 
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes (2022-10-21 09:56:14 
+1000)

are available in the Git repository at:

  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux.gitdata=05%7C01%7Cguchun.chen%40amd.com%7C6bbe7e42eb3d43bf622208dab7c4c906%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638024353059986195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Y%2BU1OrPyhCaS44nGQMTrtqBpdkcJwFdFJEAaqWGiaqo%3Dreserved=0
 tags/amd-drm-fixes-6.1-2022-10-26-1

for you to fetch changes up to d61e1d1d5225a9baeb995bcbdb904f66f70ed87e:

  drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume 
(2022-10-26 17:48:43 -0400)


amd-drm-fixes-6.1-2022-10-26-1:

amdgpu:
- Stable pstate fix
- SMU 13.x updates
- SR-IOV fixes
- PCI AER fix
- GC 11.x fixes
- Display fixes
- Expose IMU firmware version for debugging
- Plane modifier fix
- S0i3 fix

amdkfd:
- Fix possible memory leak
- Fix GC 10.x cache info reporting

UAPI:
- Expose IMU firmware version via existing INFO firmware query


Alvin Lee (1):
  drm/amd/display: Don't return false if no stream

Chengming Gui (1):
  drm/amdgpu: fix pstate setting issue

David Francis (1):
  drm/amd: Add IMU fw version to fw version queries

Jesse Zhang (1):
  drm/amdkfd: correct the cache info for gfx1036

Joaquín Ignacio Aramendía (1):
  drm/amd/display: Revert logic for plane modifiers

Kenneth Feng (2):
  drm/amd/pm: update driver-if header for smu_v13_0_10
  drm/amd/pm: allow gfxoff on gc_11_0_3

Lijo Lazar (1):
  drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x

Prike Liang (2):
  drm/amdkfd: update gfx1037 Lx cache setting
  drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume

Rafael Mendonca (1):
  drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr()

Rodrigo Siqueira (1):
  drm/amd/display: Remove wrong pipe control lock

Yiqing Yao (1):
  drm/amdgpu: Adjust MES polling timeout for sriov

YuBiao Wang (1):
  drm/amdgpu: skip mes self test for gc 11.0.3 in recover

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  18 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  13 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h|   1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c |   1 +
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c |   9 +-
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c|  28 ++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c  | 106 +++-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c|  50 ++
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c |  12 +--
 .../amd/display/dc/dcn32/dcn32_resource_helpers.c  |   2 +-
 .../pm/swsmu/inc/pmfw_if/smu13_driver_if_v13_0_0.h | 111 +++--
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_v13_0.h   |   2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c |   7 +-
 include/uapi/drm/amdgpu_drm.h  |   2 +
 18 files changed, 259 insertions(+), 119 deletions(-)


RE: [PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap

2022-09-21 Thread Chen, Guchun
Perhaps you need to update the prefix of patch subject to 'drm/amd/pm: check 
return value ...'.

With above addressed, it's: Acked-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Li Zhong  
Sent: Thursday, September 22, 2022 9:27 AM
To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org
Cc: jiapeng.ch...@linux.alibaba.com; Powell, Darren ; 
Chen, Guchun ; Limonciello, Mario 
; Quan, Evan ; Lazar, Lijo 
; dan...@ffwll.ch; airl...@linux.ie; Pan, Xinhui 
; Koenig, Christian ; Deucher, 
Alexander ; Li Zhong 
Subject: [PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap

amdgpu_bo_kmap() returns error when fails to map buffer object. Add the error 
check and propagate the error.

Signed-off-by: Li Zhong 
---
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 1eb4e613b27a..ec055858eb95 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)  {
struct pp_hwmgr *hwmgr = handle;
struct amdgpu_device *adev = hwmgr->adev;
+   int err;
 
if (!addr || !size)
return -EINVAL;
@@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
*addr = NULL;
*size = 0;
if (adev->pm.smu_prv_buffer) {
-   amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   if (err)
+   return err;
*size = adev->pm.smu_prv_buffer_size;
}
 
--
2.25.1



RE: [PATCH] drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl

2022-07-05 Thread Chen, Guchun
Hi Alex,

I think we need to revert this patch on amd-staging-drm-next branch, as its 
base commit like " drm/amdgpu: remove GTT accounting v2" does not present on 
5.16. Instead, the series is part of upcoming 5.18 based amd-staging-drm-next 
branch. Otherwise, incorrect GTT size reporting switched from page to bytes 
will crash several vulkan APPs.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Saturday, June 11, 2022 12:01 AM
To: Michel Dänzer 
Cc: Deucher, Alexander ; Pan, Xinhui 
; amd-gfx list ; Koenig, 
Christian ; Maling list - DRI developers 

Subject: Re: [PATCH] drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl

Applied.  Thanks!

Alex

On Fri, Jun 10, 2022 at 10:01 AM Michel Dänzer  wrote:
>
> From: Michel Dänzer 
>
> The commit below changed the TTM manager size unit from pages to 
> bytes, but failed to adjust the corresponding calculations in 
> amdgpu_ioctl.
>
> Fixes: dfa714b88eb0 ("drm/amdgpu: remove GTT accounting v2")
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1930data=05%7C01%7C
> guchun.chen%40amd.com%7C28ed180d765c4588474008da4afa68e1%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637904736611555668%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C3000%7C%7C%7Csdata=%2Bmr%2BJWj5q%2BfB04L4hmNSG%2BYpfhny6YayNV
> gt2xty6bo%3Dreserved=0
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F6642data=05%7C01%
> 7Cguchun.chen%40amd.com%7C28ed180d765c4588474008da4afa68e1%7C3dd8961fe
> 4884e608e11a82d994e183d%7C0%7C0%7C637904736611555668%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000%7C%7C%7Csdata=yN1jFKsffHu2Ik2crsrRxGBxCRylXckSj9zILxTZ
> QzE%3Dreserved=0
> Signed-off-by: Michel Dänzer 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 801f6fa692e9..6de63ea6687e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -642,7 +642,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
> struct drm_file *filp)
> atomic64_read(>visible_pin_size),
> vram_gtt.vram_size);
> vram_gtt.gtt_size = ttm_manager_type(>mman.bdev, 
> TTM_PL_TT)->size;
> -   vram_gtt.gtt_size *= PAGE_SIZE;
> vram_gtt.gtt_size -= atomic64_read(>gart_pin_size);
> return copy_to_user(out, _gtt,
> min((size_t)size, 
> sizeof(vram_gtt))) ? -EFAULT : 0; @@ -675,7 +674,6 @@ int 
> amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> mem.cpu_accessible_vram.usable_heap_size * 3 / 
> 4;
>
> mem.gtt.total_heap_size = gtt_man->size;
> -   mem.gtt.total_heap_size *= PAGE_SIZE;
> mem.gtt.usable_heap_size = mem.gtt.total_heap_size -
> atomic64_read(>gart_pin_size);
> mem.gtt.heap_usage = 
> ttm_resource_manager_usage(gtt_man);
> --
> 2.36.1
>


RE: [PATCH -next 1/2 v2] drm/amdgpu: remove unneeded semicolon

2022-01-13 Thread Chen, Guchun
Series is:
Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Yang Li  
Sent: Thursday, January 13, 2022 3:12 PM
To: airl...@linux.ie; Chen, Guchun 
Cc: dan...@ffwll.ch; Deucher, Alexander ; Koenig, 
Christian ; Pan, Xinhui ; 
amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; Yang Li ; Abaci Robot 

Subject: [PATCH -next 1/2 v2] drm/amdgpu: remove unneeded semicolon

Eliminate the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d4d9b9ea8bbd..ff9bd5a844fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2722,7 +2722,7 @@ struct amdgpu_ras* amdgpu_ras_get_context(struct 
amdgpu_device *adev)  int amdgpu_ras_set_context(struct amdgpu_device *adev, 
struct amdgpu_ras* ras_con)  {
if (!adev)
-   return -EINVAL;;
+   return -EINVAL;
 
adev->psp.ras_context.ras = ras_con;
return 0;
--
2.20.1.7.g153144c



RE: [PATCH -next 1/2] drm/amdgpu: remove unneeded semicolon

2022-01-12 Thread Chen, Guchun
Thanks for your patch, Yang. Can you pls also fix the original indentation 
problem as well?

if (!adev)
-   return -EINVAL;;
+   return -EINVAL;

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Yang Li
Sent: Thursday, January 13, 2022 9:22 AM
To: airl...@linux.ie
Cc: Pan, Xinhui ; Abaci Robot ; 
linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; Yang Li 
; amd-...@lists.freedesktop.org; dan...@ffwll.ch; 
Deucher, Alexander ; Koenig, Christian 

Subject: [PATCH -next 1/2] drm/amdgpu: remove unneeded semicolon

Eliminate the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d4d9b9ea8bbd..7d9d99e581da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2722,7 +2722,7 @@ struct amdgpu_ras* amdgpu_ras_get_context(struct 
amdgpu_device *adev)  int amdgpu_ras_set_context(struct amdgpu_device *adev, 
struct amdgpu_ras* ras_con)  {
if (!adev)
-   return -EINVAL;;
+   return -EINVAL;
 
adev->psp.ras_context.ras = ras_con;
return 0;
--
2.20.1.7.g153144c



RE: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2022-01-11 Thread Chen, Guchun
[Public]

Hi Christian,

My BAD, I checked that discussion history of this just now. So If I read it 
correctly, the double check at a different place to skip evict is: " drm/ttm: 
Double check mem_type of BO while eviction"? It is in 5.16 kernel.

Regards,
Guchun

-Original Message-
From: Christian König  
Sent: Tuesday, January 11, 2022 7:27 PM
To: Chen, Guchun ; Pan, Xinhui ; 
Koenig, Christian ; amd-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Subject: Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

IIRC we have completely dropped this patch in favor of a check at a different 
place.

Regards,
Christian.

Am 11.01.22 um 09:47 schrieb Chen, Guchun:
> [Public]
>
> Hi Christian,
>
> Looks this patch still missed in 5.16 kernel. Is it intentional?
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2
> Ftree%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_bo.c%3Fh%3Dv5.16data=04%7
> C01%7CGuchun.Chen%40amd.com%7Cf3b7f4971dc8405b0c2908d9d4f55547%7C3dd89
> 61fe4884e608e11a82d994e183d%7C0%7C0%7C637774972434004088%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C3000sdata=vbuBPHO40J2HGt7abzfzC0nC1DQa62qal5S6TXBRj4w%3
> Dreserved=0
>
> Regards,
> Guchun
>
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Pan, Xinhui
> Sent: Tuesday, November 9, 2021 9:16 PM
> To: Koenig, Christian ; 
> amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Subject: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru 
> list
>
> [AMD Official Use Only]
>
> [AMD Official Use Only]
>
> Actually this patch does not totally fix the mismatch of lru list with 
> mem_type as mem_type is changed in ->move() and lru list is changed after 
> that.
>
> During this small period, another eviction could still happed and evict this 
> mismatched BO from sMam(say, its lru list is on vram domain) to sMem.
> 
> 发件人: Pan, Xinhui 
> 发送时间: 2021年11月9日 21:05
> 收件人: Koenig, Christian; amd-...@lists.freedesktop.org
> 抄送: dri-devel@lists.freedesktop.org
> 主题: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list
>
> Yes, a stable tag is needed. vulkan guys say 5.14 hit this issue too.
>
> I think that amdgpu_bo_move() does support copy from sysMem to sysMem 
> correctly.
> maybe something below is needed.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c83ef42ca702..aa63ae7ddf1e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -485,7 +485,8 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
> bool evict,
>  }
>  if (old_mem->mem_type == TTM_PL_SYSTEM &&
>  (new_mem->mem_type == TTM_PL_TT ||
> -new_mem->mem_type == AMDGPU_PL_PREEMPT)) {
> +new_mem->mem_type == AMDGPU_PL_PREEMPT ||
> +new_mem->mem_type == TTM_PL_SYSTEM)) {
>  ttm_bo_move_null(bo, new_mem);
>  goto out;
>  }
>
> otherwise, amdgpu_move_blit() is called to do the system memory copy which 
> use a wrong address.
>   206 /* Map only what can't be accessed directly */
>   207 if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) {
>   208 *addr = amdgpu_ttm_domain_start(adev, mem->mem_type) +
>   209 mm_cur->start;
>   210 return 0;
>   211 }
>
> line 208, *addr is zero. So when amdgpu_copy_buffer submit job with such 
> addr, page fault happens.
>
>
> 
> 发件人: Koenig, Christian 
> 发送时间: 2021年11月9日 20:35
> 收件人: Pan, Xinhui; amd-...@lists.freedesktop.org
> 抄送: dri-devel@lists.freedesktop.org
> 主题: Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list
>
> Mhm, I'm not sure what the rational behind that is.
>
> Not moving the BO would make things less efficient, but should never cause a 
> crash.
>
> Maybe we should add a CC: stable tag and push it to -fixes instead?
>
> Christian.
>
> Am 09.11.21 um 13:28 schrieb Pan, Xinhui:
>> [AMD Official Use Only]
>>
>> I hit vulkan cts test hang with navi23.
>>
>> dmesg says gmc page fault with address 0x0, 0x1000, 0x2000
>> And some debug log also says amdgu copy one BO from system Domain to system 
>> Domain which is really weird.
>> 
>> 发件人: Koenig, Christian 
>> 发送时间: 2021年11月9日 20:20
>> 收件人: Pan

RE: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2022-01-11 Thread Chen, Guchun
[Public]

Hi Christian,

Looks this patch still missed in 5.16 kernel. Is it intentional?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/ttm/ttm_bo.c?h=v5.16

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Pan, Xinhui
Sent: Tuesday, November 9, 2021 9:16 PM
To: Koenig, Christian ; amd-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Subject: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

[AMD Official Use Only]

[AMD Official Use Only]

Actually this patch does not totally fix the mismatch of lru list with mem_type 
as mem_type is changed in ->move() and lru list is changed after that.

During this small period, another eviction could still happed and evict this 
mismatched BO from sMam(say, its lru list is on vram domain) to sMem.

发件人: Pan, Xinhui 
发送时间: 2021年11月9日 21:05
收件人: Koenig, Christian; amd-...@lists.freedesktop.org
抄送: dri-devel@lists.freedesktop.org
主题: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

Yes, a stable tag is needed. vulkan guys say 5.14 hit this issue too.

I think that amdgpu_bo_move() does support copy from sysMem to sysMem correctly.
maybe something below is needed.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c83ef42ca702..aa63ae7ddf1e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -485,7 +485,8 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
bool evict,
}
if (old_mem->mem_type == TTM_PL_SYSTEM &&
(new_mem->mem_type == TTM_PL_TT ||
-new_mem->mem_type == AMDGPU_PL_PREEMPT)) {
+new_mem->mem_type == AMDGPU_PL_PREEMPT ||
+new_mem->mem_type == TTM_PL_SYSTEM)) {
ttm_bo_move_null(bo, new_mem);
goto out;
}

otherwise, amdgpu_move_blit() is called to do the system memory copy which use 
a wrong address.
 206 /* Map only what can't be accessed directly */
 207 if (!tmz && mem->start != AMDGPU_BO_INVALID_OFFSET) {
 208 *addr = amdgpu_ttm_domain_start(adev, mem->mem_type) +
 209 mm_cur->start;
 210 return 0;
 211 }

line 208, *addr is zero. So when amdgpu_copy_buffer submit job with such addr, 
page fault happens.



发件人: Koenig, Christian 
发送时间: 2021年11月9日 20:35
收件人: Pan, Xinhui; amd-...@lists.freedesktop.org
抄送: dri-devel@lists.freedesktop.org
主题: Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

Mhm, I'm not sure what the rational behind that is.

Not moving the BO would make things less efficient, but should never cause a 
crash.

Maybe we should add a CC: stable tag and push it to -fixes instead?

Christian.

Am 09.11.21 um 13:28 schrieb Pan, Xinhui:
> [AMD Official Use Only]
>
> I hit vulkan cts test hang with navi23.
>
> dmesg says gmc page fault with address 0x0, 0x1000, 0x2000
> And some debug log also says amdgu copy one BO from system Domain to system 
> Domain which is really weird.
> 
> 发件人: Koenig, Christian 
> 发送时间: 2021年11月9日 20:20
> 收件人: Pan, Xinhui; amd-...@lists.freedesktop.org
> 抄送: dri-devel@lists.freedesktop.org
> 主题: Re: [PATCH] drm/ttm: Put BO in its memory manager's lru list
>
> Am 09.11.21 um 12:19 schrieb xinhui pan:
>> After we move BO to a new memory region, we should put it to the new 
>> memory manager's lru list regardless we unlock the resv or not.
>>
>> Signed-off-by: xinhui pan 
> Interesting find, did you trigger that somehow or did you just 
> stumbled over it by reading the code?
>
> Patch is Reviewed-by: Christian König , I 
> will pick that up for drm-misc-next.
>
> Thanks,
> Christian.
>
>> ---
>>drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
>>1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>> b/drivers/gpu/drm/ttm/ttm_bo.c index f1367107925b..e307004f0b28 
>> 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -701,6 +701,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>ret = ttm_bo_evict(bo, ctx);
>>if (locked)
>>ttm_bo_unreserve(bo);
>> + else
>> + ttm_bo_move_to_lru_tail_unlocked(bo);
>>
>>ttm_bo_put(bo);
>>return ret;


RE: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)

2021-09-12 Thread Chen, Guchun
[Public]

Thanks for your suggestion, Robin. Do you agree with this as well, Christian 
and Xinhui?

Regards,
Guchun

-Original Message-
From: Robin Murphy  
Sent: Saturday, September 11, 2021 2:25 AM
To: Chen, Guchun ; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; Koenig, Christian ; 
Pan, Xinhui ; Deucher, Alexander 
Cc: Shi, Leslie 
Subject: Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when 
array bounds (v2)

On 2021-09-10 11:09, Guchun Chen wrote:
> Vendor will define their own memory types on top of TTM_PL_PRIV, but 
> call ttm_set_driver_manager directly without checking mem_type value 
> when setting up memory manager. So add such check to aware the case 
> when array bounds.
> 
> v2: lower check level to WARN_ON
> 
> Signed-off-by: Leslie Shi 
> Signed-off-by: Guchun Chen 
> ---
>   include/drm/ttm/ttm_device.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/include/drm/ttm/ttm_device.h 
> b/include/drm/ttm/ttm_device.h index 07d722950d5b..aa79953c807c 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
>   static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
> struct ttm_resource_manager *manager)
>   {
> + WARN_ON(type >= TTM_NUM_MEM_TYPES);

Nit: I know nothing about this code, but from the context alone it would seem 
sensible to do

if (WARN_ON(type >= TTM_NUM_MEM_TYPES))
return;

to avoid making the subsequent assignment when we *know* it's invalid and 
likely to corrupt memory.

Robin.

>   bdev->man_drv[type] = manager;
>   }
>   
> 


RE: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds

2021-09-10 Thread Chen, Guchun
[Public]

Hi Christian and Xinhui,

Thanks for your suggestion. The cause is I saw data corruption in several 
proprietary use cases. BUILD_BUG_ON will have build variation per gcc 
difference?

Anyway, WARN_ON is fine to me, and I will send a new patch set soon to address 
this.

Regards,
Guchun

From: Koenig, Christian 
Sent: Friday, September 10, 2021 2:37 PM
To: Pan, Xinhui ; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; Deucher, Alexander 
; Chen, Guchun 
Cc: Shi, Leslie 
Subject: Re: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array 
bounds

Yeah, that's a good point.

If build_bug_on() doesn't works for some reason then we at least need to lower 
this to a WARN_ON.

A BUG_ON() is only justified if we prevent strong data corruption with it or 
note a NULL pointer earlier on or similar.

Regards,
Christian.
Am 10.09.21 um 06:36 schrieb Pan, Xinhui:

[AMD Official Use Only]

looks good to me.
But maybe build_bug_on works too and more reasonable to detect such wrong usage.

From: Chen, Guchun <mailto:guchun.c...@amd.com>
Sent: Friday, September 10, 2021 12:30:14 PM
To: amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org> 
<mailto:amd-...@lists.freedesktop.org>; 
dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> 
<mailto:dri-devel@lists.freedesktop.org>; 
Koenig, Christian <mailto:christian.koe...@amd.com>; 
Pan, Xinhui <mailto:xinhui@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: Chen, Guchun <mailto:guchun.c...@amd.com>; Shi, Leslie 
<mailto:yuliang@amd.com>
Subject: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array 
bounds

Vendor will define their own memory types on top of TTM_PL_PRIV,
but call ttm_set_driver_manager directly without checking mem_type
value when setting up memory manager. So add such check to aware
the case when array bounds.

Signed-off-by: Leslie Shi <mailto:yuliang@amd.com>
Signed-off-by: Guchun Chen <mailto:guchun.c...@amd.com>
---
 include/drm/ttm/ttm_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 7a0f561c57ee..24ad76ca8022 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -308,6 +308,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
 static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
   struct ttm_resource_manager *manager)
 {
+   BUG_ON(type >= TTM_NUM_MEM_TYPES);
 bdev->man_drv[type] = manager;
 }

--
2.17.1



RE: [PATCH] drm/display: fix possible null-pointer dereference in dcn10_set_clock()

2021-08-10 Thread Chen, Guchun
[Public]

Thanks for your patch.

I suggest moving the check of function pointer dc->clk_mgr->funcs->get_clock 
earlier, and return early if it's NULL, as if it's NULL, it's meaningless to 
continue the clock setting.


if (!dc->clk_mgr || !dc->clk_mgr->funcs->get_clock)
return DC_FAIL_UNSUPPORTED_1;

dc->clk_mgr->funcs->get_clock(dc->clk_mgr,
context, clock_type, _cfg);


Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Tuo Li
Sent: Tuesday, August 10, 2021 5:20 PM
To: Wentland, Harry ; Li, Sun peng (Leo) 
; Deucher, Alexander ; Koenig, 
Christian ; Pan, Xinhui ; 
airl...@linux.ie; dan...@ffwll.ch; Cyr, Aric ; Lei, Jun 
; Zhuo, Qingqing ; Siqueira, Rodrigo 
; Lee, Alvin ; Stempen, Vladimir 
; isabel.zh...@amd.com; Lee, Sung ; 
Po-Yu Hsieh Paul ; Wood, Wyatt 
Cc: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; baijiaju1...@gmail.com; Tuo Li 
; TOTE Robot 
Subject: [PATCH] drm/display: fix possible null-pointer dereference in 
dcn10_set_clock()

The variable dc->clk_mgr is checked in:
  if (dc->clk_mgr && dc->clk_mgr->funcs->get_clock)

This indicates dc->clk_mgr can be NULL.
However, it is dereferenced in:
  if (!dc->clk_mgr->funcs->get_clock)

To fix this possible null-pointer dereference, check dc->clk_mgr before 
dereferencing it.

Reported-by: TOTE Robot 
Signed-off-by: Tuo Li 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index c545eddabdcc..3a7c7c7efa68 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -3635,7 +3635,7 @@ enum dc_status dcn10_set_clock(struct dc *dc,
dc->clk_mgr->funcs->get_clock(dc->clk_mgr,
context, clock_type, 
_cfg);
 
-   if (!dc->clk_mgr->funcs->get_clock)
+   if (dc->clk_mgr && !dc->clk_mgr->funcs->get_clock)
return DC_FAIL_UNSUPPORTED_1;
 
if (clk_khz > clock_cfg.max_clock_khz)
--
2.25.1


RE: [PATCH 1/3] drm/amdgpu: create amdgpu_vkms (v2)

2021-07-23 Thread Chen, Guchun
[Public]

Look copy right statement is missed in both amdgpu_vkms.c and amdgpu_vkms.h.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, July 23, 2021 10:32 PM
To: Taylor, Ryan 
Cc: kernel test robot ; Daniel Vetter ; 
Siqueira, Rodrigo ; amd-gfx list 
; Melissa Wen ; Maling 
list - DRI developers 
Subject: Re: [PATCH 1/3] drm/amdgpu: create amdgpu_vkms (v2)

On Wed, Jul 21, 2021 at 1:07 PM Ryan Taylor  wrote:
>
> Modify the VKMS driver into an api that dce_virtual can use to create 
> virtual displays that obey drm's atomic modesetting api.
>
> v2: Made local functions static.
>
> Reported-by: kernel test robot 
> Signed-off-by: Ryan Taylor 
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile  |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h  |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 411 
> +++  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h |  
> 29 ++  drivers/gpu/drm/amd/amdgpu/dce_virtual.c |  23 +-
>  7 files changed, 458 insertions(+), 11 deletions(-)  create mode 
> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index f089794bbdd5..30cbcd5ce1cc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -120,6 +120,7 @@ amdgpu-y += \
>  amdgpu-y += \
> dce_v10_0.o \
> dce_v11_0.o \
> +   amdgpu_vkms.o \
> dce_virtual.o
>
>  # add GFX block
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 54cf647bd018..d0a2f2ed433d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -919,6 +919,7 @@ struct amdgpu_device {
>
> /* display */
> boolenable_virtual_display;
> +   struct amdgpu_vkms_output   *amdgpu_vkms_output;
> struct amdgpu_mode_info mode_info;
> /* For pre-DCE11. DCE11 and later are in "struct amdgpu_device->dm" */
> struct work_struct  hotplug_work;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index d0c935cf4f0f..1b016e5bc75f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1230,7 +1230,7 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> int ret, retry = 0;
> bool supports_atomic = false;
>
> -   if (!amdgpu_virtual_display &&
> +   if (amdgpu_virtual_display ||
> amdgpu_device_asic_has_dc_support(flags & AMD_ASIC_MASK))
> supports_atomic = true;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> index 09b048647523..5a143ca02cf9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> @@ -344,7 +344,7 @@ int amdgpu_fbdev_init(struct amdgpu_device *adev)
> }
>
> /* disable all the possible outputs/crtcs before entering KMS mode */
> -   if (!amdgpu_device_has_dc_support(adev))
> +   if (!amdgpu_device_has_dc_support(adev) && 
> + !amdgpu_virtual_display)
> 
> drm_helper_disable_unused_functions(adev_to_drm(adev));
>
> drm_fb_helper_initial_config(>helper, bpp_sel); diff 
> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> new file mode 100644
> index ..d5c1f1c58f5f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> @@ -0,0 +1,411 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include 
> +#include  #include 
> +
> +#include "amdgpu.h"
> +#include "amdgpu_vkms.h"
> +#include "amdgpu_display.h"
> +
> +/**
> + * DOC: amdgpu_vkms
> + *
> + * The amdgpu vkms interface provides a virtual KMS interface for 
> +several use
> + * cases: devices without display hardware, platforms where the 
> +actual display
> + * hardware is not useful (e.g., servers), SR-IOV virtual functions, 
> +device
> + * emulation/simulation, and device bring up prior to display 
> +hardware being
> + * usable. We previously emulated a legacy KMS interface, but there 
> +was a desire
> + * to move to the atomic KMS interface. The vkms driver did 
> +everything we
> + * needed, but we wanted KMS support natively in the driver without 
> +buffer
> + * sharing and the ability to support an instance of VKMS per device. 
> +We first
> + * looked at splitting vkms into a stub driver and a helper module 
> +that other
> + * drivers could use to implement a virtual display, but this 
> +strategy ended up
> + * being messy due to driver specific callbacks needed for buffer management.
> + * Ultimately, it proved easier to import the vkms code as it mostly 
> 

RE: [PATCH -next] drm/amdgpu: Fix missing unlock on error in amdgpu_ras_debugfs_table_read()

2021-07-05 Thread Chen, Guchun
[Public]

Thank you for the patch, Yingliang.

There is a similar patch sent out last Saturday and under review. Please check 
it.

[PATCH 3/4] drm/amdgpu: unlock on error in amdgpu_ras_debugfs

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Yang 
Yingliang
Sent: Monday, July 5, 2021 9:40 AM
To: linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; 
amd-...@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH -next] drm/amdgpu: Fix missing unlock on error in 
amdgpu_ras_debugfs_table_read()

Add the missing unlock before return from function
amdgpu_ras_debugfs_table_read() in the error handling case.

Fixes: 9b790694a031 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index fc70620369e4..dbeeb4986ca6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -912,8 +912,10 @@ static ssize_t amdgpu_ras_debugfs_table_read(struct file 
*f, char __user *buf,
 record.retired_page);
 
data_len = min_t(size_t, rec_hdr_fmt_size - r, size);
-   if (copy_to_user(buf, [r], data_len))
-   return -EINVAL;
+   if (copy_to_user(buf, [r], data_len)) {
+   res = -EINVAL;
+   goto Out;
+   }
buf += data_len;
size -= data_len;
*pos += data_len;
--
2.25.1

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C895d0b06d5e54b3598cf08d93f83454a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637610655312805026%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=b8UQJCZDgKs7CkMFMMXtFUfGe%2FQA4Cnm%2FKJKOlvV1K0%3Dreserved=0


RE: [pull] amdgpu, radeon, ttm, sched drm-next-5.13

2021-04-07 Thread Chen, Guchun
[AMD Public Use]

Hi Felix and Christian,

If the regression you are talking about is the NULL pointer problem when 
running KFD tests, it should fixed by below patch in this series.

drm/amdgpu: fix NULL pointer dereference

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Wednesday, April 7, 2021 2:57 PM
To: Kuehling, Felix ; Deucher, Alexander 
; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; airl...@gmail.com; daniel.vet...@ffwll.ch
Subject: Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13

Am 06.04.21 um 17:42 schrieb Felix Kuehling:
> Am 2021-04-01 um 6:29 p.m. schrieb Alex Deucher:
>> Hi Dave, Daniel,
>>
>> New stuff for 5.13.  There are two small patches for ttm and 
>> scheduler that were dependencies for amdgpu changes.
>>
>> The following changes since commit 2cbcb78c9ee5520c8d836c7ff57d1b60ebe8e9b7:
>>
>>Merge tag 'amd-drm-next-5.13-2021-03-23' of 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>> lab.freedesktop.org%2Fagd5f%2Flinuxdata=04%7C01%7Cguchun.chen%40
>> amd.com%7C51d1cbcf7ccc43854abb08d8f99250d8%7C3dd8961fe4884e608e11a82d
>> 994e183d%7C0%7C0%7C637533754128113017%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>> p;sdata=FcdoL9w5LhBZ849ctXPudr%2BBQnnm7Oiq3pz5X7LGGk4%3Dreserved
>> =0 into drm-next (2021-03-26 15:53:21 +0100)
>>
>> are available in the Git repository at:
>>
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>> lab.freedesktop.org%2Fagd5f%2Flinux.gitdata=04%7C01%7Cguchun.che
>> n%40amd.com%7C51d1cbcf7ccc43854abb08d8f99250d8%7C3dd8961fe4884e608e11
>> a82d994e183d%7C0%7C0%7C637533754128113017%7CUnknown%7CTWFpbGZsb3d8eyJ
>> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100
>> 0sdata=N4JIk%2BEgzleaKYaxvdtT7TR1ZsS6FGsIGpDDUqiQiLw%3Drese
>> rved=0 tags/amd-drm-next-5.13-2021-04-01
>>
>> for you to fetch changes up to ef95d2a98d642a537190d73c45ae3c308afee890:
>>
>>drm/amdgpu/display: fix warning on 32 bit in dmub (2021-04-01 
>> 17:32:32 -0400)
>>
>> 
>> amd-drm-next-5.13-2021-04-01:
>>
>> amdgpu:
>> - Re-enable GPU reset on VanGogh
>> - Enable DPM flags for SMART_SUSPEND and MAY_SKIP_RESUME
>> - Disentangle HG from vga_switcheroo
>> - S0ix fixes
>> - W=1 fixes
>> - Resource iterator fixes
>> - DMCUB updates
>> - UBSAN fixes
>> - More PM API cleanup
>> - Aldebaran updates
>> - Modifier fixes
>> - Enable VCN load balancing with asymmetric engines
>> - Rework BO structs
>> - Aldebaran reset support
>> - Initial LTTPR display work
>> - Display MALL fixes
>> - Fall back to YCbCr420 when YCbCr444 fails
>> - SR-IOV fixes
>> - Misc cleanups and fixes
>>
>> radeon:
>> - Typo fixes
>>
>> ttm:
>> - Handle cached requests (required for Aldebaran)
>>
>> scheduler:
>> - Fix runqueue selection when changing priorities (required to fix VCN
>>load balancing)
>>
>> 
>> Alex Deucher (20):
>>drm/amdgpu/display/dm: add missing parameter documentation
>>drm/amdgpu: Add additional Sienna Cichlid PCI ID
>>drm/amdgpu: add a dev_pm_ops prepare callback (v2)
>>drm/amdgpu: enable DPM_FLAG_MAY_SKIP_RESUME and 
>> DPM_FLAG_SMART_SUSPEND flags (v2)
>>drm/amdgpu: disentangle HG systems from vgaswitcheroo
>>drm/amdgpu: rework S3/S4/S0ix state handling
>>drm/amdgpu: don't evict vram on APUs for suspend to ram (v4)
>>drm/amdgpu: clean up non-DC suspend/resume handling
>>drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3)
>>drm/amdgpu: re-enable suspend phase 2 for S0ix
>>drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend
>>drm/amdgpu: update comments about s0ix suspend/resume
>>drm/amdgpu: drop S0ix checks around CG/PG in suspend
>>drm/amdgpu: skip kfd suspend/resume for S0ix
>>drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
>>drm/amdgpu/display: fix memory leak for dimgrey cavefish
>>drm/amdgpu/pm: mark pcie link/speed arrays as const
>>drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend
>>drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in 
>> suspend
>>drm/amdgpu/display: fix warning on 32 bit in dmub
>>
>> Alex Sierra (2):
>>drm/amdgpu: replace per_device_list by array
>>drm/amdgpu: ih reroute for newer asics than vega20
>>
>> Alvin Lee (1):
>>drm/amd/display: Change input parameter for set_drr
>>
>> Anson Jacob (2):
>>drm/amd/display: Fix UBSAN: shift-out-of-bounds warning
>>drm/amd/display: Removing unused code from dmub_cmd.h
>>
>> Anthony Koo (2):
>>drm/amd/display: [FW Promotion] Release 0.0.57
>>drm/amd/display: [FW Promotion] Release 0.0.58
>>
>> Aric Cyr (2):
>>drm/amd/display: 3.2.128
>>  

RE: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

2021-03-22 Thread Chen, Guchun
[AMD Public Use]

Thanks for your patch, Silva. The issue has been fixed by " a5c6007e20e1 
drm/amd/display: fix modprobe failure on vega series".

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Gustavo A. 
R. Silva
Sent: Monday, March 22, 2021 8:51 PM
To: Lee Jones ; Wentland, Harry ; 
Li, Sun peng (Leo) ; Deucher, Alexander 
; Koenig, Christian ; 
David Airlie ; Daniel Vetter 
Cc: Gustavo A. R. Silva ; 
dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org
Subject: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

The wrong sizeof values are currently being used as arguments to kzalloc().

Fix this by using the right arguments *dceip and *vbios, correspondingly.

Addresses-Coverity-ID: 1502901 ("Wrong sizeof argument")
Fixes: fca1e079055e ("drm/amd/display/dc/calcs/dce_calcs: Remove some large 
variables from the stack")
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
index 556ecfabc8d2..1244fcb0f446 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
@@ -2051,11 +2051,11 @@ void bw_calcs_init(struct bw_calcs_dceip *bw_dceip,
 
enum bw_calcs_version version = bw_calcs_version_from_asic_id(asic_id);
 
-   dceip = kzalloc(sizeof(dceip), GFP_KERNEL);
+   dceip = kzalloc(sizeof(*dceip), GFP_KERNEL);
if (!dceip)
return;
 
-   vbios = kzalloc(sizeof(vbios), GFP_KERNEL);
+   vbios = kzalloc(sizeof(*vbios), GFP_KERNEL);
if (!vbios) {
kfree(dceip);
return;
--
2.27.0

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C4ec6ae20f70a488fd2dd08d8ed3987cd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637520178643844637%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=YKVR3n%2FnX50dwuP91T1xPxW%2FvgisWDY0dvF8PxO4P4A%3Dreserved=0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm/ttm: Do not add non-system domain BO into swap list

2021-02-23 Thread Chen, Guchun
[AMD Public Use]

Acked-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Wednesday, February 24, 2021 11:35 AM
To: Pan, Xinhui 
Cc: Deucher, Alexander ; Maling list - DRI 
developers ; Koenig, Christian 
; amd-gfx list 
Subject: Re: [PATCH] drm/ttm: Do not add non-system domain BO into swap list

On Tue, Feb 23, 2021 at 10:28 PM xinhui pan  wrote:
>
> BO would be added into swap list if it is validated into system domain.
> If BO is validated again into non-system domain, say, VRAM domain. It 
> actually should not be in the swap list.
>
> Signed-off-by: xinhui pan 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c index a97d41f4ce3c..3a10bebb75d6 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -111,6 +111,8 @@ void ttm_bo_move_to_lru_tail(struct 
> ttm_buffer_object *bo,
>
> swap = _glob.swap_lru[bo->priority];
> list_move_tail(>swap, swap);
> +   } else {
> +   list_del_init(>swap);
> }
>
> if (bdev->funcs->del_from_lru_notify)
> --
> 2.25.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=04%7C01%7C
> guchun.chen%40amd.com%7C554dbc7fd1fe4438268508d8d87529da%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637497345043233977%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000sdata=2sWpQGXSETm6t%2FKwHXeuLjmcwHHMFKlIplpcL9T3VF8%3D
> p;reserved=0
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C554dbc7fd1fe4438268508d8d87529da%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637497345043233977%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=7sfyxSHzKhpYeh6GzlzhjkBDDsNlxMhz3Ydcs6AHnPw%3Dreserved=0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm/amd/display: use div_s64() for 64-bit division

2021-01-25 Thread Chen, Guchun
[AMD Public Use]

Hi Arnd Bergmann,

Thanks for your patch. This link error during compile has been fixed by below 
commit and been submitted to drm-next branch already.

5da047444e82 drm/amd/display: fix 64-bit division issue on 32-bit OS

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Arnd Bergmann
Sent: Monday, January 25, 2021 7:40 PM
To: Wentland, Harry ; Li, Sun peng (Leo) 
; Deucher, Alexander ; Koenig, 
Christian ; David Airlie ; Daniel 
Vetter ; Aberback, Joshua ; Lakha, 
Bhawanpreet ; Kazlauskas, Nicholas 

Cc: Arnd Bergmann ; Chalmers, Wesley ; 
Zhuo, Qingqing ; Siqueira, Rodrigo 
; linux-ker...@vger.kernel.org; 
amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Jacky Liao 
; Leung, Martin 
Subject: [PATCH] drm/amd/display: use div_s64() for 64-bit division

From: Arnd Bergmann 

The open-coded 64-bit division causes a link error on 32-bit
machines:

ERROR: modpost: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: modpost: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

Use the div_s64() to perform the division here. One of them was an unsigned 
division originally, but it looks like signed division was intended, so use 
that to consistently allow a negative delay.

Fixes: ea7154d8d9fb ("drm/amd/display: Update 
dcn30_apply_idle_power_optimizations() code")
Signed-off-by: Arnd Bergmann 
---
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
index dff83c6a142a..a133e399e76d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
@@ -772,8 +772,8 @@ bool dcn30_apply_idle_power_optimizations(struct dc *dc, 
bool enable)
cursor_cache_enable ? 
_attr : NULL)) {
unsigned int v_total = 
stream->adjust.v_total_max ?
stream->adjust.v_total_max : 
stream->timing.v_total;
-   unsigned int refresh_hz = (unsigned long long) 
stream->timing.pix_clk_100hz *
-   100LL / (v_total * 
stream->timing.h_total);
+   unsigned int refresh_hz = div_s64((unsigned 
long long) stream->timing.pix_clk_100hz *
+   100LL, v_total * 
stream->timing.h_total);
 
/*
 * one frame time in microsec:
@@ -800,8 +800,8 @@ bool dcn30_apply_idle_power_optimizations(struct dc *dc, 
bool enable)
unsigned int denom = refresh_hz * 6528;
unsigned int stutter_period = 
dc->current_state->perf_params.stutter_period_us;
 
-   tmr_delay = (((100LL + 2 * stutter_period * 
refresh_hz) *
-   (100LL + 
dc->debug.mall_additional_timer_percent) + denom - 1) /
+   tmr_delay = div_s64(((100LL + 2 * 
stutter_period * refresh_hz) *
+   (100LL + 
dc->debug.mall_additional_timer_percent) + denom - 1),
denom) - 64LL;
 
/* scale should be increased until it fits into 
6 bits */ @@ -815,8 +815,8 @@ bool dcn30_apply_idle_power_optimizations(struct 
dc *dc, bool enable)
}
 
denom *= 2;
-   tmr_delay = (((100LL + 2 * 
stutter_period * refresh_hz) *
-   (100LL + 
dc->debug.mall_additional_timer_percent) + denom - 1) /
+   tmr_delay = div_s64(((100LL + 2 * 
stutter_period * refresh_hz) *
+   (100LL + 
dc->debug.mall_additional_timer_percent) + denom - 1),
denom) - 64LL;
}
 
--
2.29.2

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C4bb97aae9edc4153392c08d8c1260048%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637471716255231899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=kLdkVHfkYx%2Bd249%2BmtG5GJTq295Pxzw7mgTe0FD8QvY%3Dreserved=0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: linux-next: Tree for Jan 22 (amdgpu)

2021-01-24 Thread Chen, Guchun
[AMD Public Use]

The link error has been fixed by:

5da047444e82 drm/amd/display: fix 64-bit division issue on 32-bit OS

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Randy Dunlap
Sent: Saturday, January 23, 2021 2:02 AM
To: Stephen Rothwell ; Linux Next Mailing List 

Cc: amd-...@lists.freedesktop.org; Linux Kernel Mailing List 
; dri-devel 
Subject: Re: linux-next: Tree for Jan 22 (amdgpu)

On 1/21/21 11:06 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20210121:
> 

on i386:

ERROR: modpost: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: modpost: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!



-- 
~Randy
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C32b5c3dbae684672163608d8bf82ab0c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637469915239051891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=0a61RTCcYsAXilfnxqSzPXxA2q6sIYDKEkMWL6HGJro%3Dreserved=0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm/amdgpu:Fixed the wrong macro definition in amdgpu_trace.h

2020-12-23 Thread Chen, Guchun
[AMD Public Use]

Nice catch and the patch is:

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Chenyang Li
Sent: Wednesday, December 23, 2020 9:19 AM
To: Deucher, Alexander ; 
amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu:Fixed the wrong macro definition in amdgpu_trace.h

In line 24 "_AMDGPU_TRACE_H" is missing an underscore.

Signed-off-by: Chenyang Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index ee9480d14cbc..86cfb3d55477 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -21,7 +21,7 @@
  *
  */
 
-#if !defined(_AMDGPU_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#if !defined(_AMDGPU_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
 #define _AMDGPU_TRACE_H_
 
 #include 
-- 
2.29.2

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C8d902bda929a44a4eac508d8a7114368%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637443040220638596%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=%2ByVfo1XiGQKDHkf354Kpi2edjFzsUT3FIlAXAM6O6AE%3Dreserved=0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [radeon-alex:amd-20.45 2387/2427] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute

2020-12-14 Thread Chen, Guchun
[AMD Public Use]

Hi there,

I will fix this soon. The issue is reported on amd-20.45 branch, which was 
branched out ahead of the fix available on mainline.

Regards,
Guchun

-Original Message-
From: kernel test robot  
Sent: Tuesday, December 15, 2020 1:49 PM
To: Chen, Guchun 
Cc: kbuild-...@lists.01.org; clang-built-li...@googlegroups.com; 
dri-devel@lists.freedesktop.org; Li, Dennis 
Subject: [radeon-alex:amd-20.45 2387/2427] 
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value 
of function declared with 'warn_unused_result' attribute

Hi Guchun,

FYI, the error/warning still remains.

tree:   git://people.freedesktop.org/~agd5f/linux.git amd-20.45
head:   a3950d94b046fb206e58fd3ec717f071c0203ba3
commit: cf13e50dea28cde351fa32767e36135afb30386d [2387/2427] drm/amdgpu: clean 
up ras sysfs creation (v2)
config: x86_64-randconfig-a002-20201214 (attached as .config)
compiler: clang version 12.0.0 
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-projectdata=04%7C01%7Cguchun.chen%40amd.com%7C708cce12ecaa4d2ee1d108d8a0bd3135%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637436084052882308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=I4ohtcRRR8iQs%2FfkMhy%2B7pnsAJ4V%2Br%2F0EpNjcoQp%2B4s%3Dreserved=0
 a29ecca7819a6ed4250d3689b12b1f664bb790d7)
reproduce (this is a W=1 build):
wget 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fintel%2Flkp-tests%2Fmaster%2Fsbin%2Fmake.crossdata=04%7C01%7Cguchun.chen%40amd.com%7C708cce12ecaa4d2ee1d108d8a0bd3135%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637436084052892302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ijbznpvhsb43YLeQJ6UZb%2BfG4mnCiAA2ZmPQhLgz6Ig%3Dreserved=0
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
git remote add radeon-alex git://people.freedesktop.org/~agd5f/linux.git
git fetch --no-tags radeon-alex amd-20.45
git checkout cf13e50dea28cde351fa32767e36135afb30386d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:5: warning: no previous 
prototype for function 'amdgpu_ras_error_cure' [-Wmissing-prototypes]
   int amdgpu_ras_error_cure(struct amdgpu_device *adev,
   ^
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:1: note: declare 'static' if the 
function is not intended to be used outside of this translation unit
   int amdgpu_ras_error_cure(struct amdgpu_device *adev,
   ^
   static 
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return 
>> value of function declared with 'warn_unused_result' attribute 
>> [-Wunused-result]
   sysfs_create_group(>dev->kobj, );
   ^~ 
   2 warnings generated.

vim +/warn_unused_result +1284 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

  1249  
  1250  /* ras fs */
  1251  static BIN_ATTR(gpu_vram_bad_pages, S_IRUGO,
  1252  amdgpu_ras_sysfs_badpages_read, NULL, 0);
  1253  static DEVICE_ATTR(features, S_IRUGO,
  1254  amdgpu_ras_sysfs_features_read, NULL);
  1255  static int amdgpu_ras_fs_init(struct amdgpu_device *adev)
  1256  {
  1257  struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
  1258  struct attribute_group group = {
  1259  .name = RAS_FS_NAME,
  1260  };
  1261  struct attribute *attrs[] = {
  1262  >features_attr.attr,
  1263  NULL
  1264  };
  1265  struct bin_attribute *bin_attrs[] = {
  1266  NULL,
  1267  NULL,
  1268  };
  1269  
  1270  /* add features entry */
  1271  con->features_attr = dev_attr_features;
  1272  group.attrs = attrs;
  1273  sysfs_attr_init(attrs[0]);
  1274  
  1275  if (amdgpu_bad_page_threshold != 0) {
  1276  /* add bad_page_features entry */
  1277  bin_attr_gpu_vram_bad_pages.private = NULL;
  1278  con->badpages_attr = bin_attr_gpu_vram_bad_pages;
  1279  bin_attrs[0] = >badpages_attr;
  1280  group.bin_attrs = bin_attrs;
  1281  sysfs_bin_attr_init(bin_attrs[0]);
  1282  }
  1283  
> 1284  sysfs_create_group(>dev->kobj, );
  1285  
  1286  return 0;
  1287  }
  1288  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://nam11.safelinks.protection.outlook.com/

RE: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation

2020-11-25 Thread Chen, Guchun
[AMD Public Use]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Colin King  
Sent: Wednesday, November 25, 2020 10:18 PM
To: Deucher, Alexander ; Koenig, Christian 
; David Airlie ; Daniel Vetter 
; Zhou1, Tao ; Chen, Guchun 
; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org
Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo 
kmalloc_array creation

From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not correct, it 
should be sizeof(*(*data)->bps_bo).  It just so happens to work because the 
sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
 
bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);
-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
+GFP_KERNEL);
 
if (!bps || !bps_bo) {
kfree(bps);
--
2.29.2
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [radeon-alex:amd-20.45 2387/2417] drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute

2020-11-22 Thread Chen, Guchun
[AMD Public Use]

+Alex.

We have one following patch to fix this. Please check.

a069a9eb73f8 drm/amdgpu: fix a warning in amdgpu_ras.c (v2)

Regards,
Guchun

-Original Message-
From: kernel test robot  
Sent: Saturday, November 21, 2020 2:02 PM
To: Chen, Guchun 
Cc: kbuild-...@lists.01.org; clang-built-li...@googlegroups.com; 
dri-devel@lists.freedesktop.org; Li, Dennis 
Subject: [radeon-alex:amd-20.45 2387/2417] 
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return value 
of function declared with 'warn_unused_result' attribute

tree:   git://people.freedesktop.org/~agd5f/linux.git amd-20.45
head:   1807abbb3a7f17fc931a15d7fd4365ea148c6bb1
commit: cf13e50dea28cde351fa32767e36135afb30386d [2387/2417] drm/amdgpu: clean 
up ras sysfs creation (v2)
config: x86_64-randconfig-a011-20201120 (attached as .config)
compiler: clang version 12.0.0 
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-projectdata=04%7C01%7Cguchun.chen%40amd.com%7C5d30079af0d54041642608d88de31a49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415356400664955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=aEjIu7Q%2FtIwvVsp%2BV28FUwW74QJCsFQ7g3Qak6%2FrazU%3Dreserved=0
 3ded927cf80ac519f9f9c4664fef08787f7c537d)
reproduce (this is a W=1 build):
wget 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fintel%2Flkp-tests%2Fmaster%2Fsbin%2Fmake.crossdata=04%7C01%7Cguchun.chen%40amd.com%7C5d30079af0d54041642608d88de31a49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415356400664955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=NphWy5Ztnf61zx9D%2FBrH%2FP64Yr5tecsLo2FecWTQNpE%3Dreserved=0
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
git remote add radeon-alex git://people.freedesktop.org/~agd5f/linux.git
git fetch --no-tags radeon-alex amd-20.45
git checkout cf13e50dea28cde351fa32767e36135afb30386d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:5: warning: no previous 
prototype for function 'amdgpu_ras_error_cure' [-Wmissing-prototypes]
   int amdgpu_ras_error_cure(struct amdgpu_device *adev,
   ^
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:906:1: note: declare 'static' if the 
function is not intended to be used outside of this translation unit
   int amdgpu_ras_error_cure(struct amdgpu_device *adev,
   ^
   static 
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1284:2: warning: ignoring return 
>> value of function declared with 'warn_unused_result' attribute 
>> [-Wunused-result]
   sysfs_create_group(>dev->kobj, );
   ^~ 
   2 warnings generated.

vim +/warn_unused_result +1284 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

  1249  
  1250  /* ras fs */
  1251  static BIN_ATTR(gpu_vram_bad_pages, S_IRUGO,
  1252  amdgpu_ras_sysfs_badpages_read, NULL, 0);
  1253  static DEVICE_ATTR(features, S_IRUGO,
  1254  amdgpu_ras_sysfs_features_read, NULL);
  1255  static int amdgpu_ras_fs_init(struct amdgpu_device *adev)
  1256  {
  1257  struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
  1258  struct attribute_group group = {
  1259  .name = RAS_FS_NAME,
  1260  };
  1261  struct attribute *attrs[] = {
  1262  >features_attr.attr,
  1263  NULL
  1264  };
  1265  struct bin_attribute *bin_attrs[] = {
  1266  NULL,
  1267  NULL,
  1268  };
  1269  
  1270  /* add features entry */
  1271  con->features_attr = dev_attr_features;
  1272  group.attrs = attrs;
  1273  sysfs_attr_init(attrs[0]);
  1274  
  1275  if (amdgpu_bad_page_threshold != 0) {
  1276  /* add bad_page_features entry */
  1277  bin_attr_gpu_vram_bad_pages.private = NULL;
  1278  con->badpages_attr = bin_attr_gpu_vram_bad_pages;
  1279  bin_attrs[0] = >badpages_attr;
  1280  group.bin_attrs = bin_attrs;
  1281  sysfs_bin_attr_init(bin_attrs[0]);
  1282  }
  1283  
> 1284  sysfs_create_group(>dev->kobj, );
  1285  
  1286  return 0;
  1287  }
  1288  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.01.org%2Fhyperkitty%2Flist%2Fkbuild-all