RE: [PATCH v2] drm/amd/display: enable more strict compile checks

2023-05-24 Thread Russell, Kent
[AMD Official Use Only - General]

(Adding Felix in CC)

I’m a fan of adding it to KFD as well. Felix, can you foresee any issues here?

Kent

From: amd-gfx  On Behalf Of Ho, Kenny
Sent: Wednesday, May 24, 2023 3:23 PM
To: Alex Deucher ; Mahfooz, Hamza 
Cc: Li, Sun peng (Leo) ; Wentland, Harry 
; Pan, Xinhui ; Siqueira, Rodrigo 
; linux-ker...@vger.kernel.org; 
dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; Daniel Vetter 
; Deucher, Alexander ; David Airlie 
; Koenig, Christian 
Subject: Re: [PATCH v2] drm/amd/display: enable more strict compile checks


[AMD Official Use Only - General]


[AMD Official Use Only - General]

(+ Felix)

Should we do the same for other modules under amd (amdkfd)?  I was going to 
enable full kernel werror in the kconfig used by my CI but this is probably 
better.

Kenny

From: Alex Deucher mailto:alexdeuc...@gmail.com>>
Sent: Wednesday, May 24, 2023 3:22 PM
To: Mahfooz, Hamza mailto:hamza.mahf...@amd.com>>
Cc: amd-...@lists.freedesktop.org 
mailto:amd-...@lists.freedesktop.org>>; Li, Sun 
peng (Leo) mailto:sunpeng...@amd.com>>; Ho, Kenny 
mailto:kenny...@amd.com>>; Pan, Xinhui 
mailto:xinhui@amd.com>>; Siqueira, Rodrigo 
mailto:rodrigo.sique...@amd.com>>; 
linux-ker...@vger.kernel.org 
mailto:linux-ker...@vger.kernel.org>>; 
dri-devel@lists.freedesktop.org 
mailto:dri-devel@lists.freedesktop.org>>; 
Daniel Vetter mailto:dan...@ffwll.ch>>; Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; David Airlie 
mailto:airl...@gmail.com>>; Wentland, Harry 
mailto:harry.wentl...@amd.com>>; Koenig, Christian 
mailto:christian.koe...@amd.com>>
Subject: Re: [PATCH v2] drm/amd/display: enable more strict compile checks

On Wed, May 24, 2023 at 3:20 PM Hamza Mahfooz 
mailto:hamza.mahf...@amd.com>> wrote:
>
> Currently, there are quite a number of issues that are quite easy for
> the CI to catch, that slip through the cracks. Among them, there are
> unused variable and indentation issues. Also, we should consider all
> warnings to be compile errors, since the community will eventually end
> up complaining about them. So, enable -Werror, -Wunused and
> -Wmisleading-indentation for all kernel builds.
>
> Cc: Alex Deucher mailto:alexander.deuc...@amd.com>>
> Cc: Harry Wentland mailto:harry.wentl...@amd.com>>
> Cc: Kenny Ho mailto:kenny...@amd.com>>
> Signed-off-by: Hamza Mahfooz 
> mailto:hamza.mahf...@amd.com>>
> ---
> v2: fix grammatical error
> ---
>  drivers/gpu/drm/amd/display/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/Makefile 
> b/drivers/gpu/drm/amd/display/Makefile
> index 0d610cb376bb..3c44162ebe21 100644
> --- a/drivers/gpu/drm/amd/display/Makefile
> +++ b/drivers/gpu/drm/amd/display/Makefile
> @@ -26,6 +26,8 @@
>
>  AMDDALPATH = $(RELATIVE_AMD_DISPLAY_PATH)
>
> +subdir-ccflags-y += -Werror -Wunused -Wmisleading-indentation
> +

Care to enable this for the rest of amdgpu as well?  Or send out an
additional patch to do that?  Either way:
Reviewed-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>

Alex

>  subdir-ccflags-y += -I$(FULL_AMD_DISPLAY_PATH)/dc/inc/
>  subdir-ccflags-y += -I$(FULL_AMD_DISPLAY_PATH)/dc/inc/hw
>  subdir-ccflags-y += -I$(FULL_AMD_DISPLAY_PATH)/dc/clk_mgr
> --
> 2.40.1
>


RE: [PATCH 12/34] drm/amdgpu: add configurable grace period for unmap queues

2023-03-28 Thread Russell, Kent
[AMD Official Use Only - General]

3 tiny grammar/spelling things inline (not critical)

 Kent

> -Original Message-
> From: amd-gfx  On Behalf Of
> Jonathan Kim
> Sent: Monday, March 27, 2023 2:43 PM
> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Kuehling, Felix ; Kim, Jonathan
> 
> Subject: [PATCH 12/34] drm/amdgpu: add configurable grace period for unmap
> queues
> 
> The HWS schedule allows a grace period for wave completion prior to
> preemption for better performance by avoiding CWSR on waves that can
> potentially complete quickly. The debugger, on the other hand, will
> want to inspect wave status immediately after it actively triggers
> preemption (a suspend function to be provided).
> 
> To minimize latency between preemption and debugger wave inspection, allow
> immediate preemption by setting the grace period to 0.
> 
> Note that setting the preepmtion grace period to 0 will result in an
> infinite grace period being set due to a CP FW bug so set it to 1 for now.
> 
> v2: clarify purpose in the description of this patch
> 
> Signed-off-by: Jonathan Kim 
> Reviewed-by: Felix Kuehling 
> ---
>  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  2 +
>  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +
>  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 
>  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  6 ++
>  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |  2 +
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 43 
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  9 ++-
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 62 +-
>  .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +
>  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 32 +
>  .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 39 +++
>  .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h   | 65 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 ++
>  13 files changed, 291 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> index a6f98141c29c..b811a0985050 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> @@ -82,5 +82,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
>   .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
>   .enable_debug_trap = kgd_aldebaran_enable_debug_trap,
>   .disable_debug_trap = kgd_aldebaran_disable_debug_trap,
> + .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
> + .build_grace_period_packet_info =
> kgd_gfx_v9_build_grace_period_packet_info,
>   .program_trap_handler_settings =
> kgd_gfx_v9_program_trap_handler_settings,
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> index d2918e5c0dea..a62bd0068515 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> @@ -410,6 +410,8 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
> 
>   kgd_gfx_v9_set_vm_context_page_table_base,
>   .enable_debug_trap = kgd_arcturus_enable_debug_trap,
>   .disable_debug_trap = kgd_arcturus_disable_debug_trap,
> + .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
> + .build_grace_period_packet_info =
> kgd_gfx_v9_build_grace_period_packet_info,
>   .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
>   .program_trap_handler_settings =
> kgd_gfx_v9_program_trap_handler_settings
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 969015281510..605387e55d33 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -802,6 +802,47 @@ uint32_t kgd_gfx_v10_disable_debug_trap(struct
> amdgpu_device *adev,
>   return 0;
>  }
> 
> +/* kgd_gfx_v10_get_iq_wait_times: Returns the mmCP_IQ_WAIT_TIME1/2
> values
> + * The values read are:
> + * ib_offload_wait_time -- Wait Count for Indirect Buffer Offloads.
> + * atomic_offload_wait_time -- Wait Count for L2 and GDS Atomics
> Offloads.
> + * wrm_offload_wait_time-- Wait Count for WAIT_REG_MEM Offloads.
> + * gws_wait_time-- Wait Count for Global Wave Syncs.
> + * que_sleep_wait_time  -- Wait Count for Dequeue Retry.
> + * sch_wave_wait_time   -- Wait Count for Scheduling Wave Message.
> + * sem_rearm_wait_time  -- Wait Count for Semaphore re-arm.
> + * deq_retry_wait_time  -- Wait Count for Global Wave Syncs.
> + */
> +void kgd_gfx_v10_get_iq_wait_times(struct amdgpu_device *adev,
> + uint32_t *wait_times)
> +
> +{
> + *wait_times = RREG32(SOC15_REG_OFFSET(GC, 0,
> mmCP_IQ_WAIT_TIME2));
> +}
> +
> +void 

RE: BUG: kernel NULL pointer dereference, address: 0000000000000026 after switching to 5.7 kernel

2020-04-13 Thread Russell, Kent
[AMD Official Use Only - Internal Distribution Only]

You can add a Reviewed-By and Tested-By for me on this patch, unless you want 
me to send it out instead, then you can review it.

 Kent

> -Original Message-
> From: amd-gfx  On Behalf Of 
> Christian König
> Sent: Saturday, April 11, 2020 5:57 AM
> To: Mikhail Gavrilov ; amd-gfx list 
> ; dri-devel 
> ; Linux List Kernel Mailing 
> ; Grodzovsky, Andrey 
> ; Russell, Kent 
> Subject: Re: BUG: kernel NULL pointer dereference, address:
> 0026 after switching to 5.7 kernel
> 
> Yeah, that is a known issue.
> 
> You could try the attached patch, but please be aware that it is not 
> even compile tested because of the Easter holidays here.
> 
> Thanks,
> Christian.
> 
> Am 10.04.20 um 21:51 schrieb Mikhail Gavrilov:
> > Hi folks.
> > After upgrade kernel to 5.7 I see every boot in kernel log following 
> > error messages:
> >
> > [2.569513] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
> > [2.569538] [drm] PSP loading UVD firmware
> > [2.570038] BUG: kernel NULL pointer dereference, address:
> 0026
> > [2.570045] #PF: supervisor read access in kernel mode
> > [2.570050] #PF: error_code(0x) - not-present page
> > [2.570055] PGD 0 P4D 0
> > [2.570060] Oops:  [#1] SMP NOPTI
> > [2.570065] CPU: 5 PID: 667 Comm: uvd_enc_1.1 Not tainted
> > 5.7.0-0.rc0.git6.1.2.fc33.x86_64 #1
> > [2.570072] Hardware name: System manufacturer System Product
> > Name/ROG STRIX X570-I GAMING, BIOS 1405 11/19/2019
> > [2.570085] RIP: 0010:__kthread_should_park+0x5/0x30
> > [2.570090] Code: 00 e9 fe fe ff ff e8 ca 3a 08 00 e9 49 fe ff ff
> > 48 89 df e8 dd 38 08 00 84 c0 0f 84 6a ff ff ff e9 a6 fe ff ff 0f 1f
> > 44 00 00  47 26 20 74 12 48 8b 87 88 09 00 00 48 8b 00 48 c1 e8 
> > 02
> > 83 e0
> > [2.570103] RSP: 0018:ad8141723e50 EFLAGS: 00010246
> > [2.570107] RAX: 7fff RBX: 8a8d1d116ed8 RCX:
> 
> > [2.570112] RDX:  RSI:  RDI:
> 
> > [2.570116] RBP: 8a8d28c11300 R08:  R09:
> 
> > [2.570120] R10:  R11:  R12:
> 8a8d1d152e40
> > [2.570125] R13: 8a8d1d117280 R14: 8a8d1d116ed8 R15:
> 8a8d1ca68000
> > [2.570131] FS:  () GS:8a8d3aa0()
> > knlGS:
> > [2.570137] CS:  0010 DS:  ES:  CR0: 80050033
> > [2.570142] CR2: 0026 CR3: 0007e3dc6000 CR4:
> 003406e0
> > [2.570147] Call Trace:
> > [2.570157]  drm_sched_get_cleanup_job+0x42/0x130 [gpu_sched]
> > [2.570166]  drm_sched_main+0x6f/0x530 [gpu_sched]
> > [2.570173]  ? lockdep_hardirqs_on+0x11e/0x1b0
> > [2.570179]  ? drm_sched_get_cleanup_job+0x130/0x130 [gpu_sched]
> > [2.570185]  kthread+0x131/0x150
> > [2.570189]  ? __kthread_bind_mask+0x60/0x60
> > [2.570196]  ret_from_fork+0x27/0x50
> > [2.570203] Modules linked in: fjes(-) amdgpu(+) amd_iommu_v2
> > gpu_sched ttm drm_kms_helper drm crc32c_intel igb nvme nvme_core dca 
> > i2c_algo_bit wmi pinctrl_amd br_netfilter bridge stp llc fuse
> > [2.570223] CR2: 0026
> > [2.570228] ---[ end trace 80c25d326e1e0d7c ]---
> > [2.570233] RIP: 0010:__kthread_should_park+0x5/0x30
> > [2.570238] Code: 00 e9 fe fe ff ff e8 ca 3a 08 00 e9 49 fe ff ff
> > 48 89 df e8 dd 38 08 00 84 c0 0f 84 6a ff ff ff e9 a6 fe ff ff 0f 1f
> > 44 00 00  47 26 20 74 12 48 8b 87 88 09 00 00 48 8b 00 48 c1 e8 
> > 02
> > 83 e0
> > [2.570250] RSP: 0018:ad8141723e50 EFLAGS: 00010246
> > [2.570255] RAX: 7fff RBX: 8a8d1d116ed8 RCX:
> 
> > [2.570260] RDX:  RSI:  RDI:
> 
> > [2.570265] RBP: 8a8d28c11300 R08:  R09:
> 
> > [2.570271] R10:  R11:  R12:
> 8a8d1d152e40
> > [2.570276] R13: 8a8d1d117280 R14: 8a8d1d116ed8 R15:
> 8a8d1ca68000
> > [2.570281] FS:  () GS:8a8d3aa0()
> > knlGS:
> > [2.570287] CS:  0010 DS:  ES:  CR0: 80050033
> > [2.570292] CR2: 0026 CR3: 0007e3dc6000 CR4:
> 003406e0
> > [2.570299] BUG: sleeping function called from invalid context at
> > include/linux/percpu-rwsem.h:49
> > [2.570306] in_atomi

RE: [PATCH] drm/amdkfd: Use true and false for boolean values

2018-08-07 Thread Russell, Kent
Thanks for that!
Reviewed-by: Kent Russell 

-Original Message-
From: amd-gfx  On Behalf Of Huang Rui
Sent: Tuesday, August 07, 2018 1:28 AM
To: Gustavo A. R. Silva ; Kuehling, Felix 

Cc: Oded Gabbay ; Zhou, David(ChunMing) 
; David Airlie ; 
linux-ker...@vger.kernel.org; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; Deucher, Alexander 
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdkfd: Use true and false for boolean values

On Sat, Aug 04, 2018 at 07:27:02PM -0500, Gustavo A. R. Silva wrote:
> Return statements in functions returning bool should use true or false 
> instead of an integer value.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Looks good for me.
Reviewed-by: Huang Rui 

Add Felix for his awareness.

> ---
>  drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
> b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> index 5d2475d..16af9d1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> +++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> @@ -62,12 +62,12 @@ static bool cik_event_interrupt_isr(struct kfd_dev *dev,
>   vmid  = (ihre->ring_id & 0xff00) >> 8;
>   if (vmid < dev->vm_info.first_vmid_kfd ||
>   vmid > dev->vm_info.last_vmid_kfd)
> - return 0;
> + return false;
>  
>   /* If there is no valid PASID, it's likely a firmware bug */
>   pasid = (ihre->ring_id & 0x) >> 16;
>   if (WARN_ONCE(pasid == 0, "FW bug: No PASID in KFD interrupt"))
> - return 0;
> + return false;
>  
>   /* Interrupt types we care about: various signals and faults.
>* They will be forwarded to a work queue (see below).
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel