Re: [PATCH v2 18/21] drm/amd/display: Fallback to 2020_YCBCR if the pixel encoding is not RGB

2023-01-25 Thread Sebastian Wick
On Tue, Jan 24, 2023 at 7:57 PM Harry Wentland  wrote:
>
>
>
> On 1/24/23 10:37, Harry Wentland wrote:
> >
> >
> > On 1/23/23 15:30, Sebastian Wick wrote:
> >> A new property to control YCC and subsampling would be the more
> >> complete path here. If we actually want to fix this in the short-term
> >> though, we should handle the YCC and RGB Colorspace values as
> >> equivalent, everywhere. Technically we're breaking the user space API
> >> here so it should be documented on the KMS property and other drivers
> >> must be adjusted accordingly as well.
> >>
> >
> > Could someone point me to a userspace that uses this currently?
> >
>
> To elaborate a bit more...
>
> A driver has always had the ability to pick the wire format, whether
> it'd be RGB or YCbCr (444, or 420). In some cases that selection
> is required in order to satisfy bandwidth requirements. In others
> we follow a certain policy to ensure similar behaviors between our
> Windows and Linux drivers. I don't think it makes sense for userspace
> to control this.

I disagree. The subsampling is degrading the image considerably in
some cases. We need control over this.

It does mean that user space has to be smart and try to reduce the
bandwidth if a KMS commit fails, but the same is true for resolution
and refresh rate right now and will be true for a min bpc property as
well.

> Based on what I see I am not convinced the entirety of the
> colorspace definition has a corresponding implementation in an
> upstream, canonical userspace, hence my question. Not even an IGT
> test existed when I started looking at this. In the absence of a
> missing userspace implementation I am not convinced we're breaking
> anything. Even then, this was never implemented in amdgpu so
> there is no way this regresses any existing behavior.

I don't think this breaks anything in practice. The change would only
break use cases where you want to set the infoframe to a variant which
does not match the wire format and that would be broken. So yes, I
agree.

We can't just remove the enum values though. If we deprecate the YCC
variants that must be documented and user space has to understand that
choosing the RGB variant will result in the kernel just doing the
"right thing".

>
> Harry
>
> > Harry
> >
> >> On Fri, Jan 13, 2023 at 5:26 PM Harry Wentland  
> >> wrote:
> >>>
> >>> From: Joshua Ashton 
> >>>
> >>> Userspace might not aware whether we're sending RGB or YCbCr
> >>> data to the display. If COLOR_SPACE_2020_RGB_FULLRANGE is
> >>> requested but the output encoding is YCbCr we should
> >>> send COLOR_SPACE_2020_YCBCR.
> >>>
> >>> Signed-off-by: Joshua Ashton 
> >>> Signed-off-by: Harry Wentland 
> >>> Cc: Pekka Paalanen 
> >>> Cc: Sebastian Wick 
> >>> Cc: vitaly.pros...@amd.com
> >>> Cc: Joshua Ashton 
> >>> Cc: dri-de...@lists.freedesktop.org
> >>> Cc: amd-gfx@lists.freedesktop.org
> >>> Reviewed-by: Harry Wentland 
> >>> ---
> >>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 -
> >>>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> >>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>> index f74b125af31f..16940ea61b59 100644
> >>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>> @@ -5184,7 +5184,10 @@ get_output_color_space(const struct dc_crtc_timing 
> >>> *dc_crtc_timing,
> >>> color_space = COLOR_SPACE_ADOBERGB;
> >>> break;
> >>> case DRM_MODE_COLORIMETRY_BT2020_RGB:
> >>> -   color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
> >>> +   if (dc_crtc_timing->pixel_encoding == PIXEL_ENCODING_RGB)
> >>> +   color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
> >>> +   else
> >>> +   color_space = COLOR_SPACE_2020_YCBCR;
> >>> break;
> >>> case DRM_MODE_COLORIMETRY_BT2020_YCC:
> >>> color_space = COLOR_SPACE_2020_YCBCR;
> >>> --
> >>> 2.39.0
> >>>
> >>
> >
>



Re: [PATCH v2 18/21] drm/amd/display: Fallback to 2020_YCBCR if the pixel encoding is not RGB

2023-01-25 Thread Sebastian Wick
On Wed, Jan 25, 2023 at 2:00 PM Joshua Ashton  wrote:
>
>
>
> On 1/23/23 20:30, Sebastian Wick wrote:
> > A new property to control YCC and subsampling would be the more
> > complete path here. If we actually want to fix this in the short-term
> > though, we should handle the YCC and RGB Colorspace values as
> > equivalent, everywhere. Technically we're breaking the user space API
> > here so it should be documented on the KMS property and other drivers
> > must be adjusted accordingly as well.
>
> I am happy with treating 2020_YCC and 2020_RGB as the same.
>
> I think having the YCC/RGB split here in Colorspace is a mistake.
> Pixel encoding should be completely separate from colorspace from a uAPI
> perspective when we want to expose that.
> It's just a design flaw from when it was added as it just mirrors the
> values in the colorimetry packets 1:1. I understand why this happened,
> and I don't think it's a big deal for us to correct ourselves now:
>
> I suggest we deprecate the _YCC variants, treat them the same as the RGB
> enum to avoid any potential uAPI breakage and key the split entirely off
> the pixel_encoding value.

Yeah, I agree. The kernel must know the wire encoding and can thus
always choose the correct variant. If anyone wants to provide a patch
I'll review it.

> That way when we do want to plumb more explicit pixel encoding down the
> line, userspace only has to manage one thing. There's no advantage for
> anything more here.
>
> - Joshie ✨
>
> >
> > On Fri, Jan 13, 2023 at 5:26 PM Harry Wentland  
> > wrote:
> >>
> >> From: Joshua Ashton 
> >>
> >> Userspace might not aware whether we're sending RGB or YCbCr
> >> data to the display. If COLOR_SPACE_2020_RGB_FULLRANGE is
> >> requested but the output encoding is YCbCr we should
> >> send COLOR_SPACE_2020_YCBCR.
> >>
> >> Signed-off-by: Joshua Ashton 
> >> Signed-off-by: Harry Wentland 
> >> Cc: Pekka Paalanen 
> >> Cc: Sebastian Wick 
> >> Cc: vitaly.pros...@amd.com
> >> Cc: Joshua Ashton 
> >> Cc: dri-de...@lists.freedesktop.org
> >> Cc: amd-gfx@lists.freedesktop.org
> >> Reviewed-by: Harry Wentland 
> >> ---
> >>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 -
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> index f74b125af31f..16940ea61b59 100644
> >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> @@ -5184,7 +5184,10 @@ get_output_color_space(const struct dc_crtc_timing 
> >> *dc_crtc_timing,
> >>  color_space = COLOR_SPACE_ADOBERGB;
> >>  break;
> >>  case DRM_MODE_COLORIMETRY_BT2020_RGB:
> >> -   color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
> >> +   if (dc_crtc_timing->pixel_encoding == PIXEL_ENCODING_RGB)
> >> +   color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
> >> +   else
> >> +   color_space = COLOR_SPACE_2020_YCBCR;
> >>  break;
> >>  case DRM_MODE_COLORIMETRY_BT2020_YCC:
> >>  color_space = COLOR_SPACE_2020_YCBCR;
> >> --
> >> 2.39.0
> >>
> >
>



[PATCH 16/16] drm/amd/display: 3.2.221

2023-01-25 Thread Alex Hung
From: Aric Cyr 

This version brings along following fixes:
- fix linux dp link lost handled only one time
- Reset DMUB mailbox SW state after HW reset
- Unassign does_plane_fit_in_mall function from dcn3.2
- Add Function delaration in dc_link
- Fix crash when connecting 2 displays with video playback
- Adjust downscaling limits for dcn314
- fix FCLK pstate change underflow
- Fix only one ABM pipe enabled under ODM combined case
- Add missing brackets in calculation
- Correct bw_params population
- Fix Z8 support configurations
- Add Debug Log for MST and PCON
- fix MALL size hardcoded for DCN321
- add rc_params_override option in dc_dsc_config
- Enable Freesync over PCon

Acked-by: Alex Hung 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 42ce45306483..2e23fd8b4e9f 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.220"
+#define DC_VER "3.2.221"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.39.1



[PATCH 15/16] drm/amd/display: fix linux dp link lost handled only one time

2023-01-25 Thread Alex Hung
From: Hersen Wu 

[Why]
linux amdgpu defer handle link lost irq. dm add handle
request to irq work queue for the first irq of link lost.
if link training fails for link lost handle, link will not
be enabled anymore.

[How]
allow adding handle request of link lost to work queue
before running dp link training for link lost.

Signed-off-by: Hersen Wu 
Acked-by: Alex Hung 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 ---
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ceeab2cd8569..9e138d48cd5d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1301,10 +1301,28 @@ static void dm_handle_hpd_rx_offload_work(struct 
work_struct *work)
else if ((dc_link->connector_signal != SIGNAL_TYPE_EDP) &&
dc_link_check_link_loss_status(dc_link, 
_work->data) &&
dc_link_dp_allow_hpd_rx_irq(dc_link)) {
-   dc_link_dp_handle_link_loss(dc_link);
+   /* offload_work->data is from handle_hpd_rx_irq->
+* schedule_hpd_rx_offload_work.this is defer handle
+* for hpd short pulse. upon here, link status may be
+* changed, need get latest link status from dpcd
+* registers. if link status is good, skip run link
+* training again.
+*/
+   union hpd_irq_data irq_data;
+
+   memset(_data, 0, sizeof(irq_data));
+
+   /* before dc_link_dp_handle_link_loss, allow new link lost 
handle
+* request be added to work queue if link lost at end of 
dc_link_
+* dp_handle_link_loss
+*/
spin_lock_irqsave(_work->offload_wq->offload_lock, 
flags);
offload_work->offload_wq->is_handling_link_loss = false;
spin_unlock_irqrestore(_work->offload_wq->offload_lock, 
flags);
+
+   if ((dp_read_hpd_rx_irq_data(dc_link, _data) == DC_OK) &&
+   dc_link_check_link_loss_status(dc_link, _data))
+   dc_link_dp_handle_link_loss(dc_link);
}
mutex_unlock(>dm.dc_lock);
 
@@ -3238,7 +3256,7 @@ static void handle_hpd_rx_irq(void *param)
union hpd_irq_data hpd_irq_data;
bool link_loss = false;
bool has_left_work = false;
-   int idx = aconnector->base.index;
+   int idx = dc_link->link_index;
struct hpd_rx_irq_offload_work_queue *offload_wq = 
>dm.hpd_rx_offload_wq[idx];
 
memset(_irq_data, 0, sizeof(hpd_irq_data));
@@ -3380,7 +3398,7 @@ static void register_hpd_handlers(struct amdgpu_device 
*adev)
(void *) aconnector);
 
if (adev->dm.hpd_rx_offload_wq)
-   
adev->dm.hpd_rx_offload_wq[connector->index].aconnector =
+   
adev->dm.hpd_rx_offload_wq[dc_link->link_index].aconnector =
aconnector;
}
}
diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index 85b57848f5cb..64d5d9b28ca6 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -433,6 +433,9 @@ void dc_link_dp_handle_link_loss(struct dc_link *link);
 bool dc_link_dp_allow_hpd_rx_irq(const struct dc_link *link);
 bool dc_link_check_link_loss_status(struct dc_link *link,
   union hpd_irq_data *hpd_irq_dpcd_data);
+enum dc_status dp_read_hpd_rx_irq_data(
+   struct dc_link *link,
+   union hpd_irq_data *irq_data);
 struct dc_sink_init_data;
 
 struct dc_sink *dc_link_add_remote_sink(
-- 
2.39.1



[PATCH 14/16] drm/amd/display: Reset DMUB mailbox SW state after HW reset

2023-01-25 Thread Alex Hung
From: Nicholas Kazlauskas 

[Why]
Otherwise we can be out of sync with what's in the hardware, leading
to us rerunning every command that's presently in the ringbuffer.

[How]
Reset software state for the mailboxes in hw_reset callback.
This is already done as part of the mailbox init in hw_init, but we
do need to remember to reset the last cached wptr value as well here.

Reviewed-by: Hansen Dsouza 
Acked-by: Alex Hung 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c 
b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
index 4a122925c3ae..92c18bfb98b3 100644
--- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
+++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c
@@ -532,6 +532,9 @@ enum dmub_status dmub_srv_hw_init(struct dmub_srv *dmub,
if (dmub->hw_funcs.reset)
dmub->hw_funcs.reset(dmub);
 
+   /* reset the cache of the last wptr as well now that hw is reset */
+   dmub->inbox1_last_wptr = 0;
+
cw0.offset.quad_part = inst_fb->gpu_addr;
cw0.region.base = DMUB_CW0_BASE;
cw0.region.top = cw0.region.base + inst_fb->size - 1;
@@ -649,6 +652,15 @@ enum dmub_status dmub_srv_hw_reset(struct dmub_srv *dmub)
if (dmub->hw_funcs.reset)
dmub->hw_funcs.reset(dmub);
 
+   /* mailboxes have been reset in hw, so reset the sw state as well */
+   dmub->inbox1_last_wptr = 0;
+   dmub->inbox1_rb.wrpt = 0;
+   dmub->inbox1_rb.rptr = 0;
+   dmub->outbox0_rb.wrpt = 0;
+   dmub->outbox0_rb.rptr = 0;
+   dmub->outbox1_rb.wrpt = 0;
+   dmub->outbox1_rb.rptr = 0;
+
dmub->hw_init = false;
 
return DMUB_STATUS_OK;
-- 
2.39.1



[PATCH 13/16] drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2

2023-01-25 Thread Alex Hung
From: George Shen 

[Why]
The hwss function does_plane_fit_in_mall not applicable to dcn3.2 asics.
Using it with dcn3.2 can result in undefined behaviour.

[How]
Assign the function pointer to NULL.

Reviewed-by: Alvin Lee 
Acked-by: Alex Hung 
Signed-off-by: George Shen 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_init.c
index 330d7cbc7398..a02918eaa2c1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_init.c
@@ -94,7 +94,7 @@ static const struct hw_sequencer_funcs dcn32_funcs = {
.get_vupdate_offset_from_vsync = dcn10_get_vupdate_offset_from_vsync,
.calc_vupdate_position = dcn10_calc_vupdate_position,
.apply_idle_power_optimizations = dcn32_apply_idle_power_optimizations,
-   .does_plane_fit_in_mall = dcn30_does_plane_fit_in_mall,
+   .does_plane_fit_in_mall = NULL,
.set_backlight_level = dcn21_set_backlight_level,
.set_abm_immediate_disable = dcn21_set_abm_immediate_disable,
.hardware_release = dcn30_hardware_release,
-- 
2.39.1



[PATCH 12/16] drm/amd/display: Add Function delaration in dc_link

2023-01-25 Thread Alex Hung
From: Mustapha Ghaddar 

[WHY]
Housekeeping cleaning and adding declaration for
function to be called from DM layer

[HOW]
Adding public functions to dc_link.h

Reviewed-by: Jun Lei 
Acked-by: Alex Hung 
Signed-off-by: Mustapha Ghaddar 
---
 drivers/gpu/drm/amd/display/dc/dc_link.h  | 27 +++
 .../dc/link/protocols/link_dp_dpia_bw.h   | 24 -
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index 1927eacbfa71..85b57848f5cb 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -627,4 +627,31 @@ struct fixed31_32 calculate_sst_avg_time_slots_per_mtp(
 void setup_dp_hpo_stream(struct pipe_ctx *pipe_ctx, bool enable);
 void dp_source_sequence_trace(struct dc_link *link, uint8_t dp_test_mode);
 
+/*
+ *  USB4 DPIA BW ALLOCATION PUBLIC FUNCTIONS
+ */
+/*
+ * Send a request from DP-Tx requesting to allocate BW remotely after
+ * allocating it locally. This will get processed by CM and a CB function
+ * will be called.
+ *
+ * @link: pointer to the dc_link struct instance
+ * @req_bw: The requested bw in Kbyte to allocated
+ *
+ * return: none
+ */
+void dc_link_set_usb4_req_bw_req(struct dc_link *link, int req_bw);
+
+/*
+ * CB function for when the status of the Req above is complete. We will
+ * find out the result of allocating on CM and update structs accordingly
+ *
+ * @link: pointer to the dc_link struct instance
+ * @bw: Allocated or Estimated BW depending on the result
+ * @result: Response type
+ *
+ * return: none
+ */
+void dc_link_get_usb4_req_bw_resp(struct dc_link *link, uint8_t bw, uint8_t 
result);
+
 #endif /* DC_LINK_H_ */
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.h 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.h
index 58eb7b581093..832a6dd2c5fa 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.h
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.h
@@ -44,30 +44,6 @@ enum bw_type {
  */
 bool set_dptx_usb4_bw_alloc_support(struct dc_link *link);
 
-/*
- * Send a request from DP-Tx requesting to allocate BW remotely after
- * allocating it locally. This will get processed by CM and a CB function
- * will be called.
- *
- * @link: pointer to the dc_link struct instance
- * @req_bw: The requested bw in Kbyte to allocated
- *
- * return: none
- */
-void set_usb4_req_bw_req(struct dc_link *link, int req_bw);
-
-/*
- * CB function for when the status of the Req above is complete. We will
- * find out the result of allocating on CM and update structs accordingly
- *
- * @link: pointer to the dc_link struct instance
- * @bw: Allocated or Estimated BW depending on the result
- * @result: Response type
- *
- * return: none
- */
-void get_usb4_req_bw_resp(struct dc_link *link, uint8_t bw, uint8_t result);
-
 /*
  * Return the response_ready flag from dc_link struct
  *
-- 
2.39.1



[PATCH 11/16] drm/amd/display: Revert "avoid disable otg when dig was disabled"

2023-01-25 Thread Alex Hung
From: Aric Cyr 

This reverts commit 82dca8576d14f3dcb775b3be5f1bbb5df9a682ac.

Acked-by: Alex Hung 
Signed-off-by: Aric Cyr 
---
 .../dc/clk_mgr/dcn315/dcn315_clk_mgr.c| 26 +--
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c
index 8c368bcc8e7e..43d1f38b94ce 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c
@@ -87,16 +87,6 @@ static int dcn315_get_active_display_cnt_wa(
return display_count;
 }
 
-bool should_disable_otg(struct pipe_ctx *pipe)
-{
-   bool ret = true;
-
-   if (pipe->stream->link->link_enc && 
pipe->stream->link->link_enc->funcs->is_dig_enabled &&
-   
pipe->stream->link->link_enc->funcs->is_dig_enabled(pipe->stream->link->link_enc))
-   ret = false;
-   return ret;
-}
-
 static void dcn315_disable_otg_wa(struct clk_mgr *clk_mgr_base, struct 
dc_state *context, bool disable)
 {
struct dc *dc = clk_mgr_base->ctx->dc;
@@ -108,16 +98,12 @@ static void dcn315_disable_otg_wa(struct clk_mgr 
*clk_mgr_base, struct dc_state
if (pipe->top_pipe || pipe->prev_odm_pipe)
continue;
if (pipe->stream && (pipe->stream->dpms_off || 
pipe->plane_state == NULL ||
-   
dc_is_virtual_signal(pipe->stream->signal))) {
-
-   /* This w/a should not trigger when we have a dig 
active */
-   if (should_disable_otg(pipe)) {
-   if (disable) {
-   
pipe->stream_res.tg->funcs->immediate_disable_crtc(pipe->stream_res.tg);
-   reset_sync_context_for_pipe(dc, 
context, i);
-   } else
-   
pipe->stream_res.tg->funcs->enable_crtc(pipe->stream_res.tg);
-   }
+
dc_is_virtual_signal(pipe->stream->signal))) {
+   if (disable) {
+   
pipe->stream_res.tg->funcs->immediate_disable_crtc(pipe->stream_res.tg);
+   reset_sync_context_for_pipe(dc, context, i);
+   } else
+   
pipe->stream_res.tg->funcs->enable_crtc(pipe->stream_res.tg);
}
}
 }
-- 
2.39.1



[PATCH 10/16] drm/amd/display: Adjust downscaling limits for dcn314

2023-01-25 Thread Alex Hung
From: Daniel Miess 

[Why]
Lower max_downscale_ratio and ARGB888 downscale factor
to prevent cases where underflow may occur on dcn314

[How]
Set max_downscale_ratio to 400 and ARGB downscale factor
to 250 for dcn314

Reviewed-by: Nicholas Kazlauskas 
Acked-by: Alex Hung 
Signed-off-by: Daniel Miess 
---
 drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
index f9ea1e86707f..79850a68f62a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
@@ -874,8 +874,9 @@ static const struct dc_plane_cap plane_cap = {
},
 
// 6:1 downscaling ratio: 1000/6 = 166.666
+   // 4:1 downscaling ratio for ARGB888 to prevent underflow during P010 
playback: 1000/4 = 250
.max_downscale_factor = {
-   .argb = 167,
+   .argb = 250,
.nv12 = 167,
.fp16 = 167
},
@@ -1763,7 +1764,7 @@ static bool dcn314_resource_construct(
pool->base.underlay_pipe_index = NO_UNDERLAY_PIPE;
pool->base.pipe_count = pool->base.res_cap->num_timing_generator;
pool->base.mpcc_count = pool->base.res_cap->num_timing_generator;
-   dc->caps.max_downscale_ratio = 600;
+   dc->caps.max_downscale_ratio = 400;
dc->caps.i2c_speed_in_khz = 100;
dc->caps.i2c_speed_in_khz_hdcp = 100;
dc->caps.max_cursor_size = 256;
-- 
2.39.1



[PATCH 09/16] drm/amd/display: fix FCLK pstate change underflow

2023-01-25 Thread Alex Hung
From: Vladimir Stempen 

[Why]
Currently we set FCLK p-state change
watermark calculated based on dummy
p-state latency when UCLK p-state is
not supported

[How]
Calculate FCLK p-state change watermark
based on on FCLK pstate change latency
in case UCLK p-state is not supported

Reviewed-by: Nevenko Stupar 
Acked-by: Alex Hung 
Signed-off-by: Vladimir Stempen 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index 0dc1a03999b6..28e9f3644bf4 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -2126,6 +2126,10 @@ void dcn32_calculate_wm_and_dlg_fpu(struct dc *dc, 
struct dc_state *context,
 */
context->bw_ctx.bw.dcn.watermarks.a = 
context->bw_ctx.bw.dcn.watermarks.c;

context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.pstate_change_ns = 0;
+   /* Calculate FCLK p-state change watermark based on FCLK pstate 
change latency in case
+* UCLK p-state is not supported, to avoid underflow in case 
FCLK pstate is supported
+*/
+   
context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.fclk_pstate_change_ns = 
get_fclk_watermark(>bw_ctx.dml, pipes, pipe_cnt) * 1000;
} else {
/* Set A:
 * All clocks min.
-- 
2.39.1



[PATCH 08/16] drm/amd/display: Fix only one ABM pipe enabled under ODM combined case

2023-01-25 Thread Alex Hung
From: Leon Huang 

[Why]
ABM set pipe before updating ODM status,
it leads to incorrect ABM pipe setting when enabling ODM combine.

[How]
Call ABM set pipe flow after ODM status update in program pipe sequence.

Reviewed-by: Chun-Liang Chang 
Reviewed-by: Nicholas Kazlauskas 
Acked-by: Alex Hung 
Signed-off-by: Leon Huang 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index 916dceecd3de..cb8edb14603a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1777,6 +1777,15 @@ static void dcn20_program_pipe(
_ctx->stream->bit_depth_params,
_ctx->stream->clamping);
}
+
+   /* Set ABM pipe after other pipe configurations done */
+   if (pipe_ctx->plane_state->visible) {
+   if (pipe_ctx->stream_res.abm) {
+   dc->hwss.set_pipe(pipe_ctx);
+   
pipe_ctx->stream_res.abm->funcs->set_abm_level(pipe_ctx->stream_res.abm,
+   pipe_ctx->stream->abm_level);
+   }
+   }
 }
 
 void dcn20_program_front_end_for_ctx(
-- 
2.39.1



[PATCH 07/16] drm/amd/display: Add missing brackets in calculation

2023-01-25 Thread Alex Hung
From: Daniel Miess 

[Why]
Brackets missing in the calculation for MIN_DST_Y_NEXT_START

[How]
Add missing brackets for this calculation

Reviewed-by: Nicholas Kazlauskas 
Reviewed-by: Nicholas Kazlauskas 
Acked-by: Alex Hung 
Signed-off-by: Daniel Miess 
---
 .../gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
index 950669f2c10d..cb7c0c878423 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
@@ -3183,7 +3183,7 @@ static void 
DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman
} else {
v->MIN_DST_Y_NEXT_START[k] = v->VTotal[k] - 
v->VFrontPorch[k] + v->VTotal[k] - v->VActive[k] - v->VStartup[k];
}
-   v->MIN_DST_Y_NEXT_START[k] += dml_floor(4.0 * v->TSetup[k] / 
(double)v->HTotal[k] / v->PixelClock[k], 1.0) / 4.0;
+   v->MIN_DST_Y_NEXT_START[k] += dml_floor(4.0 * v->TSetup[k] / 
((double)v->HTotal[k] / v->PixelClock[k]), 1.0) / 4.0;
if (((v->VUpdateOffsetPix[k] + v->VUpdateWidthPix[k] + 
v->VReadyOffsetPix[k]) / v->HTotal[k])
<= (isInterlaceTiming ?
dml_floor((v->VTotal[k] - 
v->VActive[k] - v->VFrontPorch[k] - v->VStartup[k]) / 2.0, 1.0) :
-- 
2.39.1



[PATCH 06/16] drm/amd/display: Correct bw_params population

2023-01-25 Thread Alex Hung
From: Daniel Miess 

[Why]
Underflow observed during P010 video playback on
dcn314 due to incorrectly populated bw_params

[How]
Populate fclk, memclk and voltage in bw_params with
values from max pstate rather than min pstate

Reviewed-by: Nicholas Kazlauskas 
Acked-by: Alex Hung 
Signed-off-by: Daniel Miess 
---
 .../dc/clk_mgr/dcn314/dcn314_clk_mgr.c| 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
index 89df7244b272..f5276bacfa4e 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
@@ -572,10 +572,11 @@ static void 
dcn314_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *cl
 {
struct clk_bw_params *bw_params = clk_mgr->base.bw_params;
struct clk_limit_table_entry def_max = 
bw_params->clk_table.entries[bw_params->clk_table.num_entries - 1];
-   uint32_t max_pstate = 0,  max_fclk = 0,  min_pstate = 0, max_dispclk = 
0, max_dppclk = 0;
+   uint32_t max_pstate = 0, max_fclk = 0, max_dispclk = 0, max_dppclk = 0;
+   uint32_t min_pstate = 0, min_fclk = clock_table->DfPstateTable[0].FClk;
int i;
 
-   /* Find highest valid fclk pstate */
+   /* Find highest and lowest valid fclk pstate */
for (i = 0; i < clock_table->NumDfPstatesEnabled; i++) {
if (is_valid_clock_value(clock_table->DfPstateTable[i].FClk) &&
clock_table->DfPstateTable[i].FClk > max_fclk) {
@@ -584,6 +585,14 @@ static void 
dcn314_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *cl
}
}
 
+   for (i = 0; i < clock_table->NumDfPstatesEnabled; i++) {
+   if (is_valid_clock_value(clock_table->DfPstateTable[i].FClk) &&
+   clock_table->DfPstateTable[i].FClk < min_fclk) {
+   min_fclk = clock_table->DfPstateTable[i].FClk;
+   min_pstate = i;
+   }
+   }
+
/* We expect the table to contain at least one valid fclk entry. */
ASSERT(is_valid_clock_value(max_fclk));
 
@@ -599,15 +608,17 @@ static void 
dcn314_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *cl
 
/* Base the clock table on dcfclk, need at least one entry regardless 
of pmfw table */
for (i = 0; i < clock_table->NumDcfClkLevelsEnabled; i++) {
-   uint32_t min_fclk = clock_table->DfPstateTable[0].FClk;
+   uint32_t max_level_fclk = clock_table->DfPstateTable[0].FClk;
+   uint32_t max_level_pstate = 0;
int j;
 
+   /* Look for the maximum supported FCLK for the current voltage. 
*/
for (j = 1; j < clock_table->NumDfPstatesEnabled; j++) {
if 
(is_valid_clock_value(clock_table->DfPstateTable[j].FClk) &&
-   clock_table->DfPstateTable[j].FClk < min_fclk &&
+   clock_table->DfPstateTable[j].FClk > max_level_fclk 
&&
clock_table->DfPstateTable[j].Voltage <= 
clock_table->SocVoltage[i]) {
-   min_fclk = clock_table->DfPstateTable[j].FClk;
-   min_pstate = j;
+   max_level_fclk = 
clock_table->DfPstateTable[j].FClk;
+   max_level_pstate = j;
}
}
 
@@ -621,15 +632,15 @@ static void 
dcn314_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *cl
bw_params->clk_table.entries[i].dtbclk_mhz = 
bw_params->clk_table.entries[j].dtbclk_mhz;
 
/* Now update clocks we do read */
-   bw_params->clk_table.entries[i].fclk_mhz = min_fclk;
-   bw_params->clk_table.entries[i].memclk_mhz = 
clock_table->DfPstateTable[min_pstate].MemClk;
-   bw_params->clk_table.entries[i].voltage = 
clock_table->DfPstateTable[min_pstate].Voltage;
+   bw_params->clk_table.entries[i].fclk_mhz = max_level_fclk;
+   bw_params->clk_table.entries[i].memclk_mhz = 
clock_table->DfPstateTable[max_level_pstate].MemClk;
+   bw_params->clk_table.entries[i].voltage = 
clock_table->DfPstateTable[max_level_pstate].Voltage;
bw_params->clk_table.entries[i].dcfclk_mhz = 
clock_table->DcfClocks[i];
bw_params->clk_table.entries[i].socclk_mhz = 
clock_table->SocClocks[i];
bw_params->clk_table.entries[i].dispclk_mhz = max_dispclk;
bw_params->clk_table.entries[i].dppclk_mhz = max_dppclk;
bw_params->clk_table.entries[i].wck_ratio = convert_wck_ratio(
-   clock_table->DfPstateTable[min_pstate].WckRatio);
+   clock_table->DfPstateTable[max_level_pstate].WckRatio);
}
 

[PATCH 05/16] drm/amd/display: Fix Z8 support configurations

2023-01-25 Thread Alex Hung
From: Nicholas Kazlauskas 

[Why]
It's not supported in multi-display, but it is supported in 2nd eDP
screen only.

[How]
Remove multi display support, restrict number of planes for all
z-states support, but still allow Z8 if we're not using PWRSEQ0.

Reviewed-by: Charlene Liu 
Acked-by: Alex Hung 
Signed-off-by: Nicholas Kazlauskas 
---
 .../gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c   | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index 197df404761a..d3ba65efe1d2 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -949,7 +949,6 @@ static enum dcn_zstate_support_state  
decide_zstate_support(struct dc *dc, struc
int plane_count;
int i;
unsigned int optimized_min_dst_y_next_start_us;
-   bool allow_z8 = context->bw_ctx.dml.vba.StutterPeriod > 1000.0;
 
plane_count = 0;
optimized_min_dst_y_next_start_us = 0;
@@ -974,6 +973,8 @@ static enum dcn_zstate_support_state  
decide_zstate_support(struct dc *dc, struc
else if (context->stream_count == 1 &&  context->streams[0]->signal == 
SIGNAL_TYPE_EDP) {
struct dc_link *link = context->streams[0]->sink->link;
struct dc_stream_status *stream_status = 
>stream_status[0];
+   bool allow_z8 = context->bw_ctx.dml.vba.StutterPeriod > 1000.0;
+   bool is_pwrseq0 = link->link_index == 0;
 
if (dc_extended_blank_supported(dc)) {
for (i = 0; i < dc->res_pool->pipe_count; i++) {
@@ -986,18 +987,17 @@ static enum dcn_zstate_support_state  
decide_zstate_support(struct dc *dc, struc
}
}
}
-   /* zstate only supported on PWRSEQ0  and when there's <2 
planes*/
-   if (link->link_index != 0 || stream_status->plane_count > 1)
+
+   /* Don't support multi-plane configurations */
+   if (stream_status->plane_count > 1)
return DCN_ZSTATE_SUPPORT_DISALLOW;
 
-   if (context->bw_ctx.dml.vba.StutterPeriod > 5000.0 || 
optimized_min_dst_y_next_start_us > 5000)
+   if (is_pwrseq0 && (context->bw_ctx.dml.vba.StutterPeriod > 
5000.0 || optimized_min_dst_y_next_start_us > 5000))
return DCN_ZSTATE_SUPPORT_ALLOW;
-   else if (link->psr_settings.psr_version == DC_PSR_VERSION_1 && 
!link->panel_config.psr.disable_psr)
+   else if (is_pwrseq0 && link->psr_settings.psr_version == 
DC_PSR_VERSION_1 && !link->panel_config.psr.disable_psr)
return allow_z8 ? DCN_ZSTATE_SUPPORT_ALLOW_Z8_Z10_ONLY 
: DCN_ZSTATE_SUPPORT_ALLOW_Z10_ONLY;
else
return allow_z8 ? DCN_ZSTATE_SUPPORT_ALLOW_Z8_ONLY : 
DCN_ZSTATE_SUPPORT_DISALLOW;
-   } else if (allow_z8) {
-   return DCN_ZSTATE_SUPPORT_ALLOW_Z8_ONLY;
} else {
return DCN_ZSTATE_SUPPORT_DISALLOW;
}
-- 
2.39.1



[PATCH 04/16] drm/amd/display: Add Debug Log for MST and PCON

2023-01-25 Thread Alex Hung
From: Fangzhi Zuo 

Add log for MST/PCON specific use case:
1. If DP1.2 hub where gives reduced link bw and no dsc support.
2. If less than 4-lane configuration where gives reduced bw.
3. If FRL PCON enabled for asic.
4. Track MST sink count.

Reviewed-by: Hersen Wu 
Acked-by: Alex Hung 
Signed-off-by: Fangzhi Zuo 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  3 +++
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c| 12 +++-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c  | 12 
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 763bc92385da..ceeab2cd8569 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1622,6 +1622,9 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
/* TODO: Remove after DP2 receiver gets proper support of Cable ID 
feature */
adev->dm.dc->debug.ignore_cable_id = true;
 
+   if (adev->dm.dc->caps.dp_hdmi21_pcon_support)
+   DRM_INFO("DP-HDMI FRL PCON supported\n");
+
r = dm_dmub_hw_init(adev);
if (r) {
DRM_ERROR("DMUB interface failed to initialize: status=%d\n", 
r);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index 5c733d445fe9..c6794196a11d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -403,6 +403,7 @@ bool dm_helpers_dp_mst_start_top_mgr(
bool boot)
 {
struct amdgpu_dm_connector *aconnector = link->priv;
+   int ret;
 
if (!aconnector) {
DRM_ERROR("Failed to find connector for link!");
@@ -418,7 +419,16 @@ bool dm_helpers_dp_mst_start_top_mgr(
DRM_INFO("DM_MST: starting TM on aconnector: %p [id: %d]\n",
aconnector, aconnector->base.base.id);
 
-   return (drm_dp_mst_topology_mgr_set_mst(>mst_mgr, true) == 
0);
+   ret = drm_dp_mst_topology_mgr_set_mst(>mst_mgr, true);
+   if (ret < 0) {
+   DRM_ERROR("DM_MST: Failed to set the device into MST mode!");
+   return false;
+   }
+
+   DRM_INFO("DM_MST: DP%x, %d-lane link detected\n", 
aconnector->mst_mgr.dpcd[0],
+   aconnector->mst_mgr.dpcd[2] & DP_MAX_LANE_COUNT_MASK);
+
+   return true;
 }
 
 bool dm_helpers_dp_mst_stop_top_mgr(
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 0bff2cc20b02..33f53cae939d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -177,6 +177,9 @@ amdgpu_dm_mst_connector_early_unregister(struct 
drm_connector *connector)
if (dc_link->sink_count)
dc_link_remove_remote_sink(dc_link, dc_sink);
 
+   DC_LOG_MST("DM_MST: remove remote sink 0x%p, %d remaining\n",
+   dc_sink, dc_link->sink_count);
+
dc_sink_release(dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
@@ -308,6 +311,9 @@ static int dm_dp_mst_get_modes(struct drm_connector 
*connector)
return 0;
}
 
+   DC_LOG_MST("DM_MST: add remote sink 0x%p, %d 
remaining\n",
+   dc_sink, 
aconnector->dc_link->sink_count);
+
dc_sink->priv = aconnector;
aconnector->dc_sink = dc_sink;
}
@@ -341,6 +347,9 @@ static int dm_dp_mst_get_modes(struct drm_connector 
*connector)
return 0;
}
 
+   DC_LOG_MST("DM_MST: add remote sink 0x%p, %d remaining\n",
+   dc_sink, aconnector->dc_link->sink_count);
+
dc_sink->priv = aconnector;
/* dc_link_add_remote_sink returns a new reference */
aconnector->dc_sink = dc_sink;
@@ -458,6 +467,9 @@ dm_dp_mst_detect(struct drm_connector *connector,
if (aconnector->dc_link->sink_count)
dc_link_remove_remote_sink(aconnector->dc_link, 
aconnector->dc_sink);
 
+   DC_LOG_MST("DM_MST: remove remote sink 0x%p, %d remaining\n",
+   aconnector->dc_link, aconnector->dc_link->sink_count);
+
dc_sink_release(aconnector->dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
-- 
2.39.1



[PATCH 03/16] drm/amd/display: fix MALL size hardcoded for DCN321

2023-01-25 Thread Alex Hung
From: Samson Tam 

[Why]
MALL size available can vary for different SKUs
MALL size was still hardcoded for DCN321

[How]
Remove hardcoding MALL size for DCN321

Reviewed-by: Alvin Lee 
Acked-by: Alex Hung 
Signed-off-by: Samson Tam 
---
 drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
index fd57e0167737..55f918b44077 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
@@ -1714,7 +1714,6 @@ static bool dcn321_resource_construct(
dc->caps.mall_size_per_mem_channel * 1024 * 1024;
dc->caps.mall_size_total = dc->caps.max_cab_allocation_bytes;
 
-   dc->caps.max_cab_allocation_bytes = 33554432; // 32MB = 1024 * 1024 * 32
dc->caps.subvp_fw_processing_delay_us = 15;
dc->caps.subvp_drr_max_vblank_margin_us = 40;
dc->caps.subvp_prefetch_end_to_mall_start_us = 15;
-- 
2.39.1



[PATCH 02/16] drm/amd/display: add rc_params_override option in dc_dsc_config

2023-01-25 Thread Alex Hung
From: Wenjing Liu 

[why]
Current RC params are based on VESA recommended configurations.
Some DSC sink may prefer non standard rc params values due to
hardware limitations. To support those DSC sink we will allow DM to
optionally pass rc_params_ovrd in dc_dsc_config so DC will override
the default VESA recommended configurations.

Reviewed-by: Martin Leung 
Acked-by: Alex Hung 
Signed-off-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h  | 24 +
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c  | 36 ++-
 .../gpu/drm/amd/display/dc/dsc/dscc_types.h   |  5 ++-
 .../gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c  | 10 +++---
 4 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
index 848db8676adf..cc3d6fb39364 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -797,6 +797,29 @@ enum dc_timing_3d_format {
TIMING_3D_FORMAT_MAX,
 };
 
+#define DC_DSC_QP_SET_SIZE 15
+#define DC_DSC_RC_BUF_THRESH_SIZE 14
+struct dc_dsc_rc_params_override {
+   int32_t rc_model_size;
+   int32_t rc_buf_thresh[DC_DSC_RC_BUF_THRESH_SIZE];
+   int32_t rc_minqp[DC_DSC_QP_SET_SIZE];
+   int32_t rc_maxqp[DC_DSC_QP_SET_SIZE];
+   int32_t rc_offset[DC_DSC_QP_SET_SIZE];
+
+   int32_t rc_tgt_offset_hi;
+   int32_t rc_tgt_offset_lo;
+   int32_t rc_edge_factor;
+   int32_t rc_quant_incr_limit0;
+   int32_t rc_quant_incr_limit1;
+
+   int32_t initial_fullness_offset;
+   int32_t initial_delay;
+
+   int32_t flatness_min_qp;
+   int32_t flatness_max_qp;
+   int32_t flatness_det_thresh;
+};
+
 struct dc_dsc_config {
uint32_t num_slices_h; /* Number of DSC slices - horizontal */
uint32_t num_slices_v; /* Number of DSC slices - vertical */
@@ -811,6 +834,7 @@ struct dc_dsc_config {
 #endif
bool is_dp; /* indicate if DSC is applied based on DP's capability */
uint32_t mst_pbn; /* pbn of display on dsc mst hub */
+   const struct dc_dsc_rc_params_override *rc_params_ovrd; /* DM owned 
memory. If not NULL, apply custom dsc rc params */
 };
 
 /**
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c
index c08c01e05dcf..42344aec60d6 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c
@@ -28,6 +28,7 @@
 #include "reg_helper.h"
 #include "dcn20_dsc.h"
 #include "dsc/dscc_types.h"
+#include "dsc/rc_calc.h"
 
 static void dsc_log_pps(struct display_stream_compressor *dsc, struct 
drm_dsc_config *pps);
 static bool dsc_prepare_config(const struct dsc_config *dsc_cfg, struct 
dsc_reg_values *dsc_reg_vals,
@@ -344,10 +345,38 @@ static void dsc_log_pps(struct display_stream_compressor 
*dsc, struct drm_dsc_co
}
 }
 
+static void dsc_override_rc_params(struct rc_params *rc, const struct 
dc_dsc_rc_params_override *override)
+{
+   uint8_t i;
+
+   rc->rc_model_size = override->rc_model_size;
+   for (i = 0; i < DC_DSC_RC_BUF_THRESH_SIZE; i++)
+   rc->rc_buf_thresh[i] = override->rc_buf_thresh[i];
+   for (i = 0; i < DC_DSC_QP_SET_SIZE; i++) {
+   rc->qp_min[i] = override->rc_minqp[i];
+   rc->qp_max[i] = override->rc_maxqp[i];
+   rc->ofs[i] = override->rc_offset[i];
+   }
+
+   rc->rc_tgt_offset_hi = override->rc_tgt_offset_hi;
+   rc->rc_tgt_offset_lo = override->rc_tgt_offset_lo;
+   rc->rc_edge_factor = override->rc_edge_factor;
+   rc->rc_quant_incr_limit0 = override->rc_quant_incr_limit0;
+   rc->rc_quant_incr_limit1 = override->rc_quant_incr_limit1;
+
+   rc->initial_fullness_offset = override->initial_fullness_offset;
+   rc->initial_xmit_delay = override->initial_delay;
+
+   rc->flatness_min_qp = override->flatness_min_qp;
+   rc->flatness_max_qp = override->flatness_max_qp;
+   rc->flatness_det_thresh = override->flatness_det_thresh;
+}
+
 static bool dsc_prepare_config(const struct dsc_config *dsc_cfg, struct 
dsc_reg_values *dsc_reg_vals,
struct dsc_optc_config *dsc_optc_cfg)
 {
struct dsc_parameters dsc_params;
+   struct rc_params rc;
 
/* Validate input parameters */
ASSERT(dsc_cfg->dc_dsc_cfg.num_slices_h);
@@ -412,7 +441,12 @@ static bool dsc_prepare_config(const struct dsc_config 
*dsc_cfg, struct dsc_reg_
dsc_reg_vals->pps.native_420 = (dsc_reg_vals->pixel_format == 
DSC_PIXFMT_NATIVE_YCBCR420);
dsc_reg_vals->pps.simple_422 = (dsc_reg_vals->pixel_format == 
DSC_PIXFMT_SIMPLE_YCBCR422);
 
-   if (dscc_compute_dsc_parameters(_reg_vals->pps, _params)) {
+   calc_rc_params(, _reg_vals->pps);
+
+   if (dsc_cfg->dc_dsc_cfg.rc_params_ovrd)
+   dsc_override_rc_params(, dsc_cfg->dc_dsc_cfg.rc_params_ovrd);
+
+   if 

[PATCH 01/16] drm/amd/display: Enable Freesync over PCon

2023-01-25 Thread Alex Hung
From: Sung Joon Kim 

[why]
Enable Freesync over PCon on Linux environment.

[how]
Adding Freesync over PCon support in amdgpu_dm
- Read DPCD for Freesync over PCon capabilitiy
- Add whitelist for compatible branch devices

Reviewed-by: Chao-kai Wang 
Acked-by: Alex Hung 
Signed-off-by: Sung Joon Kim 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 45 -
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 67 ++-
 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 33 +
 drivers/gpu/drm/amd/display/dc/dm_helpers.h   |  1 +
 .../amd/display/include/ddc_service_types.h   |  1 +
 .../amd/display/modules/inc/mod_info_packet.h |  4 +-
 .../display/modules/info_packet/info_packet.c |  4 +-
 7 files changed, 118 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 86a2f7f58550..763bc92385da 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -106,7 +106,6 @@
 
 #include "modules/inc/mod_freesync.h"
 #include "modules/power/power_helpers.h"
-#include "modules/inc/mod_info_packet.h"
 
 #define FIRMWARE_RENOIR_DMUB "amdgpu/renoir_dmcub.bin"
 MODULE_FIRMWARE(FIRMWARE_RENOIR_DMUB);
@@ -7115,6 +7114,9 @@ void amdgpu_dm_connector_init_helper(struct 
amdgpu_display_manager *dm,
aconnector->base.dpms = DRM_MODE_DPMS_OFF;
aconnector->hpd.hpd = AMDGPU_HPD_NONE; /* not used */
aconnector->audio_inst = -1;
+   aconnector->pack_sdp_v1_3 = false;
+   aconnector->as_type = ADAPTIVE_SYNC_TYPE_NONE;
+   memset(>vsdb_info, 0, sizeof(aconnector->vsdb_info));
mutex_init(>hpd_lock);
 
/*
@@ -7605,6 +7607,8 @@ static void update_freesync_state_on_stream(
struct amdgpu_crtc *acrtc = to_amdgpu_crtc(new_crtc_state->base.crtc);
unsigned long flags;
bool pack_sdp_v1_3 = false;
+   struct amdgpu_dm_connector *aconn;
+   enum vrr_packet_type packet_type = PACKET_TYPE_VRR;
 
if (!new_stream)
return;
@@ -7640,11 +7644,27 @@ static void update_freesync_state_on_stream(
}
}
 
+   aconn = (struct amdgpu_dm_connector *)new_stream->dm_stream_context;
+
+   if (aconn && aconn->as_type == FREESYNC_TYPE_PCON_IN_WHITELIST) {
+   pack_sdp_v1_3 = aconn->pack_sdp_v1_3;
+
+   if (aconn->vsdb_info.amd_vsdb_version == 1)
+   packet_type = PACKET_TYPE_FS_V1;
+   else if (aconn->vsdb_info.amd_vsdb_version == 2)
+   packet_type = PACKET_TYPE_FS_V2;
+   else if (aconn->vsdb_info.amd_vsdb_version == 3)
+   packet_type = PACKET_TYPE_FS_V3;
+
+   mod_build_adaptive_sync_infopacket(new_stream, aconn->as_type, 
NULL,
+   _stream->adaptive_sync_infopacket);
+   }
+
mod_freesync_build_vrr_infopacket(
dm->freesync_module,
new_stream,
_params,
-   PACKET_TYPE_VRR,
+   packet_type,
TRANSFER_FUNC_UNKNOWN,
_infopacket,
pack_sdp_v1_3);
@@ -10313,6 +10333,7 @@ void amdgpu_dm_update_freesync_caps(struct 
drm_connector *connector,
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_hdmi_vsdb_info vsdb_info = {0};
bool freesync_capable = false;
+   enum adaptive_sync_type as_type = ADAPTIVE_SYNC_TYPE_NONE;
 
if (!connector->state) {
DRM_ERROR("%s - Connector has no state", __func__);
@@ -10405,6 +10426,26 @@ void amdgpu_dm_update_freesync_caps(struct 
drm_connector *connector,
}
}
 
+   as_type = 
dm_get_adaptive_sync_support_type(amdgpu_dm_connector->dc_link);
+
+   if (as_type == FREESYNC_TYPE_PCON_IN_WHITELIST) {
+   i = parse_hdmi_amd_vsdb(amdgpu_dm_connector, edid, _info);
+   if (i >= 0 && vsdb_info.freesync_supported && 
vsdb_info.amd_vsdb_version > 0) {
+
+   amdgpu_dm_connector->pack_sdp_v1_3 = true;
+   amdgpu_dm_connector->as_type = as_type;
+   amdgpu_dm_connector->vsdb_info = vsdb_info;
+
+   amdgpu_dm_connector->min_vfreq = 
vsdb_info.min_refresh_rate_hz;
+   amdgpu_dm_connector->max_vfreq = 
vsdb_info.max_refresh_rate_hz;
+   if (amdgpu_dm_connector->max_vfreq - 
amdgpu_dm_connector->min_vfreq > 10)
+   freesync_capable = true;
+
+   connector->display_info.monitor_range.min_vfreq = 
vsdb_info.min_refresh_rate_hz;
+   connector->display_info.monitor_range.max_vfreq = 
vsdb_info.max_refresh_rate_hz;
+   }
+   }
+
 update:
if (dm_con_state)
dm_con_state->freesync_capable = freesync_capable;
diff --git 

[PATCH 00/16] DC Patches Jan 25, 2023

2023-01-25 Thread Alex Hung
This DC patchset brings improvements in multiple areas. In summary, we 
highlight:

- Fix linux dp link lost handled only one time
- Reset DMUB mailbox SW state after HW reset
- Unassign does_plane_fit_in_mall function from dcn3.2
- Add Function delaration in dc_link
- Fix crash when connecting 2 displays with video playback
- Adjust downscaling limits for dcn314
- Fix FCLK pstate change underflow
- Fix only one ABM pipe enabled under ODM combined case
- Add missing brackets in calculation
- Correct bw_params population
- Fix Z8 support configurations
- Add Debug Log for MST and PCON
- Fix MALL size hardcoded for DCN321
- Add rc_params_override option in dc_dsc_config
- Enable Freesync over PCon

Cc: Daniel Wheeler 

Aric Cyr (2):
  drm/amd/display: Revert "avoid disable otg when dig was disabled"
  drm/amd/display: 3.2.221

Daniel Miess (3):
  drm/amd/display: Correct bw_params population
  drm/amd/display: Add missing brackets in calculation
  drm/amd/display: Adjust downscaling limits for dcn314

Fangzhi Zuo (1):
  drm/amd/display: Add Debug Log for MST and PCON

George Shen (1):
  drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2

Hersen Wu (1):
  drm/amd/display: fix linux dp link lost handled only one time

Leon Huang (1):
  drm/amd/display: Fix only one ABM pipe enabled under ODM combined case

Mustapha Ghaddar (1):
  drm/amd/display: Add Function delaration in dc_link

Nicholas Kazlauskas (2):
  drm/amd/display: Fix Z8 support configurations
  drm/amd/display: Reset DMUB mailbox SW state after HW reset

Samson Tam (1):
  drm/amd/display: fix MALL size hardcoded for DCN321

Sung Joon Kim (1):
  drm/amd/display: Enable Freesync over PCon

Vladimir Stempen (1):
  drm/amd/display: fix FCLK pstate change underflow

Wenjing Liu (1):
  drm/amd/display: add rc_params_override option in dc_dsc_config

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 72 +--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 67 +
 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 45 +++-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 12 
 .../dc/clk_mgr/dcn314/dcn314_clk_mgr.c| 31 +---
 .../dc/clk_mgr/dcn315/dcn315_clk_mgr.c| 26 ++-
 drivers/gpu/drm/amd/display/dc/dc.h   |  2 +-
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h  | 24 +++
 drivers/gpu/drm/amd/display/dc/dc_link.h  | 30 
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c  | 36 +-
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|  9 +++
 .../amd/display/dc/dcn314/dcn314_resource.c   |  5 +-
 .../gpu/drm/amd/display/dc/dcn32/dcn32_init.c |  2 +-
 .../amd/display/dc/dcn321/dcn321_resource.c   |  1 -
 drivers/gpu/drm/amd/display/dc/dm_helpers.h   |  1 +
 .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 14 ++--
 .../dc/dml/dcn314/display_mode_vba_314.c  |  2 +-
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |  4 ++
 .../gpu/drm/amd/display/dc/dsc/dscc_types.h   |  5 +-
 .../gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c  | 10 +--
 .../dc/link/protocols/link_dp_dpia_bw.h   | 24 ---
 .../gpu/drm/amd/display/dmub/src/dmub_srv.c   | 12 
 .../amd/display/include/ddc_service_types.h   |  1 +
 .../amd/display/modules/inc/mod_info_packet.h |  4 +-
 .../display/modules/info_packet/info_packet.c |  4 +-
 25 files changed, 328 insertions(+), 115 deletions(-)

-- 
2.39.1



Re: [PATCH 8/8] drm/amd/pm: drop the support for manual fan speed setting on SMU13.0.7

2023-01-25 Thread Matt Coffin
On Wed Jan 11, 2023 at 7:47 AM MST, Alex Deucher wrote:
> On Wed, Jan 11, 2023 at 8:23 AM Quan, Evan  wrote:
> >
> > [AMD Official Use Only - General]
> >
> > Regarding the manual fan speed setting issue targeted by this patch, the 
> > SCPM feature of the new SMU13 asics prevents us from toggling the fan 
> > control feature from auto to manual.

This makes sense as a move towards using the same interface that other
platforms are most likely using.

> > About the capability in the OD table you mentioned, it might be a different 
> > issue.

I included some info/questions below; any hints you could give to point
me in the right way to keep learning would be appreciated.

>
> Right.  Manual fan control is no longer possible.  As Evan said, you
> can adjust the automatic fan curve using the OD interface, but that is
> it.

Sorry for the late reply; I became busy with day job. I've been working
on implementing OD support (and a sysfs interface to set *any* OD
setting by number, in contrast with pp_od_clk_voltage's pidgeon-holing
into supporting only PP_OD_DPM_TABLE_COMMAND commands), at the very
least for my own experimentation.

The following is what I see when I read the OD table out from the SMU
(assuming that the inclusion of another VF curve setting at index 4 in
the header was a mistake, based on the values returned by the SMU).

It seems that, at least in my case, my hardware is running in some kind
of mode that would *not* allow changing of the fan curve? Is it possble
that the header information in pm/inc/smu_v13_0_0_pptable.h is incorrect
even beyond the potential idx 4 of ODSETTINGs?

It appears also that transferring the OD table *back* to the SMU results
in no error, but also no action taken, as subsequent reads to not
reflect any changes. I'm thinking this is due to some values read in on
the inital read of the table being invalid, but seemingly irrelevant
given what is reported by the capabilities (see: FAN_CURVE[*]).

Is there any hints you guys could offer in terms of

1. what might be mal-aligned or mis-labeled in the smu_v13 pptable
header above?
2. What pre-requisites I might be missing to allow the support for
ODCAP_FAN_CURVE?
3. Why the apparent values for some settings in the boot table seemingly
wildly invalid? Will those somehow become valid once pre-requisites for
OD operation are met?

I also feel like I've strayed from the original topic of the proposed
patch, and this probably belongs in it's own thread... but quite know
how to preserve any context there (sorry).

Thanks in advance for helping out an eager outsider,
Matt

Capabilities:
SMU_13_0_0_ODCAP_GFXCLK_LIMITS[0] true
SMU_13_0_0_ODCAP_GFXCLK_CURVE[1] true
SMU_13_0_0_ODCAP_UCLK_LIMITS[2] true
SMU_13_0_0_ODCAP_POWER_LIMIT[3] true
SMU_13_0_0_ODCAP_FAN_ACOUSTIC_LIMIT[4] true
SMU_13_0_0_ODCAP_FAN_SPEED_MIN[5] true
SMU_13_0_0_ODCAP_TEMPERATURE_FAN[6] true
SMU_13_0_0_ODCAP_TEMPERATURE_SYSTEM[7] true
SMU_13_0_0_ODCAP_MEMORY_TIMING_TUNE[8] true
SMU_13_0_0_ODCAP_FAN_ZERO_RPM_CONTROL[9] true
SMU_13_0_0_ODCAP_AUTO_UV_ENGINE[10] true
SMU_13_0_0_ODCAP_AUTO_OC_ENGINE[11] true
SMU_13_0_0_ODCAP_AUTO_OC_MEMORY[12] true
SMU_13_0_0_ODCAP_FAN_CURVE[13] false
SMU_13_0_0_ODCAP_AUTO_FAN_ACOUSTIC_LIMIT[14] true
SMU_13_0_0_ODCAP_POWER_MODE[15] false

Limits:
SMU_13_0_0_ODSETTING_GFXCLKFMAX[0] - [500,5000]
SMU_13_0_0_ODSETTING_GFXCLKFMIN[1] - [500,5000]
SMU_13_0_0_ODSETTING_CUSTOM_GFX_VF_CURVE_A[2] - [97,1500]
SMU_13_0_0_ODSETTING_CUSTOM_GFX_VF_CURVE_B[3] - [97,1500]
SMU_13_0_0_ODSETTING_CUSTOM_CURVE_VFT_FMIN[4] - [10,15]
SMU_13_0_0_ODSETTING_UCLKFMIN[5] - [500,3200]
SMU_13_0_0_ODSETTING_UCLKFMAX[6] - [500,3200]
SMU_13_0_0_ODSETTING_POWERPERCENTAGE[7] - [25,105]
SMU_13_0_0_ODSETTING_FANRPMMIN[8] - [50,110]
SMU_13_0_0_ODSETTING_FANRPMACOUSTICLIMIT[9] - [0,1]
SMU_13_0_0_ODSETTING_FANTARGETTEMPERATURE[10] - [0,1]
SMU_13_0_0_ODSETTING_OPERATINGTEMPMAX[11] - [0,1]
SMU_13_0_0_ODSETTING_ACTIMING[12] - [0,1]
SMU_13_0_0_ODSETTING_FAN_ZERO_RPM_CONTROL[13] - [0,1]
SMU_13_0_0_ODSETTING_AUTOUVENGINE[14] - [25,100]
SMU_13_0_0_ODSETTING_AUTOOCENGINE[15] - [23,100]
SMU_13_0_0_ODSETTING_AUTOOCMEMORY[16] - [25,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_TEMPERATURE_1[17] - [23,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_SPEED_1[18] - [25,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_TEMPERATURE_2[19] - [23,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_SPEED_2[20] - [25,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_TEMPERATURE_3[21] - [23,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_SPEED_3[22] - [25,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_TEMPERATURE_4[23] - [23,100]
SMU_13_0_0_ODSETTING_FAN_CURVE_SPEED_4[24] - [0,0]
SMU_13_0_0_ODSETTING_FAN_CURVE_TEMPERATURE_5[25] - [0,1]
SMU_13_0_0_ODSETTING_FAN_CURVE_SPEED_5[26] - [0,0]
SMU_13_0_0_ODSETTING_AUTO_FAN_ACOUSTIC_LIMIT[27] - [0,0]
SMU_13_0_0_ODSETTING_POWER_MODE[28] - [0,0]

Boot OD Table:
GFXFLK: [600, 2945]
UCLK: [97, 1249]
FAN_CURVE[0]: 0 @ 0
FAN_CURVE[1]: 0 @ 0
FAN_CURVE[2]: 0 @ 0
FAN_CURVE[3]: 0 @ 0
FAN_CURVE[4]: 0 @ 0
FAN_CURVE[5]: 0 @ 0
FAN_MIN_PWM: 35
FAN_TARGET_TEMP: 

[PULL] amdgpu, drm drm-fixes-6.2

2023-01-25 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 6.2.  This contains a fix for DP MST that avoids the big revert.
There are still some corner cases, but it fixes things for most users.

The following changes since commit 3f30a6e67ce49c0068f8058893326db46b6db11f:

  Merge tag 'amd-drm-fixes-6.2-2023-01-19' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes (2023-01-20 11:21:20 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-6.2-2023-01-25

for you to fetch changes up to 4b069553246f993c4221e382d0d0ae34f5ba730e:

  drm/amd/display: Fix timing not changning when freesync video is enabled 
(2023-01-25 14:50:18 -0500)


amd-drm-fixes-6.2-2023-01-25:

amdgpu:
- GC11.x fixes
- SMU13.0.0 fix
- Freesync video fix
- DP MST fixes

drm:
- DP MST kref fix


Aurabindo Pillai (1):
  drm/amd/display: Fix timing not changning when freesync video is enabled

Evan Quan (1):
  drm/amd/pm: add missing AllowIHInterrupt message mapping for SMU13.0.0

Jonathan Kim (1):
  drm/amdgpu: remove unconditional trap enable on add gfx11 queues

Li Ma (2):
  drm/amdgpu: enable imu firmware for GC 11.0.4
  drm/amdgpu: declare firmware for new MES 11.0.4

Lyude Paul (1):
  drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count assignments

Wayne Lin (3):
  drm/amdgpu/display/mst: limit payload to be updated one by one
  drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPD
  drm/display/dp_mst: Correct the kref of port.

 drivers/gpu/drm/amd/amdgpu/imu_v11_0.c |  1 +
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c |  3 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 31 +
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c  | 51 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c|  5 ---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c  | 14 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c   |  1 +
 drivers/gpu/drm/display/drm_dp_mst_topology.c  |  4 +-
 8 files changed, 89 insertions(+), 21 deletions(-)


Re: [PATCH v3 04/10] drm/fbdev-generic: Initialize fb-helper structure in generic setup

2023-01-25 Thread Sam Ravnborg
Hi Thomas,

On Wed, Jan 25, 2023 at 09:04:09PM +0100, Thomas Zimmermann wrote:
> Initialize the fb-helper structure immediately after its allocation
> in drm_fbdev_generic_setup(). That will make it easier to fill it with
> driver-specific values, such as the preferred BPP.
> 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Javier Martinez Canillas 
> ---
>  drivers/gpu/drm/drm_fbdev_generic.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
> b/drivers/gpu/drm/drm_fbdev_generic.c
> index 135d58b8007b..63f66325a8a5 100644
> --- a/drivers/gpu/drm/drm_fbdev_generic.c
> +++ b/drivers/gpu/drm/drm_fbdev_generic.c
> @@ -385,8 +385,6 @@ static int drm_fbdev_client_hotplug(struct drm_client_dev 
> *client)
>   if (dev->fb_helper)
>   return drm_fb_helper_hotplug_event(dev->fb_helper);
>  
> - drm_fb_helper_prepare(dev, fb_helper, _fb_helper_generic_funcs);
> -
>   ret = drm_fb_helper_init(dev, fb_helper);
>   if (ret)
>   goto err;

>From the documentation:
The drm_fb_helper_prepare()
helper must be called first to initialize the minimum required to make
hotplug detection work.
...
To finish up the fbdev helper initialization, the
drm_fb_helper_init() function is called.

So this change do not follow the documentation as drm_fb_helper_init()
is now called before drm_fb_helper_prepare()

I did not follow all the code - but my gut feeling is that the
documentation is right.

Sam


> @@ -456,12 +454,12 @@ void drm_fbdev_generic_setup(struct drm_device *dev,
>   fb_helper = kzalloc(sizeof(*fb_helper), GFP_KERNEL);
>   if (!fb_helper)
>   return;
> + drm_fb_helper_prepare(dev, fb_helper, _fb_helper_generic_funcs);
>  
>   ret = drm_client_init(dev, _helper->client, "fbdev", 
> _fbdev_client_funcs);
>   if (ret) {
> - kfree(fb_helper);
>   drm_err(dev, "Failed to register client: %d\n", ret);
> - return;
> + goto err_drm_client_init;
>   }
>  
>   /*
> @@ -484,5 +482,12 @@ void drm_fbdev_generic_setup(struct drm_device *dev,
>   drm_dbg_kms(dev, "client hotplug ret=%d\n", ret);
>  
>   drm_client_register(_helper->client);
> +
> + return;
> +
> +err_drm_client_init:
> + drm_fb_helper_unprepare(fb_helper);
> + kfree(fb_helper);
> + return;
>  }
>  EXPORT_SYMBOL(drm_fbdev_generic_setup);
> -- 
> 2.39.0


Re: [PATCH v3 02/10] drm/client: Add hotplug_failed flag

2023-01-25 Thread Sam Ravnborg
Hi Thomas,

On Wed, Jan 25, 2023 at 09:04:07PM +0100, Thomas Zimmermann wrote:
> Signal failed hotplugging with a flag in struct drm_client_dev. If set,
> the client helpers will not further try to set up the fbdev display.
> 
> This used to be signalled with a combination of cleared pointers in
> struct drm_fb_helper,
I failed to find where we clear the pointers. What do I miss?
(I had assumed we would stop clearing the pointers after this change).

Sam

which prevents us from initializing these pointers
> early after allocation.
> 
> The change also harmonizes behavior among DRM clients. Additional DRM
> clients will now handle failed hotplugging like fbdev does.
> 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Javier Martinez Canillas 
> ---
>  drivers/gpu/drm/drm_client.c| 5 +
>  drivers/gpu/drm/drm_fbdev_generic.c | 4 
>  include/drm/drm_client.h| 8 
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
> index 09ac191c202d..009e7b10455c 100644
> --- a/drivers/gpu/drm/drm_client.c
> +++ b/drivers/gpu/drm/drm_client.c
> @@ -208,8 +208,13 @@ void drm_client_dev_hotplug(struct drm_device *dev)
>   if (!client->funcs || !client->funcs->hotplug)
>   continue;
>  
> + if (client->hotplug_failed)
> + continue;
> +
>   ret = client->funcs->hotplug(client);
>   drm_dbg_kms(dev, "%s: ret=%d\n", client->name, ret);
> + if (ret)
> + client->hotplug_failed = true;
>   }
>   mutex_unlock(>clientlist_mutex);
>  }
> diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
> b/drivers/gpu/drm/drm_fbdev_generic.c
> index 3d455a2e3fb5..135d58b8007b 100644
> --- a/drivers/gpu/drm/drm_fbdev_generic.c
> +++ b/drivers/gpu/drm/drm_fbdev_generic.c
> @@ -382,10 +382,6 @@ static int drm_fbdev_client_hotplug(struct 
> drm_client_dev *client)
>   struct drm_device *dev = client->dev;
>   int ret;
>  
> - /* Setup is not retried if it has failed */
> - if (!fb_helper->dev && fb_helper->funcs)
> - return 0;
> -
>   if (dev->fb_helper)
>   return drm_fb_helper_hotplug_event(dev->fb_helper);
>  
> diff --git a/include/drm/drm_client.h b/include/drm/drm_client.h
> index 4fc8018eddda..39482527a775 100644
> --- a/include/drm/drm_client.h
> +++ b/include/drm/drm_client.h
> @@ -106,6 +106,14 @@ struct drm_client_dev {
>* @modesets: CRTC configurations
>*/
>   struct drm_mode_set *modesets;
> +
> + /**
> +  * @hotplug failed:
> +  *
> +  * Set by client hotplug helpers if the hotplugging failed
> +  * before. It is usually not tried again.
> +  */
> + bool hotplug_failed;
>  };
>  
>  int drm_client_init(struct drm_device *dev, struct drm_client_dev *client,
> -- 
> 2.39.0


Re: [PATCH v3 01/10] drm/client: Test for connectors before sending hotplug event

2023-01-25 Thread Sam Ravnborg
Hi Thomas,

On Wed, Jan 25, 2023 at 09:04:06PM +0100, Thomas Zimmermann wrote:
> Test for connectors in the client code and remove a similar test
> from the generic fbdev emulation. Do nothing if the test fails.
> Not having connectors indicates a driver bug.
> 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Javier Martinez Canillas 
> ---
>  drivers/gpu/drm/drm_client.c| 5 +
>  drivers/gpu/drm/drm_fbdev_generic.c | 5 -
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
> index 262ec64d4397..09ac191c202d 100644
> --- a/drivers/gpu/drm/drm_client.c
> +++ b/drivers/gpu/drm/drm_client.c
> @@ -198,6 +198,11 @@ void drm_client_dev_hotplug(struct drm_device *dev)
>   if (!drm_core_check_feature(dev, DRIVER_MODESET))
>   return;
>  
> + if (!dev->mode_config.num_connector) {
> + drm_dbg_kms(dev, "No connectors found, will not send hotplug 
> events!\n");
> + return;
This deserves a more visible logging - if a driver fails here it would
be good to spot it in the normal kernel log.
drm_info or drm_notice?

The original code had this on the debug level, but when moving the log
level could also be updated.

Sam

> + }
> +
>   mutex_lock(>clientlist_mutex);
>   list_for_each_entry(client, >clientlist, list) {
>   if (!client->funcs || !client->funcs->hotplug)
> diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
> b/drivers/gpu/drm/drm_fbdev_generic.c
> index 0a4c160e0e58..3d455a2e3fb5 100644
> --- a/drivers/gpu/drm/drm_fbdev_generic.c
> +++ b/drivers/gpu/drm/drm_fbdev_generic.c
> @@ -389,11 +389,6 @@ static int drm_fbdev_client_hotplug(struct 
> drm_client_dev *client)
>   if (dev->fb_helper)
>   return drm_fb_helper_hotplug_event(dev->fb_helper);
>  
> - if (!dev->mode_config.num_connector) {
> - drm_dbg_kms(dev, "No connectors found, will not create 
> framebuffer!\n");
> - return 0;
> - }
> -
>   drm_fb_helper_prepare(dev, fb_helper, _fb_helper_generic_funcs);
>  
>   ret = drm_fb_helper_init(dev, fb_helper);
> -- 
> 2.39.0


[PATCH v3 19/19] jump_label: RFC / temporary for CI - tolerate toggled state

2023-01-25 Thread Jim Cromie
__jump_label_patch currently will "crash the box" if it finds a
jump_entry not as expected; it makes no allowances for the well-formed
but incorrect "toggled" state.  This patch changes panic-on-toggled
into a warning, allowing to reduce the problem to a repeatable script.

note: this patch is arch/x86 only, so might not help CI at all.

submod () {
# set drm.debug analogues
echo  MP test_dynamic_debug p_disjoint_bits=${1:-0} p_level_num=${2:-0}
modprobe test_dynamic_debug p_disjoint_bits=${1:-0} p_level_num=${2:-0} 
dyndbg=+p
# _submod should pick up kparams
echo  MP test_dynamic_debug_submod dyndbg=+pmf
modprobe test_dynamic_debug_submod dyndbg=+pmf
}
unmod () {
rmmod test_dynamic_debug_submod
rmmod test_dynamic_debug
}
note () {
echo NOTE: $* >&2
sleep 0.1
}

submod_probe () {

echo 4 > /sys/module/dynamic_debug/parameters/verbose
unmod
submod $*

note submod prdbgs are supposedly enabled
grep test_dynamic_debug /proc/dynamic_debug/control
cat /sys/module/test_dynamic_debug/parameters/do_prints

note but they dont print here
cat /sys/module/test_dynamic_debug_submod/parameters/do_prints

note if D2_CORE in $1, trigger toggled warning
note echo class D2_CORE -p
echo class D2_CORE -p > /proc/dynamic_debug/control
}

heres the repeatable results

[   17.023758] virtme-init: triggering udev coldplug
[   18.285949] virtme-init: waiting for udev to settle
[   22.550866] i2c_piix4: module verification failed: signature and/or required 
key missing - tainting kernel
[   22.551945] dyndbg: add-module: i2c_piix4 12 sites 0.0
[   22.552277] dyndbg:  12 debug prints in module i2c_piix4
[   22.555099] piix4_smbus :00:01.3: SMBus Host Controller at 0x700, 
revision 0
[   22.597344] dyndbg: add-module: serio_raw 2 sites 0.0
[   22.597633] dyndbg:   2 debug prints in module serio_raw
[   22.603506] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   23.556657] dyndbg: add-module: intel_rapl_common 12 sites 0.0
[   23.557000] dyndbg:  12 debug prints in module intel_rapl_common
[   23.759499] dyndbg: add-module: intel_rapl_msr 2 sites 0.0
[   23.759928] dyndbg:   2 debug prints in module intel_rapl_msr
[   26.081050] virtme-init: udev is done
virtme-init: console is ttyS0
bash-5.2# . test-funcs.rc
:#> submod_probe 1 0
rmmod: ERROR: Module test_dynamic_debug_submod is not currently loaded
rmmod: ERROR: Module test_dynamic_debug is not currently loaded
MP test_dynamic_debug p_disjoint_bits=1 p_level_num=0 dyndbg=+pm
[   61.712445] dyndbg: add-module: test_dynamic_debug 33 sites 4.0
[   61.712789] dyndbg: classes[0..]: module:test_dynamic_debug base:22 len:8 
ty:3
[   61.713144] dyndbg: module:test_dynamic_debug attached 4 classes
[   61.713894] dyndbg:  33 debug prints in module test_dynamic_debug
[   61.715486] dyndbg: bits:0x1 > *.p_disjoint_bits
[   61.715732] dyndbg: apply bitmap: 0x1 to: 0x0 for '*'
[   61.715983] dyndbg: query 0: "class D2_CORE +p" mod:*
[   61.716253] dyndbg: split into words: "class" "D2_CORE" "+p"
[   61.716539] dyndbg: op='+' flags=0x1 maskp=0x
[   61.716794] dyndbg: parsed: func="" file="" module="" format="" lineno=0-0 
class=D2_CORE
[   61.717232] dyndbg: good-class: test_dynamic_debug.D2_CORE  
module:test_dynamic_debug nd:33 nc:4 nu:0
[   61.717690] dyndbg: processed 1 queries, with 1 matches, 0 errs
[   61.717982] dyndbg: bit_0: 1 matches on class: D2_CORE -> 0x1
[   61.718283] dyndbg: applied bitmap: 0x1 to: 0x0 for '*'
[   61.718542] dyndbg: p_disjoint_bits: total matches: 1
[   61.718799] dyndbg: lvl:0 bits:0x0 > p_level_num
[   61.719029] dyndbg: p_level_num: total matches: 0
[   61.719279] dyndbg: module: test_dynamic_debug dyndbg="+pm"
[   61.719554] dyndbg: query 0: "+pm" mod:test_dynamic_debug
[   61.719824] dyndbg: split into words: "+pm"
[   61.720034] dyndbg: op='+' flags=0x3 maskp=0x
[   61.720299] dyndbg: parsed: func="" file="" module="test_dynamic_debug" 
format="" lineno=0-0 class=(null)
[   61.720786] dyndbg: changed lib/test_dynamic_debug.c:206 
[test_dynamic_debug]test_dynamic_debug_exit p => pm
[   61.721289] dyndbg: changed lib/test_dynamic_debug.c:200 
[test_dynamic_debug]test_dynamic_debug_init p => pm
[   61.721778] dyndbg: changed lib/test_dynamic_debug.c:198 
[test_dynamic_debug]test_dynamic_debug_init p => pm
[   61.722283] dyndbg: changed lib/test_dynamic_debug.c:191 
[test_dynamic_debug]do_prints p => pm
[   61.722711] dyndbg: changed lib/test_dynamic_debug.c:170 
[test_dynamic_debug]do_levels p => pm
[   61.723128] dyndbg: changed lib/test_dynamic_debug.c:150 
[test_dynamic_debug]do_cats p => pm
[   61.723554] dyndbg: processed 1 queries, with 6 matches, 0 errs
[   61.725233] test_dynamic_debug: test_dd: init start
[   61.725487] test_dynamic_debug: test_dd: do_prints:
[   61.725745] test_dynamic_debug: test_dd: doing categories
[   61.726011] test_dd: LOW msg
[   61.726176] test_dd: MID msg
[   61.726328] test_dd: HI msg
[   61.726470] test_dd: 

[PATCH v3 15/19] test-dyndbg: build test_dynamic_debug_submod

2023-01-25 Thread Jim Cromie
CONFIG_DRM_USE_DYNAMIC_DEBUG=y has a regression; drm subsystem
modules, which depend upon drm.ko and use the drm.debug API, do not
get enabled when __drm_debug is set by `modprobe drm debug=0x1f`.

With =N, __drm_debug is checked before logging the msg, so the
end-of-modprobe debug=$init affected all later checks.  But with =y,
each run-time check is replaced by a static-key that is set at
end-of-modprobe.

This creates a chicken-egg dependency; i915 must be modprobed before
its drm.debugs are enabled, but drm.ko (and __drm_debug=$init) must be
done before modprobe i915, so its callsites arent there yet to be
enabled.

The fix is to split DECLARE_DYNDBG_CLASSMAP to:

DYNDBG_CLASSMAP_DEFINE - invoked in 'parent'
DYNDBG_CLASSMAP_USE - invoked in dependent, to USE the exported definition

To prove the fix w/o involving DRM, we need 2 modules, one dependent
on the other.  Add test_dynamic_debug_submod.ko, which _USEs the
classmaps _exported by test_dynamic_debug.ko

To keep code to a minimum, test_dynamic_debug.c ifdefs on
TEST_DYNAMIC_DEBUG_SUBMOD to build either parent or dependent, with
either DYNDBG_CLASSMAP_DEFINE or DYNDBG_CLASSMAP_USE invocations.

So test_dynamic_debug_submod.c is just 2 lines: include the .c after
defining SUBMOD.  This also gives the 2 modules identical prdbg
callsites, only differing by enablement/configuration.

Signed-off-by: Jim Cromie 
---
 lib/Makefile|  3 +-
 lib/test_dynamic_debug.c| 52 -
 lib/test_dynamic_debug_submod.c | 10 +++
 3 files changed, 57 insertions(+), 8 deletions(-)
 create mode 100644 lib/test_dynamic_debug_submod.c

diff --git a/lib/Makefile b/lib/Makefile
index 4d9461bfea42..7f7e75f44cd7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -77,7 +77,7 @@ obj-$(CONFIG_TEST_SORT) += test_sort.o
 obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
-obj-$(CONFIG_TEST_DYNAMIC_DEBUG) += test_dynamic_debug.o
+obj-$(CONFIG_TEST_DYNAMIC_DEBUG) += test_dynamic_debug.o 
test_dynamic_debug_submod.o
 obj-$(CONFIG_TEST_PRINTF) += test_printf.o
 obj-$(CONFIG_TEST_SCANF) += test_scanf.o
 obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
@@ -98,6 +98,7 @@ obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
 obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
 CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
 obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
+
 #
 # CFLAGS for compiling floating point code inside the kernel. x86/Makefile 
turns
 # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
diff --git a/lib/test_dynamic_debug.c b/lib/test_dynamic_debug.c
index e678884066bf..8c005c17f2db 100644
--- a/lib/test_dynamic_debug.c
+++ b/lib/test_dynamic_debug.c
@@ -6,7 +6,11 @@
  *  Jim Cromie 
  */
 
-#define pr_fmt(fmt) "test_dd: " fmt
+#if defined(TEST_DYNAMIC_DEBUG_SUBMOD)
+  #define pr_fmt(fmt) "test_dd_submod: " fmt
+#else
+  #define pr_fmt(fmt) "test_dd: " fmt
+#endif
 
 #define DEBUG /* enable all prdbgs (plain & class'd) at compiletime */
 
@@ -49,6 +53,14 @@ module_param_cb(do_prints, _ops_do_prints, NULL, 0600);
};  \
module_param_cb(_flags##_##_model, _ops_dyndbg_classes, 
&_flags##_model, 0600)
 
+/*
+ * dynamic-debug imitates drm.debug's use of enums (DRM_UT_CORE etc)
+ * to define its classes/categories.  dyndbg allows class-id's 0..62,
+ * reserving 63 for plain old (non-class'd) prdbgs.  A module can
+ * define multiple classmaps, as long as they claim non-overlapping
+ * subranges.
+ */
+
 /* numeric input, independent bits */
 enum cat_disjoint_bits {
D2_CORE = 0,
@@ -61,7 +73,36 @@ enum cat_disjoint_bits {
D2_LEASE,
D2_DP,
D2_DRMRES };
+
+/* symbolic input, independent bits */
+enum cat_disjoint_names { LOW = 10, MID, HI };
+
+/* numeric verbosity, V2 > V1 related */
+enum cat_level_num { V0 = 14, V1, V2, V3, V4, V5, V6, V7 };
+
+/* symbolic verbosity */
+enum cat_level_names { L0 = 22, L1, L2, L3, L4, L5, L6, L7 };
+
+#if defined(TEST_DYNAMIC_DEBUG_SUBMOD)
+
+/* use the classmaps defined in 'parent' module below */
+DYNDBG_CLASSMAP_USE(map_disjoint_bits);
+DYNDBG_CLASSMAP_USE(map_disjoint_names);
+DYNDBG_CLASSMAP_USE(map_level_num);
+DYNDBG_CLASSMAP_USE(map_level_names);
+
+#else
+
+/*
+ * parent module, define a classmap of each of 4 types.
+ * enum values are class-ids
+ * enum symbols are stringified, used as classnames
+ * param bits are mapped in order: 0..N
+ * (a straight, obvious, linear map is encouraged)
+ */
+
 DYNDBG_CLASSMAP_DEFINE(map_disjoint_bits, DD_CLASS_TYPE_DISJOINT_BITS,
+  /* bits 0..N of param are mapped to these class-ids */
   D2_CORE,
   D2_DRIVER,
   D2_KMS,
@@ -75,27 +116,23 @@ DYNDBG_CLASSMAP_DEFINE(map_disjoint_bits, 
DD_CLASS_TYPE_DISJOINT_BITS,
 

[PATCH v3 10/19] dyndbg-API: split DECLARE_(DYNDBG_CLASSMAP) to $1(_DEFINE|_USE)

2023-01-25 Thread Jim Cromie
DECLARE_DYNDBG_CLASSMAP's job was to allow modules to declare the debug
classes/categories they want dyndbg to >control on their behalf.  Its
args give the class-names, their mapping to class_ids, and the sysfs
interface style (usually a class-bitmap).  Modules wanting a drm.debug
style knob need to create the kparam, and call module_param_cb() to
wire the sysfs node to the classmap.  DRM does this is in drm_print.c

In DRM, multiple modules declare identical DRM_UT_* classmaps, so that
the class'd prdbgs are modified across those modules in a coordinated
way across the subsystem, by either explicit class DRM_UT_* queries to
>control, or by writes to /sys/module/drm/parameters/debug (drm.debug)

This coordination-by-identical-declarations is weird, so this patch
splits the macro into _DEFINE and _USE flavors.  This distinction
follows the definition vs declaration that K gave us, improving the
api; _DEFINE is used once to specify the classmap, and multiple users
_USE the single definition explicitly.

Currently the latter just reuses the former, and still needs all the
same args, but that can be tuned later; the _DEFINE can initialize an
(extern/global) struct classmap, and _USE can, well use/reference
that struct.

Also wrap DYNDBG_CLASSMAP_USEs with ifdef DRM_USE_DYNAMIC_DEBUG to
balance with the one around drm_print's use of DYNDBG_CLASSMAP_DEFINE.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  4 +++-
 drivers/gpu/drm/display/drm_dp_helper.c |  4 +++-
 drivers/gpu/drm/drm_crtc_helper.c   |  4 +++-
 drivers/gpu/drm/drm_print.c |  2 +-
 drivers/gpu/drm/i915/i915_params.c  |  4 +++-
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  4 +++-
 include/linux/dynamic_debug.h   | 20 
 lib/test_dynamic_debug.c| 32 -
 8 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index cd4caaa29528..a7a3a382c4a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -189,7 +189,8 @@ int amdgpu_vcnfw_log;
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
-DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
+#if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
"DRM_UT_CORE",
"DRM_UT_DRIVER",
"DRM_UT_KMS",
@@ -200,6 +201,7 @@ DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, 
DD_CLASS_TYPE_DISJOINT_BITS, 0,
"DRM_UT_LEASE",
"DRM_UT_DP",
"DRM_UT_DRMRES");
+#endif
 
 struct amdgpu_mgpu_info mgpu_info = {
.mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 16565a0a5da6..8fa7a88299e7 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -41,7 +41,8 @@
 
 #include "drm_dp_helper_internal.h"
 
-DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
+#if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
"DRM_UT_CORE",
"DRM_UT_DRIVER",
"DRM_UT_KMS",
@@ -52,6 +53,7 @@ DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, 
DD_CLASS_TYPE_DISJOINT_BITS, 0,
"DRM_UT_LEASE",
"DRM_UT_DP",
"DRM_UT_DRMRES");
+#endif
 
 struct dp_aux_backlight {
struct backlight_device *base;
diff --git a/drivers/gpu/drm/drm_crtc_helper.c 
b/drivers/gpu/drm/drm_crtc_helper.c
index a209659a996c..7e6b25446303 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -50,7 +50,8 @@
 
 #include "drm_crtc_helper_internal.h"
 
-DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
+#if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
"DRM_UT_CORE",
"DRM_UT_DRIVER",
"DRM_UT_KMS",
@@ -61,6 +62,7 @@ DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, 
DD_CLASS_TYPE_DISJOINT_BITS, 0,
"DRM_UT_LEASE",
"DRM_UT_DP",
"DRM_UT_DRMRES");
+#endif
 
 /**
  * DOC: overview
diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index 5b93c11895bb..4b697e18238d 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -56,7 +56,7 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each bit 
enables a debug cat
 module_param_named(debug, __drm_debug, ulong, 0600);
 #else
 /* classnames must match vals of enum drm_debug_category */

[PATCH v3 18/19] test-dyndbg: tune sub-module behavior

2023-01-25 Thread Jim Cromie
lib/test_dynamic_debug.c is used to build 2 modules:
test_dynamic_debug.ko and test_dynamic_debug_submod.ko

Define DEBUG only in the main module, not in the submod.  Its purpose
is to insure that prdbgs are enabled by default, so that a modprobe
without params actually logs something, showing that compile-time
enablement works.  This doesn't need to be repeated in the submodule.

Rather, the submodule's purpose is to prove that classmaps defined and
exported from a parent module are propagated to submodules, setting
their class'd debugs accordingly.

Signed-off-by: Jim Cromie 
---
 lib/test_dynamic_debug.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/test_dynamic_debug.c b/lib/test_dynamic_debug.c
index 6b7bd35c3e15..70a1e8955ad0 100644
--- a/lib/test_dynamic_debug.c
+++ b/lib/test_dynamic_debug.c
@@ -10,10 +10,9 @@
   #define pr_fmt(fmt) "test_dd_submod: " fmt
 #else
   #define pr_fmt(fmt) "test_dd: " fmt
+  #define DEBUG/* enable all prdbgs (plain & class'd), to log by 
default */
 #endif
 
-#define DEBUG /* enable all prdbgs (plain & class'd) at compiletime */
-
 #include 
 
 /* run tests by reading or writing sysfs node: do_prints */
-- 
2.39.1



[PATCH v3 08/19] dyndbg: tighten ddebug_class_name() 1st arg

2023-01-25 Thread Jim Cromie
Change function's 1st arg-type, by derefing in the caller.
The fn doesn't need any other fields in the struct.

no functional change.

Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 2d4640479e5b..10c29bc19901 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -1110,12 +1110,12 @@ static void *ddebug_proc_next(struct seq_file *m, void 
*p, loff_t *pos)
 #define class_in_range(class_id, map)  \
(class_id >= map->base && class_id < map->base + map->length)
 
-static const char *ddebug_class_name(struct ddebug_iter *iter, struct _ddebug 
*dp)
+static const char *ddebug_class_name(struct ddebug_table *dt, struct _ddebug 
*dp)
 {
-   struct ddebug_class_map *map = iter->table->classes;
-   int i, nc = iter->table->num_classes;
+   struct ddebug_class_map *map = dt->classes;
+   int i;
 
-   for (i = 0; i < nc; i++, map++)
+   for (i = 0; i < dt->num_classes; i++, map++)
if (class_in_range(dp->class_id, map))
return map->class_names[dp->class_id - map->base];
 
@@ -1149,7 +1149,7 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
seq_puts(m, "\"");
 
if (dp->class_id != _DPRINTK_CLASS_DFLT) {
-   class = ddebug_class_name(iter, dp);
+   class = ddebug_class_name(iter->table, dp);
if (class)
seq_printf(m, " class:%s", class);
else
-- 
2.39.1



[PATCH v3 11/19] dyndbg-API: specialize DYNDBG_CLASSMAP_(DEFINE|USE)

2023-01-25 Thread Jim Cromie
Now that the DECLARE_DYNDBG_CLASSMAP macro has been split into
DYNDBG_CLASSMAP_DEFINE and DYNDBG_CLASSMAP_USE, lets differentiate
them according to their separate jobs.

Dyndbg's existing __dyndbg_classes[] section does:

. catalogs the classmaps defined by the module (or builtin modules)
. authorizes dyndbg to >control those class'd prdbgs for the module.

This patch adds __dyndbg_class_refs[] section:

. catalogs refs/uses of the classmap definitions.
. authorizes dyndbg to >control those class'd prdbgs in ref'g module.
. maps the client module to classmap definitions
  drm-drivers and helpers are clients.
  this allows dyndbg to apply drm.debug to the client module, when added.

The distinction of the 2 roles yields these gains:

It follows the define-once-declare-elsewhere pattern that K gave us,
dumping the weird coordinated-changes-by-identical-classmaps API.

It allows separate handling of class-refs, to find the classed
kernel-params (if any) using this classmap, and propagate their
settings to the class'd prdbgs in the client module.

It fixes the chicken-egg problem that DRM_USE_DYNAMIC_DEBUG=y has; the
new type allows dyndbg to handle class-refs found while adding the
module.

The new DYNDBG_CLASSMAP_* macros add records to the sections:

DYNDBG_CLASSMAP_DEFINE:
  invoked just once per sub-system.
  for drm, its drm_print, where drm.debug is exposed.
  defines the classmap, names "DRM_UT_*", maps to class_id's
  authorizes dyndbg to exert >control
  populates __dyndbg_classes[] "section", __used.
  exports the classmap.

DYNDBG_CLASSMAP_USE:
  invoked by modules using classmaps defined & exported elsewhere
  populates __dyndbg_class_refs[] "section", __used.
  maps client-module name to the extern'd classmap.
  has client-name, so dyndbg can recognize loading client modules.

To support this, a few data changes:

. struct ddebug_class_user
  contains: user-module-name, ref to classmap-defn
  encodes drm-driver's use of a classmap, allowing lookup

struct ddebug_info gets 2 new fields to encapsulate the new section:
  class_refs, num_class_refs.
  set by dynamic_debug_init() for builtins.
  or by kernel/module/main:load_info() for loadable modules.

vmlinux.lds.h: new BOUNDED_SECTION for class-refs, with linker symbols

dynamic_debug.c: changes to:
  setup - in/under ddebug_add_module(), immediately following
  prdbg enables - in/under ddebug_change(), further below

ddebug_attach_module_classes() - largely unchanged:
  called from ddebug_add_module
  finds classmaps whose .mod_name matches module being added.
  attaches them to the module's ddebug_table.
  minor tweaks for code regularity, debug output

ddebug_attach_client_module_classes() - new fn:
. like above, but works class-refs, not classes.
. called from ddebug_add_module, after list-add to ddebug-tables.
  this lets ddebug_change find it to apply the settings
. scans class-refs, for the block "owned" by the module being added.
  for builtins, its N consecutive of many. for loadables, N of N
. calls ddebug_apply_parents_params() for each.

ddebug_apply_parents_params(new fn)

scans module's/builtin kernel-params, calls ddebug_match_attach_kparam
for each to find the params/sysfs-nodes using a classmap.

ddebug_match_apply_kparam(new fn):

1st, it tests the kernel-param.ops is dyndbg's; this guarantees that
the attached arg is a struct ddebug_class_param, which has a ref to
the param's state, and to the classmap.

2nd, it requires that the classmap attached to the kparam is the one
were called for; modules can use many separate classmaps (as
test_dynamic_debug does).

Then apply the "parent" kparam's setting to the client.

The ddebug_change() support:

ddebug_find_valid_class(): This does the search over classmaps,
looking for the class FOO echo'd to >control.  So now it searches over
__dyndbg_class_refs[] after __dyndbg_classes[].

ddebug_apply_class_bitmap(): now quieter when not changing things.

ddebug_class_name(): return class-names for defined AND used classes.

Signed-off-by: Jim Cromie 
--
v3 - s/BUG_ON/WARN_ON/ in __dyndbg_class_refs handling
 simpler args in callchain
v2 - rebase past merge conflicts
---
 include/asm-generic/vmlinux.lds.h |   1 +
 include/linux/dynamic_debug.h |  39 ---
 kernel/module/main.c  |   3 +
 lib/dynamic_debug.c   | 170 ++
 4 files changed, 179 insertions(+), 34 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 659bf3b31c91..5beb0321613e 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -373,6 +373,7 @@
/* implement dynamic printk debug */\
. = ALIGN(8);   \
BOUNDED_SECTION_BY(__dyndbg_classes, ___dyndbg_classes) \
+   BOUNDED_SECTION_BY(__dyndbg_class_refs, ___dyndbg_class_refs)   \
BOUNDED_SECTION_BY(__dyndbg, ___dyndbg)  

[PATCH v3 16/19] test-dyndbg: rename DD_SYS_WRAP to DYNDBG_CLASSMAP_PARAM

2023-01-25 Thread Jim Cromie
Original name was a punt; but the macro is maybe general enough to put
in the API later.  Rename and improve the macro towards that end.

Also tweak internal name constructed in the macro, to add a '_'
between the name components.  This changes the .i file only.

no functional change.

Signed-off-by: Jim Cromie 
---
 lib/test_dynamic_debug.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/lib/test_dynamic_debug.c b/lib/test_dynamic_debug.c
index 8c005c17f2db..ff1e70ae060e 100644
--- a/lib/test_dynamic_debug.c
+++ b/lib/test_dynamic_debug.c
@@ -44,14 +44,15 @@ module_param_cb(do_prints, _ops_do_prints, NULL, 
0600);
  * Additionally, here:
  * - tie together sysname, mapname, bitsname, flagsname
  */
-#define DD_SYS_WRAP(_model, _flags)\
+#define DYNDBG_CLASSMAP_PARAM(_model, _flags)  \
static unsigned long bits_##_model; \
-   static struct ddebug_class_param _flags##_model = { \
+   static struct ddebug_class_param _flags##_##_model = {  \
.bits = _##_model, \
.flags = #_flags,   \
.map = _##_model,   \
};  \
-   module_param_cb(_flags##_##_model, _ops_dyndbg_classes, 
&_flags##_model, 0600)
+   module_param_cb(_flags##_##_model, _ops_dyndbg_classes,   \
+   &_flags##_##_model, 0600)
 
 /*
  * dynamic-debug imitates drm.debug's use of enums (DRM_UT_CORE etc)
@@ -113,23 +114,23 @@ DYNDBG_CLASSMAP_DEFINE(map_disjoint_bits, 
DD_CLASS_TYPE_DISJOINT_BITS,
   D2_LEASE,
   D2_DP,
   D2_DRMRES);
-DD_SYS_WRAP(disjoint_bits, p);
-DD_SYS_WRAP(disjoint_bits, T);
+DYNDBG_CLASSMAP_PARAM(disjoint_bits, p);
+DYNDBG_CLASSMAP_PARAM(disjoint_bits, T);
 
 DYNDBG_CLASSMAP_DEFINE(map_disjoint_names, DD_CLASS_TYPE_DISJOINT_NAMES,
   LOW, MID, HI);
-DD_SYS_WRAP(disjoint_names, p);
-DD_SYS_WRAP(disjoint_names, T);
+DYNDBG_CLASSMAP_PARAM(disjoint_names, p);
+DYNDBG_CLASSMAP_PARAM(disjoint_names, T);
 
 DYNDBG_CLASSMAP_DEFINE(map_level_num, DD_CLASS_TYPE_LEVEL_NUM,
   V0, V1, V2, V3, V4, V5, V6, V7);
-DD_SYS_WRAP(level_num, p);
-DD_SYS_WRAP(level_num, T);
+DYNDBG_CLASSMAP_PARAM(level_num, p);
+DYNDBG_CLASSMAP_PARAM(level_num, T);
 
 DYNDBG_CLASSMAP_DEFINE(map_level_names, DD_CLASS_TYPE_LEVEL_NAMES,
   L0, L1, L2, L3, L4, L5, L6, L7);
-DD_SYS_WRAP(level_names, p);
-DD_SYS_WRAP(level_names, T);
+DYNDBG_CLASSMAP_PARAM(level_names, p);
+DYNDBG_CLASSMAP_PARAM(level_names, T);
 
 #endif /* TEST_DYNAMIC_DEBUG_SUBMOD */
 
-- 
2.39.1



[PATCH v3 17/19] test-dyndbg: disable WIP dyndbg-trace params

2023-01-25 Thread Jim Cromie
The dyndbg-to-trace feature is WIP, and not in mainline, so the
presence of the interface to use/test it is unhelpful/confusing.

So define DYNDBG_CLASSMAP_PARAM_T() as DYNDBG_CLASSMAP_PARAM() or
blank, depending upon ifdef DYDBG_TRACE, and update 4 params
controlling the T-flag to use it.

Signed-off-by: Jim Cromie 
---
 lib/test_dynamic_debug.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/test_dynamic_debug.c b/lib/test_dynamic_debug.c
index ff1e70ae060e..6b7bd35c3e15 100644
--- a/lib/test_dynamic_debug.c
+++ b/lib/test_dynamic_debug.c
@@ -54,6 +54,14 @@ module_param_cb(do_prints, _ops_do_prints, NULL, 0600);
module_param_cb(_flags##_##_model, _ops_dyndbg_classes,   \
&_flags##_##_model, 0600)
 
+/* TBD */
+#ifdef DYNDBG_TRACE
+#define DYNDBG_CLASSMAP_PARAM_T(_model, _flags)\
+   DYNDBG_CLASSMAP_PARAM(_model, _flags)
+#else
+#define DYNDBG_CLASSMAP_PARAM_T(_model, _flags)
+#endif
+
 /*
  * dynamic-debug imitates drm.debug's use of enums (DRM_UT_CORE etc)
  * to define its classes/categories.  dyndbg allows class-id's 0..62,
@@ -115,22 +123,22 @@ DYNDBG_CLASSMAP_DEFINE(map_disjoint_bits, 
DD_CLASS_TYPE_DISJOINT_BITS,
   D2_DP,
   D2_DRMRES);
 DYNDBG_CLASSMAP_PARAM(disjoint_bits, p);
-DYNDBG_CLASSMAP_PARAM(disjoint_bits, T);
+DYNDBG_CLASSMAP_PARAM_T(disjoint_bits, T);
 
 DYNDBG_CLASSMAP_DEFINE(map_disjoint_names, DD_CLASS_TYPE_DISJOINT_NAMES,
   LOW, MID, HI);
 DYNDBG_CLASSMAP_PARAM(disjoint_names, p);
-DYNDBG_CLASSMAP_PARAM(disjoint_names, T);
+DYNDBG_CLASSMAP_PARAM_T(disjoint_names, T);
 
 DYNDBG_CLASSMAP_DEFINE(map_level_num, DD_CLASS_TYPE_LEVEL_NUM,
   V0, V1, V2, V3, V4, V5, V6, V7);
 DYNDBG_CLASSMAP_PARAM(level_num, p);
-DYNDBG_CLASSMAP_PARAM(level_num, T);
+DYNDBG_CLASSMAP_PARAM_T(level_num, T);
 
 DYNDBG_CLASSMAP_DEFINE(map_level_names, DD_CLASS_TYPE_LEVEL_NAMES,
   L0, L1, L2, L3, L4, L5, L6, L7);
 DYNDBG_CLASSMAP_PARAM(level_names, p);
-DYNDBG_CLASSMAP_PARAM(level_names, T);
+DYNDBG_CLASSMAP_PARAM_T(level_names, T);
 
 #endif /* TEST_DYNAMIC_DEBUG_SUBMOD */
 
-- 
2.39.1



[PATCH v3 13/19] dyndbg-API: DYNDBG_CLASSMAP_DEFINE() improvements

2023-01-25 Thread Jim Cromie
patch 1 in this series fixed a CLASSMAP usage error, this improves the
api so that misuse is less likely.

changes here:

0- Add William Swanson's public domain map macro:
   https://github.com/swansontec/map-macro/blob/master/map.h
   this makes 1 possible.

1- classname args to CLASSMAP macros were given as strings: "DRM_UT_CORE".
   Now they are the actual enum const symbols: DRM_UT_CORE.
   Direct use of symbols is tighter, more comprehensible by tools, grep

2- drop _base arg.
   _base was the value of the 1st classname
   that is now available due to 1, no need to require it 2x

So take _base out of the API/kdoc.  Note that the macro impl keeps the
_base arg so that it can be used to set classmap.base, but reuses it
in the MAP-stringify _base, __VA_ARGS__ expression.

Also cleanup the API usage comment in test_dynamic_debug.c, and since
comments in test-code might not be noticed, restate that here.

Using the CLASSMAP api:

  - class-specifications are enum consts/symbols,
like DRM_UT_CORE, DRM_UT_KMS, etc.
their values define bits in the sysfs-node (like drm.debug)

  - they are stringified and accepted at >control
echo class DRM_UT_CORE +p >control

  - multiple class-maps must share the per-module: 0-62 class_id space
(by setting initial enum values to non-overlapping subranges)

todo: fixup the 'i' prefix, a quick/dirty avoidance of MAP.

NOTE: test_dynamic_debug.c also has this helper macro to wire a
classmap to a drm.debug style parameter; its easier to just use it as
a model/template as needed, rather than try to make it general enough
to be an official API helper.

 define DD_SYS_WRAP(_model, _flags) \
static unsigned long bits_##_model; \
static struct ddebug_class_param _flags##_model = { \
.bits = _##_model, \
.flags = #_flags,   \
.map = _##_model,   \
};  \
module_param_cb(_flags##_##_model, _ops_dyndbg_classes, 
&_flags##_model, 0600)

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c   | 22 +++---
 include/drm/drm_print.h   |  1 +
 include/linux/dynamic_debug.h | 17 +--
 include/linux/map.h   | 55 +++
 lib/test_dynamic_debug.c  | 43 +--
 5 files changed, 96 insertions(+), 42 deletions(-)
 create mode 100644 include/linux/map.h

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index 4b697e18238d..07c25241e8cc 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -56,17 +56,17 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each 
bit enables a debug cat
 module_param_named(debug, __drm_debug, ulong, 0600);
 #else
 /* classnames must match vals of enum drm_debug_category */
-DYNDBG_CLASSMAP_DEFINE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
-   "DRM_UT_CORE",
-   "DRM_UT_DRIVER",
-   "DRM_UT_KMS",
-   "DRM_UT_PRIME",
-   "DRM_UT_ATOMIC",
-   "DRM_UT_VBL",
-   "DRM_UT_STATE",
-   "DRM_UT_LEASE",
-   "DRM_UT_DP",
-   "DRM_UT_DRMRES");
+DYNDBG_CLASSMAP_DEFINE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS,
+  DRM_UT_CORE,
+  DRM_UT_DRIVER,
+  DRM_UT_KMS,
+  DRM_UT_PRIME,
+  DRM_UT_ATOMIC,
+  DRM_UT_VBL,
+  DRM_UT_STATE,
+  DRM_UT_LEASE,
+  DRM_UT_DP,
+  DRM_UT_DRMRES);
 
 static struct ddebug_class_param drm_debug_bitmap = {
.bits = &__drm_debug,
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index a44fb7ef257f..6a27e8f26770 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -333,6 +333,7 @@ static inline bool drm_debug_enabled_raw(enum 
drm_debug_category category)
})
 
 #if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+//extern struct ddebug_class_map drm_debug_classes[];
 /*
  * the drm.debug API uses dyndbg, so each drm_*dbg macro/callsite gets
  * a descriptor, and only enabled callsites are reachable.  They use
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 91015d1a04e0..7cdfc4b533ae 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -7,6 +7,7 @@
 #endif
 
 #include 
+#include 
 
 /*
  * An instance of this structure is created in a special
@@ -90,18 +91,16 @@ struct ddebug_class_map {
 };
 
 /**
- * DYNDBG_CLASSMAP_DEFINE - define debug-classes used by a module.
- * @_var:   name of the classmap, 

[PATCH v3 12/19] dyndbg-API: DYNDBG_CLASSMAP_USE drop extra args

2023-01-25 Thread Jim Cromie
Drop macro args after _var.  Since DYNDBG_CLASSMAP_USE no longer
forwards to DYNDBG_CLASSMAP_DEFINE, it doesn't need those args to
forward.  Keep only the _var arg, which is the extern'd struct
classmap with all the class info.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +-
 drivers/gpu/drm/display/drm_dp_helper.c | 12 +-
 drivers/gpu/drm/drm_crtc_helper.c   | 12 +-
 drivers/gpu/drm/i915/i915_params.c  | 12 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c   | 12 +-
 include/linux/dynamic_debug.h   | 30 ++---
 6 files changed, 22 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index a7a3a382c4a6..6c57e598b7d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -190,17 +190,7 @@ int amdgpu_vcnfw_log;
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
 #if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
-   "DRM_UT_CORE",
-   "DRM_UT_DRIVER",
-   "DRM_UT_KMS",
-   "DRM_UT_PRIME",
-   "DRM_UT_ATOMIC",
-   "DRM_UT_VBL",
-   "DRM_UT_STATE",
-   "DRM_UT_LEASE",
-   "DRM_UT_DP",
-   "DRM_UT_DRMRES");
+DYNDBG_CLASSMAP_USE(drm_debug_classes);
 #endif
 
 struct amdgpu_mgpu_info mgpu_info = {
diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 8fa7a88299e7..3bc188cb1116 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -42,17 +42,7 @@
 #include "drm_dp_helper_internal.h"
 
 #if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
-   "DRM_UT_CORE",
-   "DRM_UT_DRIVER",
-   "DRM_UT_KMS",
-   "DRM_UT_PRIME",
-   "DRM_UT_ATOMIC",
-   "DRM_UT_VBL",
-   "DRM_UT_STATE",
-   "DRM_UT_LEASE",
-   "DRM_UT_DP",
-   "DRM_UT_DRMRES");
+DYNDBG_CLASSMAP_USE(drm_debug_classes);
 #endif
 
 struct dp_aux_backlight {
diff --git a/drivers/gpu/drm/drm_crtc_helper.c 
b/drivers/gpu/drm/drm_crtc_helper.c
index 7e6b25446303..1780db9de069 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -51,17 +51,7 @@
 #include "drm_crtc_helper_internal.h"
 
 #if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
-   "DRM_UT_CORE",
-   "DRM_UT_DRIVER",
-   "DRM_UT_KMS",
-   "DRM_UT_PRIME",
-   "DRM_UT_ATOMIC",
-   "DRM_UT_VBL",
-   "DRM_UT_STATE",
-   "DRM_UT_LEASE",
-   "DRM_UT_DP",
-   "DRM_UT_DRMRES");
+DYNDBG_CLASSMAP_USE(drm_debug_classes);
 #endif
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index b5b2542ae364..e959d0384ead 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -30,17 +30,7 @@
 #include "i915_drv.h"
 
 #if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
-   "DRM_UT_CORE",
-   "DRM_UT_DRIVER",
-   "DRM_UT_KMS",
-   "DRM_UT_PRIME",
-   "DRM_UT_ATOMIC",
-   "DRM_UT_VBL",
-   "DRM_UT_STATE",
-   "DRM_UT_LEASE",
-   "DRM_UT_DP",
-   "DRM_UT_DRMRES");
+DYNDBG_CLASSMAP_USE(drm_debug_classes);
 #endif
 
 #define i915_param_named(name, T, perm, desc) \
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index e4146b9af357..ad341411687f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -72,17 +72,7 @@
 #include "nouveau_dmem.h"
 
 #if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-DYNDBG_CLASSMAP_USE(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
-   "DRM_UT_CORE",
-   "DRM_UT_DRIVER",
-   "DRM_UT_KMS",
-   "DRM_UT_PRIME",
-   "DRM_UT_ATOMIC",
-   "DRM_UT_VBL",
-   "DRM_UT_STATE",
-   "DRM_UT_LEASE",
-   "DRM_UT_DP",
-   "DRM_UT_DRMRES");

[PATCH v3 14/19] drm_print: fix stale macro-name in comment

2023-01-25 Thread Jim Cromie
Cited commit uses stale macro name, fix this, and explain better.

When DRM_USE_DYNAMIC_DEBUG=y, DYNDBG_CLASSMAP_DEFINE() maps DRM_UT_*
onto BITs in drm.debug.  This still uses enum drm_debug_category, but
it is somewhat indirect, with the ordered set of DRM_UT_* enum-vals.
This requires that the macro args: DRM_UT_* list must be kept in sync
and in order.

Fixes: f158936b60a7 ("drm: POC drm on dyndbg - use in core, 2 helpers, 3 
drivers.")
Signed-off-by: Jim Cromie 
---
. emphasize ABI non-change despite enum val change - Jani Nikula
. reorder to back of patchset to follow API name changes.
---
 include/drm/drm_print.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 6a27e8f26770..7695ba31b3a4 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -276,7 +276,10 @@ static inline struct drm_printer drm_err_printer(const 
char *prefix)
  *
  */
 enum drm_debug_category {
-   /* These names must match those in DYNAMIC_DEBUG_CLASSBITS */
+   /*
+* Keep DYNDBG_CLASSMAP_DEFINE args in sync with changes here,
+* the enum-values define BIT()s in drm.debug, so are ABI.
+*/
/**
 * @DRM_UT_CORE: Used in the generic drm code: drm_ioctl.c, drm_mm.c,
 * drm_memory.c, ...
-- 
2.39.1



[PATCH v3 09/19] dyndbg: constify ddebug_apply_class_bitmap args

2023-01-25 Thread Jim Cromie
ddebug_apply_class_bitmap() does not alter its 2 bitmap args, make
this guarantee in the interface.

NOTE: the bitmap is also available in the dcp arg, but the 2 vars
serve a 2nd purpose; the CLASS_TYPE callers use them to translate
levels into their underlying disjoint representation.

no functional change

Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 10c29bc19901..b51f4bde6198 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -592,7 +592,7 @@ static int ddebug_exec_queries(char *query, const char 
*modname)
 
 /* apply a new bitmap to the sys-knob's current bit-state */
 static int ddebug_apply_class_bitmap(const struct ddebug_class_param *dcp,
-unsigned long *new_bits, unsigned long 
*old_bits,
+const unsigned long *new_bits, const 
unsigned long *old_bits,
 const char *query_modname)
 {
 #define QUERY_SIZE 128
-- 
2.39.1



[PATCH v3 05/19] dyndbg: split param_set_dyndbg_classes to inner/outer fns

2023-01-25 Thread Jim Cromie
Inner fn adds mod_name param, allowing caller to guarantee that only
one module is affected by a prdbgs update.  Outer fn preserves
kernel_param interface, passing NULL to inner fn.

no functional change.

Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 943e0597ecd4..0a5efc735b36 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -707,18 +707,9 @@ static int param_set_dyndbg_classnames(const char *instr, 
const struct kernel_pa
return 0;
 }
 
-/**
- * param_set_dyndbg_classes - class FOO >control
- * @instr: string echo>d to sysfs, input depends on map_type
- * @kp:kp->arg has state: bits/lvl, map, map_type
- *
- * Enable/disable prdbgs by their class, as given in the arguments to
- * DECLARE_DYNDBG_CLASSMAP.  For LEVEL map-types, enforce relative
- * levels by bitpos.
- *
- * Returns: 0 or <0 if error.
- */
-int param_set_dyndbg_classes(const char *instr, const struct kernel_param *kp)
+static int param_set_dyndbg_module_classes(const char *instr,
+  const struct kernel_param *kp,
+  const char *modnm)
 {
const struct ddebug_class_param *dcp = kp->arg;
const struct ddebug_class_map *map = dcp->map;
@@ -755,8 +746,8 @@ int param_set_dyndbg_classes(const char *instr, const 
struct kernel_param *kp)
KP_NAME(kp), inrep, 
CLASSMAP_BITMASK(map->length));
inrep &= CLASSMAP_BITMASK(map->length);
}
-   v2pr_info("bits:%lx > %s\n", inrep, KP_NAME(kp));
-   totct += ddebug_apply_class_bitmap(dcp, , dcp->bits, 
NULL);
+   v2pr_info("bits:0x%lx > %s.%s\n", inrep, modnm ?: "*", 
KP_NAME(kp));
+   totct += ddebug_apply_class_bitmap(dcp, , dcp->bits, 
modnm);
*dcp->bits = inrep;
break;
case DD_CLASS_TYPE_LEVEL_NUM:
@@ -769,7 +760,7 @@ int param_set_dyndbg_classes(const char *instr, const 
struct kernel_param *kp)
old_bits = CLASSMAP_BITMASK(*dcp->lvl);
new_bits = CLASSMAP_BITMASK(inrep);
v2pr_info("lvl:%ld bits:0x%lx > %s\n", inrep, new_bits, 
KP_NAME(kp));
-   totct += ddebug_apply_class_bitmap(dcp, _bits, _bits, 
NULL);
+   totct += ddebug_apply_class_bitmap(dcp, _bits, _bits, 
modnm);
*dcp->lvl = inrep;
break;
default:
@@ -778,6 +769,21 @@ int param_set_dyndbg_classes(const char *instr, const 
struct kernel_param *kp)
vpr_info("%s: total matches: %d\n", KP_NAME(kp), totct);
return 0;
 }
+/**
+ * param_set_dyndbg_classes - class FOO >control
+ * @instr: string echo>d to sysfs, input depends on map_type
+ * @kp:kp->arg has state: bits/lvl, map, map_type
+ *
+ * Enable/disable prdbgs by their class, as given in the arguments to
+ * DECLARE_DYNDBG_CLASSMAP.  For LEVEL map-types, enforce relative
+ * levels by bitpos.
+ *
+ * Returns: 0 or <0 if error.
+ */
+int param_set_dyndbg_classes(const char *instr, const struct kernel_param *kp)
+{
+   return param_set_dyndbg_module_classes(instr, kp, NULL);
+}
 EXPORT_SYMBOL(param_set_dyndbg_classes);
 
 /**
-- 
2.39.1



[PATCH v3 06/19] dyndbg: drop NUM_TYPE_ARRAY

2023-01-25 Thread Jim Cromie
ARRAY_SIZE works here, since array decl is complete.

no functional change

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index bf47bcfad8e6..81b643ab7f6e 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -104,11 +104,9 @@ struct ddebug_class_map {
.mod_name = KBUILD_MODNAME, \
.base = _base,  \
.map_type = _maptype,   \
-   .length = NUM_TYPE_ARGS(char*, __VA_ARGS__),\
+   .length = ARRAY_SIZE(_var##_classnames),\
.class_names = _var##_classnames,   \
}
-#define NUM_TYPE_ARGS(eltype, ...) \
-(sizeof((eltype[]){__VA_ARGS__}) / sizeof(eltype))
 
 /* encapsulate linker provided built-in (or module) dyndbg data */
 struct _ddebug_info {
-- 
2.39.1



[PATCH v3 07/19] dyndbg: reduce verbose/debug clutter

2023-01-25 Thread Jim Cromie
currently, for verbose=3, this is logged:

 dyndbg: query 0: "class DRM_UT_CORE +p" mod:*
 dyndbg: split into words: "class" "DRM_UT_CORE" "+p"

 dyndbg: op='+'
 dyndbg: flags=0x1
 dyndbg: *flagsp=0x1 *maskp=0x

 dyndbg: parsed: func="" file="" module="" format="" lineno=0-0 
class=DRM_UT_CORE
 dyndbg: no matches for query
 dyndbg: no-match: func="" file="" module="" format="" lineno=0-0 
class=DRM_UT_CORE
 dyndbg: processed 1 queries, with 0 matches, 0 errs

This patch:

shrinks 3 lines of 2nd stanza to single line

drops 2 middle lines of 3rd stanza
 3 differs from 1 only by status
 2 is just status, retold in 4, with more info.

Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 0a5efc735b36..2d4640479e5b 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -265,9 +265,6 @@ static int ddebug_change(const struct ddebug_query *query,
}
mutex_unlock(_lock);
 
-   if (!nfound && verbose)
-   pr_info("no matches for query\n");
-
return nfound;
 }
 
@@ -496,7 +493,6 @@ static int ddebug_parse_flags(const char *str, struct 
flag_settings *modifiers)
pr_err("bad flag-op %c, at start of %s\n", *str, str);
return -EINVAL;
}
-   v3pr_info("op='%c'\n", op);
 
for (; *str ; ++str) {
for (i = ARRAY_SIZE(opt_array) - 1; i >= 0; i--) {
@@ -510,7 +506,6 @@ static int ddebug_parse_flags(const char *str, struct 
flag_settings *modifiers)
return -EINVAL;
}
}
-   v3pr_info("flags=0x%x\n", modifiers->flags);
 
/* calculate final flags, mask based upon op */
switch (op) {
@@ -526,7 +521,7 @@ static int ddebug_parse_flags(const char *str, struct 
flag_settings *modifiers)
modifiers->flags = 0;
break;
}
-   v3pr_info("*flagsp=0x%x *maskp=0x%x\n", modifiers->flags, 
modifiers->mask);
+   v3pr_info("op='%c' flags=0x%x maskp=0x%x\n", op, modifiers->flags, 
modifiers->mask);
 
return 0;
 }
@@ -536,7 +531,7 @@ static int ddebug_exec_query(char *query_string, const char 
*modname)
struct flag_settings modifiers = {};
struct ddebug_query query = {};
 #define MAXWORDS 9
-   int nwords, nfound;
+   int nwords;
char *words[MAXWORDS];
 
nwords = ddebug_tokenize(query_string, words, MAXWORDS);
@@ -554,10 +549,7 @@ static int ddebug_exec_query(char *query_string, const 
char *modname)
return -EINVAL;
}
/* actually go and implement the change */
-   nfound = ddebug_change(, );
-   vpr_info_dq(, nfound ? "applied" : "no-match");
-
-   return nfound;
+   return ddebug_change(, );
 }
 
 /* handle multiple queries in query string, continue on error, return
-- 
2.39.1



[PATCH v3 03/19] dyndbg: replace classmap list with a vector

2023-01-25 Thread Jim Cromie
Classmaps are stored/linked in a section/array, but are each added to
the module's ddebug_table.maps list-head.

This is unnecessary; even when ddebug_attach_classmap() is handling
the builtin section (with classmaps for multiple builtin modules), its
contents are ordered, so a module's possibly multiple classmaps will
be consecutive in the section, and could be treated as a vector/block,
since both start-addy and subrange length are in the ddebug_info arg.

So this changes:

struct ddebug_class_map drops list-head link.

struct ddebug_table drops the list-head maps, and gets: classes &
num_classes for the start-addy and num_classes, placed to improve
struct packing.

The loading: in ddebug_attach_module_classes(), replace the
for-the-modname list-add loop, with a forloop that finds the module's
subrange (start,length) of matching classmaps within the possibly
builtin classmaps vector, and saves those to the ddebug_table.

The reading/using: change list-foreach loops in ddebug_class_name() &
ddebug_find_valid_class() to walk the array from start to length.

Also:
Move #define __outvar up, above an added use in a fn-prototype.
Simplify ddebug_attach_module_classes args, ref has both addy,len.

This isn't technically a bugfix, but IMO simplifies later fixes for
the chicken-egg post-init enablement regression.

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h |  1 -
 lib/dynamic_debug.c   | 61 ++-
 2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 41682278d2e8..bf47bcfad8e6 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -81,7 +81,6 @@ enum class_map_type {
 };
 
 struct ddebug_class_map {
-   struct list_head link;
struct module *mod;
const char *mod_name;   /* needed for builtins */
const char **class_names;
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 009f2ead09c1..823190094350 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -45,10 +45,11 @@ extern struct ddebug_class_map __start___dyndbg_classes[];
 extern struct ddebug_class_map __stop___dyndbg_classes[];
 
 struct ddebug_table {
-   struct list_head link, maps;
+   struct list_head link;
const char *mod_name;
-   unsigned int num_ddebugs;
struct _ddebug *ddebugs;
+   struct ddebug_class_map *classes;
+   unsigned int num_ddebugs, num_classes;
 };
 
 struct ddebug_query {
@@ -146,13 +147,15 @@ static void vpr_info_dq(const struct ddebug_query *query, 
const char *msg)
  query->first_lineno, query->last_lineno, query->class_string);
 }
 
+#define __outvar /* filled by callee */
 static struct ddebug_class_map *ddebug_find_valid_class(struct ddebug_table 
const *dt,
- const char 
*class_string, int *class_id)
+   const char 
*class_string,
+   __outvar int *class_id)
 {
struct ddebug_class_map *map;
-   int idx;
+   int i, idx;
 
-   list_for_each_entry(map, >maps, link) {
+   for (map = dt->classes, i = 0; i < dt->num_classes; i++, map++) {
idx = match_string(map->class_names, map->length, class_string);
if (idx >= 0) {
*class_id = idx + map->base;
@@ -163,7 +166,6 @@ static struct ddebug_class_map 
*ddebug_find_valid_class(struct ddebug_table cons
return NULL;
 }
 
-#define __outvar /* filled by callee */
 /*
  * Search the tables for _ddebug's which match the given `query' and
  * apply the `flags' and `mask' to them.  Returns number of matching
@@ -1107,9 +1109,10 @@ static void *ddebug_proc_next(struct seq_file *m, void 
*p, loff_t *pos)
 
 static const char *ddebug_class_name(struct ddebug_iter *iter, struct _ddebug 
*dp)
 {
-   struct ddebug_class_map *map;
+   struct ddebug_class_map *map = iter->table->classes;
+   int i, nc = iter->table->num_classes;
 
-   list_for_each_entry(map, >table->maps, link)
+   for (i = 0; i < nc; i++, map++)
if (class_in_range(dp->class_id, map))
return map->class_names[dp->class_id - map->base];
 
@@ -1193,30 +1196,31 @@ static const struct proc_ops proc_fops = {
.proc_write = ddebug_proc_write
 };
 
-static void ddebug_attach_module_classes(struct ddebug_table *dt,
-struct ddebug_class_map *classes,
-int num_classes)
+static void ddebug_attach_module_classes(struct ddebug_table *dt, struct 
_ddebug_info *di)
 {
struct ddebug_class_map *cm;
-   int i, j, ct = 0;
+   int i, nc = 0;
 
-   for (cm = classes, i = 0; i < num_classes; i++, cm++) {
+   /*
+* Find this module's classmaps in a subrange/wholerange of
+* the 

[PATCH v3 04/19] dyndbg: make ddebug_apply_class_bitmap more selective

2023-01-25 Thread Jim Cromie
Add query_module param to ddebug_apply_class_bitmap().  This allows
its caller to update just one module, or all (as currently).  We'll
use this later to propagate drm.debug to each USEr as they're
modprobed.

No functional change.

Signed-off-by: Jim Cromie 
---

after `modprobe i915`, heres the module dependencies,
though not all on drm.debug.

bash-5.2# lsmod
Module  Size  Used by
i915 3133440  0
drm_buddy  20480  1 i915
ttm90112  1 i915
i2c_algo_bit   16384  1 i915
video  61440  1 i915
wmi32768  1 video
drm_display_helper200704  1 i915
drm_kms_helper208896  2 drm_display_helper,i915
drm   606208  5 
drm_kms_helper,drm_display_helper,drm_buddy,i915,ttm
cec57344  2 drm_display_helper,i915
---
 lib/dynamic_debug.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 823190094350..943e0597ecd4 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -600,7 +600,8 @@ static int ddebug_exec_queries(char *query, const char 
*modname)
 
 /* apply a new bitmap to the sys-knob's current bit-state */
 static int ddebug_apply_class_bitmap(const struct ddebug_class_param *dcp,
-unsigned long *new_bits, unsigned long 
*old_bits)
+unsigned long *new_bits, unsigned long 
*old_bits,
+const char *query_modname)
 {
 #define QUERY_SIZE 128
char query[QUERY_SIZE];
@@ -608,7 +609,8 @@ static int ddebug_apply_class_bitmap(const struct 
ddebug_class_param *dcp,
int matches = 0;
int bi, ct;
 
-   v2pr_info("apply: 0x%lx to: 0x%lx\n", *new_bits, *old_bits);
+   v2pr_info("apply bitmap: 0x%lx to: 0x%lx for %s\n", *new_bits, 
*old_bits,
+ query_modname ?: "");
 
for (bi = 0; bi < map->length; bi++) {
if (test_bit(bi, new_bits) == test_bit(bi, old_bits))
@@ -617,12 +619,15 @@ static int ddebug_apply_class_bitmap(const struct 
ddebug_class_param *dcp,
snprintf(query, QUERY_SIZE, "class %s %c%s", 
map->class_names[bi],
 test_bit(bi, new_bits) ? '+' : '-', dcp->flags);
 
-   ct = ddebug_exec_queries(query, NULL);
+   ct = ddebug_exec_queries(query, query_modname);
matches += ct;
 
v2pr_info("bit_%d: %d matches on class: %s -> 0x%lx\n", bi,
  ct, map->class_names[bi], *new_bits);
}
+   v2pr_info("applied bitmap: 0x%lx to: 0x%lx for %s\n", *new_bits, 
*old_bits,
+ query_modname ?: "");
+
return matches;
 }
 
@@ -678,7 +683,7 @@ static int param_set_dyndbg_classnames(const char *instr, 
const struct kernel_pa
continue;
}
curr_bits ^= BIT(cls_id);
-   totct += ddebug_apply_class_bitmap(dcp, _bits, 
dcp->bits);
+   totct += ddebug_apply_class_bitmap(dcp, _bits, 
dcp->bits, NULL);
*dcp->bits = curr_bits;
v2pr_info("%s: changed bit %d:%s\n", KP_NAME(kp), 
cls_id,
  map->class_names[cls_id]);
@@ -688,7 +693,7 @@ static int param_set_dyndbg_classnames(const char *instr, 
const struct kernel_pa
old_bits = CLASSMAP_BITMASK(*dcp->lvl);
curr_bits = CLASSMAP_BITMASK(cls_id + (wanted ? 1 : 0 
));
 
-   totct += ddebug_apply_class_bitmap(dcp, _bits, 
_bits);
+   totct += ddebug_apply_class_bitmap(dcp, _bits, 
_bits, NULL);
*dcp->lvl = (cls_id + (wanted ? 1 : 0));
v2pr_info("%s: changed bit-%d: \"%s\" %lx->%lx\n", 
KP_NAME(kp), cls_id,
  map->class_names[cls_id], old_bits, 
curr_bits);
@@ -751,7 +756,7 @@ int param_set_dyndbg_classes(const char *instr, const 
struct kernel_param *kp)
inrep &= CLASSMAP_BITMASK(map->length);
}
v2pr_info("bits:%lx > %s\n", inrep, KP_NAME(kp));
-   totct += ddebug_apply_class_bitmap(dcp, , dcp->bits);
+   totct += ddebug_apply_class_bitmap(dcp, , dcp->bits, 
NULL);
*dcp->bits = inrep;
break;
case DD_CLASS_TYPE_LEVEL_NUM:
@@ -764,7 +769,7 @@ int param_set_dyndbg_classes(const char *instr, const 
struct kernel_param *kp)
old_bits = CLASSMAP_BITMASK(*dcp->lvl);
new_bits = CLASSMAP_BITMASK(inrep);
v2pr_info("lvl:%ld bits:0x%lx > %s\n", inrep, new_bits, 
KP_NAME(kp));
-   totct += ddebug_apply_class_bitmap(dcp, _bits, _bits);
+   totct += ddebug_apply_class_bitmap(dcp, _bits, _bits, 
NULL);

[PATCH v3 02/19] test-dyndbg: show that DEBUG enables prdbgs at compiletime

2023-01-25 Thread Jim Cromie
Dyndbg is required to enable prdbgs at compile-time if DEBUG is
defined.  Show this works; add the defn to test_dynamic_debug.c,
and manually inspect/verify its effect at module load:

[   15.292810] dyndbg: module:test_dynamic_debug attached 4 classes
[   15.293189] dyndbg:  32 debug prints in module test_dynamic_debug
[   15.293715] test_dd: init start
[   15.293716] test_dd: doing categories
[   15.293716] test_dd: LOW msg
...
[   15.293733] test_dd: L6 msg
[   15.293733] test_dd: L7 msg
[   15.293733] test_dd: init done

NOTES:

As is observable above, define DEBUG enables all prdbgs, including
those in mod_init-fn, and more notably, the class'd ones (callsites
with non-default class_ids).

This differs from the >control interface, which in order to properly
protect a client's class'd prdbgs, requires a "class FOO" in queries
to change them.  If this sounds wrong, note that the DEBUG is in the
module source file, and is thus privileged.

This yields an occaisional surprise; the following disables all the
compile-time enabled plain prdbgs, but leaves the class'd ones
enabled.

 :#> modprobe test_dynamic_debug dyndbg==_

Signed-off-by: Jim Cromie 
---
 lib/test_dynamic_debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/test_dynamic_debug.c b/lib/test_dynamic_debug.c
index a01f0193a419..89dd7f285e31 100644
--- a/lib/test_dynamic_debug.c
+++ b/lib/test_dynamic_debug.c
@@ -8,6 +8,8 @@
 
 #define pr_fmt(fmt) "test_dd: " fmt
 
+#define DEBUG /* enable all prdbgs (plain & class'd) at compiletime */
+
 #include 
 
 /* run tests by reading or writing sysfs node: do_prints */
-- 
2.39.1



[PATCH v3 01/19] test-dyndbg: fixup CLASSMAP usage error

2023-01-25 Thread Jim Cromie
more careful reading of test output reveals:

lib/test_dynamic_debug.c:103 [test_dynamic_debug]do_cats =pmf "doing 
categories\n"
lib/test_dynamic_debug.c:105 [test_dynamic_debug]do_cats =p "LOW msg\n" 
class:MID
lib/test_dynamic_debug.c:106 [test_dynamic_debug]do_cats =p "MID msg\n" class:HI
lib/test_dynamic_debug.c:107 [test_dynamic_debug]do_cats =_ "HI msg\n" class 
unknown, _id:13

That last line is wrong, the HI class is declared.

But the enum's 1st val (explicitly initialized) was wrong; it must be
_base, not _base+1 (a DECLARE_DYNDBG_CLASSMAP param).  So the last
enumeration exceeded the range of mapped class-id's, which triggered
the "class unknown" report.  Basically, I coded in an error, and
forgot to verify it and remove it.

RFC:

This patch fixes a bad usage of DECLARE_DYNDBG_CLASSMAP([1]), showing that
it is too error-prone.  As noted in test-dynamic-debug.c comments:

 * Using the CLASSMAP api:
 * - classmaps must have corresponding enum
 * - enum symbols must match/correlate with class-name strings in the map.
 * - base must equal enum's 1st value
 * - multiple maps must set their base to share the 0-62 class_id space !!
 *   (build-bug-on tips welcome)

Those shortcomings could largely be fixed with a __stringify_list
(which doesn't exist) used in DEFINE_DYNAMIC_DEBUG_CLASSMAP(), on
__VA_ARGS__ a 2nd time.  Then, DRM would pass DRM_UT_* ; all the
categories, in order, and not their stringifications, which created
all the usage complications above.

[1] name changed later to DYNDBG_CLASSMAP_DEFINE

Signed-off-by: Jim Cromie 
---
 lib/test_dynamic_debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_dynamic_debug.c b/lib/test_dynamic_debug.c
index 8dd250ad022b..a01f0193a419 100644
--- a/lib/test_dynamic_debug.c
+++ b/lib/test_dynamic_debug.c
@@ -75,7 +75,7 @@ DD_SYS_WRAP(disjoint_bits, p);
 DD_SYS_WRAP(disjoint_bits, T);
 
 /* symbolic input, independent bits */
-enum cat_disjoint_names { LOW = 11, MID, HI };
+enum cat_disjoint_names { LOW = 10, MID, HI };
 DECLARE_DYNDBG_CLASSMAP(map_disjoint_names, DD_CLASS_TYPE_DISJOINT_NAMES, 10,
"LOW", "MID", "HI");
 DD_SYS_WRAP(disjoint_names, p);
-- 
2.39.1



[PATCH v3 00/19] fix DRM_USE_DYNAMIC_DEBUG regression

2023-01-25 Thread Jim Cromie
Hi everyone,

In v6.1 DRM_USE_DYNAMIC_DEBUG=y has a regression enabling drm.debug in
drivers at modprobe.

It is due to a chicken-egg problem loading modules; on `modprobe
i915`, drm is loaded 1st, and drm/parameters/debug is set.  When
drm_debug_enabled() tested __drm_debug at runtime, this just worked.

But with DRM_USE_DYNAMIC_DEBUG=y, the runtime test is replaced with a
static_key for each drm_dbg/dyndbg callsite, enabled by dyndbg's
kparam callback on __drm_debug.  So with drm.ko loaded and initialized
before the dependent modules, their debug callsites aren't yet present
to be enabled.

STATUS - v3

not quite ready.
rebased on -rc5, hopefully applies to patchwork head 
still has RFC patch -> CI_ONLY temporary, to avoid panics
boots on my amdgpu box, drm.debug=0x3ff works at boot-time
the "toggled" warning is repeatable with test_dynamic_debug*.ko
it also occurs on amdgpu, so not just artificial.
v2 is https://lore.kernel.org/lkml/20230113193016.749791-1-jim.cro...@gmail.com/

OVERVIEW

As Jani Nikula noted rather more gently, DECLARE_DYNDBG_CLASSMAP is
error-prone enough to call broken: sharing of a common classmap
required identical classmap definitions in all modules using DRM_UT_*,
which is inherently error-prone.  IOW, it muddled the K distinction
between a (single) definition, and multiple references.

So patches 10-13 split it into:

DYNDBG_CLASSMAP_DEFINE  used once per subsystem to define each classmap.
DYNDBG_CLASSMAP_USE declare dependence on a DEFINEd classmap.

DYNDBG_CLASSMAP_DEFINE initializes the classmap, stores it into the
(existing) __dyndbg_classes section, and exports the struct var
(unlike DECLARE_DYNDBG_CLASSMAP).

DYNDBG_CLASSMAP_USE initializes a class-ref struct, containing the
user-module-name, and a ref to the exported classmap var.

The distinction allows separate treatment of classmaps and
classmap-refs, the latter getting additional behavior to propagate
parent's kparam settings to USEr. (forex: drm.debug to drm-drivers) 

. lookup the classmap defn being referenced, and its module
. find the module's kernel-params using the classmap
. propagate kparam vals into the prdgs in module being added.

It also makes the weird coordinated-changes-by-identical-classmaps
"feature" unnecessary.

Patch-10 splits the DECLARE macro into DEFINE & USE, and updates uses.

Patch-11 is the core of it; the separate treatment begins in
ddebug_add_module().  It calls ddebug_attach_module_classes(1) to
handle class-defns; this adds ddebug_attach_client_module_classes(2)
to handle class-refs, as they are found while modprobing drm
drivers. (2) calls ddebug_apply_parents_params(3) on each USEr's
referred classmap definition.

(3) scans kernel-params owned by the module DEFINEing the classmap,
either builtin or loadable, calls ddebug_match_apply_kparam(4) on each.

(4) looks for kparams which are wired to dyndbg's param-ops.  Those
params have a struct ddebug_class_param attached, which has a classmap
and a ref to a state-var (__drm_debug for DRM case).  If the kparam's
classmap is the same as from (2), then apply its state-var to the
client module by calling ddebug_apply_class_bitmap().

Patch-12 cleans up DYNDBG_CLASSMAP_USE, dropping now unneeded args.

Patch-13 improves DYNDBG_CLASSMAP_DEFINE, by accepting DRM_UT_*
symbols directly, not "DRM_UT_*" (their strings).  It adds new
include/linux/map.h to support this.

Patches 1-9 are prep, refactor, cleanup, tighten interfaces

Patches 15-18 extend test_dynamic_debug to recreate DRM's multi-module
regression; it builds both test_dynamic_debug.ko and _submod.ko, with
an ifdef to _DEFINE in the main module, and _USE in the submod.  This
gives both modules identical set of prdbgs, which is helpful for
comparing results.

here it is, working properly:

doing class DRM_UT_CORE -p
[ 9904.961750] dyndbg: read 21 bytes from userspace
[ 9904.962286] dyndbg: query 0: "class DRM_UT_CORE -p" mod:*
[ 9904.962848] dyndbg: split into words: "class" "DRM_UT_CORE" "-p"
[ 9904.963444] dyndbg: op='-' flags=0x0 maskp=0xfffe
[ 9904.963945] dyndbg: parsed: func="" file="" module="" format="" lineno=0-0 
class=DRM_UT_CORE
[ 9904.964781] dyndbg: good-class: drm.DRM_UT_CORE  module:drm nd:302 nc:1 nu:0
[ 9904.966411] dyndbg: class-ref: drm_kms_helper.DRM_UT_CORE  
module:drm_kms_helper nd:95 nc:0 nu:1
[ 9904.967265] dyndbg: class-ref: drm_display_helper.DRM_UT_CORE  
module:drm_display_helper nd:150 nc:0 nu:1
[ 9904.968349] dyndbg: class-ref: i915.DRM_UT_CORE  module:i915 nd:1659 nc:0 
nu:1
[ 9904.969801] dyndbg: class-ref: amdgpu.DRM_UT_CORE  module:amdgpu nd:4425 
nc:0 nu:1
[ 9904.977079] dyndbg: class-ref: nouveau.DRM_UT_CORE  module:nouveau nd:103 
nc:0 nu:1
[ 9904.977830] dyndbg: processed 1 queries, with 507 matches, 0 errs
doing class DRM_UT_DRIVER +p
[ 9906.151761] dyndbg: read 23 bytes from userspace
[ 9906.152241] dyndbg: query 0: "class DRM_UT_DRIVER +p" mod:*
[ 9906.152793] dyndbg: split into words: "class" "DRM_UT_DRIVER" "+p"
[ 9906.153388] dyndbg: 

[PATCH 09/32] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
GFX9.4.2 now supports per-VMID debug mode controls registers
(SPI_GDBG_PER_VMID_CNTL).

Because the KFD lets the HWS handle PASID-VMID mapping, the KFD will
forward all debug mode setting register writes to the HWS scheduler
using a new MAP_PROCESS API, so instead of writing to registers, return
the required register values that the HWS needs to write on debug enable
and disable.

v2: add commentary on unused restore_dbg_registers for debug enable.

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  | 43 ++-
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 4485bb29bec9..89868f9927ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,44 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "gc/gc_9_4_2_offset.h"
+#include "gc/gc_9_4_2_sh_mask.h"
+
+/**
+ * Returns TRAP_EN, EXCP_EN and EXCP_REPLACE.
+ *
+ * restore_dbg_reisters is ignored here but is a general interface requirement
+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to manually
+ * initialize the debug mode registers after it has disabled GFX off during the
+ * debug session.
+ */
+static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
+
+/* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
+static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 
keep_trap_enabled);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
 
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
@@ -41,6 +79,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_atc_vmid_pasid_mapping_info =
kgd_gfx_v9_get_atc_vmid_pasid_mapping_info,
.set_vm_context_page_table_base = 
kgd_gfx_v9_set_vm_context_page_table_base,
-   .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .enable_debug_trap = kgd_aldebaran_enable_debug_trap,
+   .disable_debug_trap = kgd_aldebaran_disable_debug_trap,
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
 };
-- 
2.25.1



[PATCH v3 10/10] drm/fbdev-generic: Rename struct fb_info 'fbi' to 'info'

2023-01-25 Thread Thomas Zimmermann
The generic fbdev emulation names variables of type struct fb_info
both 'fbi' and 'info'. The latter seems to be more common in fbdev
code, so name fbi accordingly.

Also replace the duplicate variable in drm_fbdev_fb_destroy().

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fbdev_generic.c | 47 ++---
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 68ce652e3a14..43f94aa9e015 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -46,16 +46,15 @@ static int drm_fbdev_fb_release(struct fb_info *info, int 
user)
 static void drm_fbdev_fb_destroy(struct fb_info *info)
 {
struct drm_fb_helper *fb_helper = info->par;
-   struct fb_info *fbi = fb_helper->info;
void *shadow = NULL;
 
if (!fb_helper->dev)
return;
 
-   if (fbi->fbdefio)
-   fb_deferred_io_cleanup(fbi);
+   if (info->fbdefio)
+   fb_deferred_io_cleanup(info);
if (drm_fbdev_use_shadow_fb(fb_helper))
-   shadow = fbi->screen_buffer;
+   shadow = info->screen_buffer;
 
drm_fb_helper_fini(fb_helper);
 
@@ -171,7 +170,7 @@ static int drm_fbdev_fb_probe(struct drm_fb_helper 
*fb_helper,
struct drm_device *dev = fb_helper->dev;
struct drm_client_buffer *buffer;
struct drm_framebuffer *fb;
-   struct fb_info *fbi;
+   struct fb_info *info;
u32 format;
struct iosys_map map;
int ret;
@@ -190,35 +189,35 @@ static int drm_fbdev_fb_probe(struct drm_fb_helper 
*fb_helper,
fb_helper->fb = buffer->fb;
fb = buffer->fb;
 
-   fbi = drm_fb_helper_alloc_info(fb_helper);
-   if (IS_ERR(fbi))
-   return PTR_ERR(fbi);
+   info = drm_fb_helper_alloc_info(fb_helper);
+   if (IS_ERR(info))
+   return PTR_ERR(info);
 
-   fbi->fbops = _fbdev_fb_ops;
-   fbi->screen_size = sizes->surface_height * fb->pitches[0];
-   fbi->fix.smem_len = fbi->screen_size;
-   fbi->flags = FBINFO_DEFAULT;
+   info->fbops = _fbdev_fb_ops;
+   info->screen_size = sizes->surface_height * fb->pitches[0];
+   info->fix.smem_len = info->screen_size;
+   info->flags = FBINFO_DEFAULT;
 
-   drm_fb_helper_fill_info(fbi, fb_helper, sizes);
+   drm_fb_helper_fill_info(info, fb_helper, sizes);
 
if (drm_fbdev_use_shadow_fb(fb_helper)) {
-   fbi->screen_buffer = vzalloc(fbi->screen_size);
-   if (!fbi->screen_buffer)
+   info->screen_buffer = vzalloc(info->screen_size);
+   if (!info->screen_buffer)
return -ENOMEM;
-   fbi->flags |= FBINFO_VIRTFB | FBINFO_READS_FAST;
+   info->flags |= FBINFO_VIRTFB | FBINFO_READS_FAST;
 
-   fbi->fbdefio = _fbdev_defio;
-   fb_deferred_io_init(fbi);
+   info->fbdefio = _fbdev_defio;
+   fb_deferred_io_init(info);
} else {
/* buffer is mapped for HW framebuffer */
ret = drm_client_buffer_vmap(fb_helper->buffer, );
if (ret)
return ret;
if (map.is_iomem) {
-   fbi->screen_base = map.vaddr_iomem;
+   info->screen_base = map.vaddr_iomem;
} else {
-   fbi->screen_buffer = map.vaddr;
-   fbi->flags |= FBINFO_VIRTFB;
+   info->screen_buffer = map.vaddr;
+   info->flags |= FBINFO_VIRTFB;
}
 
/*
@@ -227,10 +226,10 @@ static int drm_fbdev_fb_probe(struct drm_fb_helper 
*fb_helper,
 * case.
 */
 #if IS_ENABLED(CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM)
-   if (fb_helper->hint_leak_smem_start && fbi->fix.smem_start == 0 
&&
+   if (fb_helper->hint_leak_smem_start && info->fix.smem_start == 
0 &&
!drm_WARN_ON_ONCE(dev, map.is_iomem))
-   fbi->fix.smem_start =
-   page_to_phys(virt_to_page(fbi->screen_buffer));
+   info->fix.smem_start =
+   page_to_phys(virt_to_page(info->screen_buffer));
 #endif
}
 
-- 
2.39.0



[PATCH v3 08/10] drm/fbdev-generic: Minimize client unregistering

2023-01-25 Thread Thomas Zimmermann
For uninitialized framebuffers, only release the DRM client and
free the fbdev memory. Do not attempt to clean up the framebuffer.

DRM fbdev clients have a two-step initialization: first create
the DRM client; then create the framebuffer device on the first
successful hotplug event. In cases where the client never creates
the framebuffer, only the client state needs to be released. We
can detect which case it is, full or client-only cleanup, by
looking at the presence of fb_helper's info field.

v3:
* fix typo in commit message (Javier)
* release client before unpreparing fbdev
v2:
* remove test for (fbi != NULL) in drm_fbdev_cleanup() (Sam)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fbdev_generic.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index dd8be5e0f271..a9c519001019 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -51,12 +51,10 @@ static void drm_fbdev_cleanup(struct drm_fb_helper 
*fb_helper)
if (!fb_helper->dev)
return;
 
-   if (fbi) {
-   if (fbi->fbdefio)
-   fb_deferred_io_cleanup(fbi);
-   if (drm_fbdev_use_shadow_fb(fb_helper))
-   shadow = fbi->screen_buffer;
-   }
+   if (fbi->fbdefio)
+   fb_deferred_io_cleanup(fbi);
+   if (drm_fbdev_use_shadow_fb(fb_helper))
+   shadow = fbi->screen_buffer;
 
drm_fb_helper_fini(fb_helper);
 
@@ -362,11 +360,13 @@ static void drm_fbdev_client_unregister(struct 
drm_client_dev *client)
 {
struct drm_fb_helper *fb_helper = drm_fb_helper_from_client(client);
 
-   if (fb_helper->info)
-   /* drm_fbdev_fb_destroy() takes care of cleanup */
+   if (fb_helper->info) {
drm_fb_helper_unregister_info(fb_helper);
-   else
-   drm_fbdev_release(fb_helper);
+   } else {
+   drm_client_release(_helper->client);
+   drm_fb_helper_unprepare(fb_helper);
+   kfree(fb_helper);
+   }
 }
 
 static int drm_fbdev_client_restore(struct drm_client_dev *client)
-- 
2.39.0



[PATCH v3 06/10] drm/fb-helper: Initialize fb-helper's preferred BPP in prepare function

2023-01-25 Thread Thomas Zimmermann
Initialize the fb-helper's preferred_bpp field early from within
drm_fb_helper_prepare(); instead of the later client hot-plugging
callback. This simplifies the generic fbdev setup function.

No real changes, but all drivers' fbdev code has to be adapted.

v3:
* build with CONFIG_DRM_FBDEV_EMULATION unset (kernel test bot)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/armada/armada_fbdev.c  |  4 ++--
 drivers/gpu/drm/drm_fb_helper.c| 22 ++
 drivers/gpu/drm/drm_fbdev_generic.c| 19 ++-
 drivers/gpu/drm/exynos/exynos_drm_fbdev.c  |  4 ++--
 drivers/gpu/drm/gma500/framebuffer.c   |  4 ++--
 drivers/gpu/drm/i915/display/intel_fbdev.c | 11 ++-
 drivers/gpu/drm/msm/msm_fbdev.c|  4 ++--
 drivers/gpu/drm/omapdrm/omap_fbdev.c   |  4 ++--
 drivers/gpu/drm/radeon/radeon_fb.c |  4 ++--
 drivers/gpu/drm/tegra/fb.c |  7 +++
 include/drm/drm_fb_helper.h| 11 ++-
 11 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_fbdev.c 
b/drivers/gpu/drm/armada/armada_fbdev.c
index 584cee123bd8..07e410c62b7a 100644
--- a/drivers/gpu/drm/armada/armada_fbdev.c
+++ b/drivers/gpu/drm/armada/armada_fbdev.c
@@ -129,7 +129,7 @@ int armada_fbdev_init(struct drm_device *dev)
 
priv->fbdev = fbh;
 
-   drm_fb_helper_prepare(dev, fbh, _fb_helper_funcs);
+   drm_fb_helper_prepare(dev, fbh, 32, _fb_helper_funcs);
 
ret = drm_fb_helper_init(dev, fbh);
if (ret) {
@@ -137,7 +137,7 @@ int armada_fbdev_init(struct drm_device *dev)
goto err_fb_helper;
}
 
-   ret = drm_fb_helper_initial_config(fbh, 32);
+   ret = drm_fb_helper_initial_config(fbh);
if (ret) {
DRM_ERROR("failed to set initial config\n");
goto err_fb_setup;
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 258103d317ac..28c428e9c530 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -416,14 +416,30 @@ static void drm_fb_helper_damage_work(struct work_struct 
*work)
  * drm_fb_helper_prepare - setup a drm_fb_helper structure
  * @dev: DRM device
  * @helper: driver-allocated fbdev helper structure to set up
+ * @preferred_bpp: Preferred bits per pixel for the device.
  * @funcs: pointer to structure of functions associate with this helper
  *
  * Sets up the bare minimum to make the framebuffer helper usable. This is
  * useful to implement race-free initialization of the polling helpers.
  */
 void drm_fb_helper_prepare(struct drm_device *dev, struct drm_fb_helper 
*helper,
+  unsigned int preferred_bpp,
   const struct drm_fb_helper_funcs *funcs)
 {
+   /*
+* Pick a preferred bpp of 32 if no value has been given. This
+* will select XRGB for the framebuffer formats. All drivers
+* have to support XRGB for backwards compatibility with legacy
+* userspace, so it's the safe choice here.
+*
+* TODO: Replace struct drm_mode_config.preferred_depth and this
+*   bpp value with a preferred format that is given as struct
+*   drm_format_info. Then derive all other values from the
+*   format.
+*/
+   if (!preferred_bpp)
+   preferred_bpp = 32;
+
INIT_LIST_HEAD(>kernel_fb_list);
spin_lock_init(>damage_lock);
INIT_WORK(>resume_work, drm_fb_helper_resume_worker);
@@ -432,6 +448,7 @@ void drm_fb_helper_prepare(struct drm_device *dev, struct 
drm_fb_helper *helper,
mutex_init(>lock);
helper->funcs = funcs;
helper->dev = dev;
+   helper->preferred_bpp = preferred_bpp;
 }
 EXPORT_SYMBOL(drm_fb_helper_prepare);
 
@@ -2183,7 +2200,6 @@ __drm_fb_helper_initial_config_and_unlock(struct 
drm_fb_helper *fb_helper)
 /**
  * drm_fb_helper_initial_config - setup a sane initial connector configuration
  * @fb_helper: fb_helper device struct
- * @bpp_sel: bpp value to use for the framebuffer configuration
  *
  * Scans the CRTCs and connectors and tries to put together an initial setup.
  * At the moment, this is a cloned configuration across all heads with
@@ -2221,15 +2237,13 @@ __drm_fb_helper_initial_config_and_unlock(struct 
drm_fb_helper *fb_helper)
  * RETURNS:
  * Zero if everything went ok, nonzero otherwise.
  */
-int drm_fb_helper_initial_config(struct drm_fb_helper *fb_helper, int bpp_sel)
+int drm_fb_helper_initial_config(struct drm_fb_helper *fb_helper)
 {
int ret;
 
if (!drm_fbdev_emulation)
return 0;
 
-   fb_helper->preferred_bpp = bpp_sel;
-
mutex_lock(_helper->lock);
ret = __drm_fb_helper_initial_config_and_unlock(fb_helper);
 
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 

[PATCH v3 07/10] drm/fbdev-generic: Minimize hotplug error handling

2023-01-25 Thread Thomas Zimmermann
Call drm_fb_helper_fini() in the generic-fbdev hotplug helper
to revert the effects of drm_fb_helper_init(). No full cleanup
is required.

v3:
* fix error in commit message (Javier)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fbdev_generic.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 6ae014040df3..dd8be5e0f271 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -387,25 +387,21 @@ static int drm_fbdev_client_hotplug(struct drm_client_dev 
*client)
 
ret = drm_fb_helper_init(dev, fb_helper);
if (ret)
-   goto err;
+   goto err_drm_err;
 
if (!drm_drv_uses_atomic_modeset(dev))
drm_helper_disable_unused_functions(dev);
 
ret = drm_fb_helper_initial_config(fb_helper);
if (ret)
-   goto err_cleanup;
+   goto err_drm_fb_helper_fini;
 
return 0;
 
-err_cleanup:
-   drm_fbdev_cleanup(fb_helper);
-err:
-   fb_helper->dev = NULL;
-   fb_helper->info = NULL;
-
+err_drm_fb_helper_fini:
+   drm_fb_helper_fini(fb_helper);
+err_drm_err:
drm_err(dev, "fbdev: Failed to setup generic emulation (ret=%d)\n", 
ret);
-
return ret;
 }
 
-- 
2.39.0



[PATCH v3 09/10] drm/fbdev-generic: Inline clean-up helpers into drm_fbdev_fb_destroy()

2023-01-25 Thread Thomas Zimmermann
The fbdev framebuffer cleanup in drm_fbdev_fb_destroy() calls
drm_fbdev_release() and drm_fbdev_cleanup(). Inline both into the
caller. No functional changes.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fbdev_generic.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index a9c519001019..68ce652e3a14 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -43,8 +43,9 @@ static int drm_fbdev_fb_release(struct fb_info *info, int 
user)
return 0;
 }
 
-static void drm_fbdev_cleanup(struct drm_fb_helper *fb_helper)
+static void drm_fbdev_fb_destroy(struct fb_info *info)
 {
+   struct drm_fb_helper *fb_helper = info->par;
struct fb_info *fbi = fb_helper->info;
void *shadow = NULL;
 
@@ -64,24 +65,10 @@ static void drm_fbdev_cleanup(struct drm_fb_helper 
*fb_helper)
drm_client_buffer_vunmap(fb_helper->buffer);
 
drm_client_framebuffer_delete(fb_helper->buffer);
-}
-
-static void drm_fbdev_release(struct drm_fb_helper *fb_helper)
-{
-   drm_fbdev_cleanup(fb_helper);
drm_client_release(_helper->client);
kfree(fb_helper);
 }
 
-/*
- * fb_ops.fb_destroy is called by the last put_fb_info() call at the end of
- * unregister_framebuffer() or fb_release().
- */
-static void drm_fbdev_fb_destroy(struct fb_info *info)
-{
-   drm_fbdev_release(info->par);
-}
-
 static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
 {
struct drm_fb_helper *fb_helper = info->par;
-- 
2.39.0



[PATCH v3 05/10] drm/fb-helper: Remove preferred_bpp parameter from fbdev internals

2023-01-25 Thread Thomas Zimmermann
Store the console's preferred BPP value in struct drm_fb_helper
and remove the respective function parameters from the internal
fbdev code.

The BPP value is only required as a fallback and will now always
be available in the fb-helper instance.

No functional changes.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fb_helper.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 4379bcd7718b..258103d317ac 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1786,7 +1786,7 @@ static uint32_t 
drm_fb_helper_find_color_mode_format(struct drm_fb_helper *fb_he
return drm_fb_helper_find_format(fb_helper, formats, format_count, bpp, 
depth);
 }
 
-static int __drm_fb_helper_find_sizes(struct drm_fb_helper *fb_helper, int 
preferred_bpp,
+static int __drm_fb_helper_find_sizes(struct drm_fb_helper *fb_helper,
  struct drm_fb_helper_surface_size *sizes)
 {
struct drm_client_dev *client = _helper->client;
@@ -1831,7 +1831,7 @@ static int __drm_fb_helper_find_sizes(struct 
drm_fb_helper *fb_helper, int prefe
surface_format = drm_fb_helper_find_color_mode_format(fb_helper,
  
plane->format_types,
  
plane->format_count,
- 
preferred_bpp);
+ 
fb_helper->preferred_bpp);
if (surface_format != DRM_FORMAT_INVALID)
break; /* found supported format */
}
@@ -1903,7 +1903,7 @@ static int __drm_fb_helper_find_sizes(struct 
drm_fb_helper *fb_helper, int prefe
return 0;
 }
 
-static int drm_fb_helper_find_sizes(struct drm_fb_helper *fb_helper, int 
preferred_bpp,
+static int drm_fb_helper_find_sizes(struct drm_fb_helper *fb_helper,
struct drm_fb_helper_surface_size *sizes)
 {
struct drm_client_dev *client = _helper->client;
@@ -1912,7 +1912,7 @@ static int drm_fb_helper_find_sizes(struct drm_fb_helper 
*fb_helper, int preferr
int ret;
 
mutex_lock(>modeset_mutex);
-   ret = __drm_fb_helper_find_sizes(fb_helper, preferred_bpp, sizes);
+   ret = __drm_fb_helper_find_sizes(fb_helper, sizes);
mutex_unlock(>modeset_mutex);
 
if (ret)
@@ -1934,15 +1934,14 @@ static int drm_fb_helper_find_sizes(struct 
drm_fb_helper *fb_helper, int preferr
  * Allocates the backing storage and sets up the fbdev info structure through
  * the ->fb_probe callback.
  */
-static int drm_fb_helper_single_fb_probe(struct drm_fb_helper *fb_helper,
-int preferred_bpp)
+static int drm_fb_helper_single_fb_probe(struct drm_fb_helper *fb_helper)
 {
struct drm_client_dev *client = _helper->client;
struct drm_device *dev = fb_helper->dev;
struct drm_fb_helper_surface_size sizes;
int ret;
 
-   ret = drm_fb_helper_find_sizes(fb_helper, preferred_bpp, );
+   ret = drm_fb_helper_find_sizes(fb_helper, );
if (ret) {
/* First time: disable all crtc's.. */
if (!fb_helper->deferred_setup)
@@ -2125,8 +2124,7 @@ static void drm_setup_crtcs_fb(struct drm_fb_helper 
*fb_helper)
 
 /* Note: Drops fb_helper->lock before returning. */
 static int
-__drm_fb_helper_initial_config_and_unlock(struct drm_fb_helper *fb_helper,
- int bpp_sel)
+__drm_fb_helper_initial_config_and_unlock(struct drm_fb_helper *fb_helper)
 {
struct drm_device *dev = fb_helper->dev;
struct fb_info *info;
@@ -2137,10 +2135,9 @@ __drm_fb_helper_initial_config_and_unlock(struct 
drm_fb_helper *fb_helper,
height = dev->mode_config.max_height;
 
drm_client_modeset_probe(_helper->client, width, height);
-   ret = drm_fb_helper_single_fb_probe(fb_helper, bpp_sel);
+   ret = drm_fb_helper_single_fb_probe(fb_helper);
if (ret < 0) {
if (ret == -EAGAIN) {
-   fb_helper->preferred_bpp = bpp_sel;
fb_helper->deferred_setup = true;
ret = 0;
}
@@ -2231,8 +2228,10 @@ int drm_fb_helper_initial_config(struct drm_fb_helper 
*fb_helper, int bpp_sel)
if (!drm_fbdev_emulation)
return 0;
 
+   fb_helper->preferred_bpp = bpp_sel;
+
mutex_lock(_helper->lock);
-   ret = __drm_fb_helper_initial_config_and_unlock(fb_helper, bpp_sel);
+   ret = __drm_fb_helper_initial_config_and_unlock(fb_helper);
 
return ret;
 }
@@ -2268,8 +2267,7 @@ int drm_fb_helper_hotplug_event(struct drm_fb_helper 
*fb_helper)
 

[PATCH v3 00/10] drm/fb-helper: Various cleanups

2023-01-25 Thread Thomas Zimmermann
Add various cleanups and changes to DRM's fbdev helpers and the
generic fbdev emulation. There's no clear theme here, just lots
of small things that need to be updated.
 
In the end, the code will better reflect which parts are in the 
DRM client, which is fbdev emulation, and which are shared fbdev
helpers.

v3:
* various minor fixes (Javier))
* build with CONFIG_DRM_FBDEV_EMULATION unset (kernel test robot)
v2:
* cleanups in drm_fbdev_fb_destroy() (Sam)
* fix declaration of drm_fb_helper_unprepare()

Thomas Zimmermann (10):
  drm/client: Test for connectors before sending hotplug event
  drm/client: Add hotplug_failed flag
  drm/fb-helper: Introduce drm_fb_helper_unprepare()
  drm/fbdev-generic: Initialize fb-helper structure in generic setup
  drm/fb-helper: Remove preferred_bpp parameter from fbdev internals
  drm/fb-helper: Initialize fb-helper's preferred BPP in prepare
function
  drm/fbdev-generic: Minimize hotplug error handling
  drm/fbdev-generic: Minimize client unregistering
  drm/fbdev-generic: Inline clean-up helpers into drm_fbdev_fb_destroy()
  drm/fbdev-generic: Rename struct fb_info 'fbi' to 'info'

 drivers/gpu/drm/armada/armada_fbdev.c  |   4 +-
 drivers/gpu/drm/drm_client.c   |  10 ++
 drivers/gpu/drm/drm_fb_helper.c|  58 ++---
 drivers/gpu/drm/drm_fbdev_generic.c| 131 -
 drivers/gpu/drm/exynos/exynos_drm_fbdev.c  |   4 +-
 drivers/gpu/drm/gma500/framebuffer.c   |   4 +-
 drivers/gpu/drm/i915/display/intel_fbdev.c |  11 +-
 drivers/gpu/drm/msm/msm_fbdev.c|   4 +-
 drivers/gpu/drm/omapdrm/omap_fbdev.c   |   4 +-
 drivers/gpu/drm/radeon/radeon_fb.c |   4 +-
 drivers/gpu/drm/tegra/fb.c |   7 +-
 include/drm/drm_client.h   |   8 ++
 include/drm/drm_fb_helper.h|  16 ++-
 13 files changed, 138 insertions(+), 127 deletions(-)


base-commit: 7d3e7f64a42d66ba8da6e7b66a8d85457ef84570
-- 
2.39.0



[PATCH v3 03/10] drm/fb-helper: Introduce drm_fb_helper_unprepare()

2023-01-25 Thread Thomas Zimmermann
Move the fb-helper clean-up code into drm_fb_helper_unprepare(). No
functional changes.

v2:
* declare as static inline (kernel test robot)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fb_helper.c | 14 +-
 include/drm/drm_fb_helper.h |  5 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index c5c13e192b64..4379bcd7718b 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -435,6 +435,18 @@ void drm_fb_helper_prepare(struct drm_device *dev, struct 
drm_fb_helper *helper,
 }
 EXPORT_SYMBOL(drm_fb_helper_prepare);
 
+/**
+ * drm_fb_helper_unprepare - clean up a drm_fb_helper structure
+ * @fb_helper: driver-allocated fbdev helper structure to set up
+ *
+ * Cleans up the framebuffer helper. Inverse of drm_fb_helper_prepare().
+ */
+void drm_fb_helper_unprepare(struct drm_fb_helper *fb_helper)
+{
+   mutex_destroy(_helper->lock);
+}
+EXPORT_SYMBOL(drm_fb_helper_unprepare);
+
 /**
  * drm_fb_helper_init - initialize a  drm_fb_helper
  * @dev: drm device
@@ -561,7 +573,7 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper)
}
mutex_unlock(_fb_helper_lock);
 
-   mutex_destroy(_helper->lock);
+   drm_fb_helper_unprepare(fb_helper);
 
if (!fb_helper->client.funcs)
drm_client_release(_helper->client);
diff --git a/include/drm/drm_fb_helper.h b/include/drm/drm_fb_helper.h
index f443e1f11654..39710c570a04 100644
--- a/include/drm/drm_fb_helper.h
+++ b/include/drm/drm_fb_helper.h
@@ -230,6 +230,7 @@ drm_fb_helper_from_client(struct drm_client_dev *client)
 #ifdef CONFIG_DRM_FBDEV_EMULATION
 void drm_fb_helper_prepare(struct drm_device *dev, struct drm_fb_helper 
*helper,
   const struct drm_fb_helper_funcs *funcs);
+void drm_fb_helper_unprepare(struct drm_fb_helper *fb_helper);
 int drm_fb_helper_init(struct drm_device *dev, struct drm_fb_helper *helper);
 void drm_fb_helper_fini(struct drm_fb_helper *helper);
 int drm_fb_helper_blank(int blank, struct fb_info *info);
@@ -296,6 +297,10 @@ static inline void drm_fb_helper_prepare(struct drm_device 
*dev,
 {
 }
 
+static inline void drm_fb_helper_unprepare(struct drm_fb_helper *fb_helper)
+{
+}
+
 static inline int drm_fb_helper_init(struct drm_device *dev,
   struct drm_fb_helper *helper)
 {
-- 
2.39.0



[PATCH v3 04/10] drm/fbdev-generic: Initialize fb-helper structure in generic setup

2023-01-25 Thread Thomas Zimmermann
Initialize the fb-helper structure immediately after its allocation
in drm_fbdev_generic_setup(). That will make it easier to fill it with
driver-specific values, such as the preferred BPP.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_fbdev_generic.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 135d58b8007b..63f66325a8a5 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -385,8 +385,6 @@ static int drm_fbdev_client_hotplug(struct drm_client_dev 
*client)
if (dev->fb_helper)
return drm_fb_helper_hotplug_event(dev->fb_helper);
 
-   drm_fb_helper_prepare(dev, fb_helper, _fb_helper_generic_funcs);
-
ret = drm_fb_helper_init(dev, fb_helper);
if (ret)
goto err;
@@ -456,12 +454,12 @@ void drm_fbdev_generic_setup(struct drm_device *dev,
fb_helper = kzalloc(sizeof(*fb_helper), GFP_KERNEL);
if (!fb_helper)
return;
+   drm_fb_helper_prepare(dev, fb_helper, _fb_helper_generic_funcs);
 
ret = drm_client_init(dev, _helper->client, "fbdev", 
_fbdev_client_funcs);
if (ret) {
-   kfree(fb_helper);
drm_err(dev, "Failed to register client: %d\n", ret);
-   return;
+   goto err_drm_client_init;
}
 
/*
@@ -484,5 +482,12 @@ void drm_fbdev_generic_setup(struct drm_device *dev,
drm_dbg_kms(dev, "client hotplug ret=%d\n", ret);
 
drm_client_register(_helper->client);
+
+   return;
+
+err_drm_client_init:
+   drm_fb_helper_unprepare(fb_helper);
+   kfree(fb_helper);
+   return;
 }
 EXPORT_SYMBOL(drm_fbdev_generic_setup);
-- 
2.39.0



[PATCH v3 02/10] drm/client: Add hotplug_failed flag

2023-01-25 Thread Thomas Zimmermann
Signal failed hotplugging with a flag in struct drm_client_dev. If set,
the client helpers will not further try to set up the fbdev display.

This used to be signalled with a combination of cleared pointers in
struct drm_fb_helper, which prevents us from initializing these pointers
early after allocation.

The change also harmonizes behavior among DRM clients. Additional DRM
clients will now handle failed hotplugging like fbdev does.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_client.c| 5 +
 drivers/gpu/drm/drm_fbdev_generic.c | 4 
 include/drm/drm_client.h| 8 
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index 09ac191c202d..009e7b10455c 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -208,8 +208,13 @@ void drm_client_dev_hotplug(struct drm_device *dev)
if (!client->funcs || !client->funcs->hotplug)
continue;
 
+   if (client->hotplug_failed)
+   continue;
+
ret = client->funcs->hotplug(client);
drm_dbg_kms(dev, "%s: ret=%d\n", client->name, ret);
+   if (ret)
+   client->hotplug_failed = true;
}
mutex_unlock(>clientlist_mutex);
 }
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 3d455a2e3fb5..135d58b8007b 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -382,10 +382,6 @@ static int drm_fbdev_client_hotplug(struct drm_client_dev 
*client)
struct drm_device *dev = client->dev;
int ret;
 
-   /* Setup is not retried if it has failed */
-   if (!fb_helper->dev && fb_helper->funcs)
-   return 0;
-
if (dev->fb_helper)
return drm_fb_helper_hotplug_event(dev->fb_helper);
 
diff --git a/include/drm/drm_client.h b/include/drm/drm_client.h
index 4fc8018eddda..39482527a775 100644
--- a/include/drm/drm_client.h
+++ b/include/drm/drm_client.h
@@ -106,6 +106,14 @@ struct drm_client_dev {
 * @modesets: CRTC configurations
 */
struct drm_mode_set *modesets;
+
+   /**
+* @hotplug failed:
+*
+* Set by client hotplug helpers if the hotplugging failed
+* before. It is usually not tried again.
+*/
+   bool hotplug_failed;
 };
 
 int drm_client_init(struct drm_device *dev, struct drm_client_dev *client,
-- 
2.39.0



[PATCH v3 01/10] drm/client: Test for connectors before sending hotplug event

2023-01-25 Thread Thomas Zimmermann
Test for connectors in the client code and remove a similar test
from the generic fbdev emulation. Do nothing if the test fails.
Not having connectors indicates a driver bug.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/drm_client.c| 5 +
 drivers/gpu/drm/drm_fbdev_generic.c | 5 -
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index 262ec64d4397..09ac191c202d 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -198,6 +198,11 @@ void drm_client_dev_hotplug(struct drm_device *dev)
if (!drm_core_check_feature(dev, DRIVER_MODESET))
return;
 
+   if (!dev->mode_config.num_connector) {
+   drm_dbg_kms(dev, "No connectors found, will not send hotplug 
events!\n");
+   return;
+   }
+
mutex_lock(>clientlist_mutex);
list_for_each_entry(client, >clientlist, list) {
if (!client->funcs || !client->funcs->hotplug)
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c
index 0a4c160e0e58..3d455a2e3fb5 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -389,11 +389,6 @@ static int drm_fbdev_client_hotplug(struct drm_client_dev 
*client)
if (dev->fb_helper)
return drm_fb_helper_hotplug_event(dev->fb_helper);
 
-   if (!dev->mode_config.num_connector) {
-   drm_dbg_kms(dev, "No connectors found, will not create 
framebuffer!\n");
-   return 0;
-   }
-
drm_fb_helper_prepare(dev, fb_helper, _fb_helper_generic_funcs);
 
ret = drm_fb_helper_init(dev, fb_helper);
-- 
2.39.0



[PATCH 31/32] drm/amdkfd: add debug device snapshot operation

2023-01-25 Thread Jonathan Kim
Similar to queue snapshot, return an array of device information using
an entry_size check and return.
Unlike queue snapshots, the debugger needs to pass to correct number of
devices that exist.  If it fails to do so, the KFD will return the
number of actual devices so that the debugger can make a subsequent
successful call.

v3: was reviewed but re-requesting review with new revision and
subvendor information.
memset 0 device info entry to clear padding.

v2: change buf_size are to num_devices for more clarity.
expand device entry new members on copy.
fix minimum entry size calculation for queue and device snapshot.
change device snapshot implementation to match queue snapshot
implementation.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  7 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 72 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  5 ++
 3 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 93b288233577..da74a6ef4d9b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2972,8 +2972,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
>queue_snapshot.entry_size);
break;
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
-   pr_warn("Debug op %i not supported yet\n", args->op);
-   r = -EACCES;
+   r = kfd_dbg_trap_device_snapshot(target,
+   args->device_snapshot.exception_mask,
+   (void __user 
*)args->device_snapshot.snapshot_buf_ptr,
+   >device_snapshot.num_devices,
+   >device_snapshot.entry_size);
break;
default:
pr_err("Invalid option: %i\n", args->op);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index db316f0625f8..d1c4eb9652fd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -22,6 +22,7 @@
 
 #include "kfd_debug.h"
 #include "kfd_device_queue_manager.h"
+#include "kfd_topology.h"
 #include 
 #include 
 
@@ -998,6 +999,77 @@ int kfd_dbg_trap_query_exception_info(struct kfd_process 
*target,
return r;
 }
 
+int kfd_dbg_trap_device_snapshot(struct kfd_process *target,
+   uint64_t exception_clear_mask,
+   void __user *user_info,
+   uint32_t *number_of_device_infos,
+   uint32_t *entry_size)
+{
+   struct kfd_dbg_device_info_entry device_info;
+   uint32_t tmp_entry_size = *entry_size, tmp_num_devices;
+   int i, r = 0;
+
+   if (!(target && user_info && number_of_device_infos && entry_size))
+   return -EINVAL;
+
+   tmp_num_devices = min_t(size_t, *number_of_device_infos, 
target->n_pdds);
+   *number_of_device_infos = target->n_pdds;
+   *entry_size = min_t(size_t, *entry_size, sizeof(device_info));
+
+   if (!tmp_num_devices)
+   return 0;
+
+   memset(_info, 0, sizeof(device_info));
+
+   mutex_lock(>event_mutex);
+
+   /* Run over all pdd of the process */
+   for (i = 0; i < tmp_num_devices; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+   struct kfd_topology_device *topo_dev = 
kfd_topology_device_by_id(pdd->dev->id);
+
+   device_info.gpu_id = pdd->dev->id;
+   device_info.exception_status = pdd->exception_status;
+   device_info.lds_base = pdd->lds_base;
+   device_info.lds_limit = pdd->lds_limit;
+   device_info.scratch_base = pdd->scratch_base;
+   device_info.scratch_limit = pdd->scratch_limit;
+   device_info.gpuvm_base = pdd->gpuvm_base;
+   device_info.gpuvm_limit = pdd->gpuvm_limit;
+   device_info.location_id = topo_dev->node_props.location_id;
+   device_info.vendor_id = topo_dev->node_props.vendor_id;
+   device_info.device_id = topo_dev->node_props.device_id;
+   device_info.revision_id = pdd->dev->adev->pdev->revision;
+   device_info.subsystem_vendor_id = 
pdd->dev->adev->pdev->subsystem_vendor;
+   device_info.subsystem_device_id = 
pdd->dev->adev->pdev->subsystem_device;
+   device_info.fw_version = pdd->dev->mec_fw_version;
+   device_info.gfx_target_version =
+   topo_dev->node_props.gfx_target_version;
+   device_info.simd_count = topo_dev->node_props.simd_count;
+   device_info.max_waves_per_simd =
+   topo_dev->node_props.max_waves_per_simd;
+   device_info.array_count = topo_dev->node_props.array_count;
+   device_info.simd_arrays_per_engine =
+  

[PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation

2023-01-25 Thread Jonathan Kim
In order to inspect waves from the saved context at any point during a
debug session, the debugger must be able to preempt queues to trigger
context save by suspending them.

On queue suspend, the KFD will copy the context save header information
so that the debugger can correctly crawl the appropriate size of the saved
context. The debugger must then also be allowed to resume suspended queues.

A queue that is newly created cannot be suspended because queue ids are
recycled after destruction so the debugger needs to know that this has
occurred.  Query functions will be later added that will clear a given
queue of its new queue status.

A queue cannot be destroyed while it is suspended to preserve its saved
context during debugger inspection.  Have queue destruction block while
a queue is suspended and unblocked when it is resumed.  Likewise, if a
queue is about to be destroyed, it cannot be suspended.

Return the number of queues successfully suspended or resumed along with
a per queue status array where the upper bits per queue status show that
the request was invalid (new/destroyed queue suspend request, missing
queue) or an error occurred (HWS in a fatal state so it can't suspend or
resume queues).

v2: add gfx11/mes support.
prevent header copy on suspend from overwriting user fields.
simplify resume_queues function.
address other nit-picks

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  11 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   7 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 446 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  10 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  14 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  |  11 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +-
 10 files changed, 518 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 60c3b0449d86..d50415fe0475 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -758,6 +758,11 @@ bool amdgpu_amdkfd_have_atomics_support(struct 
amdgpu_device *adev)
return adev->have_atomics_support;
 }
 
+void amdgpu_amdkfd_debug_mem_fence(struct amdgpu_device *adev)
+{
+   amdgpu_device_flush_hdp(adev, NULL);
+}
+
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev, 
bool reset)
 {
amdgpu_umc_poison_handler(adev, reset);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index df782274a4c8..9d1c6ab14331 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -310,6 +310,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct amdgpu_device 
*adev,
  uint64_t va, void *drm_priv,
  struct kgd_mem **mem, uint64_t *size,
  uint64_t *mmap_offset);
+void amdgpu_amdkfd_debug_mem_fence(struct amdgpu_device *adev);
 int amdgpu_amdkfd_get_tile_config(struct amdgpu_device *adev,
struct tile_config *config);
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 205a487d91d2..b62e93b35a44 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -410,6 +410,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
pr_debug("Write ptr address   == 0x%016llX\n",
args->write_pointer_address);
 
+   kfd_dbg_ev_raise(KFD_EC_MASK(EC_QUEUE_NEW), p, dev, queue_id, false, 
NULL, 0);
return 0;
 
 err_create_queue:
@@ -2908,7 +2909,17 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
args->launch_mode.launch_mode);
break;
case KFD_IOC_DBG_TRAP_SUSPEND_QUEUES:
+   r = suspend_queues(target,
+   args->suspend_queues.num_queues,
+   args->suspend_queues.grace_period,
+   args->suspend_queues.exception_mask,
+   (uint32_t 
*)args->suspend_queues.queue_array_ptr);
+
+   break;
case KFD_IOC_DBG_TRAP_RESUME_QUEUES:
+   r = resume_queues(target, args->resume_queues.num_queues,
+   (uint32_t 
*)args->resume_queues.queue_array_ptr);
+   break;
case KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH:
case KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH:
case 

[PATCH 18/32] drm/amdkfd: add send exception operation

2023-01-25 Thread Jonathan Kim
Add a debug operation that allows the debugger to send an exception
directly to runtime through a payload address.

For memory violations, normal vmfault signals will be applied to
notify runtime instead after passing in the saved exception data
when a memory violation was raised to the debugger.

For runtime exceptions, this will unblock the runtime enable
function which will be explained and implemented in a follow up
patch.

Signed-off-by: Jonathan Kim 
---
 .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 44 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 71 ++-
 8 files changed, 135 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
index 5c8023cba196..62a38cd820fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
@@ -118,9 +118,9 @@ static void cik_event_interrupt_wq(struct kfd_dev *dev,
return;
 
if (info.vmid == vmid)
-   kfd_signal_vm_fault_event(dev, pasid, );
+   kfd_signal_vm_fault_event(dev, pasid, , NULL);
else
-   kfd_signal_vm_fault_event(dev, pasid, NULL);
+   kfd_signal_vm_fault_event(dev, pasid, NULL, NULL);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 628178126d3b..09fe8576dc8c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2738,6 +2738,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
r = kfd_dbg_trap_disable(target);
break;
case KFD_IOC_DBG_TRAP_SEND_RUNTIME_EVENT:
+   r = kfd_dbg_send_exception_to_runtime(target,
+   args->send_runtime_event.gpu_id,
+   args->send_runtime_event.queue_id,
+   args->send_runtime_event.exception_mask);
+   break;
case KFD_IOC_DBG_TRAP_SET_EXCEPTIONS_ENABLED:
case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_OVERRIDE:
case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_MODE:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index fcd064b13f6a..4174b479ea6f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -125,6 +125,49 @@ bool kfd_dbg_ev_raise(uint64_t event_mask,
return is_subscribed;
 }
 
+int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
+   unsigned int dev_id,
+   unsigned int queue_id,
+   uint64_t error_reason)
+{
+   if (error_reason & KFD_EC_MASK(EC_DEVICE_MEMORY_VIOLATION)) {
+   struct kfd_process_device *pdd = NULL;
+   struct kfd_hsa_memory_exception_data *data;
+   int i;
+
+   for (i = 0; i < p->n_pdds; i++) {
+   if (p->pdds[i]->dev->id == dev_id) {
+   pdd = p->pdds[i];
+   break;
+   }
+   }
+
+   if (!pdd)
+   return -ENODEV;
+
+   data = (struct kfd_hsa_memory_exception_data *)
+   pdd->vm_fault_exc_data;
+
+   kfd_dqm_evict_pasid(pdd->dev->dqm, p->pasid);
+   kfd_signal_vm_fault_event(pdd->dev, p->pasid, NULL, data);
+   error_reason &= ~KFD_EC_MASK(EC_DEVICE_MEMORY_VIOLATION);
+   }
+
+   if (error_reason & (KFD_EC_MASK(EC_PROCESS_RUNTIME))) {
+   /*
+* block should only happen after the debugger receives runtime
+* enable notice.
+*/
+   up(>runtime_enable_sema);
+   error_reason &= ~KFD_EC_MASK(EC_PROCESS_RUNTIME);
+   }
+
+   if (error_reason)
+   return kfd_send_exception_to_runtime(p, queue_id, error_reason);
+
+   return 0;
+}
+
 static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable)
 {
struct mqd_update_info minfo = {0};
@@ -175,6 +218,7 @@ static int kfd_dbg_set_workaround(struct kfd_process 
*target, bool enable)
}
 
return r;
+}
 
 static int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 

[PATCH 16/32] drm/amdkfd: add per process hw trap enable and disable functions

2023-01-25 Thread Jonathan Kim
To enable HW debug mode per process, all devices must be debug enabled
successfully.  If a failure occures, rewind the enablement of debug mode
on the enabled devices.

A power management scenario that needs to be considered is HW
debug mode setting during GFXOFF.  During GFXOFF, these registers
will be unreachable so we have to transiently disable GFXOFF when
setting.  Also, some devices don't support the RLC save restore
function for these debug registers so we have to disable GFXOFF
completely during a debug session.

Cooperative launch also has debugging restriction based on HW/FW bugs.
If such bugs exists, the debugger cannot attach to a process that uses GWS
resources nor can GWS resources be requested if a process is being
debugged.

Multi-process debug devices can only enable trap temporaries based
on certain runtime scenerios, which will be explained when the
runtime enable functions are implemented in a follow up patch.

v2: add gfx11 support. fix fw checks. remove asic family name comments.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   5 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 148 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  29 
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   9 ++
 5 files changed, 190 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f5f639de28f0..628178126d3b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1453,6 +1453,11 @@ static int kfd_ioctl_alloc_queue_gws(struct file *filep,
goto out_unlock;
}
 
+   if (!kfd_dbg_has_gws_support(dev) && p->debug_trap_enabled) {
+   retval = -EBUSY;
+   goto out_unlock;
+   }
+
retval = pqm_set_gws(>pqm, args->queue_id, args->num_gws ? dev->gws 
: NULL);
mutex_unlock(>mutex);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 6e99a0160275..659dfc7411fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -21,6 +21,7 @@
  */
 
 #include "kfd_debug.h"
+#include "kfd_device_queue_manager.h"
 #include 
 
 void debug_event_write_work_handler(struct work_struct *work)
@@ -101,11 +102,68 @@ static int kfd_dbg_set_mes_debug_mode(struct 
kfd_process_device *pdd)
pdd->watch_points, flags);
 }
 
+/* kfd_dbg_trap_deactivate:
+ * target: target process
+ * unwind: If this is unwinding a failed kfd_dbg_trap_enable()
+ * unwind_count:
+ * If unwind == true, how far down the pdd list we need
+ * to unwind
+ * else: ignored
+ */
+static void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, 
int unwind_count)
+{
+   int i, count = 0;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+
+   /* If this is an unwind, and we have unwound the required
+* enable calls on the pdd list, we need to stop now
+* otherwise we may mess up another debugger session.
+*/
+   if (unwind && count == unwind_count)
+   break;
+
+   /* GFX off is already disabled by debug activate if not RLC 
restore supported. */
+   if (kfd_dbg_is_rlc_restore_supported(pdd->dev))
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
+   pdd->spi_dbg_override =
+   pdd->dev->kfd2kgd->disable_debug_trap(
+   pdd->dev->adev,
+   target->runtime_info.ttmp_setup,
+   pdd->dev->vm_info.last_vmid_kfd);
+   if (kfd_dbg_is_rlc_restore_supported(pdd->dev))
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
+
+   if (!kfd_dbg_is_per_vmid_supported(pdd->dev) &&
+   release_debug_trap_vmid(pdd->dev->dqm, 
>qpd))
+   pr_err("Failed to release debug vmid on [%i]\n", 
pdd->dev->id);
+
+   if (!pdd->dev->shared_resources.enable_mes)
+   debug_refresh_runlist(pdd->dev->dqm);
+   else
+   kfd_dbg_set_mes_debug_mode(pdd);
+
+   count++;
+   }
+
+   kfd_dbg_set_workaround(target, false);
+}
+
 int kfd_dbg_trap_disable(struct kfd_process *target)
 {
if (!target->debug_trap_enabled)
return 0;
 
+   /*
+* Defer deactivation to runtime if runtime not enabled otherwise reset
+* attached running target runtime state to enable for re-attach.
+*/
+   if (target->runtime_info.runtime_state == 

[PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation

2023-01-25 Thread Jonathan Kim
Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

v3: add gfx11 support.
cleanup gfx9 kgd calls to set and clear address watch.
use per device spinlock to set watch points.
fixup runlist refresh calls on set/clear address watch.

v2: change dev_id arg to gpu_id for consistency

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  51 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  78 ++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|   8 ++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |   5 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|  52 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  77 ++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   8 ++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  24 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 136 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   6 +-
 13 files changed, 451 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 4de2066215b4..18baf1cd8c01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -118,6 +118,55 @@ static uint32_t kgd_aldebaran_set_wave_launch_mode(struct 
amdgpu_device *adev,
return data;
 }
 
+#define TCP_WATCH_STRIDE (regTCP_WATCH1_ADDR_H - regTCP_WATCH0_ADDR_H)
+static uint32_t kgd_gfx_aldebaran_set_address_watch(
+   struct amdgpu_device *adev,
+   uint64_t watch_address,
+   uint32_t watch_address_mask,
+   uint32_t watch_id,
+   uint32_t watch_mode,
+   uint32_t debug_vmid)
+{
+   uint32_t watch_address_high;
+   uint32_t watch_address_low;
+   uint32_t watch_address_cntl;
+
+   watch_address_cntl = 0;
+   watch_address_low = lower_32_bits(watch_address);
+   watch_address_high = upper_32_bits(watch_address) & 0x;
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   MODE,
+   watch_mode);
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   MASK,
+   watch_address_mask >> 6);
+
+   watch_address_cntl = REG_SET_FIELD(watch_address_cntl,
+   TCP_WATCH0_CNTL,
+   VALID,
+   1);
+
+   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) +
+   (watch_id * TCP_WATCH_STRIDE)),
+   watch_address_high);
+
+   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_L) +
+   (watch_id * TCP_WATCH_STRIDE)),
+   watch_address_low);
+
+   return watch_address_cntl;
+}
+
+uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev,
+   uint32_t watch_id)
+{
+   return 0;
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -140,6 +189,8 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.validate_trap_override_request = 
kgd_aldebaran_validate_trap_override_request,
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
+   .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
+   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 500013540356..a7fb5ef13166 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -413,6 +413,8 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.validate_trap_override_request = 
kgd_gfx_v9_validate_trap_override_request,

[PATCH 29/32] drm/amdkfd: add debug query exception info operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to query additional info based on an exception code.
For device exceptions, it's currently only memory violation information.
For process exceptions, it's currently only runtime information.
Queue exception only report the queue exception status.

The debugger has the option of clearing the target exception on query.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |   7 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 120 +++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |   6 ++
 3 files changed, 133 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 0ae1237fa193..d3d2026b6e65 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2957,6 +2957,13 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
>query_debug_event.exception_mask);
break;
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
+   r = kfd_dbg_trap_query_exception_info(target,
+   args->query_exception_info.source_id,
+   args->query_exception_info.exception_code,
+   args->query_exception_info.clear_exception,
+   (void __user 
*)args->query_exception_info.info_ptr,
+   >query_exception_info.info_size);
+   break;
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
pr_warn("Debug op %i not supported yet\n", args->op);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 032207efef15..db316f0625f8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -878,6 +878,126 @@ int kfd_dbg_trap_set_wave_launch_mode(struct kfd_process 
*target,
return r;
 }
 
+int kfd_dbg_trap_query_exception_info(struct kfd_process *target,
+   uint32_t source_id,
+   uint32_t exception_code,
+   bool clear_exception,
+   void __user *info,
+   uint32_t *info_size)
+{
+   bool found = false;
+   int r = 0;
+   uint32_t copy_size, actual_info_size = 0;
+   uint64_t *exception_status_ptr = NULL;
+
+   if (!target)
+   return -EINVAL;
+
+   if (!info || !info_size)
+   return -EINVAL;
+
+   mutex_lock(>event_mutex);
+
+   if (KFD_DBG_EC_TYPE_IS_QUEUE(exception_code)) {
+   /* Per queue exceptions */
+   struct queue *queue = NULL;
+   int i;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+   struct qcm_process_device *qpd = >qpd;
+
+   list_for_each_entry(queue, >queues_list, list) {
+   if (!found && queue->properties.queue_id == 
source_id) {
+   found = true;
+   break;
+   }
+   }
+   if (found)
+   break;
+   }
+
+   if (!found) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   if (!(queue->properties.exception_status & 
KFD_EC_MASK(exception_code))) {
+   r = -ENODATA;
+   goto out;
+   }
+   exception_status_ptr = >properties.exception_status;
+   } else if (KFD_DBG_EC_TYPE_IS_DEVICE(exception_code)) {
+   /* Per device exceptions */
+   struct kfd_process_device *pdd = NULL;
+   int i;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   pdd = target->pdds[i];
+   if (pdd->dev->id == source_id) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   if (!(pdd->exception_status & KFD_EC_MASK(exception_code))) {
+   r = -ENODATA;
+   goto out;
+   }
+
+   if (exception_code == EC_DEVICE_MEMORY_VIOLATION) {
+   copy_size = min((size_t)(*info_size), 
pdd->vm_fault_exc_data_size);
+
+   if (copy_to_user(info, pdd->vm_fault_exc_data, 
copy_size)) {
+   r = -EFAULT;
+   goto out;
+   }
+   actual_info_size = pdd->vm_fault_exc_data_size;
+   if 

[PATCH 30/32] drm/amdkfd: add debug queue snapshot operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to get a snapshot of a specified number of queues
containing various queue property information that is copied to the
debugger.

Since the debugger doesn't know how many queues exist at any given time,
allow the debugger to pass the requested number of snapshots as 0 to get
the actual number of potential snapshots to use for a subsequent snapshot
request for actual information.

To prevent future ABI breakage, pass in the requested entry_size.
The KFD will return it's own entry_size in case the debugger still wants
log the information in a core dump on sizing failure.

Also allow the debugger to clear exceptions when doing a snapshot.

v3: fix uninitialized return and change queue snapshot to type void for
proper increment on buffer copy.
use memset 0 to init snapshot entry to clear struct padding.

v2: change buf_size arg to num_queues for clarity.
fix minimum entry size calculation.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  6 +++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +++
 .../amd/amdkfd/kfd_process_queue_manager.c| 41 +++
 5 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d3d2026b6e65..93b288233577 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2965,6 +2965,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
>query_exception_info.info_size);
break;
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
+   r = pqm_get_queue_snapshot(>pqm,
+   args->queue_snapshot.exception_mask,
+   (void __user 
*)args->queue_snapshot.snapshot_buf_ptr,
+   >queue_snapshot.num_queues,
+   >queue_snapshot.entry_size);
+   break;
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
pr_warn("Debug op %i not supported yet\n", args->op);
r = -EACCES;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7792fe9491c5..5ae504a512f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3000,6 +3000,42 @@ int suspend_queues(struct kfd_process *p,
return total_suspended;
 }
 
+static uint32_t set_queue_type_for_user(struct queue_properties *q_props)
+{
+   switch (q_props->type) {
+   case KFD_QUEUE_TYPE_COMPUTE:
+   return q_props->format == KFD_QUEUE_FORMAT_PM4
+   ? KFD_IOC_QUEUE_TYPE_COMPUTE
+   : KFD_IOC_QUEUE_TYPE_COMPUTE_AQL;
+   case KFD_QUEUE_TYPE_SDMA:
+   return KFD_IOC_QUEUE_TYPE_SDMA;
+   case KFD_QUEUE_TYPE_SDMA_XGMI:
+   return KFD_IOC_QUEUE_TYPE_SDMA_XGMI;
+   default:
+   WARN_ONCE(true, "queue type not recognized!");
+   return 0x;
+   };
+}
+
+void set_queue_snapshot_entry(struct queue *q,
+ uint64_t exception_clear_mask,
+ struct kfd_queue_snapshot_entry *qss_entry)
+{
+   qss_entry->ring_base_address = q->properties.queue_address;
+   qss_entry->write_pointer_address = (uint64_t)q->properties.write_ptr;
+   qss_entry->read_pointer_address = (uint64_t)q->properties.read_ptr;
+   qss_entry->ctx_save_restore_address =
+   q->properties.ctx_save_restore_area_address;
+   qss_entry->ctx_save_restore_area_size =
+   q->properties.ctx_save_restore_area_size;
+   qss_entry->exception_status = q->properties.exception_status;
+   qss_entry->queue_id = q->properties.queue_id;
+   qss_entry->gpu_id = q->device->id;
+   qss_entry->ring_size = (uint32_t)q->properties.queue_size;
+   qss_entry->queue_type = set_queue_type_for_user(>properties);
+   q->properties.exception_status &= ~exception_clear_mask;
+}
+
 int debug_lock_and_unmap(struct device_queue_manager *dqm)
 {
int r;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7ccf8d0d1867..89d4a5b293a5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -296,6 +296,9 @@ int suspend_queues(struct kfd_process *p,
 int resume_queues(struct kfd_process *p,
uint32_t num_queues,
uint32_t *usr_queue_id_array);
+void set_queue_snapshot_entry(struct queue *q,
+ uint64_t 

[PATCH 05/32] drm/amdgpu: setup hw debug registers on driver initialization

2023-01-25 Thread Jonathan Kim
Add missing debug trap registers references and initialize all debug
registers on boot by clearing the hardware exception overrides and the
wave allocation ID index.

The debugger requires that TTMPs 6 & 7 save the dispatch ID to map
waves onto dispatch during compute context inspection.
In order to correctly set this up, set the special reserved CP bit by
default whenever the MQD is initailized.

v2: leave TRAP_EN set for multi-process debugging as per process disable
will be taken care of in later patches.
fixup typo in description.
enable ttmp setup for dispatch boundary in mqd init for gfx11.
add trap on wave start and end registers for gfx11.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 30 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  5 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  |  5 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  5 ++
 .../include/asic_reg/gc/gc_10_1_0_offset.h| 14 
 .../include/asic_reg/gc/gc_10_1_0_sh_mask.h   | 69 +++
 .../include/asic_reg/gc/gc_10_3_0_offset.h| 10 +++
 .../include/asic_reg/gc/gc_10_3_0_sh_mask.h   |  4 ++
 .../include/asic_reg/gc/gc_11_0_0_sh_mask.h   |  4 ++
 11 files changed, 173 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 6983acc456b2..a5faf23805b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4823,6 +4823,29 @@ static u32 
gfx_v10_0_init_pa_sc_tile_steering_override(struct amdgpu_device *ade
 
 #define DEFAULT_SH_MEM_BASES   (0x6000)
 
+static void gfx_v10_0_debug_trap_config_init(struct amdgpu_device *adev,
+   uint32_t first_vmid,
+   uint32_t last_vmid)
+{
+   uint32_t data;
+   uint32_t trap_config_vmid_mask = 0;
+   int i;
+
+   /* Calculate trap config vmid mask */
+   for (i = first_vmid; i < last_vmid; i++)
+   trap_config_vmid_mask |= (1 << i);
+
+   data = REG_SET_FIELD(0, SPI_GDBG_TRAP_CONFIG,
+   VMID_SEL, trap_config_vmid_mask);
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   TRAP_EN, 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_CONFIG), data);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA0), 0);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA1), 0);
+}
+
 static void gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
 {
int i;
@@ -4854,6 +4877,9 @@ static void gfx_v10_0_init_compute_vmid(struct 
amdgpu_device *adev)
WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
}
+
+   gfx_v10_0_debug_trap_config_init(adev, adev->vm_manager.first_kfd_vmid,
+   AMDGPU_NUM_VMID);
 }
 
 static void gfx_v10_0_init_gds_vmid(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index c621b2ad7ba3..3ca7a31fb770 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1572,6 +1572,7 @@ static void gfx_v11_0_init_compute_vmid(struct 
amdgpu_device *adev)
/* Enable trap for each kfd vmid. */
data = RREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   WREG32_SOC15(GC, 0, regSPI_GDBG_PER_VMID_CNTL, data);
}
soc21_grbm_select(adev, 0, 0, 0, 0);
mutex_unlock(>srbm_mutex);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8ad5c03506f2..222fe87161b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2289,6 +2289,29 @@ static void gfx_v9_0_setup_rb(struct amdgpu_device *adev)
adev->gfx.config.num_rbs = hweight32(active_rbs);
 }
 
+static void gfx_v9_0_debug_trap_config_init(struct amdgpu_device *adev,
+   uint32_t first_vmid,
+   uint32_t last_vmid)
+{
+   uint32_t data;
+   uint32_t trap_config_vmid_mask = 0;
+   int i;
+
+   /* Calculate trap config vmid mask */
+   for (i = first_vmid; i < last_vmid; i++)
+   trap_config_vmid_mask |= (1 << i);
+
+   data = REG_SET_FIELD(0, SPI_GDBG_TRAP_CONFIG,
+   VMID_SEL, trap_config_vmid_mask);
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   TRAP_EN, 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_CONFIG), data);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   

[PATCH 22/32] drm/amdkfd: add debug set exceptions enabled operation

2023-01-25 Thread Jonathan Kim
The debugger subscibes to nofication for requested exceptions on attach.
Allow the debugger to change its subsciption later on.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 36 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  2 ++
 3 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 46f9d453dc5e..9b87ba351eff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2892,6 +2892,9 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
args->send_runtime_event.exception_mask);
break;
case KFD_IOC_DBG_TRAP_SET_EXCEPTIONS_ENABLED:
+   kfd_dbg_set_enabled_debug_exception_mask(target,
+   args->set_exceptions_enabled.exception_mask);
+   break;
case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_OVERRIDE:
case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_MODE:
case KFD_IOC_DBG_TRAP_SUSPEND_QUEUES:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 0c876172db4b..3ea53aaa776b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -529,3 +529,39 @@ int kfd_dbg_trap_enable(struct kfd_process *target, 
uint32_t fd,
 
return r;
 }
+
+void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target,
+   uint64_t exception_set_mask)
+{
+   uint64_t found_mask = 0;
+   struct process_queue_manager *pqm;
+   struct process_queue_node *pqn;
+   static const char write_data = '.';
+   loff_t pos = 0;
+   int i;
+
+   mutex_lock(>event_mutex);
+
+   found_mask |= target->exception_status;
+
+   pqm = >pqm;
+   list_for_each_entry(pqn, >queues, process_queue_list) {
+   if (!pqn)
+   continue;
+
+   found_mask |= pqn->q->properties.exception_status;
+   }
+
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+
+   found_mask |= pdd->exception_status;
+   }
+
+   if (exception_set_mask & found_mask)
+   kernel_write(target->dbg_ev_file, _data, 1, );
+
+   target->exception_enable_mask = exception_set_mask;
+
+   mutex_unlock(>event_mutex);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 43284243b2c4..81557579ab04 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -59,6 +59,8 @@ static inline bool kfd_dbg_is_per_vmid_supported(struct 
kfd_dev *dev)
 
 void debug_event_write_work_handler(struct work_struct *work);
 
+void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target,
+   uint64_t exception_set_mask);
 /*
  * If GFX off is enabled, chips that do not support RLC restore for the debug
  * registers will disable GFX off temporarily for the entire debug session.
-- 
2.25.1



[PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events

2023-01-25 Thread Jonathan Kim
The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, any exceptions it subscribed to may still
have interrupts in the interrupt ring buffer or KGD/KFD pipeline.
To prevent a new session from inheriting stale interrupts, when a new
queue is created, open an interrupt drain and allow the IH ring to drain
from a timestamped checkpoint.  Then inject a custom IV so that once
the custom IV is picked up by the KFD, it's safe to close the drain
and proceed with queue creation.

The drain must also be on debug disable as SW interrupts may still
be processed.  Drain at this time and clear all the exception status.

The debugger may also not be attached nor subscibed to certain
exceptions so forward them directly to the runtime.

GFX10 also requires its own IV processing, hence the creation of
kfd_int_process_v10.c.  This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.

v3: enable gfx11 interrupts
v2: fix interrupt drain on debug disable.
fix interrupt drain on queue create during -ERESTARTSYS.
fix up macros naming for ECODE parsing.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|  16 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   2 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|  85 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   6 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   4 +-
 .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c  | 405 ++
 .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  21 +-
 .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c   |  98 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  12 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  47 ++
 .../amd/amdkfd/kfd_process_queue_manager.c|   4 +
 12 files changed, 681 insertions(+), 20 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 8816853e50c0..60c3b0449d86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -763,6 +763,22 @@ void amdgpu_amdkfd_ras_poison_consumption_handler(struct 
amdgpu_device *adev, bo
amdgpu_umc_poison_handler(adev, reset);
 }
 
+int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev,
+   uint32_t *payload)
+{
+   int ret;
+
+   /* Device or IH ring is not ready so bail. */
+   ret = amdgpu_ih_wait_on_checkpoint_process_ts(adev, >irq.ih);
+   if (ret)
+   return ret;
+
+   /* Send payload to fence KFD interrupts */
+   amdgpu_amdkfd_interrupt(adev, payload);
+
+   return 0;
+}
+
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev)
 {
if (adev->gfx.ras && adev->gfx.ras->query_utcl2_poison_status)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 333780491867..df782274a4c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -241,6 +241,8 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct 
amdgpu_device *dst,
struct amdgpu_device *src,
bool is_min);
 int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev, bool 
is_min);
+int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev,
+   uint32_t *payload);
 
 /* Read user wptr from a specified user address space with page fault
  * disabled. The memory must be pinned and mapped to the hardware when
diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index 747754428073..2ec8f27c5366 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -53,6 +53,7 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_events.o \
$(AMDKFD_PATH)/cik_event_interrupt.o \
$(AMDKFD_PATH)/kfd_int_process_v9.o \
+   $(AMDKFD_PATH)/kfd_int_process_v10.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 16acf3d416eb..0c876172db4b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -125,6 +125,65 @@ bool kfd_dbg_ev_raise(uint64_t event_mask,
return is_subscribed;
 }
 
+/* set pending event queue entry from ring entry  */
+bool kfd_set_dbg_ev_from_interrupt(struct kfd_dev *dev,
+  

[PATCH 23/32] drm/amdkfd: add debug wave launch override operation

2023-01-25 Thread Jonathan Kim
This operation allows the debugger to override the enabled HW
exceptions on the device.

On debug devices that only support the debugging of a single process,
the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK
register.
Because they are global, only address watch exceptions are allowed to
be enabled.  In other words, the debugger must preserve all non-address
watch exception states in normal mode operation by barring a full
replacement override or a non-address watch override request.

For multi-process debugging, all HW exception overrides are per-VMID so
all exceptions can be overridden or fully replaced.

In order for the debugger to know what is permissible, returned the
supported override mask back to the debugger along with the previously
enable overrides.

v3: v2 was reviewed but requesting re-review for GFX11 added supported.

v2: switch unsupported override mode return from EPERM to EINVAL to
support unique EPERM on PTRACE failure.

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  | 47 ++
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 55 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 10 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |  5 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 86 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 55 
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 10 +++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  7 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 69 +++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  6 ++
 11 files changed, 350 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index a64a53f9efe6..84a9d9391ea4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -25,6 +25,7 @@
 #include "amdgpu_amdkfd_gfx_v9.h"
 #include "gc/gc_9_4_2_offset.h"
 #include "gc/gc_9_4_2_sh_mask.h"
+#include 
 
 /**
  * Returns TRAP_EN, EXCP_EN and EXCP_REPLACE.
@@ -62,6 +63,50 @@ static uint32_t kgd_aldebaran_disable_debug_trap(struct 
amdgpu_device *adev,
return data;
 }
 
+static int kgd_aldebaran_validate_trap_override_request(struct amdgpu_device 
*adev,
+   uint32_t trap_override,
+   uint32_t 
*trap_mask_supported)
+{
+   *trap_mask_supported &= KFD_DBG_TRAP_MASK_FP_INVALID |
+   KFD_DBG_TRAP_MASK_FP_INPUT_DENORMAL |
+   KFD_DBG_TRAP_MASK_FP_DIVIDE_BY_ZERO |
+   KFD_DBG_TRAP_MASK_FP_OVERFLOW |
+   KFD_DBG_TRAP_MASK_FP_UNDERFLOW |
+   KFD_DBG_TRAP_MASK_FP_INEXACT |
+   KFD_DBG_TRAP_MASK_INT_DIVIDE_BY_ZERO |
+   KFD_DBG_TRAP_MASK_DBG_ADDRESS_WATCH |
+   KFD_DBG_TRAP_MASK_DBG_MEMORY_VIOLATION;
+
+   if (trap_override != KFD_DBG_TRAP_OVERRIDE_OR &&
+   trap_override != KFD_DBG_TRAP_OVERRIDE_REPLACE)
+   return -EPERM;
+
+   return 0;
+}
+
+/* returns TRAP_EN, EXCP_EN and EXCP_RPLACE. */
+static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct 
amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t trap_override,
+   uint32_t trap_mask_bits,
+   uint32_t trap_mask_request,
+   uint32_t *trap_mask_prev,
+   uint32_t kfd_dbg_trap_cntl_prev)
+
+{
+   uint32_t data = 0;
+
+   *trap_mask_prev = REG_GET_FIELD(kfd_dbg_trap_cntl_prev, 
SPI_GDBG_PER_VMID_CNTL, EXCP_EN);
+   trap_mask_bits = (trap_mask_bits & trap_mask_request) |
+   (*trap_mask_prev & ~trap_mask_request);
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 
trap_mask_bits);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 
trap_override);
+
+   return data;
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -81,6 +126,8 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.set_vm_context_page_table_base = 
kgd_gfx_v9_set_vm_context_page_table_base,
.enable_debug_trap = kgd_aldebaran_enable_debug_trap,
.disable_debug_trap = kgd_aldebaran_disable_debug_trap,
+   .validate_trap_override_request = 
kgd_aldebaran_validate_trap_override_request,
+   

[PATCH 27/32] drm/amdkfd: add debug set flags operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to set single memory and single ALU operations.

Some exceptions are imprecise (memory violations, address watch) in the
sense that a trap occurs only when the exception interrupt occurs and
not at the non-halting faulty instruction.  Trap temporaries 0 & 1 save
the program counter address, which means that these values will not point
to the faulty instruction address but to whenever the interrupt was
raised.

Setting the Single Memory Operations flag will inject an automatic wait
on every memory operation instruction forcing imprecise memory exceptions
to become precise at the cost of performance.  This setting is not
permitted on debug devices that support only a global setting of this
option.

Return the previous set flags to the debugger as well.

v3: make precise mem op the only available flag for now.

v2: add gfx11 support.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 38 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  1 +
 3 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8f2ede781863..c34caa14b84e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2947,6 +2947,8 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
args->clear_node_address_watch.id);
break;
case KFD_IOC_DBG_TRAP_SET_FLAGS:
+   r = kfd_dbg_trap_set_flags(target, >set_flags.flags);
+   break;
case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 8d2e1adb442d..77ba7da2bb9d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -23,6 +23,7 @@
 #include "kfd_debug.h"
 #include "kfd_device_queue_manager.h"
 #include 
+#include 
 
 #define MAX_WATCH_ADDRESSES4
 
@@ -425,6 +426,40 @@ static void kfd_dbg_clear_process_address_watch(struct 
kfd_process *target)
kfd_dbg_trap_clear_dev_address_watch(target->pdds[i], 
j);
 }
 
+int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t *flags)
+{
+   uint32_t prev_flags = target->dbg_flags;
+   int i, r = 0;
+
+   for (i = 0; i < target->n_pdds; i++) {
+   if (!kfd_dbg_is_per_vmid_supported(target->pdds[i]->dev) &&
+   (*flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP)) {
+   *flags = prev_flags;
+   return -EACCES;
+   }
+   }
+
+   target->dbg_flags = *flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP;
+   *flags = prev_flags;
+   for (i = 0; i < target->n_pdds; i++) {
+   struct kfd_process_device *pdd = target->pdds[i];
+
+   if (!kfd_dbg_is_per_vmid_supported(pdd->dev))
+   continue;
+
+   if (!pdd->dev->shared_resources.enable_mes)
+   r = debug_refresh_runlist(pdd->dev->dqm);
+   else
+   r = kfd_dbg_set_mes_debug_mode(pdd);
+
+   if (r) {
+   target->dbg_flags = prev_flags;
+   break;
+   }
+   }
+
+   return r;
+}
 
 /* kfd_dbg_trap_deactivate:
  * target: target process
@@ -439,9 +474,12 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, 
bool unwind, int unwind
int i, count = 0;
 
if (!unwind) {
+   uint32_t flags = 0;
cancel_work_sync(>debug_event_workarea);
kfd_dbg_clear_process_address_watch(target);
kfd_dbg_trap_set_wave_launch_mode(target, 0);
+
+   kfd_dbg_trap_set_flags(target, );
}
 
for (i = 0; i < target->n_pdds; i++) {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 63c716ce5ab9..782362d82890 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -57,6 +57,7 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t watch_address_mask,
uint32_t *watch_id,
uint32_t watch_mode);
+int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t *flags);
 int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
unsigned int dev_id,
unsigned int queue_id,
-- 
2.25.1



[PATCH 28/32] drm/amdkfd: add debug query event operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to query a single queue, device and process
exception.
The KFD should also return the GPU or Queue id of the exception.
The debugger also has the option of clearing exceptions after
being queried.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  6 +++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 64 
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  5 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index c34caa14b84e..0ae1237fa193 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2950,6 +2950,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
r = kfd_dbg_trap_set_flags(target, >set_flags.flags);
break;
case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
+   r = kfd_dbg_ev_query_debug_event(target,
+   >query_debug_event.queue_id,
+   >query_debug_event.gpu_id,
+   args->query_debug_event.exception_mask,
+   >query_debug_event.exception_mask);
+   break;
case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 77ba7da2bb9d..032207efef15 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -27,6 +27,70 @@
 
 #define MAX_WATCH_ADDRESSES4
 
+int kfd_dbg_ev_query_debug_event(struct kfd_process *process,
+ unsigned int *queue_id,
+ unsigned int *gpu_id,
+ uint64_t exception_clear_mask,
+ uint64_t *event_status)
+{
+   struct process_queue_manager *pqm;
+   struct process_queue_node *pqn;
+   int i;
+
+   if (!(process && process->debug_trap_enabled))
+   return -ENODATA;
+
+   mutex_lock(>event_mutex);
+   *event_status = 0;
+   *queue_id = 0;
+   *gpu_id = 0;
+
+   /* find and report queue events */
+   pqm = >pqm;
+   list_for_each_entry(pqn, >queues, process_queue_list) {
+   uint64_t tmp = process->exception_enable_mask;
+
+   if (!pqn->q)
+   continue;
+
+   tmp &= pqn->q->properties.exception_status;
+
+   if (!tmp)
+   continue;
+
+   *event_status = pqn->q->properties.exception_status;
+   *queue_id = pqn->q->properties.queue_id;
+   *gpu_id = pqn->q->device->id;
+   pqn->q->properties.exception_status &= ~exception_clear_mask;
+   goto out;
+   }
+
+   /* find and report device events */
+   for (i = 0; i < process->n_pdds; i++) {
+   struct kfd_process_device *pdd = process->pdds[i];
+   uint64_t tmp = process->exception_enable_mask
+   & pdd->exception_status;
+
+   if (!tmp)
+   continue;
+
+   *event_status = pdd->exception_status;
+   *gpu_id = pdd->dev->id;
+   pdd->exception_status &= ~exception_clear_mask;
+   goto out;
+   }
+
+   /* report process events */
+   if (process->exception_enable_mask & process->exception_status) {
+   *event_status = process->exception_status;
+   process->exception_status &= ~exception_clear_mask;
+   }
+
+out:
+   mutex_unlock(>event_mutex);
+   return *event_status ? 0 : -EAGAIN;
+}
+
 void debug_event_write_work_handler(struct work_struct *work)
 {
struct kfd_process *process;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 782362d82890..4f2195d57ff0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -27,6 +27,11 @@
 
 void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, int 
unwind_count);
 int kfd_dbg_trap_activate(struct kfd_process *target);
+int kfd_dbg_ev_query_debug_event(struct kfd_process *process,
+   unsigned int *queue_id,
+   unsigned int *gpu_id,
+   uint64_t exception_clear_mask,
+   uint64_t *event_status);
 bool kfd_set_dbg_ev_from_interrupt(struct kfd_dev *dev,
   unsigned int pasid,
   uint32_t doorbell_id,
-- 
2.25.1



[PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11

2023-01-25 Thread Jonathan Kim
Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |  2 +
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|  7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 -
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64 +++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  3 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  | 42 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  9 ++-
 13 files changed, 124 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index d20df0cf0d88..b5f5eed2b5ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -219,6 +219,8 @@ struct mes_add_queue_input {
uint32_tgws_size;
uint64_ttba_addr;
uint64_ttma_addr;
+   uint32_ttrap_en;
+   uint32_tskip_process_ctx_clear;
uint32_tis_kfd_process;
uint32_tis_aql_queue;
uint32_tqueue_size;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index fbacdc42efac..38c7a0cbf264 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.gws_size = input->gws_size;
mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
mes_add_queue_pkt.tma_addr = input->tma_addr;
+   mes_add_queue_pkt.trap_en = input->trap_en;
+   mes_add_queue_pkt.skip_process_ctx_clear = 
input->skip_process_ctx_clear;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
 
/* For KFD, gds_size is re-used for queue size (needed in MES for AQL 
queues) */
mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
mes_add_queue_pkt.gds_size = input->queue_size;
 
-   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&
- (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
- (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
-   mes_add_queue_pkt.trap_en = 1;
-
/* For KFD, gds_size is re-used for queue size (needed in MES for AQL 
queues) */
mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
mes_add_queue_pkt.gds_size = input->queue_size;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ee05c2e54ef6..f5f639de28f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -530,8 +530,6 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
goto out;
}
 
-   minfo.update_flag = UPDATE_FLAG_CU_MASK;
-
mutex_lock(>mutex);
 
retval = pqm_update_mqd(>pqm, args->queue_id, );
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index f6ea6db266b4..6e99a0160275 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -37,6 +37,70 @@ void debug_event_write_work_handler(struct work_struct *work)
kernel_write(process->dbg_ev_file, _data, 1, );
 }
 
+static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable)
+{
+   struct mqd_update_info minfo = {0};
+   int err;
+
+   if (!q || (!q->properties.is_dbg_wa && !enable))
+   return 0;
+
+   

[PATCH 24/32] drm/amdkfd: add debug wave launch mode operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to set wave behaviour on to either normally operate,
halt at launch, trap on every instruction, terminate immediately or
stall on allocation.

v2: add gfx11 support and remove deprecated launch mode options

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  | 12 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  1 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 25 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  3 ++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |  3 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 14 +++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 25 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 36 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  5 ++-
 11 files changed, 124 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 84a9d9391ea4..4de2066215b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -107,6 +107,17 @@ static uint32_t 
kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device
return data;
 }
 
+static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+   uint8_t wave_launch_mode,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, LAUNCH_MODE, 
wave_launch_mode);
+
+   return data;
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -128,6 +139,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.disable_debug_trap = kgd_aldebaran_disable_debug_trap,
.validate_trap_override_request = 
kgd_aldebaran_validate_trap_override_request,
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
+   .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 0405725e95e3..500013540356 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -412,6 +412,7 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.disable_debug_trap = kgd_arcturus_disable_debug_trap,
.validate_trap_override_request = 
kgd_gfx_v9_validate_trap_override_request,
.set_wave_launch_trap_override = 
kgd_gfx_v9_set_wave_launch_trap_override,
+   .set_wave_launch_mode = kgd_gfx_v9_set_wave_launch_mode,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 32a6e5fbeacd..7591145bc69f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -854,6 +854,30 @@ uint32_t kgd_gfx_v10_set_wave_launch_trap_override(struct 
amdgpu_device *adev,
return 0;
 }
 
+uint32_t kgd_gfx_v10_set_wave_launch_mode(struct amdgpu_device *adev,
+   uint8_t wave_launch_mode,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+   bool is_mode_set = !!wave_launch_mode;
+
+   mutex_lock(>grbm_idx_mutex);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true);
+
+   data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL2,
+   VMID_MASK, is_mode_set ? 1 << vmid : 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL2,
+   MODE, is_mode_set ? wave_launch_mode : 0);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL2), data);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(>grbm_idx_mutex);
+
+   return 0;
+}
+
 /* kgd_gfx_v10_get_iq_wait_times: Returns the mmCP_IQ_WAIT_TIME1/2 values
  * The values read are:
  * ib_offload_wait_time -- Wait Count for Indirect Buffer Offloads.
@@ -941,6 +965,7 @@ const struct kfd2kgd_calls gfx_v10_kfd2kgd = {
.disable_debug_trap = kgd_gfx_v10_disable_debug_trap,
.validate_trap_override_request = 
kgd_gfx_v10_validate_trap_override_request,

[PATCH 14/32] drm/amdgpu: expose debug api for mes

2023-01-25 Thread Jonathan Kim
Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process
debug API.

The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES
from clearing the process context when the first queue is added to the
scheduler in order to maintain debug mode settings during queue preemption
and restore.  The MES clears the process context in this case due to an
unresolved FW caching bug during normal mode operations.
During debug mode, the KFD will hold a reference to the target process
so the process context should never go stale and MES can afford to skip
this requirement.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   | 32 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   | 20 
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++
 drivers/gpu/drm/amd/include/mes_v11_api_def.h | 21 +++-
 4 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 82e27bd4f038..4916e0b0156f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -924,6 +924,38 @@ int amdgpu_mes_reg_wait(struct amdgpu_device *adev, 
uint32_t reg,
return r;
 }
 
+int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev,
+   uint64_t process_context_addr,
+   uint32_t spi_gdbg_per_vmid_cntl,
+   const uint32_t *tcp_watch_cntl,
+   uint32_t flags)
+{
+   struct mes_misc_op_input op_input = {0};
+   int r;
+
+   if (!adev->mes.funcs->misc_op) {
+   DRM_ERROR("mes set shader debugger is not supported!\n");
+   return -EINVAL;
+   }
+
+   op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER;
+   op_input.set_shader_debugger.process_context_addr = 
process_context_addr;
+   op_input.set_shader_debugger.flags.u32all = flags;
+   op_input.set_shader_debugger.spi_gdbg_per_vmid_cntl = 
spi_gdbg_per_vmid_cntl;
+   memcpy(op_input.set_shader_debugger.tcp_watch_cntl, tcp_watch_cntl,
+   sizeof(op_input.set_shader_debugger.tcp_watch_cntl));
+
+   amdgpu_mes_lock(>mes);
+
+   r = adev->mes.funcs->misc_op(>mes, _input);
+   if (r)
+   DRM_ERROR("failed to set_shader_debugger\n");
+
+   amdgpu_mes_unlock(>mes);
+
+   return r;
+}
+
 static void
 amdgpu_mes_ring_to_queue_props(struct amdgpu_device *adev,
   struct amdgpu_ring *ring,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 547ec35691fa..d20df0cf0d88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -256,6 +256,7 @@ enum mes_misc_opcode {
MES_MISC_OP_READ_REG,
MES_MISC_OP_WRM_REG_WAIT,
MES_MISC_OP_WRM_REG_WR_WAIT,
+   MES_MISC_OP_SET_SHADER_DEBUGGER,
 };
 
 struct mes_misc_op_input {
@@ -278,6 +279,20 @@ struct mes_misc_op_input {
uint32_t   reg0;
uint32_t   reg1;
} wrm_reg;
+
+   struct {
+   uint64_t process_context_addr;
+   union {
+   struct {
+   uint64_t single_memop : 1;
+   uint64_t single_alu_op : 1;
+   uint64_t reserved: 30;
+   };
+   uint32_t u32all;
+   } flags;
+   uint32_t spi_gdbg_per_vmid_cntl;
+   uint32_t tcp_watch_cntl[4];
+   } set_shader_debugger;
};
 };
 
@@ -340,6 +355,11 @@ int amdgpu_mes_reg_wait(struct amdgpu_device *adev, 
uint32_t reg,
 int amdgpu_mes_reg_write_reg_wait(struct amdgpu_device *adev,
  uint32_t reg0, uint32_t reg1,
  uint32_t ref, uint32_t mask);
+int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev,
+   uint64_t process_context_addr,
+   uint32_t spi_gdbg_per_vmid_cntl,
+   const uint32_t *tcp_watch_cntl,
+   uint32_t flags);
 
 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id,
int queue_type, int idx,
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 62cdd2113135..fbacdc42efac 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -334,6 +334,18 @@ static int mes_v11_0_misc_op(struct amdgpu_mes *mes,
misc_pkt.wait_reg_mem.reg_offset1 = input->wrm_reg.reg0;
misc_pkt.wait_reg_mem.reg_offset2 

[PATCH 11/32] drm/amdgpu: add configurable grace period for unmap queues

2023-01-25 Thread Jonathan Kim
The HWS schedule allows a grace period for wave completion prior to
preemption for better performance by avoiding CWSR on waves that can
potentially complete quickly. The debugger, on the other hand, will
want to inspect wave status immediately after it actively triggers
preemption (a suspend function to be provided).

To minimize latency between preemption and debugger wave inspection, allow
immediate preemption by setting the grace period to 0.

Note that setting the preepmtion grace period to 0 will result in an
infinite grace period being set due to a CP FW bug so set it to 1 for now.

v2: clarify purpose in the description of this patch

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  2 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  6 ++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  |  2 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 43 
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  9 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 -
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 32 +
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 39 +++
 .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h   | 65 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 ++
 13 files changed, 291 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 89868f9927ae..a64a53f9efe6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -81,5 +81,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.set_vm_context_page_table_base = 
kgd_gfx_v9_set_vm_context_page_table_base,
.enable_debug_trap = kgd_aldebaran_enable_debug_trap,
.disable_debug_trap = kgd_aldebaran_disable_debug_trap,
+   .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
+   .build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index d5bb86ccd617..ef8befc31fc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -410,6 +410,8 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
kgd_gfx_v9_set_vm_context_page_table_base,
.enable_debug_trap = kgd_arcturus_enable_debug_trap,
.disable_debug_trap = kgd_arcturus_disable_debug_trap,
+   .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
+   .build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index c09b45de02d0..2491402afd58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -801,6 +801,47 @@ uint32_t kgd_gfx_v10_disable_debug_trap(struct 
amdgpu_device *adev,
return 0;
 }
 
+/* kgd_gfx_v10_get_iq_wait_times: Returns the mmCP_IQ_WAIT_TIME1/2 values
+ * The values read are:
+ * ib_offload_wait_time -- Wait Count for Indirect Buffer Offloads.
+ * atomic_offload_wait_time -- Wait Count for L2 and GDS Atomics Offloads.
+ * wrm_offload_wait_time-- Wait Count for WAIT_REG_MEM Offloads.
+ * gws_wait_time-- Wait Count for Global Wave Syncs.
+ * que_sleep_wait_time  -- Wait Count for Dequeue Retry.
+ * sch_wave_wait_time   -- Wait Count for Scheduling Wave Message.
+ * sem_rearm_wait_time  -- Wait Count for Semaphore re-arm.
+ * deq_retry_wait_time  -- Wait Count for Global Wave Syncs.
+ */
+void kgd_gfx_v10_get_iq_wait_times(struct amdgpu_device *adev,
+   uint32_t *wait_times)
+
+{
+   *wait_times = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_IQ_WAIT_TIME2));
+}
+
+void kgd_gfx_v10_build_grace_period_packet_info(struct amdgpu_device *adev,
+   uint32_t wait_times,
+   uint32_t grace_period,
+   uint32_t *reg_offset,
+   uint32_t *reg_data)
+{
+   *reg_data = wait_times;
+
+   /*
+* The CP cannont handle a 0 grace period input and will result in
+* an infinite grace period being set so set to 1 to prevent this.
+*/
+  

[PATCH 17/32] drm/amdkfd: add raise exception event function

2023-01-25 Thread Jonathan Kim
Exception events can be generated from interrupts or queue activitity.

The raise event function will save exception status of a queue, device
or process then notify the debugger of the status change by writing to
a debugger polled file descriptor that the debugger provides during
debug attach.

For memory violation exceptions, extra exception data will be saved.

The debugger will be able to query the saved exception states by query
operation that will be provided by follow up patches.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 91 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h |  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  7 ++
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 659dfc7411fe..fcd064b13f6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -38,6 +38,93 @@ void debug_event_write_work_handler(struct work_struct *work)
kernel_write(process->dbg_ev_file, _data, 1, );
 }
 
+/* update process/device/queue exception status, write to descriptor
+ * only if exception_status is enabled.
+ */
+bool kfd_dbg_ev_raise(uint64_t event_mask,
+   struct kfd_process *process, struct kfd_dev *dev,
+   unsigned int source_id, bool use_worker,
+   void *exception_data, size_t exception_data_size)
+{
+   struct process_queue_manager *pqm;
+   struct process_queue_node *pqn;
+   int i;
+   static const char write_data = '.';
+   loff_t pos = 0;
+   bool is_subscribed = true;
+
+   if (!(process && process->debug_trap_enabled))
+   return false;
+
+   mutex_lock(>event_mutex);
+
+   if (event_mask & KFD_EC_MASK_DEVICE) {
+   for (i = 0; i < process->n_pdds; i++) {
+   struct kfd_process_device *pdd = process->pdds[i];
+
+   if (pdd->dev != dev)
+   continue;
+
+   pdd->exception_status |= event_mask & 
KFD_EC_MASK_DEVICE;
+
+   if (event_mask & 
KFD_EC_MASK(EC_DEVICE_MEMORY_VIOLATION)) {
+   if (!pdd->vm_fault_exc_data) {
+   pdd->vm_fault_exc_data = kmemdup(
+   exception_data,
+   exception_data_size,
+   GFP_KERNEL);
+   if (!pdd->vm_fault_exc_data)
+   pr_debug("Failed to allocate 
exception data memory");
+   } else {
+   pr_debug("Debugger exception data not 
saved\n");
+   print_hex_dump_bytes("exception data: ",
+   DUMP_PREFIX_OFFSET,
+   exception_data,
+   exception_data_size);
+   }
+   }
+   break;
+   }
+   } else if (event_mask & KFD_EC_MASK_PROCESS) {
+   process->exception_status |= event_mask & KFD_EC_MASK_PROCESS;
+   } else {
+   pqm = >pqm;
+   list_for_each_entry(pqn, >queues,
+   process_queue_list) {
+   int target_id;
+
+   if (!pqn->q)
+   continue;
+
+   target_id = event_mask & KFD_EC_MASK(EC_QUEUE_NEW) ?
+   pqn->q->properties.queue_id :
+   pqn->q->doorbell_id;
+
+   if (pqn->q->device != dev || target_id != source_id)
+   continue;
+
+   pqn->q->properties.exception_status |= event_mask;
+   break;
+   }
+   }
+
+   if (process->exception_enable_mask & event_mask) {
+   if (use_worker)
+   schedule_work(>debug_event_workarea);
+   else
+   kernel_write(process->dbg_ev_file,
+   _data,
+   1,
+   );
+   } else {
+   is_subscribed = false;
+   }
+
+   mutex_unlock(>event_mutex);
+
+   return is_subscribed;
+}
+
 static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable)
 {
struct mqd_update_info minfo = {0};
@@ -88,7 +175,6 @@ static int kfd_dbg_set_workaround(struct kfd_process 
*target, bool enable)
}
 
return r;
-}
 
 static int 

[PATCH 19/32] drm/amdkfd: add runtime enable operation

2023-01-25 Thread Jonathan Kim
The debugger can attach to a process prior to HSA enablement (i.e.
inferior is spawned by the debugger and attached to immediately before
target process has been enabled for HSA dispatches) or it
can attach to a running target that is already HSA enabled.  Either
way, the debugger needs to know the enablement status to know when
it can inspect queues.

For the scenario where the debugger spawns the target process,
it will have to wait for ROCr's runtime enable request from the target.
The runtime enable request will be able to see that its process has been
debug attached.  ROCr raises an EC_PROCESS_RUNTIME signal to the
debugger then blocks the target process while waiting the debugger's
response. Once the debugger has received the runtime signal, it will
unblock the target process.

For the scenario where the debugger attaches to a running target
process, ROCr will set the target process' runtime status as enabled so
that on an attach request, the debugger will be able to see this
status and will continue with debug enablement as normal.

A secondary requirement is to conditionally enable the trap tempories only
if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger
attaches with HSA runtime enabled.  This is because setting up the trap
temporaries incurs a performance overhead that is unacceptable for
microbench performance in normal mode for certain customers.

In the scenario where the debugger spawns the target process, when ROCr
detects that the debugger has attached during the runtime enable
request, it will enable the trap temporaries before it blocks the target
process while waiting for the debugger to respond.

In the scenario where the debugger attaches to a running target process,
it will enable to trap temporaries itself.

Finally, there is an additional restriction that is required to be
enforced with runtime enable and HW debug mode setting. The debugger must
first ensure that HW debug mode has been enabled before permitting HW debug
mode operations.

With single process debug devices, allowing the debugger to set debug
HW modes prior to trap activation means that debug HW mode setting can
occur before the KFD has reserved the debug VMID (0xf) from the hardware
scheduler's VMID allocation resource pool.  This can result in the
hardware scheduler assigning VMID 0xf to a non-debugged process and
having that process inherit debug HW mode settings intended for the
debugged target process instead, which is both incorrect and potentially
fatal for normal mode operation.

With multi process debug devices, allowing the debugger to set debug
HW modes prior to trap activation means that non-debugged processes
migrating to a new VMID could inherit unintended debug settings.

All debug operations that touch HW settings must require trap activation
where trap activation is triggered by both debug attach and runtime
enablement (target has KFD opened and is ready to dispatch work).

v2: fix up hierarchy of semantics in description.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 150 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   1 +
 4 files changed, 157 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 09fe8576dc8c..46f9d453dc5e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2654,11 +2654,147 @@ static int kfd_ioctl_criu(struct file *filep, struct 
kfd_process *p, void *data)
return ret;
 }
 
-static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process *p, 
void *data)
+static int runtime_enable(struct kfd_process *p, uint64_t r_debug,
+   bool enable_ttmp_setup)
 {
+   int i = 0, ret = 0;
+
+   if (p->is_runtime_retry)
+   goto retry;
+
+   if (p->runtime_info.runtime_state != DEBUG_RUNTIME_STATE_DISABLED)
+   return -EBUSY;
+
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (pdd->qpd.queue_count)
+   return -EEXIST;
+   }
+
+   p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED;
+   p->runtime_info.r_debug = r_debug;
+   p->runtime_info.ttmp_setup = enable_ttmp_setup;
+
+   if (p->runtime_info.ttmp_setup) {
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
+   pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   true,
+ 

[PATCH 32/32] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-01-25 Thread Jonathan Kim
Bump the minor version to declare debugging capability is now
available.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 -
 include/uapi/linux/kfd_ioctl.h   | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index da74a6ef4d9b..c28d4b2dd0ef 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2896,7 +2896,6 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
if (!r)
target->exception_enable_mask = 
args->enable.exception_mask;
 
-   pr_warn("Debug functions limited\n");
break;
case KFD_IOC_DBG_TRAP_DISABLE:
r = kfd_dbg_trap_disable(target);
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ef4eed45c19..a0efe1ccdbd6 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -37,9 +37,10 @@
  * - 1.9 - Add available memory ioctl
  * - 1.10 - Add SMI profiler event log
  * - 1.11 - Add unified memory for ctx save/restore area
+ * - 1.12 - Add debugger API
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 11
+#define KFD_IOCTL_MINOR_VERSION 12
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH 10/32] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode
for GFX11.

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 39 +++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
index 7e80caa05060..34aeff692eba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
@@ -30,6 +30,7 @@
 #include "soc15d.h"
 #include "v11_structs.h"
 #include "soc21.h"
+#include 
 
 enum hqd_dequeue_request_type {
NO_ACTION = 0,
@@ -606,6 +607,42 @@ static void set_vm_context_page_table_base_v11(struct 
amdgpu_device *adev,
adev->gfxhub.funcs->setup_vm_pt_regs(adev, vmid, page_table_base);
 }
 
+/**
+ * Returns TRAP_EN, EXCP_EN and EXCP_REPLACE.
+ *
+ * restore_dbg_reisters is ignored here but is a general interface requirement
+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to manually
+ * initialize the debug mode registers after it has disabled GFX off during the
+ * debug session.
+ */
+static uint32_t kgd_gfx_v11_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
+
+/* Returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
+static uint32_t kgd_gfx_v11_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 
keep_trap_enabled);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
+
+   return data;
+}
+
 const struct kfd2kgd_calls gfx_v11_kfd2kgd = {
.program_sh_mem_settings = program_sh_mem_settings_v11,
.set_pasid_vmid_mapping = set_pasid_vmid_mapping_v11,
@@ -622,4 +659,6 @@ const struct kfd2kgd_calls gfx_v11_kfd2kgd = {
.wave_control_execute = wave_control_execute_v11,
.get_atc_vmid_pasid_mapping_info = NULL,
.set_vm_context_page_table_base = set_vm_context_page_table_base_v11,
+   .enable_debug_trap = kgd_gfx_v11_enable_debug_trap,
+   .disable_debug_trap = kgd_gfx_v11_disable_debug_trap
 };
-- 
2.25.1



[PATCH 20/32] drm/amdkfd: add debug trap enabled flag to tma

2023-01-25 Thread Jonathan Kim
From: Jay Cornwall 

Trap handler behavior will differ when a debugger is attached.

Make the debug trap flag available in the trap handler TMA.
Update it when the debug trap ioctl is invoked.

v4: fix up comments to clarify flagging implementation.

v3: Rebase for upstream

v2:
Add missing debug flag setup on APUs

Signed-off-by: Jay Cornwall 
Reviewed-by: Felix Kuehling 
Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 +++
 3 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 47f8425a0db3..16acf3d416eb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -257,6 +257,8 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, 
bool unwind, int unwind
if (unwind && count == unwind_count)
break;
 
+   kfd_process_set_trap_debug_flag(>qpd, false);
+
/* GFX off is already disabled by debug activate if not RLC 
restore supported. */
if (kfd_dbg_is_rlc_restore_supported(pdd->dev))
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
@@ -355,6 +357,15 @@ int kfd_dbg_trap_activate(struct kfd_process *target)
if (kfd_dbg_is_rlc_restore_supported(pdd->dev))
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
+   /**
+* Setting the debug flag in the trap handler requires that the 
TMA has been
+* allocated, which occurs during CWSR initialization.
+* In the event that CWSR has not been initialized at this 
point, setting the
+* flag will be called again during CWSR initialization if the 
target process
+* is still debug enabled.
+*/
+   kfd_process_set_trap_debug_flag(>qpd, true);
+
if (!pdd->dev->shared_resources.enable_mes)
r = debug_refresh_runlist(pdd->dev->dqm);
else
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 63c59ad2a4ca..d7f00181ae6b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1104,6 +1104,8 @@ int kfd_init_apertures(struct kfd_process *process);
 void kfd_process_set_trap_handler(struct qcm_process_device *qpd,
  uint64_t tba_addr,
  uint64_t tma_addr);
+void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
+bool enabled);
 
 /* CWSR initialization */
 int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 8519604f7249..5da1edd36bd2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1250,6 +1250,8 @@ int kfd_process_init_cwsr_apu(struct kfd_process *p, 
struct file *filep)
 
memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size);
 
+   kfd_process_set_trap_debug_flag(qpd, p->debug_trap_enabled);
+
qpd->tma_addr = qpd->tba_addr + KFD_CWSR_TMA_OFFSET;
pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for 
pqm.\n",
qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr);
@@ -1286,6 +1288,9 @@ static int kfd_process_device_init_cwsr_dgpu(struct 
kfd_process_device *pdd)
 
memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size);
 
+   kfd_process_set_trap_debug_flag(>qpd,
+   pdd->process->debug_trap_enabled);
+
qpd->tma_addr = qpd->tba_addr + KFD_CWSR_TMA_OFFSET;
pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for pqm.\n",
 qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr);
@@ -1372,6 +1377,16 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool 
supported)
return true;
 }
 
+void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
+bool enabled)
+{
+   if (qpd->cwsr_kaddr) {
+   uint64_t *tma =
+   (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
+   tma[2] = enabled;
+   }
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
-- 
2.25.1



[PATCH 12/32] drm/amdkfd: prepare map process for single process debug devices

2023-01-25 Thread Jonathan Kim
Older HW only supports debugging on a single process because the
SPI debug mode setting registers are device global.

The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS
for debug purposes. To pin the VMID, the KFD will remove the VMID from
the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged
process will never migrate away from its pinned VMID.

The KFD is responsible for reserving and releasing this pinned VMID
accordingly whenever the debugger attaches and detaches respectively.

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 101 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   5 +
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c|   9 ++
 .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h   |   5 +-
 4 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7556f80d41e4..0cd3a5e9ff25 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1490,7 +1490,7 @@ static int initialize_cpsch(struct device_queue_manager 
*dqm)
dqm->active_cp_queue_count = 0;
dqm->gws_queue_count = 0;
dqm->active_runlist = false;
-   INIT_WORK(>hw_exception_work, kfd_process_hw_exception);
+   dqm->trap_debug_vmid = 0;
 
init_sdma_bitmaps(dqm);
 
@@ -1933,8 +1933,7 @@ static int destroy_queue_cpsch(struct 
device_queue_manager *dqm,
if (!dqm->dev->shared_resources.enable_mes) {
decrement_queue_count(dqm, qpd, q);
retval = execute_queues_cpsch(dqm,
- 
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0,
- USE_DEFAULT_GRACE_PERIOD);
+ 
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);
if (retval == -ETIME)
qpd->reset_wavefronts = true;
} else {
@@ -2463,6 +2462,98 @@ static void kfd_process_hw_exception(struct work_struct 
*work)
amdgpu_amdkfd_gpu_reset(dqm->dev->adev);
 }
 
+int reserve_debug_trap_vmid(struct device_queue_manager *dqm,
+   struct qcm_process_device *qpd)
+{
+   int r;
+   int updated_vmid_mask;
+
+   if (dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("Unsupported on sched_policy: %i\n", dqm->sched_policy);
+   return -EINVAL;
+   }
+
+   dqm_lock(dqm);
+
+   if (dqm->trap_debug_vmid != 0) {
+   pr_err("Trap debug id already reserved\n");
+   r = -EBUSY;
+   goto out_unlock;
+   }
+
+   r = unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0,
+   USE_DEFAULT_GRACE_PERIOD, false);
+   if (r)
+   goto out_unlock;
+
+   updated_vmid_mask = dqm->dev->shared_resources.compute_vmid_bitmap;
+   updated_vmid_mask &= ~(1 << dqm->dev->vm_info.last_vmid_kfd);
+
+   dqm->dev->shared_resources.compute_vmid_bitmap = updated_vmid_mask;
+   dqm->trap_debug_vmid = dqm->dev->vm_info.last_vmid_kfd;
+   r = set_sched_resources(dqm);
+   if (r)
+   goto out_unlock;
+
+   r = map_queues_cpsch(dqm);
+   if (r)
+   goto out_unlock;
+
+   pr_debug("Reserved VMID for trap debug: %i\n", dqm->trap_debug_vmid);
+
+out_unlock:
+   dqm_unlock(dqm);
+   return r;
+}
+
+/*
+ * Releases vmid for the trap debugger
+ */
+int release_debug_trap_vmid(struct device_queue_manager *dqm,
+   struct qcm_process_device *qpd)
+{
+   int r;
+   int updated_vmid_mask;
+   uint32_t trap_debug_vmid;
+
+   if (dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("Unsupported on sched_policy: %i\n", dqm->sched_policy);
+   return -EINVAL;
+   }
+
+   dqm_lock(dqm);
+   trap_debug_vmid = dqm->trap_debug_vmid;
+   if (dqm->trap_debug_vmid == 0) {
+   pr_err("Trap debug id is not reserved\n");
+   r = -EINVAL;
+   goto out_unlock;
+   }
+
+   r = unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0,
+   USE_DEFAULT_GRACE_PERIOD, false);
+   if (r)
+   goto out_unlock;
+
+   updated_vmid_mask = dqm->dev->shared_resources.compute_vmid_bitmap;
+   updated_vmid_mask |= (1 << dqm->dev->vm_info.last_vmid_kfd);
+
+   dqm->dev->shared_resources.compute_vmid_bitmap = updated_vmid_mask;
+   dqm->trap_debug_vmid = 0;
+   r = set_sched_resources(dqm);
+   if (r)
+   goto out_unlock;
+
+   r = map_queues_cpsch(dqm);
+   if (r)
+   goto out_unlock;
+
+   pr_debug("Released VMID for trap debug: %i\n", 

[PATCH 01/32] drm/amdkfd: add debug and runtime enable interface

2023-01-25 Thread Jonathan Kim
Introduce the GPU debug operations interface.

For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU
instruction set, provide the necessary interface to allow the debugger
to HW debug-mode set and query exceptions per HSA queue, process or
device.

The runtime_enable interface coordinates exception handling with the
HSA runtime.

Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.

v2: was previously reviewed but removed deprecrated wave launch modes
(kill and disable).
Also remove non-needed dbg flag option.
Add revision and subvendor info to debug device snapshot entry.
Add trap on wave start and end override option.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  48 ++
 include/uapi/linux/kfd_ioctl.h   | 663 ++-
 2 files changed, 710 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f79b8e964140..d3b019e64093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2645,6 +2645,48 @@ static int kfd_ioctl_criu(struct file *filep, struct 
kfd_process *p, void *data)
return ret;
 }
 
+static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process *p, 
void *data)
+{
+   return 0;
+}
+
+static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, 
void *data)
+{
+   struct kfd_ioctl_dbg_trap_args *args = data;
+   int r = 0;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("Debugging does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   switch (args->op) {
+   case KFD_IOC_DBG_TRAP_ENABLE:
+   case KFD_IOC_DBG_TRAP_DISABLE:
+   case KFD_IOC_DBG_TRAP_SEND_RUNTIME_EVENT:
+   case KFD_IOC_DBG_TRAP_SET_EXCEPTIONS_ENABLED:
+   case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_OVERRIDE:
+   case KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_MODE:
+   case KFD_IOC_DBG_TRAP_SUSPEND_QUEUES:
+   case KFD_IOC_DBG_TRAP_RESUME_QUEUES:
+   case KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH:
+   case KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH:
+   case KFD_IOC_DBG_TRAP_SET_FLAGS:
+   case KFD_IOC_DBG_TRAP_QUERY_DEBUG_EVENT:
+   case KFD_IOC_DBG_TRAP_QUERY_EXCEPTION_INFO:
+   case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
+   case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
+   pr_warn("Debugging not supported yet\n");
+   r = -EACCES;
+   break;
+   default:
+   pr_err("Invalid option: %i\n", args->op);
+   r = -EINVAL;
+   }
+
+   return r;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -2754,6 +2796,12 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_AVAILABLE_MEMORY,
kfd_ioctl_get_available_memory, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_RUNTIME_ENABLE,
+   kfd_ioctl_runtime_enable, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
+   kfd_ioctl_set_debug_trap, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 42b60198b6c5..9ef4eed45c19 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -109,6 +109,32 @@ struct kfd_ioctl_get_available_memory_args {
__u32 pad;
 };
 
+struct kfd_dbg_device_info_entry {
+   __u64 exception_status;
+   __u64 lds_base;
+   __u64 lds_limit;
+   __u64 scratch_base;
+   __u64 scratch_limit;
+   __u64 gpuvm_base;
+   __u64 gpuvm_limit;
+   __u32 gpu_id;
+   __u32 location_id;
+   __u32 vendor_id;
+   __u32 device_id;
+   __u32 revision_id;
+   __u32 subsystem_vendor_id;
+   __u32 subsystem_device_id;
+   __u32 fw_version;
+   __u32 gfx_target_version;
+   __u32 simd_count;
+   __u32 max_waves_per_simd;
+   __u32 array_count;
+   __u32 simd_arrays_per_engine;
+   __u32 capability;
+   __u32 debug_prop;
+   __u32 pad;
+};
+
 /* For kfd_ioctl_set_memory_policy_args.default_policy and alternate_policy */
 #define KFD_IOC_CACHE_POLICY_COHERENT 0
 #define KFD_IOC_CACHE_POLICY_NONCOHERENT 1
@@ -766,6 +792,635 @@ struct kfd_ioctl_set_xnack_mode_args {
__s32 xnack_enabled;
 };
 
+/* Wave launch override modes */
+enum kfd_dbg_trap_override_mode {
+   KFD_DBG_TRAP_OVERRIDE_OR = 0,
+   KFD_DBG_TRAP_OVERRIDE_REPLACE = 1
+};
+
+/* Wave launch overrides */
+enum kfd_dbg_trap_mask {
+   KFD_DBG_TRAP_MASK_FP_INVALID = 1,
+   KFD_DBG_TRAP_MASK_FP_INPUT_DENORMAL = 2,
+   KFD_DBG_TRAP_MASK_FP_DIVIDE_BY_ZERO = 4,
+   KFD_DBG_TRAP_MASK_FP_OVERFLOW = 8,
+   

[PATCH 13/32] drm/amdgpu: prepare map process for multi-process debug devices

2023-01-25 Thread Jonathan Kim
Unlike single process debug devices, multi-process debug devices allow
debug mode setting per-VMID (non-device-global).

Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows
the KFD to forward the required SPI debug register write requests.

To request a new debug mode setting change, the KFD must be able to
preempt all queues then remap all queues with these new setting
requests for MAP_PROCESS to take effect.

Note that by default, trap enablement in non-debug mode must be disabled
for performance reasons for multi-process debug devices due to setup
overhead in FW.

v2: remove asic family code name comment in per vmid support check

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  7 +++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 50 +++
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 ++
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 15 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  9 
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  5 ++
 6 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index 8aa7a3ad4e97..53c5a3e55bd2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -32,5 +32,12 @@ int kfd_dbg_trap_disable(struct kfd_process *target);
 int kfd_dbg_trap_enable(struct kfd_process *target, uint32_t fd,
void __user *runtime_info,
uint32_t *runtime_info_size);
+
+static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_dev *dev)
+{
+   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2);
+}
+
 void debug_event_write_work_handler(struct work_struct *work);
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 0cd3a5e9ff25..2517716d7cbc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2554,6 +2554,56 @@ int release_debug_trap_vmid(struct device_queue_manager 
*dqm,
return r;
 }
 
+int debug_lock_and_unmap(struct device_queue_manager *dqm)
+{
+   int r;
+
+   if (dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("Unsupported on sched_policy: %i\n", dqm->sched_policy);
+   return -EINVAL;
+   }
+
+   if (!kfd_dbg_is_per_vmid_supported(dqm->dev))
+   return 0;
+
+   dqm_lock(dqm);
+
+   r = unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 0, 
false);
+   if (r)
+   dqm_unlock(dqm);
+
+   return r;
+}
+
+int debug_map_and_unlock(struct device_queue_manager *dqm)
+{
+   int r;
+
+   if (dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("Unsupported on sched_policy: %i\n", dqm->sched_policy);
+   return -EINVAL;
+   }
+
+   if (!kfd_dbg_is_per_vmid_supported(dqm->dev))
+   return 0;
+
+   r = map_queues_cpsch(dqm);
+
+   dqm_unlock(dqm);
+
+   return r;
+}
+
+int debug_refresh_runlist(struct device_queue_manager *dqm)
+{
+   int r = debug_lock_and_unmap(dqm);
+
+   if (r)
+   return r;
+
+   return debug_map_and_unlock(dqm);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 0cb1504d24cf..bef3be84c5cc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -286,6 +286,9 @@ int reserve_debug_trap_vmid(struct device_queue_manager 
*dqm,
struct qcm_process_device *qpd);
 int release_debug_trap_vmid(struct device_queue_manager *dqm,
struct qcm_process_device *qpd);
+int debug_lock_and_unmap(struct device_queue_manager *dqm);
+int debug_map_and_unlock(struct device_queue_manager *dqm);
+int debug_refresh_runlist(struct device_queue_manager *dqm);
 
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 363cf8e005cc..f19c506da23d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -88,6 +88,10 @@ static int pm_map_process_aldebaran(struct packet_manager 
*pm,
 {
struct pm4_mes_map_process_aldebaran *packet;
uint64_t vm_page_table_base_addr = qpd->page_table_base;
+   struct kfd_dev *kfd = pm->dqm->dev;
+   struct kfd_process_device *pdd =
+   container_of(qpd, struct kfd_process_device, qpd);
+   int i;
 
packet = (struct pm4_mes_map_process_aldebaran *)buffer;
memset(buffer, 0, sizeof(struct 

[PATCH 08/32] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
Similar to GFX9 debug devices, set the hardware debug mode by draining
the SPI appropriately prior the mode setting request.

Because GFX10 has waves allocated by the work group boundaray and each
SE's SPI instances do not communicate, the SPI drain time is much longer.
This long drain time will be fixed for GFX11 onwards.

Also remove a bunch of deprecated misplaced references for GFX10.3.

Signed-off-by: Jonathan Kim 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  95 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  28 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c  | 147 +-
 3 files changed, 126 insertions(+), 144 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 9378fc79e9ea..c09b45de02d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -708,6 +708,99 @@ static void set_vm_context_page_table_base(struct 
amdgpu_device *adev,
adev->gfxhub.funcs->setup_vm_pt_regs(adev, vmid, page_table_base);
 }
 
+/*
+ * GFX10 helper for wave launch stall requirements on debug trap setting.
+ *
+ * vmid:
+ *   Target VMID to stall/unstall.
+ *
+ * stall:
+ *   0-unstall wave launch (enable), 1-stall wave launch (disable).
+ *   After wavefront launch has been stalled, allocated waves must drain from
+ *   SPI in order for debug trap settings to take effect on those waves.
+ *   This is roughly a ~3500 clock cycle wait on SPI where a read on
+ *   SPI_GDBG_WAVE_CNTL translates to ~32 clock cycles.
+ *   KGD_GFX_V10_WAVE_LAUNCH_SPI_DRAIN_LATENCY indicates the number of reads 
required.
+ *
+ *   NOTE: We can afford to clear the entire STALL_VMID field on unstall
+ *   because current GFX10 chips cannot support multi-process debugging due to
+ *   trap configuration and masking being limited to global scope.  Always
+ *   assume single process conditions.
+ *
+ */
+
+#define KGD_GFX_V10_WAVE_LAUNCH_SPI_DRAIN_LATENCY  110
+static void kgd_gfx_v10_set_wave_launch_stall(struct amdgpu_device *adev, 
uint32_t vmid, bool stall)
+{
+   uint32_t data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+   int i;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_VMID,
+   stall ? 1 << vmid : 0);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL), data);
+
+   if (!stall)
+   return;
+
+   for (i = 0; i < KGD_GFX_V10_WAVE_LAUNCH_SPI_DRAIN_LATENCY; i++)
+   RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+}
+
+uint32_t kgd_gfx_v10_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+
+   mutex_lock(>grbm_idx_mutex);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true);
+
+   /* assume gfx off is disabled for the debug session if rlc restore not 
supported. */
+   if (restore_dbg_registers) {
+   uint32_t data = 0;
+
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   VMID_SEL, 1 << vmid);
+   data = REG_SET_FIELD(data, SPI_GDBG_TRAP_CONFIG,
+   TRAP_EN, 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_CONFIG), data);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA0), 0);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_DATA1), 0);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(>grbm_idx_mutex);
+
+   return 0;
+   }
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(>grbm_idx_mutex);
+
+   return 0;
+}
+
+uint32_t kgd_gfx_v10_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   mutex_lock(>grbm_idx_mutex);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   kgd_gfx_v10_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(>grbm_idx_mutex);
+
+   return 0;
+}
+
 static void program_trap_handler_settings(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr)
 {
@@ -750,5 +843,7 @@ const struct kfd2kgd_calls gfx_v10_kfd2kgd = {
.get_atc_vmid_pasid_mapping_info =
get_atc_vmid_pasid_mapping_info,
.set_vm_context_page_table_base = set_vm_context_page_table_base,
+   .enable_debug_trap = kgd_gfx_v10_enable_debug_trap,
+   .disable_debug_trap = kgd_gfx_v10_disable_debug_trap,
  

[PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable

2023-01-25 Thread Jonathan Kim
The ROCm debugger will attach to a process to debug by PTRACE and will
expect the KFD to prepare a process for the target PID, whether the
target PID has opened the KFD device or not.

This patch is to explicity handle this requirement.  Further HW mode
setting and runtime coordination requirements will be handled in
following patches.

In the case where the target process has not opened the KFD device,
a new KFD process must be created for the target PID.
The debugger as well as the target process for this case will have not
acquired any VMs so handle process restoration to correctly account for
this.

To coordinate with HSA runtime, the debugger must be aware of the target
process' runtime enablement status and will copy the runtime status
information into the debugged KFD process for later query.

On enablement, the debugger will subscribe to a set of exceptions where
each exception events will notify the debugger through a pollable FIFO
file descriptor that the debugger provides to the KFD to manage.
Some events will be synchronously raised while other are scheduled,
which is why a debug_event_workarea worker is initialized.

Finally on process termination of either the debugger or the target,
debugging must be disabled if it has not been done so.

v3: fix typo on debug trap disable and PTRACE ATTACH relax check.
remove unnecessary queue eviction counter reset when there's nothing
to evict.
change err code to EALREADY if attaching to an already attached process.
move debug disable to release worker to avoid race with disable from
ioctl call.

v2: relax debug trap disable and PTRACE ATTACH requirement.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/Makefile   |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 88 -
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 94 +++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 33 +++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 34 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 63 +
 7 files changed, 308 insertions(+), 29 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_debug.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index e758c2a24cd0..747754428073 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -55,7 +55,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v9.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
-   $(AMDKFD_PATH)/kfd_crat.o
+   $(AMDKFD_PATH)/kfd_crat.o \
+   $(AMDKFD_PATH)/kfd_debug.o
 
 ifneq ($(CONFIG_AMD_IOMMU_V2),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d3b019e64093..ee05c2e54ef6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -44,6 +44,7 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_dma_buf.h"
+#include "kfd_debug.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -142,10 +143,15 @@ static int kfd_open(struct inode *inode, struct file 
*filep)
return -EPERM;
}
 
-   process = kfd_create_process(filep);
+   process = kfd_create_process(current);
if (IS_ERR(process))
return PTR_ERR(process);
 
+   if (kfd_process_init_cwsr_apu(process, filep)) {
+   kfd_unref_process(process);
+   return -EFAULT;
+   }
+
if (kfd_is_locked()) {
dev_dbg(kfd_device, "kfd is locked!\n"
"process %d unreferenced", process->pasid);
@@ -2653,6 +2659,9 @@ static int kfd_ioctl_runtime_enable(struct file *filep, 
struct kfd_process *p, v
 static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, 
void *data)
 {
struct kfd_ioctl_dbg_trap_args *args = data;
+   struct task_struct *thread = NULL;
+   struct pid *pid = NULL;
+   struct kfd_process *target = NULL;
int r = 0;
 
if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
@@ -2660,9 +2669,71 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
return -EINVAL;
}
 
+   pid = find_get_pid(args->pid);
+   if (!pid) {
+   pr_debug("Cannot find pid info for %i\n", args->pid);
+   r = -ESRCH;
+   goto out;
+   }
+
+   thread = get_pid_task(pid, PIDTYPE_PID);
+
+   if (args->op == KFD_IOC_DBG_TRAP_ENABLE) {
+   bool create_process;
+
+   rcu_read_lock();
+   

[PATCH 07/32] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
On GFX9.4.1, the implicit wait count instruction on s_barrier is
disabled by default in the driver during normal operation for
performance requirements.

There is a hardware bug in GFX9.4.1 where if the implicit wait count
instruction after an s_barrier instruction is disabled, any wave that
hits an exception may step over the s_barrier when returning from the
trap handler with the barrier logic having no ability to be
aware of this, thereby causing other waves to wait at the barrier
indefinitely resulting in a shader hang.  This bug has been corrected
for GFX9.4.2 and onward.

Since the debugger subscribes to hardware exceptions, in order to avoid
this bug, the debugger must enable implicit wait count on s_barrier
for a debug session and disable it on detach.

In order to change this setting in the in the device global SQ_CONFIG
register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
will either dispatch work through the compute ring buffers used for
image post processing or through the hardware scheduler by the KFD.

Have the KGD suspend and drain the compute ring buffer, then suspend the
hardware scheduler and block any future KFD process job requests before
changing the implicit wait count setting.  Once set, resume all work.

v2: remove flush on kfd suspend as that will be a general fix required
outside of this patch series.
comment on trap enable/disable ignored variables.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   | 118 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |   4 +-
 3 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 872450a3a164..3c03e34c194c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1041,6 +1041,9 @@ struct amdgpu_device {
struct pci_saved_state  *pci_state;
pci_channel_state_t pci_channel_state;
 
+   /* Track auto wait count on s_barrier settings */
+   boolbarrier_has_auto_waitcnt;
+
struct amdgpu_reset_control *reset_cntl;
uint32_t
ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 4191af5a3f13..d5bb86ccd617 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -26,6 +26,7 @@
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
+#include "amdgpu_reset.h"
 #include "sdma0/sdma0_4_2_2_offset.h"
 #include "sdma0/sdma0_4_2_2_sh_mask.h"
 #include "sdma1/sdma1_4_2_2_offset.h"
@@ -48,6 +49,8 @@
 #include "amdgpu_amdkfd_gfx_v9.h"
 #include "gfxhub_v1_0.h"
 #include "mmhub_v9_4.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
 
 #define HQD_N_REGS 56
 #define DUMP_REG(addr) do {\
@@ -276,6 +279,117 @@ int kgd_arcturus_hqd_sdma_destroy(struct amdgpu_device 
*adev, void *mqd,
return 0;
 }
 
+/*
+ * Helper used to suspend/resume gfx pipe for image post process work to set
+ * barrier behaviour.
+ */
+static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool 
suspend)
+{
+   int i, r = 0;
+
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+   struct amdgpu_ring *ring = >gfx.compute_ring[i];
+
+   if (!(ring && ring->sched.thread))
+   continue;
+
+   /* stop secheduler and drain ring. */
+   if (suspend) {
+   drm_sched_stop(>sched, NULL);
+   r = amdgpu_fence_wait_empty(ring);
+   if (r)
+   goto out;
+   } else {
+   drm_sched_start(>sched, false);
+   }
+   }
+
+out:
+   /* return on resume or failure to drain rings. */
+   if (!suspend || r)
+   return r;
+
+   return amdgpu_device_ip_wait_for_idle(adev, GC_HWIP);
+}
+
+static void set_barrier_auto_waitcnt(struct amdgpu_device *adev, bool 
enable_waitcnt)
+{
+   uint32_t data;
+
+   WRITE_ONCE(adev->barrier_has_auto_waitcnt, enable_waitcnt);
+
+   if (!down_read_trylock(>reset_domain->sem))
+   return;
+
+   amdgpu_amdkfd_suspend(adev, false);
+
+   if (suspend_resume_compute_scheduler(adev, true))
+   goto out;
+
+   data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CONFIG));
+   data = REG_SET_FIELD(data, SQ_CONFIG, DISABLE_BARRIER_WAITCNT,
+   enable_waitcnt ? 0 : 1);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CONFIG), data);
+
+out:
+   suspend_resume_compute_scheduler(adev, false);
+
+   amdgpu_amdkfd_resume(adev, false);
+
+ 

[PATCH 04/32] drm/amdgpu: add kgd hw debug mode setting interface

2023-01-25 Thread Jonathan Kim
Introduce the require KGD debug calls that will execute hardware debug
mode setting.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
---
 .../gpu/drm/amd/include/kgd_kfd_interface.h   | 34 +++
 1 file changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 5cb3e8634739..15e7a5c920a0 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -289,6 +289,40 @@ struct kfd2kgd_calls {
uint32_t vmid, uint64_t page_table_base);
uint32_t (*read_vmid_from_vmfault_reg)(struct amdgpu_device *adev);
 
+   uint32_t (*enable_debug_trap)(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid);
+   uint32_t (*disable_debug_trap)(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid);
+   int (*validate_trap_override_request)(struct amdgpu_device *adev,
+   uint32_t trap_override,
+   uint32_t *trap_mask_supported);
+   uint32_t (*set_wave_launch_trap_override)(struct amdgpu_device *adev,
+uint32_t vmid,
+uint32_t trap_override,
+uint32_t trap_mask_bits,
+uint32_t trap_mask_request,
+uint32_t *trap_mask_prev,
+uint32_t kfd_dbg_trap_cntl_prev);
+   uint32_t (*set_wave_launch_mode)(struct amdgpu_device *adev,
+   uint8_t wave_launch_mode,
+   uint32_t vmid);
+   uint32_t (*set_address_watch)(struct amdgpu_device *adev,
+   uint64_t watch_address,
+   uint32_t watch_address_mask,
+   uint32_t watch_id,
+   uint32_t watch_mode,
+   uint32_t debug_vmid);
+   uint32_t (*clear_address_watch)(struct amdgpu_device *adev,
+   uint32_t watch_id);
+   void (*get_iq_wait_times)(struct amdgpu_device *adev,
+   uint32_t *wait_times);
+   void (*build_grace_period_packet_info)(struct amdgpu_device *adev,
+   uint32_t wait_times,
+   uint32_t grace_period,
+   uint32_t *reg_offset,
+   uint32_t *reg_data);
void (*get_cu_occupancy)(struct amdgpu_device *adev, int pasid,
int *wave_cnt, int *max_waves_per_cu);
void (*program_trap_handler_settings)(struct amdgpu_device *adev,
-- 
2.25.1



[PATCH 02/32] drm/amdkfd: display debug capabilities

2023-01-25 Thread Jonathan Kim
Expose debug capabilities in the KFD topology node's HSA capabilities and
debug properties flags.

Ensure correct capabilities are exposed based on firmware support.

Flag definitions can be referenced in uapi/linux/kfd_sysfs.h.

v2: v1 was reviewed but re-requesting review for the following.
- remove asic family code name comments in firmware support checking
- add gfx11 requirements in fw support checks and debug props and caps

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 --
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 ++
 include/uapi/linux/kfd_sysfs.h|  15 
 3 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3fdaba56be6f..647a14142da9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -551,6 +551,8 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
  dev->gpu->mec_fw_version);
sysfs_show_32bit_prop(buffer, offs, "capability",
  dev->node_props.capability);
+   sysfs_show_64bit_prop(buffer, offs, "debug_prop",
+ dev->node_props.debug_prop);
sysfs_show_32bit_prop(buffer, offs, "sdma_fw_version",
  dev->gpu->sdma_fw_version);
sysfs_show_64bit_prop(buffer, offs, "unique_id",
@@ -1865,6 +1867,97 @@ static int kfd_topology_add_device_locked(struct kfd_dev 
*gpu, uint32_t gpu_id,
return res;
 }
 
+static void kfd_topology_set_dbg_firmware_support(struct kfd_topology_device 
*dev)
+{
+   bool firmware_supported = true;
+
+   if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(11, 0, 0) &&
+   KFD_GC_VERSION(dev->gpu) < IP_VERSION(12, 0, 0)) {
+   firmware_supported =
+   (dev->gpu->adev->mes.sched_version & 
AMDGPU_MES_VERSION_MASK) >= 9;
+   goto out;
+   }
+
+   /*
+* Note: Any unlisted devices here are assumed to support exception 
handling.
+* Add additional checks here as needed.
+*/
+   switch (KFD_GC_VERSION(dev->gpu)) {
+   case IP_VERSION(9, 0, 1):
+   firmware_supported = dev->gpu->mec_fw_version >= 459 + 32768;
+   break;
+   case IP_VERSION(9, 1, 0):
+   case IP_VERSION(9, 2, 1):
+   case IP_VERSION(9, 2, 2):
+   case IP_VERSION(9, 3, 0):
+   case IP_VERSION(9, 4, 0):
+   firmware_supported = dev->gpu->mec_fw_version >= 459;
+   break;
+   case IP_VERSION(9, 4, 1):
+   firmware_supported = dev->gpu->mec_fw_version >= 60;
+   break;
+   case IP_VERSION(9, 4, 2):
+   firmware_supported = dev->gpu->mec_fw_version >= 51;
+   break;
+   case IP_VERSION(10, 1, 10):
+   case IP_VERSION(10, 1, 2):
+   case IP_VERSION(10, 1, 1):
+   firmware_supported = dev->gpu->mec_fw_version >= 144;
+   break;
+   case IP_VERSION(10, 3, 0):
+   case IP_VERSION(10, 3, 2):
+   case IP_VERSION(10, 3, 1):
+   case IP_VERSION(10, 3, 4):
+   case IP_VERSION(10, 3, 5):
+   firmware_supported = dev->gpu->mec_fw_version >= 89;
+   break;
+   case IP_VERSION(10, 1, 3):
+   case IP_VERSION(10, 3, 3):
+   firmware_supported = false;
+   break;
+   default:
+   break;
+   }
+
+out:
+   if (firmware_supported)
+   dev->node_props.capability |= 
HSA_CAP_TRAP_DEBUG_FIRMWARE_SUPPORTED;
+}
+
+static void kfd_topology_set_capabilities(struct kfd_topology_device *dev)
+{
+   dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 <<
+   HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
+   HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+
+   dev->node_props.capability |= HSA_CAP_TRAP_DEBUG_SUPPORT |
+   HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_TRAP_OVERRIDE_SUPPORTED |
+   HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED;
+
+   if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
+   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
+
+   if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2))
+   dev->node_props.debug_prop |=
+   HSA_DBG_DISPATCH_INFO_ALWAYS_VALID;
+   else
+   dev->node_props.capability |=
+   
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED;
+   } else {
+   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 |
+ 

[PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode for
GFX9 prior to GFX9.4.1.

GFX9.4.1 and onward will require their own enable/disable sequence as
follow on patches.

When hardware debug mode setting is requested, waves will inherit
these settings in the Shader Processor Input's (SPI) Sequencer Global
Block (SQG). This means that the KGD must drain all waves from the SPI
into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
to ensure that the order of operations that the debugger expects with
regards to debug mode setting transaction requests and wave inheritence
of that mode is upheld.

Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.

v2: remove unnecessary static srbm lock renaming.
add comments to explain ignored arguments for debug trap enable and
disable.

Signed-off-by: Jonathan Kim 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 93 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  9 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  3 +
 3 files changed, 105 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index e92b93557c13..94a9fd9bd984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -646,6 +646,97 @@ int kgd_gfx_v9_wave_control_execute(struct amdgpu_device 
*adev,
return 0;
 }
 
+/*
+ * GFX9 helper for wave launch stall requirements on debug trap setting.
+ *
+ * vmid:
+ *   Target VMID to stall/unstall.
+ *
+ * stall:
+ *   0-unstall wave launch (enable), 1-stall wave launch (disable).
+ *   After wavefront launch has been stalled, allocated waves must drain from
+ *   SPI in order for debug trap settings to take effect on those waves.
+ *   This is roughly a ~96 clock cycle wait on SPI where a read on
+ *   SPI_GDBG_WAVE_CNTL translates to ~32 clock cycles.
+ *   KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY indicates the number of reads 
required.
+ *
+ *   NOTE: We can afford to clear the entire STALL_VMID field on unstall
+ *   because GFX9.4.1 cannot support multi-process debugging due to trap
+ *   configuration and masking being limited to global scope.  Always assume
+ *   single process conditions.
+
+ */
+#define KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY   3
+void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
+   uint32_t vmid,
+   bool stall)
+{
+   int i;
+   uint32_t data = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+
+   if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 4, 1))
+   data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_VMID,
+   stall ? 1 << vmid : 0);
+   else
+   data = REG_SET_FIELD(data, SPI_GDBG_WAVE_CNTL, STALL_RA,
+   stall ? 1 : 0);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL), data);
+
+   if (!stall)
+   return;
+
+   for (i = 0; i < KGD_GFX_V9_WAVE_LAUNCH_SPI_DRAIN_LATENCY; i++)
+   RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_WAVE_CNTL));
+}
+
+/**
+ * restore_dbg_reisters is ignored here but is a general interface requirement
+ * for devices that support GFXOFF and where the RLC save/restore list
+ * does not support hw registers for debugging i.e. the driver has to manually
+ * initialize the debug mode registers after it has disabled GFX off during the
+ * debug session.
+ */
+uint32_t kgd_gfx_v9_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid)
+{
+   mutex_lock(>grbm_idx_mutex);
+
+   kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(>grbm_idx_mutex);
+
+   return 0;
+}
+
+/**
+ * keep_trap_enabled is ignored here but is a general interface requirement
+ * for devices that support multi-process debugging where the performance
+ * overhead from trap temporary setup needs to be bypassed when the debug
+ * session has ended.
+ */
+uint32_t kgd_gfx_v9_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid)
+{
+   mutex_lock(>grbm_idx_mutex);
+
+   kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true);
+
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_GDBG_TRAP_MASK), 0);
+
+   kgd_gfx_v9_set_wave_launch_stall(adev, vmid, false);
+
+   mutex_unlock(>grbm_idx_mutex);
+
+   return 0;
+}
+
 void kgd_gfx_v9_set_vm_context_page_table_base(struct amdgpu_device *adev,
uint32_t vmid, 

[PATCH 00/32] Upstream of kernel support for AMDGPU ISA debugging

2023-01-25 Thread Jonathan Kim
AMDGPU kernel upstream support for debugging of compute ISA.

Current production ROCm GDB interface for ISA debugging:
https://rocmdocs.amd.com/en/latest/ROCm_Tools/ROCgdb.html

WIP upstream source for ROCm GDB API, ROC Kernel and ROC Thunk can be 
referenced here:
https://github.com/ROCm-Developer-Tools/ROCdbgapi/tree/wip-dbgapi
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/wip-dbgapi
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/wip-dbgapi




RE: [PATCH] drm/amdgpu: update wave data type to 3 for gfx11

2023-01-25 Thread Joshi, Mukul
[AMD Official Use Only - General]

Reviewed-by: Mukul Joshi 

> -Original Message-
> From: amd-gfx  On Behalf Of
> Graham Sider
> Sent: Tuesday, January 17, 2023 2:42 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Sider, Graham 
> Subject: [PATCH] drm/amdgpu: update wave data type to 3 for gfx11
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> SQ_WAVE_INST_DW0 isn't present on gfx11 compared to gfx10, so update
> wave data type to signify a difference.
> 
> Signed-off-by: Graham Sider 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index f98c67d07a9b..f821309f48c9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -754,8 +754,8 @@ static void gfx_v11_0_read_wave_data(struct
> amdgpu_device *adev, uint32_t simd,
>  * zero here */
> WARN_ON(simd != 0);
> 
> -   /* type 2 wave data */
> -   dst[(*no_fields)++] = 2;
> +   /* type 3 wave data */
> +   dst[(*no_fields)++] = 3;
> dst[(*no_fields)++] = wave_read_ind(adev, wave,
> ixSQ_WAVE_STATUS);
> dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_PC_LO);
> dst[(*no_fields)++] = wave_read_ind(adev, wave, ixSQ_WAVE_PC_HI);
> --
> 2.25.1


Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 10:33 AM Matthew Wilcox  wrote:
>
> On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> > +/* Use when VMA is not part of the VMA tree and needs no locking */
> > +static inline void init_vm_flags(struct vm_area_struct *vma,
> > +  unsigned long flags)
> > +{
> > + vma->vm_flags = flags;
>
> vm_flags are supposed to have type vm_flags_t.  That's not been
> fully realised yet, but perhaps we could avoid making it worse?
>
> >   pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > +  * Flags, see mm.h.
> > +  * WARNING! Do not modify directly.
> > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > +  */
> > + unsigned long vm_flags;
>
> Including changing this line to vm_flags_t

Good point. Will make the change. Thanks!



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 10:37 AM Matthew Wilcox  wrote:
>
> On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > > + /*
> > > > +  * Flags, see mm.h.
> > > > +  * WARNING! Do not modify directly.
> > > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > > +  */
> > > > + unsigned long vm_flags;
> > >
> > > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> >
> > Thanks for pointing this out, Peter! I guess for that I'll need to
> > convert all read accesses and provide get_vm_flags() too? That will
> > cause some additional churt (a quick search shows 801 hits over 248
> > files) but maybe it's worth it? I think Michal suggested that too in
> > another patch. Should I do that while we are at it?
>
> Here's a trick I saw somewhere in the VFS:
>
> union {
> const vm_flags_t vm_flags;
> vm_flags_t __private __vm_flags;
> };
>
> Now it can be read by anybody but written only by those using
> ACCESS_PRIVATE.

Huh, this is quite nice! I think it does not save us from the cases
when vma->vm_flags is passed by a reference and modified indirectly,
like in ksm_madvise()? Though maybe such usecases are so rare (I found
only 2 cases) that we can ignore this?



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > + /*
> > > +  * Flags, see mm.h.
> > > +  * WARNING! Do not modify directly.
> > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > +  */
> > > + unsigned long vm_flags;
> >
> > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> 
> Thanks for pointing this out, Peter! I guess for that I'll need to
> convert all read accesses and provide get_vm_flags() too? That will
> cause some additional churt (a quick search shows 801 hits over 248
> files) but maybe it's worth it? I think Michal suggested that too in
> another patch. Should I do that while we are at it?

Here's a trick I saw somewhere in the VFS:

union {
const vm_flags_t vm_flags;
vm_flags_t __private __vm_flags;
};

Now it can be read by anybody but written only by those using
ACCESS_PRIVATE.



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +  unsigned long flags)
> +{
> + vma->vm_flags = flags;

vm_flags are supposed to have type vm_flags_t.  That's not been
fully realised yet, but perhaps we could avoid making it worse?

>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

Including changing this line to vm_flags_t



Re: [PATCH] drm/amd: Allow s0ix without BIOS support

2023-01-25 Thread Alex Deucher
On Wed, Jan 25, 2023 at 1:33 PM Mario Limonciello
 wrote:
>
> We guard the suspend entry code from running unless we have proper
> BIOS support for either S3 mode or s0ix mode.
>
> If a user's system doesn't support either of these modes the kernel
> still does offer s2idle in `/sys/power/mem_sleep` so there is an
> expectation from users that it works even if the power consumption
> remains very high.
>
> Rafael Ávila de Espíndola reports that a system of his has a
> non-functional graphics stack after resuming.  That system doesn't
> support S3 and the FADT doesn't indicate support for low power idle.
>
> Through some experimentation it was concluded that even without the
> hardware s0i3 support provided by the amd_pmc driver the power
> consumption over suspend is decreased by running amdgpu's s0ix
> suspend routine.
>
> The numbers over suspend showed:
> * No patch: 9.2W
> * Skip amdgpu suspend entirely: 10.5W
> * Run amdgpu s0ix routine: 7.7W
>
> As this does improve the power, remove some of the guard rails in
> `amdgpu_acpi.c` for only running s0ix suspend routines in the right
> circumstances.
>
> However if this turns out to cause regressions for anyone, we should
> revert this change and instead opt for skipping suspend/resume routines
> entirely or try to fix the underlying behavior that makes graphics fail
> after resume without underlying platform support.
>
> Reported-by: Rafael Ávila de Espíndola 
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
> Signed-off-by: Mario Limonciello 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 57b5e11446c65..fa7375b97fd47 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1079,20 +1079,16 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device 
> *adev)
>  * S0ix even though the system is suspending to idle, so return false
>  * in that case.
>  */
> -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> dev_warn_once(adev->dev,
>   "Power consumption will be higher as BIOS has 
> not been configured for suspend-to-idle.\n"
>   "To use suspend-to-idle change the sleep mode 
> in BIOS setup.\n");
> -   return false;
> -   }
>
>  #if !IS_ENABLED(CONFIG_AMD_PMC)
> dev_warn_once(adev->dev,
>   "Power consumption will be higher as the kernel has not 
> been compiled with CONFIG_AMD_PMC.\n");
> -   return false;
> -#else
> -   return true;
>  #endif /* CONFIG_AMD_PMC */
> +   return true;
>  }
>
>  #endif /* CONFIG_SUSPEND */
> --
> 2.25.1
>


[PATCH] drm/amd: Allow s0ix without BIOS support

2023-01-25 Thread Mario Limonciello
We guard the suspend entry code from running unless we have proper
BIOS support for either S3 mode or s0ix mode.

If a user's system doesn't support either of these modes the kernel
still does offer s2idle in `/sys/power/mem_sleep` so there is an
expectation from users that it works even if the power consumption
remains very high.

Rafael Ávila de Espíndola reports that a system of his has a
non-functional graphics stack after resuming.  That system doesn't
support S3 and the FADT doesn't indicate support for low power idle.

Through some experimentation it was concluded that even without the
hardware s0i3 support provided by the amd_pmc driver the power
consumption over suspend is decreased by running amdgpu's s0ix
suspend routine.

The numbers over suspend showed:
* No patch: 9.2W
* Skip amdgpu suspend entirely: 10.5W
* Run amdgpu s0ix routine: 7.7W

As this does improve the power, remove some of the guard rails in
`amdgpu_acpi.c` for only running s0ix suspend routines in the right
circumstances.

However if this turns out to cause regressions for anyone, we should
revert this change and instead opt for skipping suspend/resume routines
entirely or try to fix the underlying behavior that makes graphics fail
after resume without underlying platform support.

Reported-by: Rafael Ávila de Espíndola 
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 57b5e11446c65..fa7375b97fd47 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1079,20 +1079,16 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device 
*adev)
 * S0ix even though the system is suspending to idle, so return false
 * in that case.
 */
-   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
+   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
dev_warn_once(adev->dev,
  "Power consumption will be higher as BIOS has not 
been configured for suspend-to-idle.\n"
  "To use suspend-to-idle change the sleep mode in 
BIOS setup.\n");
-   return false;
-   }
 
 #if !IS_ENABLED(CONFIG_AMD_PMC)
dev_warn_once(adev->dev,
  "Power consumption will be higher as the kernel has not 
been compiled with CONFIG_AMD_PMC.\n");
-   return false;
-#else
-   return true;
 #endif /* CONFIG_AMD_PMC */
+   return true;
 }
 
 #endif /* CONFIG_SUSPEND */
-- 
2.25.1



[PATCH] drm/amd/display: disable S/G display on DCN 3.1.2/3

2023-01-25 Thread Alex Deucher
Causes flickering or white screens in some configurations.
Disable it for now until we can fix the issue.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2352
Cc: roman...@amd.com
Cc: yifan1.zh...@amd.com
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 42d99bf4bbc9..0c6b60183b0d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1536,8 +1536,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
break;
case IP_VERSION(2, 1, 0):
case IP_VERSION(3, 0, 1):
-   case IP_VERSION(3, 1, 2):
-   case IP_VERSION(3, 1, 3):
case IP_VERSION(3, 1, 6):
init_data.flags.gpu_vm_support = true;
break;
-- 
2.39.1



Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 9:08 AM Michal Hocko  wrote:
>
> On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote:
> > On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
> >  wrote:
> > >
> > > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > > > Replace indirect modifications to vma->vm_flags with calls to modifier
> > > > functions to be able to track flag changes and to keep vma locking
> > > > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > > > vm_flags modification attempts.
> > >
> > > Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> > > gueess we should be willing to trust it.
> >
> > Yes, but I really want to prevent an indirect misuse since it was not
> > easy to find these. If you feel strongly about it I will remove them
> > or if you have a better suggestion I'm all for it.
>
> You can avoid that by making flags inaccesible directly, right?

Ah, you mean Peter's suggestion of using __private? I guess that would
cover it. I'll drop these BUG_ONs in the next version. Thanks!

>
> --
> Michal Hocko
> SUSE Labs



[PATCH 2/3] drm/amdgpu: always sending PSP messages LOAD_ASD and UNLOAD_TA

2023-01-25 Thread vitaly.prosyak
From: Vitaly Prosyak 

We allow sending PSP messages LOAD_ASD and UNLOAD_TA without
acquiring a lock in drm_dev_enter during driver unload
because we must call drm_dev_unplug as the beginning
of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit, the following commands would work
 -sudo modprobe -r amdgpu
 -sudo modprobe amdgpu

Signed-off-by: Vitaly Prosyak 
Reviewed-by Alex Deucher 
Change-Id: I57f65fe820e2f7055f8065cd18c63fe6ff3ab694
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index a8391f269cd0..40929f34447c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -606,12 +606,21 @@ psp_cmd_submit_buf(struct psp_context *psp,
int timeout = 2;
bool ras_intr = false;
bool skip_unsupport = false;
+   bool dev_entered;
 
if (psp->adev->no_hw_access)
return 0;
 
-   if (!drm_dev_enter(adev_to_drm(psp->adev), ))
-   return 0;
+   dev_entered = drm_dev_enter(adev_to_drm(psp->adev), );
+   /*
+* We allow sending PSP messages LOAD_ASD and UNLOAD_TA without 
acquiring
+* a lock in drm_dev_enter during driver unload because we must call
+* drm_dev_unplug as the beginning  of unload driver sequence . It is 
very
+* crucial that userspace can't access device instances anymore.
+*/
+   if (!dev_entered)
+   WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD &&
+   psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA);
 
memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
 
@@ -676,7 +685,8 @@ psp_cmd_submit_buf(struct psp_context *psp,
}
 
 exit:
-   drm_dev_exit(idx);
+   if (dev_entered)
+   drm_dev_exit(idx);
return ret;
 }
 
-- 
2.25.1



[PATCH 2/3] drm/amdgpu: always sending PSP messages LOAD_ASD and UNLOAD_TA

2023-01-25 Thread vitaly.prosyak
From: Vitaly Prosyak 

We allow sending PSP messages LOAD_ASD and UNLOAD_TA without
acquiring a lock in drm_dev_enter during driver unload
because we must call drm_dev_unplug as the beginning
of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit, the following commands would work
 -sudo modprobe -r amdgpu
 -sudo modprobe amdgpu

Signed-off-by: Vitaly Prosyak 
Reviewed-by Alex Deucher 
Change-Id: I57f65fe820e2f7055f8065cd18c63fe6ff3ab694
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index a8391f269cd0..40929f34447c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -606,12 +606,21 @@ psp_cmd_submit_buf(struct psp_context *psp,
int timeout = 2;
bool ras_intr = false;
bool skip_unsupport = false;
+   bool dev_entered;
 
if (psp->adev->no_hw_access)
return 0;
 
-   if (!drm_dev_enter(adev_to_drm(psp->adev), ))
-   return 0;
+   dev_entered = drm_dev_enter(adev_to_drm(psp->adev), );
+   /*
+* We allow sending PSP messages LOAD_ASD and UNLOAD_TA without 
acquiring
+* a lock in drm_dev_enter during driver unload because we must call
+* drm_dev_unplug as the beginning  of unload driver sequence . It is 
very
+* crucial that userspace can't access device instances anymore.
+*/
+   if (!dev_entered)
+   WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD &&
+   psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA);
 
memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
 
@@ -676,7 +685,8 @@ psp_cmd_submit_buf(struct psp_context *psp,
}
 
 exit:
-   drm_dev_exit(idx);
+   if (dev_entered)
+   drm_dev_exit(idx);
return ret;
 }
 
-- 
2.25.1



[PATCH 3/3] drm/amdgpu: use pci_dev_is_disconnected

2023-01-25 Thread vitaly.prosyak
From: Vitaly Prosyak 

Added condition for pci_dev_is_disconnected and keeps
drm_dev_is_unplugged to check whether we should unmap MMIO.
Suggested by Alex regarding pci_dev_is_disconnected.
Suggested by Christian keeping drm_dev_is_unplugged.

Signed-off-by: Vitaly Prosyak 
Reviewed-by Alex Deucher 
Reviewed-by Christian Koenig 
Change-Id: I618c471cd398437d4ed6dec6d22be78e12683ae6
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a10b627c8357..d3568e1ded23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -78,6 +78,8 @@
 
 #include 
 
+#include "../../../../pci/pci.h"
+
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -4031,7 +4033,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
amdgpu_gart_dummy_page_fini(adev);
 
-   if (drm_dev_is_unplugged(adev_to_drm(adev)))
+   if (pci_dev_is_disconnected(adev->pdev) &&
+   drm_dev_is_unplugged(adev_to_drm(adev)))
amdgpu_device_unmap_mmio(adev);
 
 }
-- 
2.25.1



  1   2   >