Re: [PATCH] drm: amd: clean up dcn32_fpu.c kernel-doc

2022-10-03 Thread Rodrigo Siqueira Jordao




On 2022-10-01 00:33, Randy Dunlap wrote:

Rectify multiple kernel-doc warnings in dcn32_fpu.c.
E.g.:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:247: warning: 
This comment starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
 * Finds dummy_latency_index when MCLK switching using firmware based
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:484: warning: 
Function parameter or member 'phantom_stream' not described in 
'dcn32_set_phantom_stream_timing'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:601: warning: 
Function parameter or member 'dc' not described in 'dcn32_assign_subvp_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:601: warning: 
Function parameter or member 'context' not described in 
'dcn32_assign_subvp_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:601: warning: 
Function parameter or member 'index' not described in 'dcn32_assign_subvp_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2140: warning: 
Function parameter or member 'dc' not described in 
'dcn32_update_bw_bounding_box_fpu'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2140: warning: 
Function parameter or member 'bw_params' not described in 
'dcn32_update_bw_bounding_box_fpu'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2140: warning: 
expecting prototype for dcn32_update_bw_bounding_box(). Prototype was for 
dcn32_update_bw_bounding_box_fpu() instead

Reported-by: kernel test robot 
Signed-off-by: Randy Dunlap 
Cc: George Shen 
Cc: Alvin Lee 
Cc: Nevenko Stupar 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Rodrigo Siqueira 
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: Alex Deucher 
Cc: Christian König 
Cc: "Pan, Xinhui" 
---
  drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c |  116 --
  1 file changed, 49 insertions(+), 67 deletions(-)

--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -243,7 +243,7 @@ void dcn32_build_wm_range_table_fpu(stru

clk_mgr->base.bw_params->wm_table.nv_entries[WM_D].pmfw_breakdown.max_uclk = 
0x;
  }
  
-/**

+/*
   * Finds dummy_latency_index when MCLK switching using firmware based
   * vblank stretch is enabled. This function will iterate through the
   * table of dummy pstate latencies until the lowest value that allows
@@ -290,15 +290,14 @@ int dcn32_find_dummy_latency_index_for_f
  /**
   * dcn32_helper_populate_phantom_dlg_params - Get DLG params for phantom pipes
   * and populate pipe_ctx with those params.
- *
- * This function must be called AFTER the phantom pipes are added to context
- * and run through DML (so that the DLG params for the phantom pipes can be
- * populated), and BEFORE we program the timing for the phantom pipes.
- *
   * @dc: [in] current dc state
   * @context: [in] new dc state
   * @pipes: [in] DML pipe params array
   * @pipe_cnt: [in] DML pipe count
+ *
+ * This function must be called AFTER the phantom pipes are added to context
+ * and run through DML (so that the DLG params for the phantom pipes can be
+ * populated), and BEFORE we program the timing for the phantom pipes.
   */
  void dcn32_helper_populate_phantom_dlg_params(struct dc *dc,
  struct dc_state *context,
@@ -331,8 +330,9 @@ void dcn32_helper_populate_phantom_dlg_p
  }
  
  /**

- * 
***
- * dcn32_predict_pipe_split: Predict if pipe split will occur for a given DML 
pipe
+ * dcn32_predict_pipe_split - Predict if pipe split will occur for a given DML 
pipe
+ * @context: [in] New DC state to be programmed
+ * @pipe_e2e: [in] DML pipe end to end context
   *
   * This function takes in a DML pipe (pipe_e2e) and predicts if pipe split is 
required (both
   * ODM and MPC). For pipe split, ODM combine is determined by the ODM mode, 
and MPC combine is
@@ -343,12 +343,7 @@ void dcn32_helper_populate_phantom_dlg_p
   * - MPC combine is only chosen if there is no ODM combine requirements / 
policy in place, and
   *   MPC is required
   *
- * @param [in]: context: New DC state to be programmed
- * @param [in]: pipe_e2e: DML pipe end to end context
- *
- * @return: Number of splits expected (1 for 2:1 split, 3 for 4:1 split, 0 for 
no splits).
- *
- * 
***
+ * Return: Number of splits expected (1 for 2:1 split, 3 for 4:1 split, 0 for 
no splits).
   */
  uint8_t dcn32_predict_pipe_split(struct dc_state *context,
  display_e2e_pipe_params_st *pipe_e2e)
@@ -504,7 +499,14 @@ void insert_entry_into_table_sorted(stru
  }
  
  /**

- * dcn32_set_phantom_stream_timing: Set timing params for the phantom stream
+ * dcn32_set_phantom_stream_timing - Set 

Re: [PATCH v5] drm/amd/display: Fix vblank refcount in vrr transition

2022-10-03 Thread Rodrigo Siqueira Jordao




On 2022-09-21 17:20, Yunxiang Li wrote:

manage_dm_interrupts disable/enable vblank using drm_crtc_vblank_off/on
which causes drm_crtc_vblank_get in vrr_transition to fail, and later
when drm_crtc_vblank_put is called the refcount on vblank will be messed
up. Therefore move the call to after manage_dm_interrupts.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1247
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1380

Signed-off-by: Yunxiang Li 
---
v2: check the return code for calls that might fail and warn on them
v3/v4: make the sequence closer to the original and remove redundant local 
variables
v5: add bug tracking info

  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 55 +--
  1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ece2003a74cc..97cc8ceaeea0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7484,15 +7484,15 @@ static void amdgpu_dm_handle_vrr_transition(struct 
dm_crtc_state *old_state,
 * We also need vupdate irq for the actual core vblank handling
 * at end of vblank.
 */
-   dm_set_vupdate_irq(new_state->base.crtc, true);
-   drm_crtc_vblank_get(new_state->base.crtc);
+   WARN_ON(dm_set_vupdate_irq(new_state->base.crtc, true) != 0);
+   WARN_ON(drm_crtc_vblank_get(new_state->base.crtc) != 0);
DRM_DEBUG_DRIVER("%s: crtc=%u VRR off->on: Get vblank ref\n",
 __func__, new_state->base.crtc->base.id);
} else if (old_vrr_active && !new_vrr_active) {
/* Transition VRR active -> inactive:
 * Allow vblank irq disable again for fixed refresh rate.
 */
-   dm_set_vupdate_irq(new_state->base.crtc, false);
+   WARN_ON(dm_set_vupdate_irq(new_state->base.crtc, false) != 0);
drm_crtc_vblank_put(new_state->base.crtc);
DRM_DEBUG_DRIVER("%s: crtc=%u VRR on->off: Drop vblank ref\n",
 __func__, new_state->base.crtc->base.id);
@@ -8257,23 +8257,6 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
mutex_unlock(>dc_lock);
}
  
-	/* Count number of newly disabled CRTCs for dropping PM refs later. */

-   for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state,
- new_crtc_state, i) {
-   if (old_crtc_state->active && !new_crtc_state->active)
-   crtc_disable_count++;
-
-   dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
-   dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
-
-   /* For freesync config update on crtc state and params for irq 
*/
-   update_stream_irq_parameters(dm, dm_new_crtc_state);
-
-   /* Handle vrr on->off / off->on transitions */
-   amdgpu_dm_handle_vrr_transition(dm_old_crtc_state,
-   dm_new_crtc_state);
-   }
-
/**
 * Enable interrupts for CRTCs that are newly enabled or went through
 * a modeset. It was intentionally deferred until after the front end
@@ -8283,16 +8266,29 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, 
new_crtc_state, i) {
struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);
  #ifdef CONFIG_DEBUG_FS
-   bool configure_crc = false;
enum amdgpu_dm_pipe_crc_source cur_crc_src;
  #if defined(CONFIG_DRM_AMD_SECURE_DISPLAY)
-   struct crc_rd_work *crc_rd_wrk = dm->crc_rd_wrk;
+   struct crc_rd_work *crc_rd_wrk;
+#endif
+#endif
+   /* Count number of newly disabled CRTCs for dropping PM refs 
later. */
+   if (old_crtc_state->active && !new_crtc_state->active)
+   crtc_disable_count++;
+
+   dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
+   dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
+
+   /* For freesync config update on crtc state and params for irq 
*/
+   update_stream_irq_parameters(dm, dm_new_crtc_state);
+
+#ifdef CONFIG_DEBUG_FS
+#if defined(CONFIG_DRM_AMD_SECURE_DISPLAY)
+   crc_rd_wrk = dm->crc_rd_wrk;
  #endif
spin_lock_irqsave(_to_drm(adev)->event_lock, flags);
cur_crc_src = acrtc->dm_irq_params.crc_src;
spin_unlock_irqrestore(_to_drm(adev)->event_lock, flags);
  #endif
-   dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
  
  		if (new_crtc_state->active &&

(!old_crtc_state->active ||
@@ -8300,16 +8296,19 @@ static 

[PATCH] drm/amd/amdgpu: Update debugfs SRBM/GRBM logic

2022-10-03 Thread Tom St Denis
Currently our debugfs bank selection logic only
supports a single GC instance.  This updates the functions
amdgpu_gfx_select_se_sh() and amdgpu_gfx_select_me_pipe_q()
to support a GC instance parameter and ultimately a GC
instance selection via the IOCTL to debugfs.

Signed-off-by: Tom St Denis 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 44 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  7 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_umr.h | 15 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 17 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 17 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 18 +++--
 9 files changed, 98 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 6066aebf491c..3a09028277f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -139,7 +139,7 @@ static int  amdgpu_debugfs_process_reg_op(bool read, struct 
file *f,
sh_bank, instance_bank);
} else if (use_ring) {
mutex_lock(>srbm_mutex);
-   amdgpu_gfx_select_me_pipe_q(adev, me, pipe, queue, vmid);
+   amdgpu_gfx_select_me_pipe_q(adev, me, pipe, queue, vmid, 0);
}
 
if (pm_pg_lock)
@@ -172,7 +172,7 @@ static int  amdgpu_debugfs_process_reg_op(bool read, struct 
file *f,
amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x);
mutex_unlock(>grbm_idx_mutex);
} else if (use_ring) {
-   amdgpu_gfx_select_me_pipe_q(adev, 0, 0, 0, 0);
+   amdgpu_gfx_select_me_pipe_q(adev, 0, 0, 0, 0, 0);
mutex_unlock(>srbm_mutex);
}
 
@@ -261,15 +261,15 @@ static ssize_t amdgpu_debugfs_regs2_op(struct file *f, 
char __user *buf, u32 off
return -EINVAL;
}
mutex_lock(>grbm_idx_mutex);
-   amdgpu_gfx_select_se_sh(adev, rd->id.grbm.se,
+   amdgpu_gfx_select_se_sh_instanced(adev, rd->id.grbm.se,
rd->id.grbm.sh,
-   
rd->id.grbm.instance);
+   
rd->id.grbm.instance, rd->id.gc_instance);
}
 
if (rd->id.use_srbm) {
mutex_lock(>srbm_mutex);
amdgpu_gfx_select_me_pipe_q(adev, rd->id.srbm.me, 
rd->id.srbm.pipe,
-   
rd->id.srbm.queue, rd->id.srbm.vmid);
+   
rd->id.srbm.queue, rd->id.srbm.vmid, rd->id.gc_instance);
}
 
if (rd->id.pg_lock)
@@ -295,12 +295,12 @@ static ssize_t amdgpu_debugfs_regs2_op(struct file *f, 
char __user *buf, u32 off
}
 end:
if (rd->id.use_grbm) {
-   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x);
+   amdgpu_gfx_select_se_sh_instanced(adev, 0x, 0x, 
0x, 0);
mutex_unlock(>grbm_idx_mutex);
}
 
if (rd->id.use_srbm) {
-   amdgpu_gfx_select_me_pipe_q(adev, 0, 0, 0, 0);
+   amdgpu_gfx_select_me_pipe_q(adev, 0, 0, 0, 0, 0);
mutex_unlock(>srbm_mutex);
}
 
@@ -319,17 +319,39 @@ static ssize_t amdgpu_debugfs_regs2_op(struct file *f, 
char __user *buf, u32 off
 static long amdgpu_debugfs_regs2_ioctl(struct file *f, unsigned int cmd, 
unsigned long data)
 {
struct amdgpu_debugfs_regs2_data *rd = f->private_data;
+   struct amdgpu_debugfs_regs2_iocdata v1_data;
int r;
 
+   mutex_lock(>lock);
+
switch (cmd) {
+   case AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE_V2:
+   r = copy_from_user(>id, (struct 
amdgpu_debugfs_regs2_iocdata_v2 *)data, sizeof rd->id);
+   if (r)
+   return r ? -EINVAL : 0;
+   goto done;
case AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE:
-   mutex_lock(>lock);
-   r = copy_from_user(>id, (struct 
amdgpu_debugfs_regs2_iocdata *)data, sizeof rd->id);
-   mutex_unlock(>lock);
-   return r ? -EINVAL : 0;
+   r = copy_from_user(_data, (struct 
amdgpu_debugfs_regs2_iocdata *)data, sizeof v1_data);
+   if (r)
+   return r ? -EINVAL : 0;
+   goto v1_copy;
default:
return -EINVAL;
}
+
+v1_copy:
+   rd->id.use_srbm = v1_data.use_srbm;
+   rd->id.use_grbm = v1_data.use_grbm;
+   rd->id.pg_lock = v1_data.pg_lock;
+   rd->id.grbm.se = v1_data.grbm.se;

Re: [PATCH] drm/amd/display: Enable dpia support for dcn314

2022-10-03 Thread Limonciello, Mario

On 10/3/2022 13:00, roman...@amd.com wrote:

From: Roman Li 

[Why]
DCN 3.1.4 supports DPIA.

[How]
  - Set dpia_supported flag for dcn314 in dmub_hw_init()
  - Remove comment that becomes irrelevant after this change.

Signed-off-by: Roman Li 
Reviewed-by: Nicholas Kazlauskas 


FYI As this is just adding the ID, I think this is a good stable candidate.

Cc: sta...@vger.kernel.org # 6.0
Reviewed-by: Mario Limonciello 


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ece2003a74cc..8471c21496c9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1105,7 +1105,8 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
hw_params.fb[i] = _info->fb[i];
  
  	switch (adev->ip_versions[DCE_HWIP][0]) {

-   case IP_VERSION(3, 1, 3): /* Only for this asic hw internal rev B0 */
+   case IP_VERSION(3, 1, 3):
+   case IP_VERSION(3, 1, 4):
hw_params.dpia_supported = true;
hw_params.disable_dpia = 
adev->dm.dc->debug.dpia_debug.bits.disable_dpia;
break;




[PATCH] drm/amd/display: Enable dpia support for dcn314

2022-10-03 Thread Roman.Li
From: Roman Li 

[Why]
DCN 3.1.4 supports DPIA.

[How]
 - Set dpia_supported flag for dcn314 in dmub_hw_init()
 - Remove comment that becomes irrelevant after this change.

Signed-off-by: Roman Li 
Reviewed-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ece2003a74cc..8471c21496c9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1105,7 +1105,8 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
hw_params.fb[i] = _info->fb[i];
 
switch (adev->ip_versions[DCE_HWIP][0]) {
-   case IP_VERSION(3, 1, 3): /* Only for this asic hw internal rev B0 */
+   case IP_VERSION(3, 1, 3):
+   case IP_VERSION(3, 1, 4):
hw_params.dpia_supported = true;
hw_params.disable_dpia = 
adev->dm.dc->debug.dpia_debug.bits.disable_dpia;
break;
-- 
2.17.1



Re: [PATCH v2 1/8] mm/memory.c: Fix race when faulting a device private page

2022-10-03 Thread Felix Kuehling

Am 2022-10-02 um 20:53 schrieb Alistair Popple:

Felix Kuehling  writes:


On 2022-09-28 08:01, Alistair Popple wrote:

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called. However no
reference is taken on the faulting page. Therefore a concurrent
migration of the device private page can free the page and possibly the
underlying pgmap. This results in a race which can crash the kernel due
to the migrate_to_ram() function pointer becoming invalid. It also means
drivers can't reliably read the zone_device_data field because the page
may have been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl
to ensure it has not been freed. Unfortunately the elevated reference
count will cause the migration required to handle the fault to fail. To
avoid this failure pass the faulting page into the migrate_vma functions
so that if an elevated reference count is found it can be checked to see
if it's expected or not.

Do we really have to drag the fault_page all the way into the migrate structure?
Is the elevated refcount still needed at that time? Maybe it would be easier to
drop the refcount early in the ops->migrage_to_ram callbacks, so we won't have
to deal with it in all the migration code.

That would also work. Honestly I don't really like either solution :-)


Then we agree. :)



I didn't like having to plumb it all through the migration code
but I ended up going this way because I felt it was easier to explain
the life time of vmf->page for the migrate_to_ram() callback. This way
vmf->page is guaranteed to be valid for the duration of the
migrate_to_ram() callbak.

As you suggest we could instead have drivers call put_page(vmf->page)
somewhere in migrate_to_ram() before calling migrate_vma_setup(). The
reason I didn't go this way is IMHO it's more subtle because in general
the page will remain valid after that put_page() anyway. So it would be
easy for drivers to introduce a bug assuming the vmf->page is still
valid and not notice because most of the time it is.


I guess I'm biased because my migrate_to_ram implementation doesn't use 
the vmf->page at all. I agree that dropping the refcount in the callback 
is subtle. But handling an elevated refcount for just one special page 
in the migration code also looks a bit fragile to me.


It's not my call to make. But my preference is very clear. Either way, 
if the decision is to go with your solution, then you have my


Acked-by: Felix Kuehling 




Let me know if you disagree with my reasoning though - would appreciate
any review here.


Regards,
   Felix



Signed-off-by: Alistair Popple 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: Ralph Campbell 
Cc: Michael Ellerman 
Cc: Felix Kuehling 
Cc: Lyude Paul 
---
   arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
   drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
   include/linux/migrate.h  |  8 ++-
   lib/test_hmm.c   |  7 ++---
   mm/memory.c  | 16 +++-
   mm/migrate.c | 34 ++---
   mm/migrate_device.c  | 18 +
   9 files changed, 87 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 5980063..d4eacf4 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
   static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
unsigned long start,
unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+   struct kvm *kvm, unsigned long gpa, struct page *fault_page)
   {
unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
+   struct migrate_vma mig = { 0 };
struct page *dpage, *spage;
struct kvmppc_uvmem_page_pvt *pvt;
unsigned long pfn;
@@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
mig.dst = _pfn;
mig.pgmap_owner = _uvmem_pgmap;
mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   mig.fault_page = fault_page;
/* The requested page is already paged-out, nothing to do */
if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
@@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
*vma,
   static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
  unsigned long start, unsigned long end,
  unsigned long page_shift,
- struct kvm *kvm, unsigned long gpa)
+

[PATCH 1/1] drm/amdgpu: Set vmbo destroy after pt bo is created

2022-10-03 Thread Philip Yang
Under VRAM usage pression, map to GPU may fail to create pt bo and
vmbo->shadow_list is not initialized, then ttm_bo_release calling
amdgpu_bo_vm_destroy to access vmbo->shadow_list generates below
dmesg and NULL pointer access backtrace:

Set vmbo destroy callback to amdgpu_bo_vm_destroy only after creating pt
bo successfully, otherwise use default callback amdgpu_bo_destroy.

amdgpu: amdgpu_vm_bo_update failed
amdgpu: update_gpuvm_pte() failed
amdgpu: Failed to map bo to gpuvm
amdgpu :43:00.0: amdgpu: Failed to map peer::43:00.0 mem_domain:2
BUG: kernel NULL pointer dereference, address:
 RIP: 0010:amdgpu_bo_vm_destroy+0x4d/0x80 [amdgpu]
 Call Trace:
  
  ttm_bo_release+0x207/0x320 [amdttm]
  amdttm_bo_init_reserved+0x1d6/0x210 [amdttm]
  amdgpu_bo_create+0x1ba/0x520 [amdgpu]
  amdgpu_bo_create_vm+0x3a/0x80 [amdgpu]
  amdgpu_vm_pt_create+0xde/0x270 [amdgpu]
  amdgpu_vm_ptes_update+0x63b/0x710 [amdgpu]
  amdgpu_vm_update_range+0x2e7/0x6e0 [amdgpu]
  amdgpu_vm_bo_update+0x2bd/0x600 [amdgpu]
  update_gpuvm_pte+0x160/0x420 [amdgpu]
  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x313/0x1130 [amdgpu]
  kfd_ioctl_map_memory_to_gpu+0x115/0x390 [amdgpu]
  kfd_ioctl+0x24a/0x5b0 [amdgpu]

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4570ad449390..ae924db72b62 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -688,11 +688,11 @@ int amdgpu_bo_create_vm(struct amdgpu_device *adev,
 * num of amdgpu_vm_pt entries.
 */
BUG_ON(bp->bo_ptr_size < sizeof(struct amdgpu_bo_vm));
-   bp->destroy = _bo_vm_destroy;
r = amdgpu_bo_create(adev, bp, _ptr);
if (r)
return r;
 
+   bo_ptr->tbo.destroy = _bo_vm_destroy;
*vmbo_ptr = to_amdgpu_bo_vm(bo_ptr);
INIT_LIST_HEAD(&(*vmbo_ptr)->shadow_list);
return r;
-- 
2.35.1



Re: [PATCH] drm/amd/display: disable psr whenever applicable

2022-10-03 Thread S, Shirish

Ping!

Regards,

Shirish S

On 9/30/2022 7:17 PM, S, Shirish wrote:



On 9/30/2022 6:59 PM, Harry Wentland wrote:

+Leo

On 9/30/22 06:27, Shirish S wrote:

[Why]
psr feature continues to be enabled for non capable links.


Do you have more info on what issues you're seeing with this?


Code wise without this change we end up setting 
"vblank_disable_immediate" parameter to false for the failing links also.


Issue wise there is a remote chance of this leading to eDP/connected 
monitor not lighting up.



[How]
disable the feature on links that are not capable of the same.

Signed-off-by: Shirish S
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
index 8ca10ab3dfc1..f73af028f312 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
@@ -60,11 +60,17 @@ static bool link_supports_psrsu(struct dc_link *link)
   */
  void amdgpu_dm_set_psr_caps(struct dc_link *link)
  {
-   if (!(link->connector_signal & SIGNAL_TYPE_EDP))
+   if (!(link->connector_signal & SIGNAL_TYPE_EDP)) {
+   DRM_ERROR("Disabling PSR as connector is not eDP\n")

I don't think we should log an error here.


My objective of logging an error was to inform user/developer that 
this boot PSR enablement had issues.


Am fine with moving it to INFO or remove it, if you insist.

Thanks for your comments.

Regards,

Shirish S


+   link->psr_settings.psr_feature_enabled = false;
return;
+   }
  
-	if (link->type == dc_connection_none)

+   if (link->type == dc_connection_none) {
+   DRM_ERROR("Disabling PSR as eDP connection type is invalid\n")

Same here, this doesn't warrant an error log.

Harry


+   link->psr_settings.psr_feature_enabled = false;
return;
+   }
  
  	if (link->dpcd_caps.psr_info.psr_version == 0) {

link->psr_settings.psr_version = DC_PSR_VERSION_UNSUPPORTED;

[PATCH] drm: amd: clean up dcn32_fpu.c kernel-doc

2022-10-03 Thread Randy Dunlap
Rectify multiple kernel-doc warnings in dcn32_fpu.c.
E.g.:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:247: warning: 
This comment starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
* Finds dummy_latency_index when MCLK switching using firmware based
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:484: warning: 
Function parameter or member 'phantom_stream' not described in 
'dcn32_set_phantom_stream_timing'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:601: warning: 
Function parameter or member 'dc' not described in 'dcn32_assign_subvp_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:601: warning: 
Function parameter or member 'context' not described in 
'dcn32_assign_subvp_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:601: warning: 
Function parameter or member 'index' not described in 'dcn32_assign_subvp_pipe'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2140: warning: 
Function parameter or member 'dc' not described in 
'dcn32_update_bw_bounding_box_fpu'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2140: warning: 
Function parameter or member 'bw_params' not described in 
'dcn32_update_bw_bounding_box_fpu'
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/dcn32_fpu.c:2140: warning: 
expecting prototype for dcn32_update_bw_bounding_box(). Prototype was for 
dcn32_update_bw_bounding_box_fpu() instead

Reported-by: kernel test robot 
Signed-off-by: Randy Dunlap 
Cc: George Shen 
Cc: Alvin Lee 
Cc: Nevenko Stupar 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Rodrigo Siqueira 
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: Alex Deucher 
Cc: Christian König 
Cc: "Pan, Xinhui" 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c |  116 --
 1 file changed, 49 insertions(+), 67 deletions(-)

--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -243,7 +243,7 @@ void dcn32_build_wm_range_table_fpu(stru

clk_mgr->base.bw_params->wm_table.nv_entries[WM_D].pmfw_breakdown.max_uclk = 
0x;
 }
 
-/**
+/*
  * Finds dummy_latency_index when MCLK switching using firmware based
  * vblank stretch is enabled. This function will iterate through the
  * table of dummy pstate latencies until the lowest value that allows
@@ -290,15 +290,14 @@ int dcn32_find_dummy_latency_index_for_f
 /**
  * dcn32_helper_populate_phantom_dlg_params - Get DLG params for phantom pipes
  * and populate pipe_ctx with those params.
- *
- * This function must be called AFTER the phantom pipes are added to context
- * and run through DML (so that the DLG params for the phantom pipes can be
- * populated), and BEFORE we program the timing for the phantom pipes.
- *
  * @dc: [in] current dc state
  * @context: [in] new dc state
  * @pipes: [in] DML pipe params array
  * @pipe_cnt: [in] DML pipe count
+ *
+ * This function must be called AFTER the phantom pipes are added to context
+ * and run through DML (so that the DLG params for the phantom pipes can be
+ * populated), and BEFORE we program the timing for the phantom pipes.
  */
 void dcn32_helper_populate_phantom_dlg_params(struct dc *dc,
  struct dc_state *context,
@@ -331,8 +330,9 @@ void dcn32_helper_populate_phantom_dlg_p
 }
 
 /**
- * 
***
- * dcn32_predict_pipe_split: Predict if pipe split will occur for a given DML 
pipe
+ * dcn32_predict_pipe_split - Predict if pipe split will occur for a given DML 
pipe
+ * @context: [in] New DC state to be programmed
+ * @pipe_e2e: [in] DML pipe end to end context
  *
  * This function takes in a DML pipe (pipe_e2e) and predicts if pipe split is 
required (both
  * ODM and MPC). For pipe split, ODM combine is determined by the ODM mode, 
and MPC combine is
@@ -343,12 +343,7 @@ void dcn32_helper_populate_phantom_dlg_p
  * - MPC combine is only chosen if there is no ODM combine requirements / 
policy in place, and
  *   MPC is required
  *
- * @param [in]: context: New DC state to be programmed
- * @param [in]: pipe_e2e: DML pipe end to end context
- *
- * @return: Number of splits expected (1 for 2:1 split, 3 for 4:1 split, 0 for 
no splits).
- *
- * 
***
+ * Return: Number of splits expected (1 for 2:1 split, 3 for 4:1 split, 0 for 
no splits).
  */
 uint8_t dcn32_predict_pipe_split(struct dc_state *context,
  display_e2e_pipe_params_st *pipe_e2e)
@@ -504,7 +499,14 @@ void insert_entry_into_table_sorted(stru
 }
 
 /**
- * dcn32_set_phantom_stream_timing: Set timing params for the phantom stream
+ * dcn32_set_phantom_stream_timing - Set timing params for the phantom stream
+ * @dc: current dc state
+ * @context: new dc 

Re: [PATCH v2 1/8] mm/memory.c: Fix race when faulting a device private page

2022-10-03 Thread Alistair Popple


Felix Kuehling  writes:

> On 2022-09-28 08:01, Alistair Popple wrote:
>> When the CPU tries to access a device private page the migrate_to_ram()
>> callback associated with the pgmap for the page is called. However no
>> reference is taken on the faulting page. Therefore a concurrent
>> migration of the device private page can free the page and possibly the
>> underlying pgmap. This results in a race which can crash the kernel due
>> to the migrate_to_ram() function pointer becoming invalid. It also means
>> drivers can't reliably read the zone_device_data field because the page
>> may have been freed with memunmap_pages().
>>
>> Close the race by getting a reference on the page while holding the ptl
>> to ensure it has not been freed. Unfortunately the elevated reference
>> count will cause the migration required to handle the fault to fail. To
>> avoid this failure pass the faulting page into the migrate_vma functions
>> so that if an elevated reference count is found it can be checked to see
>> if it's expected or not.
>
> Do we really have to drag the fault_page all the way into the migrate 
> structure?
> Is the elevated refcount still needed at that time? Maybe it would be easier 
> to
> drop the refcount early in the ops->migrage_to_ram callbacks, so we won't have
> to deal with it in all the migration code.

That would also work. Honestly I don't really like either solution :-)

I didn't like having to plumb it all through the migration code
but I ended up going this way because I felt it was easier to explain
the life time of vmf->page for the migrate_to_ram() callback. This way
vmf->page is guaranteed to be valid for the duration of the
migrate_to_ram() callbak.

As you suggest we could instead have drivers call put_page(vmf->page)
somewhere in migrate_to_ram() before calling migrate_vma_setup(). The
reason I didn't go this way is IMHO it's more subtle because in general
the page will remain valid after that put_page() anyway. So it would be
easy for drivers to introduce a bug assuming the vmf->page is still
valid and not notice because most of the time it is.

Let me know if you disagree with my reasoning though - would appreciate
any review here.

> Regards,
>   Felix
>
>
>>
>> Signed-off-by: Alistair Popple 
>> Cc: Jason Gunthorpe 
>> Cc: John Hubbard 
>> Cc: Ralph Campbell 
>> Cc: Michael Ellerman 
>> Cc: Felix Kuehling 
>> Cc: Lyude Paul 
>> ---
>>   arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>>   include/linux/migrate.h  |  8 ++-
>>   lib/test_hmm.c   |  7 ++---
>>   mm/memory.c  | 16 +++-
>>   mm/migrate.c | 34 ++---
>>   mm/migrate_device.c  | 18 +
>>   9 files changed, 87 insertions(+), 41 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> index 5980063..d4eacf4 100644
>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>>   static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>>  unsigned long start,
>>  unsigned long end, unsigned long page_shift,
>> -struct kvm *kvm, unsigned long gpa)
>> +struct kvm *kvm, unsigned long gpa, struct page *fault_page)
>>   {
>>  unsigned long src_pfn, dst_pfn = 0;
>> -struct migrate_vma mig;
>> +struct migrate_vma mig = { 0 };
>>  struct page *dpage, *spage;
>>  struct kvmppc_uvmem_page_pvt *pvt;
>>  unsigned long pfn;
>> @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>> *vma,
>>  mig.dst = _pfn;
>>  mig.pgmap_owner = _uvmem_pgmap;
>>  mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>> +mig.fault_page = fault_page;
>>  /* The requested page is already paged-out, nothing to do */
>>  if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
>> @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
>> *vma,
>>   static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
>>unsigned long start, unsigned long end,
>>unsigned long page_shift,
>> -  struct kvm *kvm, unsigned long gpa)
>> +  struct kvm *kvm, unsigned long gpa,
>> +  struct page *fault_page)
>>   {
>>  int ret;
>>  mutex_lock(>arch.uvmem_lock);
>> -ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
>> +ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
>> +fault_page);

Re: [RFC PATCH 1/1] drm/amdgpu: Fix null-ptr-deref in amdgpu_device_fini_sw()

2022-10-03 Thread Steven J Abner

I had done more delving into this also, thankfully was not forgotten.
Additional info to solve blackout, was going to contact AMD:
The problem as far as I could trace occurs in amdgpu_psp.c in function 
psp_cmd_submit_buf().
Normal counter 'timeout' seems to use a max of about 400 with 
usleep_range(5, 80).
Under normal operation 'psp->fence_buf' will equal 'index' and drop 
from loop. I could not track what
fills the kernel buffer objects 'virtual' pointer's reference 
(psp->fence_buf). I assume that
it's to a firmware file that on read is returning an error, and getting 
stuck in a loop lock.
The error condition I found occurs when 'psp->fence_buf' does not equal 
'index',

breaking from loop with 'timeout' == 0.
Blackout seemed to be about 80% of reboots, but found that in 
reconfiguration of kernel,
on 5.18.19, with CONFIG_ATA_ACPI=y drops blackouts to about 30% of 
reboots. I have now avoided
all blackouts with addition of CONFIG_SATA_PMP=y (at least a week 
free?).
This is pure guess but, maybe the ARM PSP processor is dependent on 
libata procedures, one

of which causes a lock up some of the time.
Steve

On Fri, Sep 30, 2022 at 21:41, Zhang Boyang  
wrote:
After amdgpu_device_init() failed, adev->reset_domain may be NULL. 
Thus
subsequent call to amdgpu_device_fini_sw() may result in 
null-ptr-deref.


This patch fixes the problem by adding a NULL pointer check around the
code of releasing adev->reset_domain in amdgpu_device_fini_sw().

Fixes: cfbb6b004744 ("drm/amdgpu: Rework reset domain to be 
refcounted.")


Signed-off-by: Zhang Boyang 
Link: 
https://lore.kernel.org/lkml/a8bce489-8ccc-aa95-3de6-f854e03ad...@suddenlinkmail.com/

Link: https://lore.kernel.org/lkml/at9whr.3z1t3vi9a2...@att.net/
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index be7aff2d4a57..204daad06b32 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4021,8 +4021,10 @@ void amdgpu_device_fini_sw(struct 
amdgpu_device *adev)

if (adev->mman.discovery_bin)
amdgpu_discovery_fini(adev);

-   amdgpu_reset_put_reset_domain(adev->reset_domain);
-   adev->reset_domain = NULL;
+   if (adev->reset_domain) {
+   amdgpu_reset_put_reset_domain(adev->reset_domain);
+   adev->reset_domain = NULL;
+   }

kfree(adev->pci_state);

--
2.30.2






RE: [PATCH 1/2] drm/amdkfd: correct RB_SIZE in SDMA0_QUEUE0_RB_CNTL

2022-10-03 Thread Zhang, Yifan
[AMD Official Use Only - General]

Hi Felix,

Yes, I just realized ffs start counting from 1. Sorry for the noise.

Best Regards,
Yifan

-Original Message-
From: Kuehling, Felix  
Sent: Friday, September 30, 2022 10:55 PM
To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Ji, Ruili 
Subject: Re: [PATCH 1/2] drm/amdkfd: correct RB_SIZE in SDMA0_QUEUE0_RB_CNTL


Am 2022-09-30 um 02:16 schrieb Yifan Zhang:
> In SDMA0_QUEUE0_RB_CNTL, queue size is 2^RB_SIZE, not 2^(RB_SIZE +1).
>
> Signed-off-by: Yifan Zhang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> index d3e2b6a599a4..03699a9ad3d9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> @@ -329,7 +329,7 @@ static void update_mqd_sdma(struct mqd_manager *mm, void 
> *mqd,
>   struct v10_sdma_mqd *m;
>   
>   m = get_sdma_mqd(mqd);
> - m->sdmax_rlcx_rb_cntl = (ffs(q->queue_size / sizeof(unsigned int)) - 1)
> + m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)

I think these two are equivalent. ffs(1) == 1. order_base_2(1) == 0. 
You're not correcting anything. You're just writing it differently.

Regards,
   Felix


>   << SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
>   q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
>   1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT | diff 
> --git 
> a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> index 26b53b6d673e..451fcb9bb051 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> @@ -329,7 +329,7 @@ static void update_mqd_sdma(struct mqd_manager *mm, void 
> *mqd,
>   struct v11_sdma_mqd *m;
>   
>   m = get_sdma_mqd(mqd);
> - m->sdmax_rlcx_rb_cntl = (ffs(q->queue_size / sizeof(unsigned int)) - 1)
> + m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
>   << SDMA0_QUEUE0_RB_CNTL__RB_SIZE__SHIFT |
>   q->vmid << SDMA0_QUEUE0_RB_CNTL__RB_VMID__SHIFT |
>   1 << SDMA0_QUEUE0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |


Re: KMS atomic state sets, full vs. minimal (Re: [PATCH v3 0/6] Add support for atomic async page-flips)

2022-10-03 Thread Ville Syrjälä
On Mon, Oct 03, 2022 at 11:48:49AM +0300, Pekka Paalanen wrote:
> On Fri, 30 Sep 2022 18:45:09 +0300
> Ville Syrjälä  wrote:
> 
> > On Fri, Sep 30, 2022 at 06:37:00PM +0300, Pekka Paalanen wrote:
> > > On Fri, 30 Sep 2022 18:09:55 +0300
> > > Ville Syrjälä  wrote:
> > >   
> > > > That would actively discourage people from even attempting the
> > > > "just dump all the state into the ioctl" approach with async flips
> > > > since even the props whose value isn't even changing would be rejected. 
> > > >  
> > > 
> > > About that.
> > > 
> > > To me it looks like just a classic case of broken communication.
> > > 
> > > The kernel developers assume that of course userspace will minimize the
> > > state set to only those properties that change, because...?
> > > 
> > > Userspace developers think that of course the kernel will optimize the
> > > state set into minimal changes, because the kernel actually has the
> > > authoritative knowledge of what the current state is, and the driver
> > > actually knows which properties are costly and need to be optimized and
> > > which ones don't matter. It has never even occurred to me that the
> > > kernel would not compare next state to current state.
> > > 
> > > No-one ever documented any expectations, did they?  
> > 
> > Do you really want that for async flips? Async flips can't be
> > done atomically with anything else, so why are you even asking
> > the kernel to do that?
> 
> I'm not talking about async flips only.
> 
> I'm talking about atomic commits in general. I don't think it's a good
> idea to make async atomic commits behave fundamentally different from
> sync atomic commits wrt. what non-changing state you are allowed to
> list in your state set or not.
> 
> Isn't there common DRM code to convert an atomic commit state set into
> state objects? It probably starts by copying the current state, and
> then playing through the commit state set, setting all listed
> properties to their new values? Why wouldn't that loop maintain the
> knowledge of what actually changed?

Any such book keeping is entirely ad-hoc atm. It's also not super
obvious how much of it is actually useful.

You have to do a real commit on the crtc anyway if the crtc (or
on any of its associated objects) is listed in the commit, so
there's not necessarily much to be gained by tracking chages in
all properties.

And that behaviour again enters very muddy waters when combined
with the async flip flag for the entire commit. The current approach
being proposed seems to suggest that we can instead short circuit
async commits when nothing has changed. That is not at all how
sync atomic commits work.

> When you copy the current data, reset all changed-flags to false. When
> you apply each property from the commit set, compare the value and set
> the changed-flag only if the value changes.
> 
> This manufacturing of the new tentative state set happens at atomic
> commit ioctl time before the ioctl returns, right? So the current state
> is well-defined: any previous atomic sync or async commit can be assumed to
> have succeeded even if it hasn't applied in hardware yet if the commit
> ioctl for it succeeded, right?

Yes. We could certainly try to fully track all changes in
properties, but no has measured if there's any real benefit
of doing so.

-- 
Ville Syrjälä
Intel


Re: KMS atomic state sets, full vs. minimal (Re: [PATCH v3 0/6] Add support for atomic async page-flips)

2022-10-03 Thread Pekka Paalanen
On Fri, 30 Sep 2022 18:45:09 +0300
Ville Syrjälä  wrote:

> On Fri, Sep 30, 2022 at 06:37:00PM +0300, Pekka Paalanen wrote:
> > On Fri, 30 Sep 2022 18:09:55 +0300
> > Ville Syrjälä  wrote:
> >   
> > > That would actively discourage people from even attempting the
> > > "just dump all the state into the ioctl" approach with async flips
> > > since even the props whose value isn't even changing would be rejected.  
> > 
> > About that.
> > 
> > To me it looks like just a classic case of broken communication.
> > 
> > The kernel developers assume that of course userspace will minimize the
> > state set to only those properties that change, because...?
> > 
> > Userspace developers think that of course the kernel will optimize the
> > state set into minimal changes, because the kernel actually has the
> > authoritative knowledge of what the current state is, and the driver
> > actually knows which properties are costly and need to be optimized and
> > which ones don't matter. It has never even occurred to me that the
> > kernel would not compare next state to current state.
> > 
> > No-one ever documented any expectations, did they?  
> 
> Do you really want that for async flips? Async flips can't be
> done atomically with anything else, so why are you even asking
> the kernel to do that?

I'm not talking about async flips only.

I'm talking about atomic commits in general. I don't think it's a good
idea to make async atomic commits behave fundamentally different from
sync atomic commits wrt. what non-changing state you are allowed to
list in your state set or not.

Isn't there common DRM code to convert an atomic commit state set into
state objects? It probably starts by copying the current state, and
then playing through the commit state set, setting all listed
properties to their new values? Why wouldn't that loop maintain the
knowledge of what actually changed?

When you copy the current data, reset all changed-flags to false. When
you apply each property from the commit set, compare the value and set
the changed-flag only if the value changes.

This manufacturing of the new tentative state set happens at atomic
commit ioctl time before the ioctl returns, right? So the current state
is well-defined: any previous atomic sync or async commit can be assumed to
have succeeded even if it hasn't applied in hardware yet if the commit
ioctl for it succeeded, right?


Thanks,
pq


pgp0OLizrAqiL.pgp
Description: OpenPGP digital signature