from:"André Almeida"

[ANNOUNCE] Graphics & DRM MC at LPC 2024

2024-05-13 Thread André Almeida

The Graphics and DRM Microconference was accepted for Linux Plumbers 
2024! Please, submit your proposal in the following page:


https://lpc.events/event/18/abstracts/

Remember to select "Graphics & DRM MC" in the Track field. The deadline 
for submissions is Sunday 30th June. The accepted proposals will be 
listed here:


https://lpc.events/event/18/contributions/1664/

LPC 2024 will happen in Vienna, Austria, on September 18-20:

https://lpc.events/event/18/

Thanks!

[RESEND PATCH v4 3/3] drm/amdgpu: Make it possible to async flip overlay planes

2024-03-08 Thread André Almeida

amdgpu can handle async flips on overlay planes, so mark it as true
during the plane initialization.

Signed-off-by: André Almeida 
---
v4: new patch

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 8a4c40b4c27e..dc5392c08a87 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1708,6 +1708,7 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager 
*dm,
} else if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
unsigned int zpos = 1 + drm_plane_index(plane);
drm_plane_create_zpos_property(plane, zpos, 1, 254);
+   plane->async_flip = true;
} else if (plane->type == DRM_PLANE_TYPE_CURSOR) {
drm_plane_create_zpos_immutable_property(plane, 255);
}
-- 
2.43.0

[RESEND PATCH v4 2/3] drm: Allow drivers to choose plane types to async flip

2024-03-08 Thread André Almeida

Different planes may have different capabilities of doing async flips,
so create a field to let drivers allow async flip per plane type.

Signed-off-by: André Almeida 
---
v4: new patch

 drivers/gpu/drm/drm_atomic_uapi.c | 4 ++--
 drivers/gpu/drm/drm_plane.c   | 3 +++
 include/drm/drm_plane.h   | 5 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 1eecfb9e240d..5c66289188be 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1075,9 +1075,9 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
+   if (async_flip && !plane_state->plane->async_flip) {
drm_dbg_atomic(prop->dev,
-  "[OBJECT:%d] Only primary planes can be 
changed during async flip\n",
+  "[OBJECT:%d] This type of plane cannot 
be changed during async flip\n",
   obj->id);
ret = -EINVAL;
break;
diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
index 672c655c7a8e..71ada690222a 100644
--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -366,6 +366,9 @@ static int __drm_universal_plane_init(struct drm_device 
*dev,
 
drm_modeset_lock_init(>mutex);
 
+   if (type == DRM_PLANE_TYPE_PRIMARY)
+   plane->async_flip = true;
+
plane->base.properties = >properties;
plane->dev = dev;
plane->funcs = funcs;
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index 641fe298052d..5e9efb7321ac 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -779,6 +779,11 @@ struct drm_plane {
 * @hotspot_y_property: property to set mouse hotspot y offset.
 */
struct drm_property *hotspot_y_property;
+
+   /**
+* @async_flip: indicates if a plane can do async flips
+*/
+   bool async_flip;
 };
 
 #define obj_to_plane(x) container_of(x, struct drm_plane, base)
-- 
2.43.0

[RESEND PATCH v4 1/3] drm/atomic: Allow userspace to use explicit sync with atomic async flips

2024-03-08 Thread André Almeida

Allow userspace to use explicit synchronization with atomic async flips.
That means that the flip will wait for some hardware fence, and then
will flip as soon as possible (async) in regard of the vblank.

Signed-off-by: André Almeida 
---
v4: no changes

 drivers/gpu/drm/drm_atomic_uapi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 29d4940188d4..1eecfb9e240d 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1066,7 +1066,9 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && prop != config->prop_fb_id) {
+   if (async_flip &&
+   prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd) {
ret = drm_atomic_plane_get_property(plane, plane_state,
prop, _val);
ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
-- 
2.43.0

[RESEND PATCH v4 0/3] drm/atomic: Allow drivers to write their own plane check for async

2024-03-08 Thread André Almeida

Hi,

AMD hardware can do async flips with overlay planes, so this patchset does a
small redesign to allow drivers to choose per plane type if they can or cannot
do async flips.

It also allows async commits with IN_FENCE_ID in any driver.

Changes from v3:
- Major patchset redesign 
v3: https://lore.kernel.org/lkml/20240128212515.630345-1-andrealm...@igalia.com/

Changes from v2:
 - Allow IN_FENCE_ID for any driver
 - Allow overlay planes again
v2: https://lore.kernel.org/lkml/20240119181235.255060-1-andrealm...@igalia.com/

Changes from v1:
 - Drop overlay planes option for now
v1: 
https://lore.kernel.org/dri-devel/20240116045159.1015510-1-andrealm...@igalia.com/

André Almeida (3):
  drm/atomic: Allow userspace to use explicit sync with atomic async
flips
  drm: Allow drivers to choose plane types to async flip
  drm/amdgpu: Make it possible to async flip overlay planes

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 1 +
 drivers/gpu/drm/drm_atomic_uapi.c   | 8 +---
 drivers/gpu/drm/drm_plane.c | 3 +++
 include/drm/drm_plane.h | 5 +
 4 files changed, 14 insertions(+), 3 deletions(-)

-- 
2.43.0

[PATCH v4 3/3] drm/amdgpu: Make it possible to async flip overlay planes

2024-02-08 Thread André Almeida

amdgpu can handle async flips on overlay planes, so mark it as true
during the plane initialization.

Signed-off-by: André Almeida 
---
v4: new patch

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 8a4c40b4c27e..dc5392c08a87 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1708,6 +1708,7 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager 
*dm,
} else if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
unsigned int zpos = 1 + drm_plane_index(plane);
drm_plane_create_zpos_property(plane, zpos, 1, 254);
+   plane->async_flip = true;
} else if (plane->type == DRM_PLANE_TYPE_CURSOR) {
drm_plane_create_zpos_immutable_property(plane, 255);
}
-- 
2.43.0

[PATCH v4 1/3] drm/atomic: Allow userspace to use explicit sync with atomic async flips

2024-02-08 Thread André Almeida

Allow userspace to use explicit synchronization with atomic async flips.
That means that the flip will wait for some hardware fence, and then
will flip as soon as possible (async) in regard of the vblank.

Signed-off-by: André Almeida 
---
v4: no changes

 drivers/gpu/drm/drm_atomic_uapi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 29d4940188d4..1eecfb9e240d 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1066,7 +1066,9 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && prop != config->prop_fb_id) {
+   if (async_flip &&
+   prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd) {
ret = drm_atomic_plane_get_property(plane, plane_state,
prop, _val);
ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
-- 
2.43.0

[PATCH v4 2/3] drm: Allow drivers to choose plane types to async flip

2024-02-08 Thread André Almeida

Different planes may have different capabilities of doing async flips,
so create a field to let drivers allow async flip per plane type.

Signed-off-by: André Almeida 
---
v4: new patch

 drivers/gpu/drm/drm_atomic_uapi.c | 4 ++--
 drivers/gpu/drm/drm_plane.c   | 3 +++
 include/drm/drm_plane.h   | 5 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 1eecfb9e240d..5c66289188be 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1075,9 +1075,9 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
+   if (async_flip && !plane_state->plane->async_flip) {
drm_dbg_atomic(prop->dev,
-  "[OBJECT:%d] Only primary planes can be 
changed during async flip\n",
+  "[OBJECT:%d] This type of plane cannot 
be changed during async flip\n",
   obj->id);
ret = -EINVAL;
break;
diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
index 672c655c7a8e..71ada690222a 100644
--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -366,6 +366,9 @@ static int __drm_universal_plane_init(struct drm_device 
*dev,
 
drm_modeset_lock_init(>mutex);
 
+   if (type == DRM_PLANE_TYPE_PRIMARY)
+   plane->async_flip = true;
+
plane->base.properties = >properties;
plane->dev = dev;
plane->funcs = funcs;
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index 641fe298052d..5e9efb7321ac 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -779,6 +779,11 @@ struct drm_plane {
 * @hotspot_y_property: property to set mouse hotspot y offset.
 */
struct drm_property *hotspot_y_property;
+
+   /**
+* @async_flip: indicates if a plane can do async flips
+*/
+   bool async_flip;
 };
 
 #define obj_to_plane(x) container_of(x, struct drm_plane, base)
-- 
2.43.0

[PATCH v4 0/3] drm/atomic: Allow drivers to write their own plane check for async

2024-02-08 Thread André Almeida

Hi,

AMD hardware can do async flips with overlay planes, so this patchset does a
small redesign to allow drivers to choose per plane type if they can or cannot
do async flips.

It also allows async commits with IN_FENCE_ID in any driver.

Changes from v3:
- Major patchset redesign 
v3: https://lore.kernel.org/lkml/20240128212515.630345-1-andrealm...@igalia.com/

Changes from v2:
 - Allow IN_FENCE_ID for any driver
 - Allow overlay planes again
v2: https://lore.kernel.org/lkml/20240119181235.255060-1-andrealm...@igalia.com/

Changes from v1:
 - Drop overlay planes option for now
v1: 
https://lore.kernel.org/dri-devel/20240116045159.1015510-1-andrealm...@igalia.com/

André Almeida (3):
  drm/atomic: Allow userspace to use explicit sync with atomic async
flips
  drm: Allow drivers to choose plane types to async flip
  drm/amdgpu: Make it possible to async flip overlay planes

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 1 +
 drivers/gpu/drm/drm_atomic_uapi.c   | 8 +---
 drivers/gpu/drm/drm_plane.c | 3 +++
 include/drm/drm_plane.h | 5 +
 4 files changed, 14 insertions(+), 3 deletions(-)

-- 
2.43.0

Re: [PATCH v3 3/3] drm/amdgpu: Implement check_async_props for planes

2024-02-01 Thread André Almeida


Hi Sima,

Em 30/01/2024 07:56, Daniel Vetter escreveu:

On Sun, Jan 28, 2024 at 06:25:15PM -0300, André Almeida wrote:

AMD GPUs can do async flips with changes on more properties than just
the FB ID, so implement a custom check_async_props for AMD planes.

Allow amdgpu to do async flips with overlay planes as well.

Signed-off-by: André Almeida 
---
v3: allow overlay planes


This comment very much written with a lack of clearly better ideas, but:

Do we really need this much flexibility, especially for the first driver
adding the first few additional properties?

A simple bool on struct drm_plane to indicate whether async flips are ok
or not should also do this job here? Maybe a bit of work to roll that out
to the primary planes for current drivers, but not much. And wouldn't need
drivers to implement some very uapi-marshalling atomic code ...


Right, perhaps I can first write in the way you suggest, and later 
expand to the form I have proposed here if/when new properties arise.




Also we could probably remove the current drm_mode_config.async_flip flag
and entirely replace it with the per-plane one.
-Sima


  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +++
  1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 116121e647ca..ed75b69636b4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -25,6 +25,7 @@
   */
  
  #include 

+#include 
  #include 
  #include 
  #include 
@@ -1430,6 +1431,33 @@ static void 
amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane,
drm_atomic_helper_plane_destroy_state(plane, state);
  }
  
+static int amdgpu_dm_plane_check_async_props(struct drm_property *prop,

+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY &&
+   plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary or overlay planes can be 
changed during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
  static const struct drm_plane_funcs dm_plane_funcs = {
.update_plane   = drm_atomic_helper_update_plane,
.disable_plane  = drm_atomic_helper_disable_plane,
@@ -1438,6 +1466,7 @@ static const struct drm_plane_funcs dm_plane_funcs = {
.atomic_duplicate_state = amdgpu_dm_plane_drm_plane_duplicate_state,
.atomic_destroy_state = amdgpu_dm_plane_drm_plane_destroy_state,
.format_mod_supported = amdgpu_dm_plane_format_mod_supported,
+   .check_async_props = amdgpu_dm_plane_check_async_props,
  };
  
  int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,

--
2.43.0

Re: [PATCH v3 1/3] drm/atomic: Allow drivers to write their own plane check for async flips

2024-02-01 Thread André Almeida


Hi Pekka,

Em 29/01/2024 05:49, Pekka Paalanen escreveu:

On Sun, 28 Jan 2024 18:25:13 -0300
André Almeida  wrote:


Some hardware are more flexible on what they can flip asynchronously, so
rework the plane check so drivers can implement their own check, lifting
up some of the restrictions.

Signed-off-by: André Almeida 
---
v3: no changes

  drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
  include/drm/drm_atomic_uapi.h | 12 ++
  include/drm/drm_plane.h   |  5 +++
  3 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index aee4a65d4959..6d5b9fec90c7 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -620,7 +620,7 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
return 0;
  }
  
-static int

+int
  drm_atomic_plane_get_property(struct drm_plane *plane,
const struct drm_plane_state *state,
struct drm_property *property, uint64_t *val)
@@ -683,6 +683,7 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
  
  	return 0;

  }
+EXPORT_SYMBOL(drm_atomic_plane_get_property);
  
  static int drm_atomic_set_writeback_fb_for_connector(

struct drm_connector_state *conn_state,
@@ -1026,18 +1027,54 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
  }
  
-static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t prop_value,

+int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,


Hi,

should the word "async" be somewhere in the function name?


 struct drm_property *prop)
  {
if (ret != 0 || old_val != prop_value) {
drm_dbg_atomic(prop->dev,
-  "[PROP:%d:%s] No prop can be changed during async 
flip\n",
+  "[PROP:%d:%s] This prop cannot be changed during 
async flip\n",
   prop->base.id, prop->name);
return -EINVAL;
}
  
  	return 0;

  }
+EXPORT_SYMBOL(drm_atomic_check_prop_changes);
+
+/* plane changes may have exceptions, so we have a special function for them */
+static int drm_atomic_check_plane_changes(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (plane->funcs->check_async_props)
+   return plane->funcs->check_async_props(prop, plane, plane_state,
+obj, prop_value, 
old_val);


Is it really ok to allow drivers to opt-in to also *reject* the FB_ID
and IN_FENCE_FD props on async commits?

Either intentionally or by accident.



Right, perhaps I should write this function in a way that you can only 
lift restrictions, and not add new ones.



+
+   /*
+* if you are trying to change something other than the FB ID, your
+* change will be either rejected or ignored, so we can stop the check
+* here
+*/
+   if (prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);


When I first read this code, it seemed to say: "If the prop is not
FB_ID, then do the usual prop change checking that happens on all
changes, not only async.". Reading this patch a few more times over, I
finally realized drm_atomic_check_prop_changes() is about async
specifically.



I see that the lack of the async word is giving some confusion, so I'll 
add it to the functions.


Thanks,
André


+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
  
  int drm_atomic_set_property(struct drm_atomic_state *state,

struct drm_file *file_priv,
@@ -1100,7 +1137,6 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
-   struct drm_mode_config *config = >dev->mode_config;
  
  		plane_state = drm_atomic_get_

[PATCH v3 3/3] drm/amdgpu: Implement check_async_props for planes

2024-01-28 Thread André Almeida

AMD GPUs can do async flips with changes on more properties than just
the FB ID, so implement a custom check_async_props for AMD planes.

Allow amdgpu to do async flips with overlay planes as well.

Signed-off-by: André Almeida 
---
v3: allow overlay planes

 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 116121e647ca..ed75b69636b4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -25,6 +25,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1430,6 +1431,33 @@ static void 
amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane,
drm_atomic_helper_plane_destroy_state(plane, state);
 }
 
+static int amdgpu_dm_plane_check_async_props(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY &&
+   plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary or overlay planes can 
be changed during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static const struct drm_plane_funcs dm_plane_funcs = {
.update_plane   = drm_atomic_helper_update_plane,
.disable_plane  = drm_atomic_helper_disable_plane,
@@ -1438,6 +1466,7 @@ static const struct drm_plane_funcs dm_plane_funcs = {
.atomic_duplicate_state = amdgpu_dm_plane_drm_plane_duplicate_state,
.atomic_destroy_state = amdgpu_dm_plane_drm_plane_destroy_state,
.format_mod_supported = amdgpu_dm_plane_format_mod_supported,
+   .check_async_props = amdgpu_dm_plane_check_async_props,
 };
 
 int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
-- 
2.43.0

[PATCH v3 0/3] drm/atomic: Allow drivers to write their own plane check for async

2024-01-28 Thread André Almeida

Hi,

AMD hardware can do more on the async flip path than just the primary plane, so
to lift up the current restrictions, this patchset allows drivers to write their
own check for planes for async flips.

This patchset allows for async commits with IN_FENCE_ID in any driver and
overlay planes on AMD. Userspace can query if a driver supports this with
TEST_ONLY commits.

Changes from v2:
 - Allow IN_FENCE_ID for any driver
 - Allow overlay planes again
v2: https://lore.kernel.org/lkml/20240119181235.255060-1-andrealm...@igalia.com/

Changes from v1:
 - Drop overlay planes option for now
v1: 
https://lore.kernel.org/dri-devel/20240116045159.1015510-1-andrealm...@igalia.com/

André Almeida (3):
  drm/atomic: Allow drivers to write their own plane check for async
flips
  drm/atomic: Allow userspace to use explicit sync with atomic async
flips
  drm/amdgpu: Implement check_async_props for planes

 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +
 drivers/gpu/drm/drm_atomic_uapi.c | 63 ++-
 include/drm/drm_atomic_uapi.h | 12 
 include/drm/drm_plane.h   |  5 ++
 4 files changed, 92 insertions(+), 17 deletions(-)

-- 
2.43.0

[PATCH v3 1/3] drm/atomic: Allow drivers to write their own plane check for async flips

2024-01-28 Thread André Almeida

Some hardware are more flexible on what they can flip asynchronously, so
rework the plane check so drivers can implement their own check, lifting
up some of the restrictions.

Signed-off-by: André Almeida 
---
v3: no changes

 drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
 include/drm/drm_atomic_uapi.h | 12 ++
 include/drm/drm_plane.h   |  5 +++
 3 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index aee4a65d4959..6d5b9fec90c7 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -620,7 +620,7 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
return 0;
 }
 
-static int
+int
 drm_atomic_plane_get_property(struct drm_plane *plane,
const struct drm_plane_state *state,
struct drm_property *property, uint64_t *val)
@@ -683,6 +683,7 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
 
return 0;
 }
+EXPORT_SYMBOL(drm_atomic_plane_get_property);
 
 static int drm_atomic_set_writeback_fb_for_connector(
struct drm_connector_state *conn_state,
@@ -1026,18 +1027,54 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
-static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
 struct drm_property *prop)
 {
if (ret != 0 || old_val != prop_value) {
drm_dbg_atomic(prop->dev,
-  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  "[PROP:%d:%s] This prop cannot be changed during 
async flip\n",
   prop->base.id, prop->name);
return -EINVAL;
}
 
return 0;
 }
+EXPORT_SYMBOL(drm_atomic_check_prop_changes);
+
+/* plane changes may have exceptions, so we have a special function for them */
+static int drm_atomic_check_plane_changes(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (plane->funcs->check_async_props)
+   return plane->funcs->check_async_props(prop, plane, plane_state,
+obj, prop_value, 
old_val);
+
+   /*
+* if you are trying to change something other than the FB ID, your
+* change will be either rejected or ignored, so we can stop the check
+* here
+*/
+   if (prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
 
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
@@ -1100,7 +1137,6 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
-   struct drm_mode_config *config = >dev->mode_config;
 
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
@@ -1108,19 +1144,11 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && prop != config->prop_fb_id) {
-   ret = drm_atomic_plane_get_property(plane, plane_state,
-   prop, _val);
-   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
-   break;
-   }
-
-   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
-   drm_dbg_atomic(prop->dev,
-  "[OBJECT:%d] Only primary planes can be 
changed during async flip\n",
-

[PATCH v3 2/3] drm/atomic: Allow userspace to use explicit sync with atomic async flips

2024-01-28 Thread André Almeida

Allow userspace to use explicit synchronization with atomic async flips.
That means that the flip will wait for some hardware fence, and then
will flip as soon as possible (async) in regard of the vblank.

Signed-off-by: André Almeida 
---
v3: new patch

 drivers/gpu/drm/drm_atomic_uapi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 6d5b9fec90c7..edae7924ad69 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1060,7 +1060,8 @@ static int drm_atomic_check_plane_changes(struct 
drm_property *prop,
 * change will be either rejected or ignored, so we can stop the check
 * here
 */
-   if (prop != config->prop_fb_id) {
+   if (prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd) {
ret = drm_atomic_plane_get_property(plane, plane_state,
prop, _val);
return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
-- 
2.43.0

Re: [PATCH v2 2/2] drm/amdgpu: Implement check_async_props for planes

2024-01-24 Thread André Almeida


Hi Ville,

Em 19/01/2024 15:25, Ville Syrjälä escreveu:

On Fri, Jan 19, 2024 at 03:12:35PM -0300, André Almeida wrote:

AMD GPUs can do async flips with changes on more properties than just
the FB ID, so implement a custom check_async_props for AMD planes.

Allow amdgpu to do async flips with IN_FENCE_ID and FB_DAMAGE_CLIPS
properties. For userspace to check if a driver support this two
properties, the strategy for now is to use TEST_ONLY commits.

Signed-off-by: André Almeida 
---
v2: Drop overlay plane option for now

  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +++
  1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 116121e647ca..7afe8c1b62d4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -25,6 +25,7 @@
   */
  
  #include 

+#include 
  #include 
  #include 
  #include 
@@ -1430,6 +1431,33 @@ static void 
amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane,
drm_atomic_helper_plane_destroy_state(plane, state);
  }
  
+static int amdgpu_dm_plane_check_async_props(struct drm_property *prop,

+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd &&


IN_FENCE should just be allowed always.


Do you mean that the common path should allow IN_FENCE_FD for all drivers?

[PATCH v2 2/2] drm/amdgpu: Implement check_async_props for planes

2024-01-19 Thread André Almeida

AMD GPUs can do async flips with changes on more properties than just
the FB ID, so implement a custom check_async_props for AMD planes.

Allow amdgpu to do async flips with IN_FENCE_ID and FB_DAMAGE_CLIPS
properties. For userspace to check if a driver support this two
properties, the strategy for now is to use TEST_ONLY commits.

Signed-off-by: André Almeida 
---
v2: Drop overlay plane option for now

 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 116121e647ca..7afe8c1b62d4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -25,6 +25,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1430,6 +1431,33 @@ static void 
amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane,
drm_atomic_helper_plane_destroy_state(plane, state);
 }
 
+static int amdgpu_dm_plane_check_async_props(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd &&
+   prop != config->prop_fb_damage_clips) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static const struct drm_plane_funcs dm_plane_funcs = {
.update_plane   = drm_atomic_helper_update_plane,
.disable_plane  = drm_atomic_helper_disable_plane,
@@ -1438,6 +1466,7 @@ static const struct drm_plane_funcs dm_plane_funcs = {
.atomic_duplicate_state = amdgpu_dm_plane_drm_plane_duplicate_state,
.atomic_destroy_state = amdgpu_dm_plane_drm_plane_destroy_state,
.format_mod_supported = amdgpu_dm_plane_format_mod_supported,
+   .check_async_props = amdgpu_dm_plane_check_async_props,
 };
 
 int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
-- 
2.43.0

[PATCH v2 0/2] drm/atomic: Allow drivers to write their own plane check for async

2024-01-19 Thread André Almeida

Hi,

AMD hardware can do more on the async flip path than just the primary plane, so
to lift up the current restrictions, this patchset allows drivers to write their
own check for planes for async flips.

For now this patchset only allow for async commits with IN_FENCE_ID and
FB_DAMAGE_CLIPS. Userspace can query if a driver supports this with TEST_ONLY
commits.

I will left overlay planes for a next iteration.

Changes from v1:
 - Drop overlay planes option for now
v1: 
https://lore.kernel.org/dri-devel/20240116045159.1015510-1-andrealm...@igalia.com/

André

André Almeida (2):
  drm/atomic: Allow drivers to write their own plane check for async
flips
  drm/amdgpu: Implement check_async_props for planes

 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +
 drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
 include/drm/drm_atomic_uapi.h | 12 
 include/drm/drm_plane.h   |  5 ++
 4 files changed, 91 insertions(+), 17 deletions(-)

-- 
2.43.0

[PATCH v2 1/2] drm/atomic: Allow drivers to write their own plane check for async flips

2024-01-19 Thread André Almeida

Some hardware are more flexible on what they can flip asynchronously, so
rework the plane check so drivers can implement their own check, lifting
up some of the restrictions.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
 include/drm/drm_atomic_uapi.h | 12 ++
 include/drm/drm_plane.h   |  5 +++
 3 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index aee4a65d4959..6d5b9fec90c7 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -620,7 +620,7 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
return 0;
 }
 
-static int
+int
 drm_atomic_plane_get_property(struct drm_plane *plane,
const struct drm_plane_state *state,
struct drm_property *property, uint64_t *val)
@@ -683,6 +683,7 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
 
return 0;
 }
+EXPORT_SYMBOL(drm_atomic_plane_get_property);
 
 static int drm_atomic_set_writeback_fb_for_connector(
struct drm_connector_state *conn_state,
@@ -1026,18 +1027,54 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
-static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
 struct drm_property *prop)
 {
if (ret != 0 || old_val != prop_value) {
drm_dbg_atomic(prop->dev,
-  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  "[PROP:%d:%s] This prop cannot be changed during 
async flip\n",
   prop->base.id, prop->name);
return -EINVAL;
}
 
return 0;
 }
+EXPORT_SYMBOL(drm_atomic_check_prop_changes);
+
+/* plane changes may have exceptions, so we have a special function for them */
+static int drm_atomic_check_plane_changes(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (plane->funcs->check_async_props)
+   return plane->funcs->check_async_props(prop, plane, plane_state,
+obj, prop_value, 
old_val);
+
+   /*
+* if you are trying to change something other than the FB ID, your
+* change will be either rejected or ignored, so we can stop the check
+* here
+*/
+   if (prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
 
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
@@ -1100,7 +1137,6 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
-   struct drm_mode_config *config = >dev->mode_config;
 
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
@@ -1108,19 +1144,11 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && prop != config->prop_fb_id) {
-   ret = drm_atomic_plane_get_property(plane, plane_state,
-   prop, _val);
-   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
-   break;
-   }
-
-   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
-   drm_dbg_atomic(prop->dev,
-  "[OBJECT:%d] Only primary planes can be 
changed during async flip\n",
-  obj->id);
-

Re: [PATCH 0/2] drm/atomic: Allow drivers to write their own plane check for async

2024-01-16 Thread André Almeida


+ Joshua

Em 16/01/2024 10:14, Pekka Paalanen escreveu:

On Tue, 16 Jan 2024 08:50:59 -0300
André Almeida  wrote:


Hi Pekka,

Em 16/01/2024 06:45, Pekka Paalanen escreveu:

On Tue, 16 Jan 2024 01:51:57 -0300
André Almeida  wrote:
   

Hi,

AMD hardware can do more on the async flip path than just the primary plane, so
to lift up the current restrictions, this patchset allows drivers to write their
own check for planes for async flips.


Hi,

what's the userspace story for this, how could userspace know it could do more?
What kind of userspace would take advantage of this and in what situations?

Or is this not meant for generic userspace?


Sorry, I forgot to document this. So the idea is that userspace will
query what they can do here with DRM_MODE_ATOMIC_TEST_ONLY calls,
instead of having capabilities for each prop.


That's the theory, but do you have a practical example?

What other planes and props would one want change in some specific use
case?

Is it just "all or nothing", or would there be room to choose and pick
which props you change and which you don't based on what the driver
supports? If the latter, then relying on TEST_ONLY might be yet another
combinatorial explosion to iterate through.



That's a good question, maybe Simon, Xaver or Joshua can share how they 
were planning to use this on Gamescope or Kwin.




Thanks,
pq


I'm not sure if adding something new to drm_plane_funcs is the right way to do,
because if we want to expand the other object types (crtc, connector) we would
need to add their own drm_XXX_funcs, so feedbacks are welcome!

    André

André Almeida (2):
drm/atomic: Allow drivers to write their own plane check for async
  flips
drm/amdgpu: Implement check_async_props for planes

   .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 30 +
   drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
   include/drm/drm_atomic_uapi.h | 12 
   include/drm/drm_plane.h   |  5 ++
   4 files changed, 92 insertions(+), 17 deletions(-)

Re: [PATCH 0/2] drm/atomic: Allow drivers to write their own plane check for async

2024-01-16 Thread André Almeida


Hi Pekka,

Em 16/01/2024 06:45, Pekka Paalanen escreveu:

On Tue, 16 Jan 2024 01:51:57 -0300
André Almeida  wrote:


Hi,

AMD hardware can do more on the async flip path than just the primary plane, so
to lift up the current restrictions, this patchset allows drivers to write their
own check for planes for async flips.


Hi,

what's the userspace story for this, how could userspace know it could do more?
What kind of userspace would take advantage of this and in what situations?

Or is this not meant for generic userspace?


Sorry, I forgot to document this. So the idea is that userspace will 
query what they can do here with DRM_MODE_ATOMIC_TEST_ONLY calls, 
instead of having capabilities for each prop.





Thanks,
pq


I'm not sure if adding something new to drm_plane_funcs is the right way to do,
because if we want to expand the other object types (crtc, connector) we would
need to add their own drm_XXX_funcs, so feedbacks are welcome!

André

André Almeida (2):
   drm/atomic: Allow drivers to write their own plane check for async
 flips
   drm/amdgpu: Implement check_async_props for planes

  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 30 +
  drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
  include/drm/drm_atomic_uapi.h | 12 
  include/drm/drm_plane.h   |  5 ++
  4 files changed, 92 insertions(+), 17 deletions(-)

[PATCH 2/2] drm/amdgpu: Implement check_async_props for planes

2024-01-15 Thread André Almeida

AMD GPUs can do async flips with overlay planes and other props rather
than just FB ID, so implement a custom check_async_props for AMD planes.

Signed-off-by: André Almeida 
---
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 30 +++
 1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 116121e647ca..127cae1f9fb4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -25,6 +25,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1430,6 +1431,34 @@ static void 
amdgpu_dm_plane_drm_plane_destroy_state(struct drm_plane *plane,
drm_atomic_helper_plane_destroy_state(plane, state);
 }
 
+static int amdgpu_dm_plane_check_async_props(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd &&
+   prop != config->prop_fb_damage_clips) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY
+   && plane_state->plane->type != DRM_PLANE_TYPE_OVERLAY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary or overlay planes can 
be changed during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static const struct drm_plane_funcs dm_plane_funcs = {
.update_plane   = drm_atomic_helper_update_plane,
.disable_plane  = drm_atomic_helper_disable_plane,
@@ -1438,6 +1467,7 @@ static const struct drm_plane_funcs dm_plane_funcs = {
.atomic_duplicate_state = amdgpu_dm_plane_drm_plane_duplicate_state,
.atomic_destroy_state = amdgpu_dm_plane_drm_plane_destroy_state,
.format_mod_supported = amdgpu_dm_plane_format_mod_supported,
+   .check_async_props = amdgpu_dm_plane_check_async_props,
 };
 
 int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm,
-- 
2.43.0

[PATCH 0/2] drm/atomic: Allow drivers to write their own plane check for async

2024-01-15 Thread André Almeida

Hi,

AMD hardware can do more on the async flip path than just the primary plane, so
to lift up the current restrictions, this patchset allows drivers to write their
own check for planes for async flips.

I'm not sure if adding something new to drm_plane_funcs is the right way to do,
because if we want to expand the other object types (crtc, connector) we would
need to add their own drm_XXX_funcs, so feedbacks are welcome!

André

André Almeida (2):
  drm/atomic: Allow drivers to write their own plane check for async
flips
  drm/amdgpu: Implement check_async_props for planes

 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 30 +
 drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
 include/drm/drm_atomic_uapi.h | 12 
 include/drm/drm_plane.h   |  5 ++
 4 files changed, 92 insertions(+), 17 deletions(-)

-- 
2.43.0

[PATCH 1/2] drm/atomic: Allow drivers to write their own plane check for async flips

2024-01-15 Thread André Almeida

Some hardware are more flexible on what they can flip asynchronously, so
rework the plane check so drivers can implement their own check, lifting
up some of the restrictions.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
 include/drm/drm_atomic_uapi.h | 12 ++
 include/drm/drm_plane.h   |  5 +++
 3 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index aee4a65d4959..6d5b9fec90c7 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -620,7 +620,7 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
return 0;
 }
 
-static int
+int
 drm_atomic_plane_get_property(struct drm_plane *plane,
const struct drm_plane_state *state,
struct drm_property *property, uint64_t *val)
@@ -683,6 +683,7 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
 
return 0;
 }
+EXPORT_SYMBOL(drm_atomic_plane_get_property);
 
 static int drm_atomic_set_writeback_fb_for_connector(
struct drm_connector_state *conn_state,
@@ -1026,18 +1027,54 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
-static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
 struct drm_property *prop)
 {
if (ret != 0 || old_val != prop_value) {
drm_dbg_atomic(prop->dev,
-  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  "[PROP:%d:%s] This prop cannot be changed during 
async flip\n",
   prop->base.id, prop->name);
return -EINVAL;
}
 
return 0;
 }
+EXPORT_SYMBOL(drm_atomic_check_prop_changes);
+
+/* plane changes may have exceptions, so we have a special function for them */
+static int drm_atomic_check_plane_changes(struct drm_property *prop,
+ struct drm_plane *plane,
+ struct drm_plane_state *plane_state,
+ struct drm_mode_object *obj,
+ u64 prop_value, u64 old_val)
+{
+   struct drm_mode_config *config = >dev->mode_config;
+   int ret;
+
+   if (plane->funcs->check_async_props)
+   return plane->funcs->check_async_props(prop, plane, plane_state,
+obj, prop_value, 
old_val);
+
+   /*
+* if you are trying to change something other than the FB ID, your
+* change will be either rejected or ignored, so we can stop the check
+* here
+*/
+   if (prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
prop);
+   }
+
+   if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+  "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+  obj->id);
+   return -EINVAL;
+   }
+
+   return 0;
+}
 
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
@@ -1100,7 +1137,6 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
-   struct drm_mode_config *config = >dev->mode_config;
 
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
@@ -1108,19 +1144,11 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
-   if (async_flip && prop != config->prop_fb_id) {
-   ret = drm_atomic_plane_get_property(plane, plane_state,
-   prop, _val);
-   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
-   break;
-   }
-
-   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
-   drm_dbg_atomic(prop->dev,
-  "[OBJECT:%d] Only primary planes can be 
changed during async flip\n",
-  obj->id);
-

Re: [PATCH] drm: allow IN_FENCE_FD and FB_DAMAGE_CLIPS to be changed with async commits

2024-01-11 Thread André Almeida


Em 11/01/2024 14:59, Xaver Hugl escreveu:
Am Do., 11. Jan. 2024 um 18:13 Uhr schrieb Simon Ser 
mailto:cont...@emersion.fr>>:


Are we sure that all drivers handle these two props properly with async
page-flips? This is a new codepath not taken by the legacy uAPI.

I've only tested on amdgpu so far. Afacs the other drivers that would need
testing / that support atomic and async pageflips are
- i915
- noueveau (though atomic is disabled by default, so maybe it doesn't 
matter?)

- vc4
- atmel-hlcdc

The first two I can test, the latter I don't have the hardware for. I 
don't know if I can
extensively test fb_damage_clips either / how I'd even know if it's 
being applied
correctly, but in the worst case I'd expect the driver to not do the 
optimizations the

property allows.

As an alternative to this, would it be okay to expose a driver hook for 
optional
driver-specific checks that drm_atomic_set_property can delegate to, and 
only

allow this with the properties and hardware that's been tested? Then more
properties (like cursor position changes on amdgpu) could be easily 
added later

on too.


I'm working on some mechanism to allow overlay planes on amdgpu, and I 
think I can add your needs to it. I'll share in the mailing list when I 
have something more concrete.

Re: [PATCH] drm: allow IN_FENCE_FD and FB_DAMAGE_CLIPS to be changed with async commits

2024-01-11 Thread André Almeida


Hi Xaver,

Em 11/01/2024 13:56, Xaver Hugl escreveu:

Like with FB_ID, the driver never has to do bandwidth validation to use
these properties, so there's no reason not to allow them.

Signed-off-by: Xaver Hugl 


Reviewed-by: André Almeida 


---
  drivers/gpu/drm/drm_atomic_uapi.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index aee4a65d4959..06d476f5a746 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1108,7 +1108,9 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
  
-		if (async_flip && prop != config->prop_fb_id) {

+   if (async_flip && prop != config->prop_fb_id &&
+   prop != config->prop_in_fence_fd &&
+   prop != config->prop_fb_damage_clips) {
ret = drm_atomic_plane_get_property(plane, plane_state,
prop, _val);
ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);

[PATCH] drm/doc: Define KMS atomic state set

2023-11-30 Thread André Almeida

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 
---

This is a standalone patch from the following serie, the other patches are
already merged:
https://lore.kernel.org/lkml/20231122161941.320564-1-andrealm...@igalia.com/

 Documentation/gpu/drm-uapi.rst | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 370d820be248..d0693f902a5c 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -570,3 +570,50 @@ dma-buf interoperability
 
 Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
 information on how dma-buf is integrated and exposed within DRM.
+
+KMS atomic state
+
+
+An atomic commit can change multiple KMS properties in an atomic fashion,
+without ever applying intermediate or partial state changes.  Either the whole
+commit succeeds or fails, and it will never be applied partially. This is the
+fundamental improvement of the atomic API over the older non-atomic API which 
is
+referred to as the "legacy API".  Applying intermediate state could 
unexpectedly
+fail, cause visible glitches, or delay reaching the final state.
+
+An atomic commit can be flagged with DRM_MODE_ATOMIC_TEST_ONLY, which means the
+complete state change is validated but not applied.  Userspace should use this
+flag to validate any state change before asking to apply it. If validation 
fails
+for any reason, userspace should attempt to fall back to another, perhaps
+simpler, final state.  This allows userspace to probe for various 
configurations
+without causing visible glitches on screen and without the need to undo a
+probing change.
+
+The changes recorded in an atomic commit apply on top the current KMS state in
+the kernel. Hence, the complete new KMS state is the complete old KMS state 
with
+the committed property settings done on top. The kernel will try to avoid
+no-operation changes, so it is safe for userspace to send redundant property
+settings.  However, not every situation allows for no-op changes, due to the
+need to acquire locks for some attributes. Userspace needs to be aware that 
some
+redundant information might result in oversynchronization issues.  No-operation
+changes do not count towards actually needed changes, e.g.  setting MODE_ID to 
a
+different blob with identical contents as the current KMS state shall not be a
+modeset on its own. As a special exception for VRR needs, explicitly setting
+FB_ID to its current value is not a no-op.
+
+A "modeset" is a change in KMS state that might enable, disable, or temporarily
+disrupt the emitted video signal, possibly causing visible glitches on screen. 
A
+modeset may also take considerably more time to complete than other kinds of
+changes, and the video sink might also need time to adapt to the new signal
+properties. Therefore a modeset must be explicitly allowed with the flag
+DRM_MODE_ATOMIC_ALLOW_MODESET.  This in combination with
+DRM_MODE_ATOMIC_TEST_ONLY allows userspace to determine if a state change is
+likely to cause visible disruption on screen and avoid such changes when end
+users do not expect them.
+
+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. Changing any other property will cause the commit to be
+rejected. Each driver may relax this restriction if they have guarantees that
+such property change doesn't cause modesets. Userspace can use TEST_ONLY 
commits
+to query the driver about this.
-- 
2.43.0

Re: [PATCH v9 0/4] drm: Add support for atomic async page-flip

2023-11-23 Thread André Almeida


Em 23/11/2023 13:24, Simon Ser escreveu:

Thanks! This iteration of the first 3 patches LGTM, I've pushed them to
drm-misc-next. I've made two adjustments to make checkpatch.pl happy:



Thank you!


- s/uint64_t/u64/
- Fix indentation for a drm_dbg_atomic()


ops :)

Re: [PATCH v9 0/4] drm: Add support for atomic async page-flip

2023-11-22 Thread André Almeida


Hi Hamza,

Em 22/11/2023 17:23, Hamza Mahfooz escreveu:

Hi André,
On 11/22/23 11:19, André Almeida wrote:

Hi,

This work from me and Simon adds support for DRM_MODE_PAGE_FLIP_ASYNC 
through
the atomic API. This feature is already available via the legacy API. 
The use

case is to be able to present a new frame immediately (or as soon as
possible), even if after missing a vblank. This might result in 
tearing, but

it's useful when a high framerate is desired, such as for gaming.

Differently from earlier versions, this one refuses to flip if any 
prop changes
for async flips. The idea is that the fast path of immediate page 
flips doesn't

play well with modeset changes, so only the fb_id can be changed.

Tested with:
  - Intel TigerLake-LP GT2
  - AMD VanGogh


Have you had a chance to test this with VRR enabled? Since, I suspect
this series might break that feature.



Someone asked this question in an earlier version of this patch, and the 
result is that VRR still works as expected. You can follow the thread at 
this link:


https://lore.kernel.org/lkml/b48bd1fc-fcb0-481b-8413-9210d44d7...@igalia.com/

I should have included this note at my cover letter, my bad.

Thanks,
André

[PATCH v9 4/4] drm/doc: Define KMS atomic state set

2023-11-22 Thread André Almeida

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 
---
v9:
- no changes
v8:
- no changes
v7:
- add a note that drivers can make exceptions for ad-hoc prop changes
- add a note about flipping the same FB_ID as a no-op
---
---
 Documentation/gpu/drm-uapi.rst | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 370d820be248..d0693f902a5c 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -570,3 +570,50 @@ dma-buf interoperability
 
 Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
 information on how dma-buf is integrated and exposed within DRM.
+
+KMS atomic state
+
+
+An atomic commit can change multiple KMS properties in an atomic fashion,
+without ever applying intermediate or partial state changes.  Either the whole
+commit succeeds or fails, and it will never be applied partially. This is the
+fundamental improvement of the atomic API over the older non-atomic API which 
is
+referred to as the "legacy API".  Applying intermediate state could 
unexpectedly
+fail, cause visible glitches, or delay reaching the final state.
+
+An atomic commit can be flagged with DRM_MODE_ATOMIC_TEST_ONLY, which means the
+complete state change is validated but not applied.  Userspace should use this
+flag to validate any state change before asking to apply it. If validation 
fails
+for any reason, userspace should attempt to fall back to another, perhaps
+simpler, final state.  This allows userspace to probe for various 
configurations
+without causing visible glitches on screen and without the need to undo a
+probing change.
+
+The changes recorded in an atomic commit apply on top the current KMS state in
+the kernel. Hence, the complete new KMS state is the complete old KMS state 
with
+the committed property settings done on top. The kernel will try to avoid
+no-operation changes, so it is safe for userspace to send redundant property
+settings.  However, not every situation allows for no-op changes, due to the
+need to acquire locks for some attributes. Userspace needs to be aware that 
some
+redundant information might result in oversynchronization issues.  No-operation
+changes do not count towards actually needed changes, e.g.  setting MODE_ID to 
a
+different blob with identical contents as the current KMS state shall not be a
+modeset on its own. As a special exception for VRR needs, explicitly setting
+FB_ID to its current value is not a no-op.
+
+A "modeset" is a change in KMS state that might enable, disable, or temporarily
+disrupt the emitted video signal, possibly causing visible glitches on screen. 
A
+modeset may also take considerably more time to complete than other kinds of
+changes, and the video sink might also need time to adapt to the new signal
+properties. Therefore a modeset must be explicitly allowed with the flag
+DRM_MODE_ATOMIC_ALLOW_MODESET.  This in combination with
+DRM_MODE_ATOMIC_TEST_ONLY allows userspace to determine if a state change is
+likely to cause visible disruption on screen and avoid such changes when end
+users do not expect them.
+
+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. Changing any other property will cause the commit to be
+rejected. Each driver may relax this restriction if they have guarantees that
+such property change doesn't cause modesets. Userspace can use TEST_ONLY 
commits
+to query the driver about this.
-- 
2.42.1

[PATCH v9 3/4] drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP

2023-11-22 Thread André Almeida

From: Simon Ser 

This new kernel capability indicates whether async page-flips are
supported via the atomic uAPI. DRM clients can use it to check
for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.

Make it clear that DRM_CAP_ASYNC_PAGE_FLIP is for legacy uAPI only.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
v9: no changes
---
---
 drivers/gpu/drm/drm_ioctl.c |  4 
 include/uapi/drm/drm.h  | 10 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 44fda68c28ae..f461ed862480 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -301,6 +301,10 @@ static int drm_getcap(struct drm_device *dev, void *data, 
struct drm_file *file_
case DRM_CAP_CRTC_IN_VBLANK_EVENT:
req->value = 1;
break;
+   case DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP:
+   req->value = drm_core_check_feature(dev, DRIVER_ATOMIC) &&
+dev->mode_config.async_page_flip;
+   break;
default:
return -EINVAL;
}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 8662b5aeea0c..796de831f4a0 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -713,7 +713,8 @@ struct drm_gem_open {
 /**
  * DRM_CAP_ASYNC_PAGE_FLIP
  *
- * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC.
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for legacy
+ * page-flips.
  */
 #define DRM_CAP_ASYNC_PAGE_FLIP0x7
 /**
@@ -773,6 +774,13 @@ struct drm_gem_open {
  * :ref:`drm_sync_objects`.
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE   0x14
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP 0x15
 
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
-- 
2.42.1

[PATCH v9 2/4] drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits

2023-11-22 Thread André Almeida

From: Simon Ser 

If the driver supports it, allow user-space to supply the
DRM_MODE_PAGE_FLIP_ASYNC flag to request an async page-flip.
Set drm_crtc_state.async_flip accordingly.

Document that drivers will reject atomic commits if an async
flip isn't possible. This allows user-space to fall back to
something else. For instance, Xorg falls back to a blit.
Another option is to wait as close to the next vblank as
possible before performing the page-flip to reduce latency.

Signed-off-by: Simon Ser 
Reviewed-by: Alex Deucher 
Co-developed-by: André Almeida 
Signed-off-by: André Almeida 
---
v9: dropped atomic_async_page_flip_not_supported
---
 drivers/gpu/drm/drm_atomic_uapi.c | 25 ++---
 include/uapi/drm/drm_mode.h   |  9 +
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index ed46133a2dd7..de4265423ddc 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1368,6 +1368,18 @@ static void complete_signaling(struct drm_device *dev,
kfree(fence_state);
 }
 
+static void
+set_async_flip(struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *crtc_state;
+   int i;
+
+   for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+   crtc_state->async_flip = true;
+   }
+}
+
 int drm_mode_atomic_ioctl(struct drm_device *dev,
  void *data, struct drm_file *file_priv)
 {
@@ -1409,9 +1421,13 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
}
 
if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) {
-   drm_dbg_atomic(dev,
-  "commit failed: invalid flag 
DRM_MODE_PAGE_FLIP_ASYNC\n");
-   return -EINVAL;
+   if (!dev->mode_config.async_page_flip) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported\n");
+   return -EINVAL;
+   }
+
+   async_flip = true;
}
 
/* can't test and expect an event at the same time. */
@@ -1514,6 +1530,9 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
if (ret)
goto out;
 
+   if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC)
+   set_async_flip(state);
+
if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) {
ret = drm_atomic_check_only(state);
} else if (arg->flags & DRM_MODE_ATOMIC_NONBLOCK) {
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 09e7a471ee30..95630f170110 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -957,6 +957,15 @@ struct hdr_output_metadata {
  * Request that the page-flip is performed as soon as possible, ie. with no
  * delay due to waiting for vblank. This may cause tearing to be visible on
  * the screen.
+ *
+ * When used with atomic uAPI, the driver will return an error if the hardware
+ * doesn't support performing an asynchronous page-flip for this update.
+ * User-space should handle this, e.g. by falling back to a regular page-flip.
+ *
+ * Note, some hardware might need to perform one last synchronous page-flip
+ * before being able to switch to asynchronous page-flips. As an exception,
+ * the driver will return success even though that first page-flip is not
+ * asynchronous.
  */
 #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
 #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
-- 
2.42.1

[PATCH v9 1/4] drm: Refuse to async flip with atomic prop changes

2023-11-22 Thread André Almeida

Given that prop changes may lead to modesetting, which would defeat the
fast path of the async flip, refuse any atomic prop change for async
flips in atomic API. The only exception is the framebuffer ID to flip
to. Currently the only plane type supported is the primary one.

Signed-off-by: André Almeida 
Reviewed-by: Simon Ser 
---
v9: no changes
v8: add a check for plane type, we can only flip primary planes
v7: drop the mode_id exception for prop changes
---
 drivers/gpu/drm/drm_atomic_uapi.c   | 52 +++--
 drivers/gpu/drm/drm_crtc_internal.h |  2 +-
 drivers/gpu/drm/drm_mode_object.c   |  2 +-
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 98d3b10c08ae..ed46133a2dd7 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1006,13 +1006,28 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
+static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+struct drm_property *prop)
+{
+   if (ret != 0 || old_val != prop_value) {
+   drm_dbg_atomic(prop->dev,
+  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  prop->base.id, prop->name);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
struct drm_mode_object *obj,
struct drm_property *prop,
-   uint64_t prop_value)
+   uint64_t prop_value,
+   bool async_flip)
 {
struct drm_mode_object *ref;
+   uint64_t old_val;
int ret;
 
if (!drm_property_change_valid_get(prop, prop_value, ))
@@ -1029,6 +1044,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_connector_get_property(connector, 
connector_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_connector_set_property(connector,
connector_state, file_priv,
prop, prop_value);
@@ -1044,6 +1066,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_crtc_get_property(crtc, crtc_state,
+  prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_crtc_set_property(crtc,
crtc_state, prop, prop_value);
break;
@@ -1051,6 +1080,7 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
+   struct drm_mode_config *config = >dev->mode_config;
 
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
@@ -1058,6 +1088,21 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip && prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
+   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+   "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+   obj->id);
+   ret = -EINVAL;
+   break;
+   }
+
ret = drm_atomic_plane_set_property(plane,
plane_state, file_priv,
prop, prop_value);
@@ -1337,6 +1382,7 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
struct drm_out_fence_state *fence_state;
int ret = 0;
unsigned int i, j, num_fences;
+   bool async_fli

[PATCH v9 0/4] drm: Add support for atomic async page-flip

2023-11-22 Thread André Almeida

Hi,

This work from me and Simon adds support for DRM_MODE_PAGE_FLIP_ASYNC through
the atomic API. This feature is already available via the legacy API. The use
case is to be able to present a new frame immediately (or as soon as
possible), even if after missing a vblank. This might result in tearing, but
it's useful when a high framerate is desired, such as for gaming.

Differently from earlier versions, this one refuses to flip if any prop changes
for async flips. The idea is that the fast path of immediate page flips doesn't
play well with modeset changes, so only the fb_id can be changed.

Tested with:
 - Intel TigerLake-LP GT2
 - AMD VanGogh

Thanks,
André

- User-space patch: https://github.com/Plagman/gamescope/pull/595
- IGT tests: 
https://lore.kernel.org/all/20231110163811.24158-1-andrealm...@igalia.com/

Changes from v8:
- Dropped atomic_async_page_flip_not_supported, giving that current design works
with any driver that support atomic and async at the same time.
- Dropped the patch that disabled atomic_async_page_flip_not_supported for AMD.
- Reordered commits
v8: https://lore.kernel.org/all/20231025005318.293690-1-andrealm...@igalia.com/

Changes from v7:
- Only accept flips to primary planes. If a driver support flips in different
planes, support will be added  later.
v7: 
https://lore.kernel.org/dri-devel/20231017092837.32428-1-andrealm...@igalia.com/

Changes from v6:
- Dropped the exception to allow MODE_ID changes (Simon)
- Clarify what happens when flipping with the same FB_ID (Pekka)

v6: 
https://lore.kernel.org/dri-devel/20230815185710.159779-1-andrealm...@igalia.com/

Changes from v5:
- Add note in the docs that not every redundant attribute will result in no-op,
  some might cause oversynchronization issues.

v5: 
https://lore.kernel.org/dri-devel/20230707224059.305474-1-andrealm...@igalia.com/

Changes from v4:
 - Documentation rewrote by Pekka Paalanen

v4: 
https://lore.kernel.org/dri-devel/20230701020917.143394-1-andrealm...@igalia.com/

Changes from v3:
 - Add new patch to reject prop changes
 - Add a documentation clarifying the KMS atomic state set

v3: 
https://lore.kernel.org/dri-devel/20220929184307.258331-1-cont...@emersion.fr/

André Almeida (1):
  drm: Refuse to async flip with atomic prop changes

Pekka Paalanen (1):
  drm/doc: Define KMS atomic state set

Simon Ser (2):
  drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits
  drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP

 Documentation/gpu/drm-uapi.rst  | 47 ++
 drivers/gpu/drm/drm_atomic_uapi.c   | 77 ++---
 drivers/gpu/drm/drm_crtc_internal.h |  2 +-
 drivers/gpu/drm/drm_ioctl.c |  4 ++
 drivers/gpu/drm/drm_mode_object.c   |  2 +-
 include/uapi/drm/drm.h  | 10 +++-
 include/uapi/drm/drm_mode.h |  9 
 7 files changed, 142 insertions(+), 9 deletions(-)

-- 
2.42.1

[PATCH v2] drm/amd: Document device reset methods

2023-11-10 Thread André Almeida

Document what each amdgpu driver reset method does.

Signed-off-by: André Almeida 
---
v2: Add more details and small correction (Alex)

 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a79d53bdbe13..c4675572f907 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -504,6 +504,31 @@ struct amdgpu_allowed_register_entry {
bool grbm_indexed;
 };
 
+/**
+ * enum amd_reset_method - Methods for resetting AMD GPU devices
+ *
+ * @AMD_RESET_METHOD_NONE: The device will not be reset.
+ * @AMD_RESET_LEGACY: Method reserved for SI, CIK and VI ASICs.
+ * @AMD_RESET_MODE0: Reset the entire ASIC. Not currently available for the
+ *   any device.
+ * @AMD_RESET_MODE1: Resets all IP blocks on the ASIC (SDMA, GFX, VCN, etc.)
+ *   individually. Suitable only for some discrete GPU, not
+ *   available for all ASICs.
+ * @AMD_RESET_MODE2: Resets a lesser level of IPs compared to MODE1. Which IPs
+ *   are reset depends on the ASIC. Notably doesn't reset IPs
+ *   shared with the CPU on APUs or the memory controllers (so
+ *   VRAM is not lost). Not available on all ASICs.
+ * @AMD_RESET_BACO: BACO (Bus Alive, Chip Off) method powers off and on the 
card
+ *  but without powering off the PCI bus. Suitable only for
+ *  discrete GPUs.
+ * @AMD_RESET_PCI: Does a full bus reset using core Linux subsystem PCI reset
+ * and does a secondary bus reset or FLR, depending on what the
+ * underlying hardware supports.
+ *
+ * Methods available for AMD GPU driver for resetting the device. Not all
+ * methods are suitable for every device. User can overwrite the method using
+ * module parameter `reset_method`.
+ */
 enum amd_reset_method {
AMD_RESET_METHOD_NONE = -1,
AMD_RESET_METHOD_LEGACY = 0,
-- 
2.42.1

[PATCH i-g-t 2/2] tests/kms_async_flips: Add test for atomic prop changes

2023-11-10 Thread André Almeida

DRM atomic API should reject modesetting during an async flip, so
test if the kernel properly rejects to flip with prop changes.

Signed-off-by: André Almeida 
---
 tests/kms_async_flips.c | 68 +
 1 file changed, 68 insertions(+)

diff --git a/tests/kms_async_flips.c b/tests/kms_async_flips.c
index 6bfcca474..edc19b5ef 100644
--- a/tests/kms_async_flips.c
+++ b/tests/kms_async_flips.c
@@ -305,6 +305,9 @@ static void test_async_flip_atomic(data_t *data)
 
test_init(data);
 
+   igt_plane_set_fb(data->plane, >bufs[0]);
+   igt_display_commit_atomic(>display, 
DRM_MODE_ATOMIC_ALLOW_MODESET, data);
+
gettimeofday(, NULL);
frame = 1;
do {
@@ -326,6 +329,55 @@ static void test_async_flip_atomic(data_t *data)
 "FPS should be significantly higher than the refresh 
rate\n");
 }
 
+static void test_invalid_atomic(data_t *data)
+{
+   int flags = DRM_MODE_PAGE_FLIP_ASYNC | DRM_MODE_PAGE_FLIP_EVENT;
+   int ret;
+
+   test_init(data);
+
+   igt_plane_set_fb(data->plane, >bufs[0]);
+   igt_display_commit_atomic(>display, 
DRM_MODE_ATOMIC_ALLOW_MODESET, data);
+
+   /* Trying to change plane position  */
+   igt_plane_set_position(data->plane, 15, 15);
+   igt_plane_set_fb(data->plane, >bufs[1]);
+   ret = igt_display_try_commit_atomic(>display, flags, data);
+   igt_assert(ret == -EINVAL);
+   igt_plane_set_position(data->plane, 0, 0);
+
+   /* Trying to change plane rotation  */
+   igt_plane_set_rotation(data->plane, IGT_ROTATION_180);
+   igt_plane_set_fb(data->plane, >bufs[1]);
+   ret = igt_display_try_commit_atomic(>display, flags, data);
+   igt_assert(ret == -EINVAL);
+   igt_plane_set_rotation(data->plane, IGT_ROTATION_0);
+}
+
+static void test_atomic_modeset(data_t *data)
+{
+   int flags = DRM_MODE_PAGE_FLIP_ASYNC | DRM_MODE_PAGE_FLIP_EVENT;
+   igt_output_t *output = data->output;
+   int ret;
+
+   test_init(data);
+
+   igt_plane_set_fb(data->plane, >bufs[0]);
+   igt_display_commit_atomic(>display, 
DRM_MODE_ATOMIC_ALLOW_MODESET, data);
+
+   /*
+* Modesetting is forbidden during atomic async flips. Mode changes that
+* require modeset are rejected.
+*/
+   for_each_connector_mode(output) {
+   drmModeModeInfo *m = >config.connector->modes[j__];
+   igt_output_override_mode(output, m);
+   ret = igt_display_try_commit_atomic(>display, flags, 
data);
+   igt_assert(ret == -EINVAL);
+   }
+   igt_output_override_mode(output, NULL);
+}
+
 static void wait_for_vblank(data_t *data, unsigned long *vbl_time, unsigned 
int *seq)
 {
drmVBlank wait_vbl;
@@ -757,6 +809,22 @@ igt_main
run_test(, test_async_flip_atomic);
}
 
+   igt_describe("Negative case to verify if any atomic changes are 
rejected from kernel as expected");
+   igt_subtest_with_dynamic("invalid-atomic-async-flip") {
+   require_monotonic_timestamp(data.drm_fd);
+   igt_require_f(igt_has_drm_cap(data.drm_fd, 
DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP),
+ "Atomic async page-flips are not supported\n");
+   run_test(, test_invalid_atomic);
+   }
+
+   igt_describe("Verify mode changes during atomic async flips");
+   igt_subtest_with_dynamic("modeset-atomic-async-flip") {
+   require_monotonic_timestamp(data.drm_fd);
+   igt_require_f(igt_has_drm_cap(data.drm_fd, 
DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP),
+ "Atomic async page-flips are not supported\n");
+   run_test(, test_atomic_modeset);
+   }
+
igt_fixture {
for (i = 0; i < NUM_FBS; i++)
igt_remove_fb(data.drm_fd, [i]);
-- 
2.42.1

[PATCH i-g-t 1/2] tests/kms_async_flips: add atomic test

2023-11-10 Thread André Almeida

From: Simon Ser 

This adds a simple test for DRM_MODE_PAGE_FLIP_ASYNC with the
atomic uAPI.

Signed-off-by: Simon Ser 
Signed-off-by: André Almeida 
---
 include/drm-uapi/drm.h  |  8 
 tests/kms_async_flips.c | 37 +
 2 files changed, 45 insertions(+)

diff --git a/include/drm-uapi/drm.h b/include/drm-uapi/drm.h
index 02540248d..500019ce0 100644
--- a/include/drm-uapi/drm.h
+++ b/include/drm-uapi/drm.h
@@ -768,6 +768,14 @@ struct drm_gem_open {
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE   0x14
 
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP 0x15
+
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
__u64 capability;
diff --git a/tests/kms_async_flips.c b/tests/kms_async_flips.c
index 6c97558be..6bfcca474 100644
--- a/tests/kms_async_flips.c
+++ b/tests/kms_async_flips.c
@@ -297,6 +297,35 @@ static void test_async_flip(data_t *data)
}
 }
 
+static void test_async_flip_atomic(data_t *data)
+{
+   int frame;
+   long long int fps;
+   struct timeval start, end, diff;
+
+   test_init(data);
+
+   gettimeofday(, NULL);
+   frame = 1;
+   do {
+   uint32_t flags = DRM_MODE_PAGE_FLIP_ASYNC | 
DRM_MODE_PAGE_FLIP_EVENT;
+
+   igt_plane_set_fb(data->plane, >bufs[frame % 4]);
+   igt_display_commit_atomic(>display, flags, data);
+
+   wait_flip_event(data);
+
+   gettimeofday(, NULL);
+   timersub(, , );
+
+   frame++;
+   } while (diff.tv_sec < RUN_TIME);
+
+   fps = frame * 1000 / RUN_TIME;
+   igt_assert_f((fps / 1000) > (data->refresh_rate * MIN_FLIPS_PER_FRAME),
+"FPS should be significantly higher than the refresh 
rate\n");
+}
+
 static void wait_for_vblank(data_t *data, unsigned long *vbl_time, unsigned 
int *seq)
 {
drmVBlank wait_vbl;
@@ -720,6 +749,14 @@ igt_main
run_test(, test_crc);
}
 
+   igt_describe("Verify the async flip functionality and the fps during 
atomic async flips");
+   igt_subtest_with_dynamic("atomic-async-flip") {
+   require_monotonic_timestamp(data.drm_fd);
+   igt_require_f(igt_has_drm_cap(data.drm_fd, 
DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP),
+ "Atomic async page-flips are not supported\n");
+   run_test(, test_async_flip_atomic);
+   }
+
igt_fixture {
for (i = 0; i < NUM_FBS; i++)
igt_remove_fb(data.drm_fd, [i]);
-- 
2.42.1

[PATCH i-g-t 0/2] Add test for atomic async page flip

2023-11-10 Thread André Almeida

This series adds tests for async page flip for DRM Atomic API. Kernel patches:
https://lore.kernel.org/dri-devel/20231025005318.293690-1-andrealm...@igalia.com/

André Almeida (1):
  tests/kms_async_flips: Add test for atomic prop changes

Simon Ser (1):
  tests/kms_async_flips: add atomic test

 include/drm-uapi/drm.h  |   8 +++
 tests/kms_async_flips.c | 105 
 2 files changed, 113 insertions(+)

-- 
2.42.1

[PATCH] drm/amd: Document device reset methods

2023-11-10 Thread André Almeida

Document what each amdgpu driver reset method does.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a79d53bdbe13..500f86c79eb7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -504,6 +504,26 @@ struct amdgpu_allowed_register_entry {
bool grbm_indexed;
 };
 
+/**
+ * enum amd_reset_method - Methods for resetting AMD GPU devices
+ *
+ * @AMD_RESET_METHOD_NONE: The device will not be reset.
+ * @AMD_RESET_LEGACY: Method reserved for SI/CIK asics.
+ * @AMD_RESET_MODE0: High level PCIe reset.
+ * @AMD_RESET_MODE1: Resets each IP block (SDMA, GFX, VCN, etc.) individually.
+ *   Suitable only for some discrete GPUs.
+ * @AMD_RESET_MODE2: Resets only the GFX block. Useful for APUs, giving that
+ *   the rest of IP blocks and SMU is shared with the CPU.
+ * @AMD_RESET_BACO: BACO (Bus Alive, Chip Off) method powers off and on the 
card
+ *  but without powering off the PCI bus. Suitable only for
+ *  discrete GPUs.
+ * @AMD_RESET_PCI: Does a full bus reset, including powering on and off the
+ * card.
+ *
+ * Methods available for AMD GPU driver for resetting the device. Not all
+ * methods are suitable for every device. User can overwrite the method using
+ * module parameter `reset_method`.
+ */
 enum amd_reset_method {
AMD_RESET_METHOD_NONE = -1,
AMD_RESET_METHOD_LEGACY = 0,
-- 
2.42.1

[PATCH v8 6/6] amd/display: indicate support for atomic async page-flips on DC

2023-10-24 Thread André Almeida

From: Simon Ser 

amdgpu_dm_commit_planes() already sets the flip_immediate flag for
async page-flips. This flag is used to set the UNP_FLIP_CONTROL
register. Thus, no additional change is required to handle async
page-flips with the atomic uAPI.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index dec6e43e7198..45b8fd61a044 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4003,7 +4003,6 @@ static int amdgpu_dm_mode_config_init(struct 
amdgpu_device *adev)
adev_to_drm(adev)->mode_config.prefer_shadow = 1;
/* indicates support for immediate flip */
adev_to_drm(adev)->mode_config.async_page_flip = true;
-   adev_to_drm(adev)->mode_config.atomic_async_page_flip_not_supported = 
true;
 
state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
-- 
2.42.0

[PATCH v8 5/6] drm/doc: Define KMS atomic state set

2023-10-24 Thread André Almeida

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 
---
v8:
- no changes
v7:
- add a note that drivers can make exceptions for ad-hoc prop changes
- add a note about flipping the same FB_ID as a no-op
---
---
 Documentation/gpu/drm-uapi.rst | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 632989df3727..34bd02270ee7 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -570,3 +570,50 @@ dma-buf interoperability
 
 Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
 information on how dma-buf is integrated and exposed within DRM.
+
+KMS atomic state
+
+
+An atomic commit can change multiple KMS properties in an atomic fashion,
+without ever applying intermediate or partial state changes.  Either the whole
+commit succeeds or fails, and it will never be applied partially. This is the
+fundamental improvement of the atomic API over the older non-atomic API which 
is
+referred to as the "legacy API".  Applying intermediate state could 
unexpectedly
+fail, cause visible glitches, or delay reaching the final state.
+
+An atomic commit can be flagged with DRM_MODE_ATOMIC_TEST_ONLY, which means the
+complete state change is validated but not applied.  Userspace should use this
+flag to validate any state change before asking to apply it. If validation 
fails
+for any reason, userspace should attempt to fall back to another, perhaps
+simpler, final state.  This allows userspace to probe for various 
configurations
+without causing visible glitches on screen and without the need to undo a
+probing change.
+
+The changes recorded in an atomic commit apply on top the current KMS state in
+the kernel. Hence, the complete new KMS state is the complete old KMS state 
with
+the committed property settings done on top. The kernel will try to avoid
+no-operation changes, so it is safe for userspace to send redundant property
+settings.  However, not every situation allows for no-op changes, due to the
+need to acquire locks for some attributes. Userspace needs to be aware that 
some
+redundant information might result in oversynchronization issues.  No-operation
+changes do not count towards actually needed changes, e.g.  setting MODE_ID to 
a
+different blob with identical contents as the current KMS state shall not be a
+modeset on its own. As a special exception for VRR needs, explicitly setting
+FB_ID to its current value is not a no-op.
+
+A "modeset" is a change in KMS state that might enable, disable, or temporarily
+disrupt the emitted video signal, possibly causing visible glitches on screen. 
A
+modeset may also take considerably more time to complete than other kinds of
+changes, and the video sink might also need time to adapt to the new signal
+properties. Therefore a modeset must be explicitly allowed with the flag
+DRM_MODE_ATOMIC_ALLOW_MODESET.  This in combination with
+DRM_MODE_ATOMIC_TEST_ONLY allows userspace to determine if a state change is
+likely to cause visible disruption on screen and avoid such changes when end
+users do not expect them.
+
+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. Changing any other property will cause the commit to be
+rejected. Each driver may relax this restriction if they have guarantees that
+such property change doesn't cause modesets. Userspace can use TEST_ONLY 
commits
+to query the driver about this.
-- 
2.42.0

[PATCH v8 4/6] drm: Refuse to async flip with atomic prop changes

2023-10-24 Thread André Almeida

Given that prop changes may lead to modesetting, which would defeat the
fast path of the async flip, refuse any atomic prop change for async
flips in atomic API. The only exception is the framebuffer ID to flip
to. Currently the only plane type supported is the primary one.

Reviewed-by: Simon Ser 
Signed-off-by: André Almeida 
---
v8: add a check for plane type, we can only flip primary planes
v7: drop the mode_id exception for prop changes
---
---
 drivers/gpu/drm/drm_atomic_uapi.c   | 54 +++--
 drivers/gpu/drm/drm_crtc_internal.h |  2 +-
 drivers/gpu/drm/drm_mode_object.c   |  2 +-
 3 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index a15121e75a0a..ebaa6413d5a0 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1006,13 +1006,28 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
+static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+struct drm_property *prop)
+{
+   if (ret != 0 || old_val != prop_value) {
+   drm_dbg_atomic(prop->dev,
+  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  prop->base.id, prop->name);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
struct drm_mode_object *obj,
struct drm_property *prop,
-   uint64_t prop_value)
+   uint64_t prop_value,
+   bool async_flip)
 {
struct drm_mode_object *ref;
+   uint64_t old_val;
int ret;
 
if (!drm_property_change_valid_get(prop, prop_value, ))
@@ -1029,6 +1044,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_connector_get_property(connector, 
connector_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_connector_set_property(connector,
connector_state, file_priv,
prop, prop_value);
@@ -1044,6 +1066,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_crtc_get_property(crtc, crtc_state,
+  prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_crtc_set_property(crtc,
crtc_state, prop, prop_value);
break;
@@ -1051,6 +1080,7 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
+   struct drm_mode_config *config = >dev->mode_config;
 
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
@@ -1058,6 +1088,21 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip && prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
+   if (async_flip && plane_state->plane->type != 
DRM_PLANE_TYPE_PRIMARY) {
+   drm_dbg_atomic(prop->dev,
+   "[OBJECT:%d] Only primary planes can be changed 
during async flip\n",
+   obj->id);
+   ret = -EINVAL;
+   break;
+   }
+
ret = drm_atomic_plane_set_property(plane,
plane_state, file_priv,
prop, prop_value);
@@ -1349,6 +1394,7 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
struct drm_out_fence_state *fence_state;
int ret = 0;
unsigned int i, j, num_fences;
+   bool async_fli

[PATCH v8 3/6] drm: introduce drm_mode_config.atomic_async_page_flip_not_supported

2023-10-24 Thread André Almeida

From: Simon Ser 

This new field indicates whether the driver has the necessary logic
to support async page-flips via the atomic uAPI. This is leveraged by
the next commit to allow user-space to use this functionality.

All atomic drivers setting drm_mode_config.async_page_flip are updated
to also set drm_mode_config.atomic_async_page_flip_not_supported. We
will gradually check and update these drivers to properly handle
drm_crtc_state.async_flip in their atomic logic.

The goal of this negative flag is the same as
fb_modifiers_not_supported: we want to eventually get rid of all
drivers missing atomic support for async flips. New drivers should not
set this flag, instead they should support atomic async flips (if
they support async flips at all). IOW, we don't want more drivers
with async flip support for legacy but not atomic.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   |  1 +
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c|  1 +
 drivers/gpu/drm/i915/display/intel_display_driver.c |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c   |  1 +
 include/drm/drm_mode_config.h   | 11 +++
 5 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 45b8fd61a044..dec6e43e7198 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4003,6 +4003,7 @@ static int amdgpu_dm_mode_config_init(struct 
amdgpu_device *adev)
adev_to_drm(adev)->mode_config.prefer_shadow = 1;
/* indicates support for immediate flip */
adev_to_drm(adev)->mode_config.async_page_flip = true;
+   adev_to_drm(adev)->mode_config.atomic_async_page_flip_not_supported = 
true;
 
state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index 84c54e8622d1..f1d9bb1d7c34 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -639,6 +639,7 @@ static int atmel_hlcdc_dc_modeset_init(struct drm_device 
*dev)
dev->mode_config.max_height = dc->desc->max_height;
dev->mode_config.funcs = _config_funcs;
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c 
b/drivers/gpu/drm/i915/display/intel_display_driver.c
index 44b59ac301e6..6142c83fba06 100644
--- a/drivers/gpu/drm/i915/display/intel_display_driver.c
+++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
@@ -126,6 +126,7 @@ static void intel_mode_config_init(struct drm_i915_private 
*i915)
mode_config->helper_private = _mode_config_funcs;
 
mode_config->async_page_flip = HAS_ASYNC_FLIPS(i915);
+   mode_config->atomic_async_page_flip_not_supported = true;
 
/*
 * Maximum framebuffer dimensions, chosen to match
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index d8c92521226d..586aa51794e8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -720,6 +720,7 @@ nouveau_display_create(struct drm_device *dev)
dev->mode_config.async_page_flip = false;
else
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
drm_kms_helper_poll_init(dev);
drm_kms_helper_poll_disable(dev);
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 973119a9176b..47b005671e6a 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -918,6 +918,17 @@ struct drm_mode_config {
 */
bool async_page_flip;
 
+   /**
+* @atomic_async_page_flip_not_supported:
+*
+* If true, the driver does not support async page-flips with the
+* atomic uAPI. This is only used by old drivers which haven't yet
+* accomodated for _crtc_state.async_flip in their atomic logic,
+* even if they have _mode_config.async_page_flip set to true.
+* New drivers shall not set this flag.
+*/
+   bool atomic_async_page_flip_not_supported;
+
/**
 * @fb_modifiers_not_supported:
 *
-- 
2.42.0

[PATCH v8 2/6] drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP

2023-10-24 Thread André Almeida

From: Simon Ser 

This new kernel capability indicates whether async page-flips are
supported via the atomic uAPI. DRM clients can use it to check
for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.

Make it clear that DRM_CAP_ASYNC_PAGE_FLIP is for legacy uAPI only.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/drm_ioctl.c |  5 +
 include/uapi/drm/drm.h  | 10 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 77590b0f38fa..a96e7acb9071 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -301,6 +301,11 @@ static int drm_getcap(struct drm_device *dev, void *data, 
struct drm_file *file_
case DRM_CAP_CRTC_IN_VBLANK_EVENT:
req->value = 1;
break;
+   case DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP:
+   req->value = drm_core_check_feature(dev, DRIVER_ATOMIC) &&
+dev->mode_config.async_page_flip &&
+
!dev->mode_config.atomic_async_page_flip_not_supported;
+   break;
default:
return -EINVAL;
}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 794c1d857677..58baefe32c23 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -713,7 +713,8 @@ struct drm_gem_open {
 /**
  * DRM_CAP_ASYNC_PAGE_FLIP
  *
- * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC.
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for legacy
+ * page-flips.
  */
 #define DRM_CAP_ASYNC_PAGE_FLIP0x7
 /**
@@ -773,6 +774,13 @@ struct drm_gem_open {
  * :ref:`drm_sync_objects`.
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE   0x14
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP 0x15
 
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
-- 
2.42.0

[PATCH v8 1/6] drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits

2023-10-24 Thread André Almeida

From: Simon Ser 

If the driver supports it, allow user-space to supply the
DRM_MODE_PAGE_FLIP_ASYNC flag to request an async page-flip.
Set drm_crtc_state.async_flip accordingly.

Document that drivers will reject atomic commits if an async
flip isn't possible. This allows user-space to fall back to
something else. For instance, Xorg falls back to a blit.
Another option is to wait as close to the next vblank as
possible before performing the page-flip to reduce latency.

Signed-off-by: Simon Ser 
Reviewed-by: Alex Deucher 
Co-developed-by: André Almeida 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/drm_atomic_uapi.c | 28 +---
 include/uapi/drm/drm_mode.h   |  9 +
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 98d3b10c08ae..a15121e75a0a 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1323,6 +1323,18 @@ static void complete_signaling(struct drm_device *dev,
kfree(fence_state);
 }
 
+static void
+set_async_flip(struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *crtc_state;
+   int i;
+
+   for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+   crtc_state->async_flip = true;
+   }
+}
+
 int drm_mode_atomic_ioctl(struct drm_device *dev,
  void *data, struct drm_file *file_priv)
 {
@@ -1363,9 +1375,16 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
}
 
if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) {
-   drm_dbg_atomic(dev,
-  "commit failed: invalid flag 
DRM_MODE_PAGE_FLIP_ASYNC\n");
-   return -EINVAL;
+   if (!dev->mode_config.async_page_flip) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported\n");
+   return -EINVAL;
+   }
+   if (dev->mode_config.atomic_async_page_flip_not_supported) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported with atomic\n");
+   return -EINVAL;
+   }
}
 
/* can't test and expect an event at the same time. */
@@ -1468,6 +1487,9 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
if (ret)
goto out;
 
+   if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC)
+   set_async_flip(state);
+
if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) {
ret = drm_atomic_check_only(state);
} else if (arg->flags & DRM_MODE_ATOMIC_NONBLOCK) {
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index ea1b639bcb28..04e6a3caa675 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -957,6 +957,15 @@ struct hdr_output_metadata {
  * Request that the page-flip is performed as soon as possible, ie. with no
  * delay due to waiting for vblank. This may cause tearing to be visible on
  * the screen.
+ *
+ * When used with atomic uAPI, the driver will return an error if the hardware
+ * doesn't support performing an asynchronous page-flip for this update.
+ * User-space should handle this, e.g. by falling back to a regular page-flip.
+ *
+ * Note, some hardware might need to perform one last synchronous page-flip
+ * before being able to switch to asynchronous page-flips. As an exception,
+ * the driver will return success even though that first page-flip is not
+ * asynchronous.
  */
 #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
 #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
-- 
2.42.0

[PATCH v8 0/6] drm: Add support for atomic async page-flip

2023-10-24 Thread André Almeida

Hi,

This work from me and Simon adds support for DRM_MODE_PAGE_FLIP_ASYNC through
the atomic API. This feature is already available via the legacy API. The use
case is to be able to present a new frame immediately (or as soon as
possible), even if after missing a vblank. This might result in tearing, but
it's useful when a high framerate is desired, such as for gaming.

Differently from earlier versions, this one refuses to flip if any prop changes
for async flips. The idea is that the fast path of immediate page flips doesn't
play well with modeset changes, so only the fb_id can be changed.

Thanks,
André

- User-space patch: https://github.com/Plagman/gamescope/pull/595
- IGT tests: 
https://gitlab.freedesktop.org/andrealmeid/igt-gpu-tools/-/tree/atomic_async_page_flip

Changes from v7:
- Only accept flips to primary planes. If a driver support flips in different
planes, support will be added  later.
v7: 
https://lore.kernel.org/dri-devel/20231017092837.32428-1-andrealm...@igalia.com/

Changes from v6:
- Dropped the exception to allow MODE_ID changes (Simon)
- Clarify what happens when flipping with the same FB_ID (Pekka)

v6: 
https://lore.kernel.org/dri-devel/20230815185710.159779-1-andrealm...@igalia.com/

Changes from v5:
- Add note in the docs that not every redundant attribute will result in no-op,
  some might cause oversynchronization issues.

v5: 
https://lore.kernel.org/dri-devel/20230707224059.305474-1-andrealm...@igalia.com/

Changes from v4:
 - Documentation rewrote by Pekka Paalanen

v4: 
https://lore.kernel.org/dri-devel/20230701020917.143394-1-andrealm...@igalia.com/

Changes from v3:
 - Add new patch to reject prop changes
 - Add a documentation clarifying the KMS atomic state set

v3: 
https://lore.kernel.org/dri-devel/20220929184307.258331-1-cont...@emersion.fr/

André Almeida (1):
  drm: Refuse to async flip with atomic prop changes

Pekka Paalanen (1):
  drm/doc: Define KMS atomic state set

Simon Ser (4):
  drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits
  drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
  drm: introduce drm_mode_config.atomic_async_page_flip_not_supported
  amd/display: indicate support for atomic async page-flips on DC

 Documentation/gpu/drm-uapi.rst| 47 +++
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  1 +
 drivers/gpu/drm/drm_atomic_uapi.c | 82 +--
 drivers/gpu/drm/drm_crtc_internal.h   |  2 +-
 drivers/gpu/drm/drm_ioctl.c   |  5 ++
 drivers/gpu/drm/drm_mode_object.c |  2 +-
 .../drm/i915/display/intel_display_driver.c   |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c |  1 +
 include/drm/drm_mode_config.h | 11 +++
 include/uapi/drm/drm.h| 10 ++-
 include/uapi/drm/drm_mode.h   |  9 ++
 11 files changed, 162 insertions(+), 9 deletions(-)

-- 
2.42.0

[PATCH v7 4/6] drm: Refuse to async flip with atomic prop changes

2023-10-17 Thread André Almeida

Given that prop changes may lead to modesetting, which would defeat the
fast path of the async flip, refuse any atomic prop change for async
flips in atomic API. The only exceptions are the framebuffer ID to flip
to and the mode ID, that could be referring to an identical mode.

Signed-off-by: André Almeida 
---
v7: drop the mode_id exception for prop changes
---
 drivers/gpu/drm/drm_atomic_uapi.c   | 47 +++--
 drivers/gpu/drm/drm_crtc_internal.h |  2 +-
 drivers/gpu/drm/drm_mode_object.c   |  2 +-
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index a15121e75a0a..b358de1bf4e7 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1006,13 +1006,28 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
+static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+struct drm_property *prop)
+{
+   if (ret != 0 || old_val != prop_value) {
+   drm_dbg_atomic(prop->dev,
+  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  prop->base.id, prop->name);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
struct drm_mode_object *obj,
struct drm_property *prop,
-   uint64_t prop_value)
+   uint64_t prop_value,
+   bool async_flip)
 {
struct drm_mode_object *ref;
+   uint64_t old_val;
int ret;
 
if (!drm_property_change_valid_get(prop, prop_value, ))
@@ -1029,6 +1044,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_connector_get_property(connector, 
connector_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_connector_set_property(connector,
connector_state, file_priv,
prop, prop_value);
@@ -1037,6 +1059,7 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_CRTC: {
struct drm_crtc *crtc = obj_to_crtc(obj);
struct drm_crtc_state *crtc_state;
+   struct drm_mode_config *config = >dev->mode_config;
 
crtc_state = drm_atomic_get_crtc_state(state, crtc);
if (IS_ERR(crtc_state)) {
@@ -1044,6 +1067,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_crtc_get_property(crtc, crtc_state,
+  prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_crtc_set_property(crtc,
crtc_state, prop, prop_value);
break;
@@ -1051,6 +1081,7 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_PLANE: {
struct drm_plane *plane = obj_to_plane(obj);
struct drm_plane_state *plane_state;
+   struct drm_mode_config *config = >dev->mode_config;
 
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
@@ -1058,6 +1089,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip && prop != config->prop_fb_id) {
+   ret = drm_atomic_plane_get_property(plane, plane_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_plane_set_property(plane,
plane_state, file_priv,
prop, prop_value);
@@ -1349,6 +1387,7 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
struct drm_out_fence_state *fence_state;
int ret = 0;
unsigned int i, j, num_fences;
+   bool async_flip = false;
 
/* disallow for drivers not supporting atomic: */
if (!drm_core_che

[PATCH v7 5/6] drm/doc: Define KMS atomic state set

2023-10-17 Thread André Almeida

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 
---
v7:
- add a note that drivers can make exceptions for ad-hoc prop changes
- add a note about flipping the same FB_ID as a no-op
---
 Documentation/gpu/drm-uapi.rst | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 632989df3727..34bd02270ee7 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -570,3 +570,50 @@ dma-buf interoperability
 
 Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
 information on how dma-buf is integrated and exposed within DRM.
+
+KMS atomic state
+
+
+An atomic commit can change multiple KMS properties in an atomic fashion,
+without ever applying intermediate or partial state changes.  Either the whole
+commit succeeds or fails, and it will never be applied partially. This is the
+fundamental improvement of the atomic API over the older non-atomic API which 
is
+referred to as the "legacy API".  Applying intermediate state could 
unexpectedly
+fail, cause visible glitches, or delay reaching the final state.
+
+An atomic commit can be flagged with DRM_MODE_ATOMIC_TEST_ONLY, which means the
+complete state change is validated but not applied.  Userspace should use this
+flag to validate any state change before asking to apply it. If validation 
fails
+for any reason, userspace should attempt to fall back to another, perhaps
+simpler, final state.  This allows userspace to probe for various 
configurations
+without causing visible glitches on screen and without the need to undo a
+probing change.
+
+The changes recorded in an atomic commit apply on top the current KMS state in
+the kernel. Hence, the complete new KMS state is the complete old KMS state 
with
+the committed property settings done on top. The kernel will try to avoid
+no-operation changes, so it is safe for userspace to send redundant property
+settings.  However, not every situation allows for no-op changes, due to the
+need to acquire locks for some attributes. Userspace needs to be aware that 
some
+redundant information might result in oversynchronization issues.  No-operation
+changes do not count towards actually needed changes, e.g.  setting MODE_ID to 
a
+different blob with identical contents as the current KMS state shall not be a
+modeset on its own. As a special exception for VRR needs, explicitly setting
+FB_ID to its current value is not a no-op.
+
+A "modeset" is a change in KMS state that might enable, disable, or temporarily
+disrupt the emitted video signal, possibly causing visible glitches on screen. 
A
+modeset may also take considerably more time to complete than other kinds of
+changes, and the video sink might also need time to adapt to the new signal
+properties. Therefore a modeset must be explicitly allowed with the flag
+DRM_MODE_ATOMIC_ALLOW_MODESET.  This in combination with
+DRM_MODE_ATOMIC_TEST_ONLY allows userspace to determine if a state change is
+likely to cause visible disruption on screen and avoid such changes when end
+users do not expect them.
+
+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. Changing any other property will cause the commit to be
+rejected. Each driver may relax this restriction if they have guarantees that
+such property change doesn't cause modesets. Userspace can use TEST_ONLY 
commits
+to query the driver about this.
-- 
2.42.0

[PATCH v7 6/6] amd/display: indicate support for atomic async page-flips on DC

2023-10-17 Thread André Almeida

From: Simon Ser 

amdgpu_dm_commit_planes() already sets the flip_immediate flag for
async page-flips. This flag is used to set the UNP_FLIP_CONTROL
register. Thus, no additional change is required to handle async
page-flips with the atomic uAPI.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 9d5742923aed..c6fd34bab358 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3997,7 +3997,6 @@ static int amdgpu_dm_mode_config_init(struct 
amdgpu_device *adev)
adev_to_drm(adev)->mode_config.prefer_shadow = 1;
/* indicates support for immediate flip */
adev_to_drm(adev)->mode_config.async_page_flip = true;
-   adev_to_drm(adev)->mode_config.atomic_async_page_flip_not_supported = 
true;
 
state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
-- 
2.42.0

[PATCH v7 2/6] drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP

2023-10-17 Thread André Almeida

From: Simon Ser 

This new kernel capability indicates whether async page-flips are
supported via the atomic uAPI. DRM clients can use it to check
for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.

Make it clear that DRM_CAP_ASYNC_PAGE_FLIP is for legacy uAPI only.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/drm_ioctl.c |  5 +
 include/uapi/drm/drm.h  | 10 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 77590b0f38fa..a96e7acb9071 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -301,6 +301,11 @@ static int drm_getcap(struct drm_device *dev, void *data, 
struct drm_file *file_
case DRM_CAP_CRTC_IN_VBLANK_EVENT:
req->value = 1;
break;
+   case DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP:
+   req->value = drm_core_check_feature(dev, DRIVER_ATOMIC) &&
+dev->mode_config.async_page_flip &&
+
!dev->mode_config.atomic_async_page_flip_not_supported;
+   break;
default:
return -EINVAL;
}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 794c1d857677..58baefe32c23 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -713,7 +713,8 @@ struct drm_gem_open {
 /**
  * DRM_CAP_ASYNC_PAGE_FLIP
  *
- * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC.
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for legacy
+ * page-flips.
  */
 #define DRM_CAP_ASYNC_PAGE_FLIP0x7
 /**
@@ -773,6 +774,13 @@ struct drm_gem_open {
  * :ref:`drm_sync_objects`.
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE   0x14
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports _MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP 0x15
 
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
-- 
2.42.0

[PATCH v7 3/6] drm: introduce drm_mode_config.atomic_async_page_flip_not_supported

2023-10-17 Thread André Almeida

From: Simon Ser 

This new field indicates whether the driver has the necessary logic
to support async page-flips via the atomic uAPI. This is leveraged by
the next commit to allow user-space to use this functionality.

All atomic drivers setting drm_mode_config.async_page_flip are updated
to also set drm_mode_config.atomic_async_page_flip_not_supported. We
will gradually check and update these drivers to properly handle
drm_crtc_state.async_flip in their atomic logic.

The goal of this negative flag is the same as
fb_modifiers_not_supported: we want to eventually get rid of all
drivers missing atomic support for async flips. New drivers should not
set this flag, instead they should support atomic async flips (if
they support async flips at all). IOW, we don't want more drivers
with async flip support for legacy but not atomic.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   |  1 +
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c|  1 +
 drivers/gpu/drm/i915/display/intel_display_driver.c |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c   |  1 +
 include/drm/drm_mode_config.h   | 11 +++
 5 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c6fd34bab358..9d5742923aed 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3997,6 +3997,7 @@ static int amdgpu_dm_mode_config_init(struct 
amdgpu_device *adev)
adev_to_drm(adev)->mode_config.prefer_shadow = 1;
/* indicates support for immediate flip */
adev_to_drm(adev)->mode_config.async_page_flip = true;
+   adev_to_drm(adev)->mode_config.atomic_async_page_flip_not_supported = 
true;
 
state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index 84c54e8622d1..f1d9bb1d7c34 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -639,6 +639,7 @@ static int atmel_hlcdc_dc_modeset_init(struct drm_device 
*dev)
dev->mode_config.max_height = dc->desc->max_height;
dev->mode_config.funcs = _config_funcs;
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c 
b/drivers/gpu/drm/i915/display/intel_display_driver.c
index 44b59ac301e6..6142c83fba06 100644
--- a/drivers/gpu/drm/i915/display/intel_display_driver.c
+++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
@@ -126,6 +126,7 @@ static void intel_mode_config_init(struct drm_i915_private 
*i915)
mode_config->helper_private = _mode_config_funcs;
 
mode_config->async_page_flip = HAS_ASYNC_FLIPS(i915);
+   mode_config->atomic_async_page_flip_not_supported = true;
 
/*
 * Maximum framebuffer dimensions, chosen to match
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index d8c92521226d..586aa51794e8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -720,6 +720,7 @@ nouveau_display_create(struct drm_device *dev)
dev->mode_config.async_page_flip = false;
else
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
drm_kms_helper_poll_init(dev);
drm_kms_helper_poll_disable(dev);
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 973119a9176b..47b005671e6a 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -918,6 +918,17 @@ struct drm_mode_config {
 */
bool async_page_flip;
 
+   /**
+* @atomic_async_page_flip_not_supported:
+*
+* If true, the driver does not support async page-flips with the
+* atomic uAPI. This is only used by old drivers which haven't yet
+* accomodated for _crtc_state.async_flip in their atomic logic,
+* even if they have _mode_config.async_page_flip set to true.
+* New drivers shall not set this flag.
+*/
+   bool atomic_async_page_flip_not_supported;
+
/**
 * @fb_modifiers_not_supported:
 *
-- 
2.42.0

[PATCH v7 0/6] drm: Add support for atomic async page-flip

2023-10-17 Thread André Almeida

Hi,

This work from me and Simon adds support for DRM_MODE_PAGE_FLIP_ASYNC through
the atomic API. This feature is already available via the legacy API. The use
case is to be able to present a new frame immediately (or as soon as
possible), even if after missing a vblank. This might result in tearing, but
it's useful when a high framerate is desired, such as for gaming.

Differently from earlier versions, this one refuses to flip if any prop changes
for async flips. The idea is that the fast path of immediate page flips doesn't
play well with modeset changes, so only the fb_id can be changed.
Thanks,
André

- User-space patch: https://github.com/Plagman/gamescope/pull/595
- IGT tests: 
https://gitlab.freedesktop.org/andrealmeid/igt-gpu-tools/-/tree/atomic_async_page_flip

Changes from v6:
- Dropped the exception to allow MODE_ID changes (Simon)
- Clarify what happens when flipping with the same FB_ID (Pekka)

v6: 
https://lore.kernel.org/dri-devel/20230815185710.159779-1-andrealm...@igalia.com/

Changes from v5:
- Add note in the docs that not every redundant attribute will result in no-op,
  some might cause oversynchronization issues.

v5: 
https://lore.kernel.org/dri-devel/20230707224059.305474-1-andrealm...@igalia.com/

Changes from v4:
 - Documentation rewrote by Pekka Paalanen

v4: 
https://lore.kernel.org/dri-devel/20230701020917.143394-1-andrealm...@igalia.com/

Changes from v3:
 - Add new patch to reject prop changes
 - Add a documentation clarifying the KMS atomic state set

v3: 
https://lore.kernel.org/dri-devel/20220929184307.258331-1-cont...@emersion.fr/

André Almeida (1):
  drm: Refuse to async flip with atomic prop changes

Pekka Paalanen (1):
  drm/doc: Define KMS atomic state set

Simon Ser (4):
  drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits
  drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
  drm: introduce drm_mode_config.atomic_async_page_flip_not_supported
  amd/display: indicate support for atomic async page-flips on DC

 Documentation/gpu/drm-uapi.rst| 47 
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  1 +
 drivers/gpu/drm/drm_atomic_uapi.c | 75 +--
 drivers/gpu/drm/drm_crtc_internal.h   |  2 +-
 drivers/gpu/drm/drm_ioctl.c   |  5 ++
 drivers/gpu/drm/drm_mode_object.c |  2 +-
 .../drm/i915/display/intel_display_driver.c   |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c |  1 +
 include/drm/drm_mode_config.h | 11 +++
 include/uapi/drm/drm.h| 10 ++-
 include/uapi/drm/drm_mode.h   |  9 +++
 11 files changed, 155 insertions(+), 9 deletions(-)

-- 
2.42.0

[PATCH v7 1/6] drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits

2023-10-17 Thread André Almeida

From: Simon Ser 

If the driver supports it, allow user-space to supply the
DRM_MODE_PAGE_FLIP_ASYNC flag to request an async page-flip.
Set drm_crtc_state.async_flip accordingly.

Document that drivers will reject atomic commits if an async
flip isn't possible. This allows user-space to fall back to
something else. For instance, Xorg falls back to a blit.
Another option is to wait as close to the next vblank as
possible before performing the page-flip to reduce latency.

Signed-off-by: Simon Ser 
Reviewed-by: Alex Deucher 
Co-developed-by: André Almeida 
Signed-off-by: André Almeida 
---
 drivers/gpu/drm/drm_atomic_uapi.c | 28 +---
 include/uapi/drm/drm_mode.h   |  9 +
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 98d3b10c08ae..a15121e75a0a 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1323,6 +1323,18 @@ static void complete_signaling(struct drm_device *dev,
kfree(fence_state);
 }
 
+static void
+set_async_flip(struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *crtc_state;
+   int i;
+
+   for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+   crtc_state->async_flip = true;
+   }
+}
+
 int drm_mode_atomic_ioctl(struct drm_device *dev,
  void *data, struct drm_file *file_priv)
 {
@@ -1363,9 +1375,16 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
}
 
if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) {
-   drm_dbg_atomic(dev,
-  "commit failed: invalid flag 
DRM_MODE_PAGE_FLIP_ASYNC\n");
-   return -EINVAL;
+   if (!dev->mode_config.async_page_flip) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported\n");
+   return -EINVAL;
+   }
+   if (dev->mode_config.atomic_async_page_flip_not_supported) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported with atomic\n");
+   return -EINVAL;
+   }
}
 
/* can't test and expect an event at the same time. */
@@ -1468,6 +1487,9 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
if (ret)
goto out;
 
+   if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC)
+   set_async_flip(state);
+
if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) {
ret = drm_atomic_check_only(state);
} else if (arg->flags & DRM_MODE_ATOMIC_NONBLOCK) {
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index ea1b639bcb28..04e6a3caa675 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -957,6 +957,15 @@ struct hdr_output_metadata {
  * Request that the page-flip is performed as soon as possible, ie. with no
  * delay due to waiting for vblank. This may cause tearing to be visible on
  * the screen.
+ *
+ * When used with atomic uAPI, the driver will return an error if the hardware
+ * doesn't support performing an asynchronous page-flip for this update.
+ * User-space should handle this, e.g. by falling back to a regular page-flip.
+ *
+ * Note, some hardware might need to perform one last synchronous page-flip
+ * before being able to switch to asynchronous page-flips. As an exception,
+ * the driver will return success even though that first page-flip is not
+ * asynchronous.
  */
 #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
 #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
-- 
2.42.0

Re: [PATCH v6 6/6] drm/doc: Define KMS atomic state set

2023-10-16 Thread André Almeida




On 10/16/23 16:52, Pekka Paalanen wrote:

On Mon, 16 Oct 2023 15:42:16 +0200
André Almeida  wrote:


Hi Pekka,

On 10/16/23 14:18, Pekka Paalanen wrote:

On Mon, 16 Oct 2023 12:52:32 +0200
André Almeida  wrote:
  

Hi Michel,

On 8/17/23 12:37, Michel Dänzer wrote:

On 8/15/23 20:57, André Almeida wrote:

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 

[...]
 

+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. [...]

During the hackfest in Brno, it was mentioned that a commit which re-sets the 
same FB_ID could actually have an effect with VRR: It could trigger scanout of 
the next frame before vertical blank has reached its maximum duration. Some 
kind of mechanism is required for this in order to allow user space to perform 
low frame rate compensation.
 

Xaver tested this hypothesis in a flipping the same fb in a VRR monitor
and it worked as expected, so this shouldn't be a concern.

Right, so it must have some effect. It cannot be simply ignored like in
the proposed doc wording. Do we special-case re-setting the same FB_ID
as "not a no-op" or "not ignored" or some other way?

There's an effect in the refresh rate, the image won't change but it
will report that a flip had happened asynchronously so the reported
framerate will be increased. Maybe an additional wording could be like:

Flipping to the same FB_ID will result in a immediate flip as if it was
changing to a different one, with no effect on the image but effecting
the reported frame rate.

Re-setting FB_ID to its current value is a special case regardless of
PAGE_FLIP_ASYNC, is it not?

So it should be called out somewhere that applies regardless of
PAGE_FLIP_ASYNC. Maybe to the end of the earlier paragraph:


+The changes recorded in an atomic commit apply on top the current KMS state in
+the kernel. Hence, the complete new KMS state is the complete old KMS state 
with
+the committed property settings done on top. The kernel will try to avoid
+no-operation changes, so it is safe for userspace to send redundant property
+settings.  However, not every situation allows for no-op changes, due to the
+need to acquire locks for some attributes. Userspace needs to be aware that 
some
+redundant information might result in oversynchronization issues.  No-operation
+changes do not count towards actually needed changes, e.g.  setting MODE_ID to 
a
+different blob with identical contents as the current KMS state shall not be a
+modeset on its own.

+As a special exception for VRR needs, explicitly setting FB_ID to its
+current value is not a no-op.

Would that work?


I liked this suggestion, thanks! I'll wrap up a v7


I'd like to try to avoid being more specific about what it does
exactly, because that's not the topic here. Such things can be
documented with the property itself. This is a summary of what is or is
not a no-op or a modeset.


Thanks,
pq

Re: [PATCH v6 6/6] drm/doc: Define KMS atomic state set

2023-10-16 Thread André Almeida


Hi Pekka,

On 10/16/23 14:18, Pekka Paalanen wrote:

On Mon, 16 Oct 2023 12:52:32 +0200
André Almeida  wrote:


Hi Michel,

On 8/17/23 12:37, Michel Dänzer wrote:

On 8/15/23 20:57, André Almeida wrote:

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 

[...]
  

+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. [...]

During the hackfest in Brno, it was mentioned that a commit which re-sets the 
same FB_ID could actually have an effect with VRR: It could trigger scanout of 
the next frame before vertical blank has reached its maximum duration. Some 
kind of mechanism is required for this in order to allow user space to perform 
low frame rate compensation.
  

Xaver tested this hypothesis in a flipping the same fb in a VRR monitor
and it worked as expected, so this shouldn't be a concern.

Right, so it must have some effect. It cannot be simply ignored like in
the proposed doc wording. Do we special-case re-setting the same FB_ID
as "not a no-op" or "not ignored" or some other way?
There's an effect in the refresh rate, the image won't change but it 
will report that a flip had happened asynchronously so the reported 
framerate will be increased. Maybe an additional wording could be like:


Flipping to the same FB_ID will result in a immediate flip as if it was 
changing to a different one, with no effect on the image but effecting 
the reported frame rate.





Thanks,
pq

Re: [PATCH v6 6/6] drm/doc: Define KMS atomic state set

2023-10-16 Thread André Almeida


Hi Michel,

On 8/17/23 12:37, Michel Dänzer wrote:

On 8/15/23 20:57, André Almeida wrote:

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 

[...]


+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. [...]

During the hackfest in Brno, it was mentioned that a commit which re-sets the 
same FB_ID could actually have an effect with VRR: It could trigger scanout of 
the next frame before vertical blank has reached its maximum duration. Some 
kind of mechanism is required for this in order to allow user space to perform 
low frame rate compensation.

Xaver tested this hypothesis in a flipping the same fb in a VRR monitor 
and it worked as expected, so this shouldn't be a concern.


Thanks,
    André

Re: [PATCH v2 2/2] drm: Replace drm_framebuffer plane size functions with its equivalents

2023-10-01 Thread André Almeida


On 9/26/23 16:15, Carlos Eduardo Gallo Filho wrote:

The functions drm_framebuffer_plane_{width,height} and
fb_plane_{width,height} do exactly the same job of its
equivalents drm_format_info_plane_{width,height} from drm_fourcc.

The only reason to have these functions on drm_framebuffer
would be if they would added a abstraction layer to call it just
passing a drm_framebuffer pointer and the desired plane index,
which is not the case, where these functions actually implements
just part of it. In the actual implementation, every call to both
drm_framebuffer_plane_{width,height} and fb_plane_{width,height} should
pass some drm_framebuffer attribute, which is the same as calling the
drm_format_info_plane_{width,height} functions.

The drm_format_info_pane_{width,height} functions are much more
consistent in both its implementation and its location on code. The
kind of calculation that they do is intrinsically derivated from the
drm_format_info struct and has not to do with drm_framebuffer, except
by the potential motivation described above, which is still not a good
justification to have drm_framebuffer functions to calculate it.

So, replace each drm_framebuffer_plane_{width,height} and
fb_plane_{width,height} call to drm_format_info_plane_{width,height}
and remove them.

Signed-off-by: Carlos Eduardo Gallo Filho 

Reviewed-by: André Almeida

Re: [PATCH v2 1/2] drm: Remove plane hsub/vsub alignment requirement for core helpers

2023-10-01 Thread André Almeida


Hi Carlos,

On 9/26/23 16:15, Carlos Eduardo Gallo Filho wrote:

The drm_format_info_plane_{height,width} functions was implemented using
regular division for the plane size calculation, which cause issues [1][2]
when used on contexts where the dimensions are misaligned with relation
to the subsampling factors. So, replace the regular division by the
DIV_ROUND_UP macro.

This allows these functions to be used in more drivers, making further
work to bring more core presence on them possible.

[1] 
http://patchwork.freedesktop.org/patch/msgid/20170321181218.10042-3-ville.syrj...@linux.intel.com
[2] 
https://patchwork.freedesktop.org/patch/msgid/20211026225105.2783797-2-imre.d...@intel.com


Prefer to use lore:

https://lore.kernel.org/dri-devel/20170321181218.10042-3-ville.syrj...@linux.intel.com/

https://lore.kernel.org/intel-gfx/20211026225105.2783797-2-imre.d...@intel.com/


Other than that,

Reviewed-by: André Almeida 



Signed-off-by: Carlos Eduardo Gallo Filho 
---
  include/drm/drm_fourcc.h | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/drm/drm_fourcc.h b/include/drm/drm_fourcc.h
index 532ae78ca747..ccf91daa4307 100644
--- a/include/drm/drm_fourcc.h
+++ b/include/drm/drm_fourcc.h
@@ -22,6 +22,7 @@
  #ifndef __DRM_FOURCC_H__
  #define __DRM_FOURCC_H__
  
+#include 

  #include 
  #include 
  
@@ -279,7 +280,7 @@ int drm_format_info_plane_width(const struct drm_format_info *info, int width,

if (plane == 0)
return width;
  
-	return width / info->hsub;

+   return DIV_ROUND_UP(width, info->hsub);
  }
  
  /**

@@ -301,7 +302,7 @@ int drm_format_info_plane_height(const struct 
drm_format_info *info, int height,
if (plane == 0)
return height;
  
-	return height / info->vsub;

+   return DIV_ROUND_UP(height, info->vsub);
  }
  
  const struct drm_format_info *__drm_format_info(u32 format);

[PATCH v8] drm/doc: Document DRM device reset expectations

2023-09-29 Thread André Almeida

Create a section that specifies how to deal with DRM device resets for
kernel and userspace drivers.

Acked-by: Pekka Paalanen 
Acked-by: Sebastian Wick 
Signed-off-by: André Almeida 
---
v8 changes:
- Add acked-by tags

v7: 
https://lore.kernel.org/dri-devel/20230818200642.276735-1-andrealm...@igalia.com/

v7 changes:
 - s/application/graphical API contex/ in the robustness part (Michel)
 - Grammar fixes (Randy)

v6: https://lore.kernel.org/lkml/20230815185710.159779-1-andrealm...@igalia.com/

v6 changes:
 - Due to substantial changes in the content, dropped Pekka's Acked-by
 - Grammar fixes (Randy)
 - Add paragraph about disabling device resets
 - Add note about integrating reset tracking in drm/sched
 - Add note that KMD should return failure for contexts affected by
   resets and UMD should check for this
 - Add note about lack of consensus around what to do about non-robust
   apps

v5: 
https://lore.kernel.org/dri-devel/20230627132323.115440-1-andrealm...@igalia.com/
---
 Documentation/gpu/drm-uapi.rst | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index eef5fd19bc92..632989df3727 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -285,6 +285,83 @@ for GPU1 and GPU2 from different vendors, and a third 
handler for
 mmapped regular files. Threads cause additional pain with signal
 handling as well.
 
+Device reset
+
+
+The GPU stack is really complex and is prone to errors, from hardware bugs,
+faulty applications and everything in between the many layers. Some errors
+require resetting the device in order to make the device usable again. This
+section describes the expectations for DRM and usermode drivers when a
+device resets and how to propagate the reset status.
+
+Device resets can not be disabled without tainting the kernel, which can lead 
to
+hanging the entire kernel through shrinkers/mmu_notifiers. Userspace role in
+device resets is to propagate the message to the application and apply any
+special policy for blocking guilty applications, if any. Corollary is that
+debugging a hung GPU context require hardware support to be able to preempt 
such
+a GPU context while it's stopped.
+
+Kernel Mode Driver
+--
+
+The KMD is responsible for checking if the device needs a reset, and to perform
+it as needed. Usually a hang is detected when a job gets stuck executing. KMD
+should keep track of resets, because userspace can query any time about the
+reset status for a specific context. This is needed to propagate to the rest of
+the stack that a reset has happened. Currently, this is implemented by each
+driver separately, with no common DRM interface. Ideally this should be 
properly
+integrated at DRM scheduler to provide a common ground for all drivers. After a
+reset, KMD should reject new command submissions for affected contexts.
+
+User Mode Driver
+
+
+After command submission, UMD should check if the submission was accepted or
+rejected. After a reset, KMD should reject submissions, and UMD can issue an
+ioctl to the KMD to check the reset status, and this can be checked more often
+if the UMD requires it. After detecting a reset, UMD will then proceed to 
report
+it to the application using the appropriate API error code, as explained in the
+section below about robustness.
+
+Robustness
+--
+
+The only way to try to keep a graphical API context working after a reset is if
+it complies with the robustness aspects of the graphical API that it is using.
+
+Graphical APIs provide ways to applications to deal with device resets. 
However,
+there is no guarantee that the app will use such features correctly, and a
+userspace that doesn't support robust interfaces (like a non-robust
+OpenGL context or API without any robustness support like libva) leave the
+robustness handling entirely to the userspace driver. There is no strong
+community consensus on what the userspace driver should do in that case,
+since all reasonable approaches have some clear downsides.
+
+OpenGL
+~~
+
+Apps using OpenGL should use the available robust interfaces, like the
+extension ``GL_ARB_robustness`` (or ``GL_EXT_robustness`` for OpenGL ES). This
+interface tells if a reset has happened, and if so, all the context state is
+considered lost and the app proceeds by creating new ones. There's no consensus
+on what to do to if robustness is not in use.
+
+Vulkan
+~~
+
+Apps using Vulkan should check for ``VK_ERROR_DEVICE_LOST`` for submissions.
+This error code means, among other things, that a device reset has happened and
+it needs to recreate the contexts to keep going.
+
+Reporting causes of resets
+--
+
+Apart from propagating the reset through the stack so apps can recover, it's
+really useful for driver developers to learn more about what caused the reset 
in
+the first place. DRM devices

Re: [PATCH v4 0/2] Merge all debug module parameters

2023-09-11 Thread André Almeida





Em 11/09/2023 14:21, Hamza Mahfooz escreveu:

On 9/11/23 13:12, André Almeida wrote:

As suggested by Christian at [0], this patchset merges all debug modules
parameters and creates a new one for disabling soft recovery:


Maybe we can overload the amdgpu_gpu_recovery module option with this.
Or even better merge all the developer module parameter into a
amdgpu_debug option. This way it should be pretty obvious that this
isn't meant to be used by someone who doesn't know how to use it.


[0] 
https://lore.kernel.org/dri-devel/55f69184-1aa2-55d6-4a10-1560d75c7...@amd.com/


I have applied the series, thanks!



Thank you!



Changelog:
- rebased on top of current amd-staging-drm-next
v3: 
https://lore.kernel.org/lkml/20230831152903.521404-1-andrealm...@igalia.com


- move enum from include/amd_shared.h to amdgpu/amdgpu_drv.c
v2: 
https://lore.kernel.org/lkml/20230830220808.421935-1-andrealm...@igalia.com/


- drop old module params
- use BIT() macros
- replace global var with adev-> vars
v1: 
https://lore.kernel.org/lkml/20230824162505.173399-1-andrealm...@igalia.com/


André Almeida (2):
   drm/amdgpu: Merge debug module parameters
   drm/amdgpu: Create an option to disable soft recovery

  drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  5 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 63 
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  7 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c    |  2 +-
  8 files changed, 59 insertions(+), 26 deletions(-)

[PATCH v4 2/2] drm/amdgpu: Create an option to disable soft recovery

2023-09-11 Thread André Almeida

Create a module option to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 7 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 37eb9b3790a0..f30490abb3fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1103,6 +1103,7 @@ struct amdgpu_device {
/* Debug */
booldebug_vm;
booldebug_largebar;
+   booldebug_disable_soft_recovery;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 830146bd61c0..3ab7eac131e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -124,6 +124,7 @@
 enum AMDGPU_DEBUG_MASK {
AMDGPU_DEBUG_VM = BIT(0),
AMDGPU_DEBUG_LARGEBAR = BIT(1),
+   AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY = BIT(2),
 };
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
@@ -945,6 +946,7 @@ MODULE_PARM_DESC(enforce_isolation, "enforce process 
isolation between graphics
  * - 0x2: Enable simulating large-bar capability on non-large bar system. This
  *   limits the VRAM size reported to ROCm applications to the visible
  *   size, usually 256MB.
+ * - 0x4: Disable GPU soft recovery, always do a full reset
  */
 MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by default");
 module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444);
@@ -2064,6 +2066,11 @@ static void amdgpu_init_debug_options(struct 
amdgpu_device *adev)
pr_info("debug: enabled simulating large-bar capability on 
non-large bar system\n");
adev->debug_largebar = true;
}
+
+   if (amdgpu_debug_mask & AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY) {
+   pr_info("debug: soft reset for GPU recovery disabled\n");
+   adev->debug_disable_soft_recovery = true;
+   }
 }
 
 static int amdgpu_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index da26c555af24..231d49132a56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -433,7 +433,12 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct 
amdgpu_ring *ring,
 bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
   struct dma_fence *fence)
 {
-   ktime_t deadline = ktime_add_us(ktime_get(), 1);
+   ktime_t deadline;
+
+   if (unlikely(ring->adev->debug_disable_soft_recovery))
+   return false;
+
+   deadline = ktime_add_us(ktime_get(), 1);
 
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || 
!fence)
return false;
-- 
2.42.0

[PATCH v4 1/2] drm/amdgpu: Merge debug module parameters

2023-09-11 Thread André Almeida

Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.

Drop the obsolete module options in favor of the new ones.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
v3:
- move from include/amd_shared.h to amdgpu/amdgpu_drv.c
v2:
- drop old module params
- use BIT() macros
- replace global var with adev-> vars
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 56 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
 7 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 83a9607a87b8..37eb9b3790a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1099,6 +1099,10 @@ struct amdgpu_device {
booldc_enabled;
/* Mask of active clusters */
uint32_taid_mask;
+
+   /* Debug */
+   booldebug_vm;
+   booldebug_largebar;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index db5ecde8f0ec..74769afaa33d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1181,7 +1181,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo);
}
 
-   if (amdgpu_vm_debug) {
+   if (adev->debug_vm) {
/* Invalidate all BOs to test for userspace bugs */
amdgpu_bo_list_for_each_entry(e, p->bo_list) {
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ef713806dd60..830146bd61c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -118,6 +118,14 @@
 #define KMS_DRIVER_MINOR   54
 #define KMS_DRIVER_PATCHLEVEL  0
 
+/*
+ * amdgpu.debug module options. Are all disabled by default
+ */
+enum AMDGPU_DEBUG_MASK {
+   AMDGPU_DEBUG_VM = BIT(0),
+   AMDGPU_DEBUG_LARGEBAR = BIT(1),
+};
+
 unsigned int amdgpu_vram_limit = UINT_MAX;
 int amdgpu_vis_vram_limit;
 int amdgpu_gart_size = -1; /* auto */
@@ -140,7 +148,6 @@ int amdgpu_vm_size = -1;
 int amdgpu_vm_fragment_size = -1;
 int amdgpu_vm_block_size = -1;
 int amdgpu_vm_fault_stop;
-int amdgpu_vm_debug;
 int amdgpu_vm_update_mode = -1;
 int amdgpu_exp_hw_support;
 int amdgpu_dc = -1;
@@ -195,6 +202,7 @@ int amdgpu_vcnfw_log;
 int amdgpu_sg_display = -1; /* auto */
 int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE;
 int amdgpu_umsch_mm;
+uint amdgpu_debug_mask;
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
@@ -406,13 +414,6 @@ module_param_named(vm_block_size, amdgpu_vm_block_size, 
int, 0444);
 MODULE_PARM_DESC(vm_fault_stop, "Stop on VM fault (0 = never (default), 1 = 
print first, 2 = always)");
 module_param_named(vm_fault_stop, amdgpu_vm_fault_stop, int, 0444);
 
-/**
- * DOC: vm_debug (int)
- * Debug VM handling (0 = disabled, 1 = enabled). The default is 0 (Disabled).
- */
-MODULE_PARM_DESC(vm_debug, "Debug VM handling (0 = disabled (default), 1 = 
enabled)");
-module_param_named(vm_debug, amdgpu_vm_debug, int, 0644);
-
 /**
  * DOC: vm_update_mode (int)
  * Override VM update mode. VM updated by using CPU (0 = never, 1 = Graphics 
only, 2 = Compute only, 3 = Both). The default
@@ -744,18 +745,6 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = 
enable)");
 
-/**
- * DOC: debug_largebar (int)
- * Set debug_largebar as 1 to enable simulating large-bar capability on 
non-large bar
- * system. This limits the VRAM size reported to ROCm applications to the 
visible
- * size, usually 256MB.
- * Default value is 0, diabled.
- */
-int debug_largebar;
-module_param(debug_largebar, int, 0444);
-MODULE_PARM_DESC(debug_largebar,
-   "Debug large-bar flag used to simulate large-bar capability on 
non-large bar machine (0 = disable, 1 = enable)");
-
 /**
  * DOC: halt_if_hws_hang (int)
  * Halt if HWS hang is detected. Default value, 0, disables the halt on hang.
@@ -948,6 +937,18 @@ module_param_named(user_partt_mode, 
amdgpu_user_partt_mode, uint, 0444);
 module_param(enforce_isolation, bool, 0444);
 MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between 
graphics and compu

[PATCH v4 0/2] Merge all debug module parameters

2023-09-11 Thread André Almeida

As suggested by Christian at [0], this patchset merges all debug modules
parameters and creates a new one for disabling soft recovery:

> Maybe we can overload the amdgpu_gpu_recovery module option with this. 
> Or even better merge all the developer module parameter into a 
> amdgpu_debug option. This way it should be pretty obvious that this 
> isn't meant to be used by someone who doesn't know how to use it.

[0] 
https://lore.kernel.org/dri-devel/55f69184-1aa2-55d6-4a10-1560d75c7...@amd.com/

Changelog:
- rebased on top of current amd-staging-drm-next 
v3: https://lore.kernel.org/lkml/20230831152903.521404-1-andrealm...@igalia.com

- move enum from include/amd_shared.h to amdgpu/amdgpu_drv.c
v2: https://lore.kernel.org/lkml/20230830220808.421935-1-andrealm...@igalia.com/

- drop old module params
- use BIT() macros
- replace global var with adev-> vars
v1: https://lore.kernel.org/lkml/20230824162505.173399-1-andrealm...@igalia.com/

André Almeida (2):
  drm/amdgpu: Merge debug module parameters
  drm/amdgpu: Create an option to disable soft recovery

 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 63 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
 8 files changed, 59 insertions(+), 26 deletions(-)

-- 
2.42.0

Re: [PATCH v3 0/2] Merge all debug module parameters

2023-09-11 Thread André Almeida


Christian, Alex, I think this series is ready to be picked as well.

Em 31/08/2023 12:29, André Almeida escreveu:

As suggested by Christian at [0], this patchset merges all debug modules
parameters and creates a new one for disabling soft recovery:


Maybe we can overload the amdgpu_gpu_recovery module option with this.
Or even better merge all the developer module parameter into a
amdgpu_debug option. This way it should be pretty obvious that this
isn't meant to be used by someone who doesn't know how to use it.


[0] 
https://lore.kernel.org/dri-devel/55f69184-1aa2-55d6-4a10-1560d75c7...@amd.com/

Changelog:
- move enum from include/amd_shared.h to amdgpu/amdgpu_drv.c
v2: https://lore.kernel.org/lkml/20230830220808.421935-1-andrealm...@igalia.com/

- drop old module params
- use BIT() macros
- replace global var with adev-> vars
v1: https://lore.kernel.org/lkml/20230824162505.173399-1-andrealm...@igalia.com/

André Almeida (2):
   drm/amdgpu: Merge debug module parameters
   drm/amdgpu: Create an option to disable soft recovery

  drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  5 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 63 
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  6 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
  8 files changed, 58 insertions(+), 26 deletions(-)

[PATCH v6 0/5] drm/amdgpu: Rework coredump memory allocation

2023-09-10 Thread André Almeida

Hi,

The patches of this set are a rework to alloc devcoredump dynamically and to
move it to a better source file.

Thanks,
André

Changelog:

v5: https://lore.kernel.org/lkml/20230817182050.205925-1-andrealm...@igalia.com/
- Added Shashank Sharma R-b tag

v4: 
https://lore.kernel.org/dri-devel/20230815195100.294458-1-andrealm...@igalia.com/
- New patch to encapsulate all reset info in a struct

v3: 
https://lore.kernel.org/dri-devel/20230810192330.198326-1-andrealm...@igalia.com/
- Changed from kmalloc to kzalloc
- Dropped "Create a module param to disable soft recovery" for now

v2: 
https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealm...@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset

v1: 
https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealm...@igalia.com/
 - Drop "Mark contexts guilty for causing soft recoveries" patch
 - Use GFP_NOWAIT for devcoredump allocatio

André Almeida (5):
  drm/amdgpu: Allocate coredump memory in a nonblocking way
  drm/amdgpu: Rework coredump to use memory dynamically
  drm/amdgpu: Encapsulate all device reset info
  drm/amdgpu: Move coredump code to amdgpu_reset file
  drm/amdgpu: Create version number for coredumps

 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 21 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 75 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c   | 77 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h   | 13 
 5 files changed, 114 insertions(+), 82 deletions(-)

-- 
2.42.0

[PATCH v6 2/5] drm/amdgpu: Rework coredump to use memory dynamically

2023-09-10 Thread André Almeida

Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.

Signed-off-by: André Almeida 
Reviewed-by: Shashank Sharma 
---
v6: no change
v5: no change
v4: change kmalloc to kzalloc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 ++
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
uint32_t*reset_dump_reg_list;
uint32_t*reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
-   struct amdgpu_task_info reset_task_info;
-   boolreset_vram_lost;
-   struct timespec64   reset_time;
-#endif
 
boolscpm_enabled;
uint32_tscpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
uint32_taid_mask;
 };
 
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+   struct amdgpu_device*adev;
+   struct amdgpu_task_info reset_task_info;
+   struct timespec64   reset_time;
+   boolreset_vram_lost;
+};
+#endif
+
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
 {
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bf4781551f88..b5b879bcc5c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4799,12 +4799,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
return 0;
 }
 
-#ifdef CONFIG_DEV_COREDUMP
+#ifndef CONFIG_DEV_COREDUMP
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+   struct amdgpu_reset_context *reset_context)
+{
+}
+#else
 static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
size_t count, void *data, size_t datalen)
 {
struct drm_printer p;
-   struct amdgpu_device *adev = data;
+   struct amdgpu_coredump_info *coredump = data;
struct drm_print_iterator iter;
int i;
 
@@ -4818,21 +4823,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
drm_printf(, " AMDGPU Device Coredump \n");
drm_printf(, "kernel: " UTS_RELEASE "\n");
drm_printf(, "module: " KBUILD_MODNAME "\n");
-   drm_printf(, "time: %lld.%09ld\n", adev->reset_time.tv_sec, 
adev->reset_time.tv_nsec);
-   if (adev->reset_task_info.pid)
+   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
+   if (coredump->reset_task_info.pid)
drm_printf(, "process_name: %s PID: %d\n",
-  adev->reset_task_info.process_name,
-  adev->reset_task_info.pid);
+  coredump->reset_task_info.process_name,
+  coredump->reset_task_info.pid);
 
-   if (adev->reset_vram_lost)
+   if (coredump->reset_vram_lost)
drm_printf(, "VRAM is lost due to GPU reset!\n");
-   if (adev->num_regs) {
+   if (coredump->adev->num_regs) {
drm_printf(, "AMDGPU register dumps:\nOffset: Value:\n");
 
-   for (i = 0; i < adev->num_regs; i++)
+   for (i = 0; i < coredump->adev->num_regs; i++)
drm_printf(, "0x%08x: 0x%08x\n",
-  adev->reset_dump_reg_list[i],
-  adev->reset_dump_reg_value[i]);
+  coredump->adev->reset_dump_reg_list[i],
+  coredump->adev->reset_dump_reg_value[i]);
}
 
return count - iter.remain;
@@ -4840,14 +4845,32 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
 
 static void amdgpu_devcoredump_free(void *data)
 {
+   kfree(data);
 }
 
-static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+   struct amdgpu_reset_context *reset_context)
 {
+   struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);

[PATCH v6 1/5] drm/amdgpu: Allocate coredump memory in a nonblocking way

2023-09-10 Thread André Almeida

During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
v5: no change
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index aa171db68639..bf4781551f88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4847,7 +4847,7 @@ static void amdgpu_reset_capture_coredumpm(struct 
amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
 
ktime_get_ts64(>reset_time);
-   dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL,
+   dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
  amdgpu_devcoredump_read, amdgpu_devcoredump_free);
 }
 #endif
-- 
2.42.0

[PATCH v6 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file

2023-09-10 Thread André Almeida

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida 
Reviewed-by: Shashank Sharma 
---
v6: no change
v5: no change
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  9 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 78 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 76 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 10 +++
 4 files changed, 86 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 56d78ca6e917..b11187d153ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -781,15 +781,6 @@ struct amdgpu_mqd {
 #define AMDGPU_PRODUCT_NAME_LEN 64
 struct amdgpu_reset_domain;
 
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
-   struct amdgpu_device*adev;
-   struct amdgpu_task_info reset_task_info;
-   struct timespec64   reset_time;
-   boolreset_vram_lost;
-};
-#endif
-
 struct amdgpu_reset_info {
/* reset dump register */
u32 *reset_dump_reg_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 96975591841d..883953f2ae53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -32,8 +32,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 
@@ -4799,82 +4797,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
return 0;
 }
 
-#ifndef CONFIG_DEV_COREDUMP
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
-   struct amdgpu_reset_context *reset_context)
-{
-}
-#else
-static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
-   size_t count, void *data, size_t datalen)
-{
-   struct drm_printer p;
-   struct amdgpu_coredump_info *coredump = data;
-   struct drm_print_iterator iter;
-   int i;
-
-   iter.data = buffer;
-   iter.offset = 0;
-   iter.start = offset;
-   iter.remain = count;
-
-   p = drm_coredump_printer();
-
-   drm_printf(, " AMDGPU Device Coredump \n");
-   drm_printf(, "kernel: " UTS_RELEASE "\n");
-   drm_printf(, "module: " KBUILD_MODNAME "\n");
-   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
-   if (coredump->reset_task_info.pid)
-   drm_printf(, "process_name: %s PID: %d\n",
-  coredump->reset_task_info.process_name,
-  coredump->reset_task_info.pid);
-
-   if (coredump->reset_vram_lost)
-   drm_printf(, "VRAM is lost due to GPU reset!\n");
-   if (coredump->adev->reset_info.num_regs) {
-   drm_printf(, "AMDGPU register dumps:\nOffset: Value:\n");
-
-   for (i = 0; i < coredump->adev->reset_info.num_regs; i++)
-   drm_printf(, "0x%08x: 0x%08x\n",
-  
coredump->adev->reset_info.reset_dump_reg_list[i],
-  
coredump->adev->reset_info.reset_dump_reg_value[i]);
-   }
-
-   return count - iter.remain;
-}
-
-static void amdgpu_devcoredump_free(void *data)
-{
-   kfree(data);
-}
-
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
-   struct amdgpu_reset_context *reset_context)
-{
-   struct amdgpu_coredump_info *coredump;
-   struct drm_device *dev = adev_to_drm(adev);
-
-   coredump = kzalloc(sizeof(*coredump), GFP_NOWAIT);
-
-   if (!coredump) {
-   DRM_ERROR("%s: failed to allocate memory for coredump\n", 
__func__);
-   return;
-   }
-
-   coredump->reset_vram_lost = vram_lost;
-
-   if (reset_context->job && reset_context->job->vm)
-   coredump->reset_task_info = reset_context->job->vm->task_info;
-
-   coredump->adev = adev;
-
-   ktime_get_ts64(>reset_time);
-
-   dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
- amdgpu_devcoredump_read, amdgpu_devcoredump_free);
-}
-#endif
-
 int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 struct amdgpu_reset_context *reset_context)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 5fed06ffcc6b..579b70a3cdab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -21,6 +21,9 @@
  *
  */
 
+#include 
+#include 
+
 #include "amdgpu_reset.h"
 #include "aldeb

[PATCH v6 5/5] drm/amdgpu: Create version number for coredumps

2023-09-10 Thread André Almeida

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida 
Reviewed-by: Shashank Sharma 
---
v6: no change
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 579b70a3cdab..e92c81ff27be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t 
offset,
p = drm_coredump_printer();
 
drm_printf(, " AMDGPU Device Coredump \n");
+   drm_printf(, "version: " AMDGPU_COREDUMP_VERSION "\n");
drm_printf(, "kernel: " UTS_RELEASE "\n");
drm_printf(, "module: " KBUILD_MODNAME "\n");
drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 01e8183ade4b..ec3a409ec509 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -88,6 +88,9 @@ struct amdgpu_reset_domain {
 };
 
 #ifdef CONFIG_DEV_COREDUMP
+
+#define AMDGPU_COREDUMP_VERSION "1"
+
 struct amdgpu_coredump_info {
struct amdgpu_device*adev;
struct amdgpu_task_info reset_task_info;
-- 
2.42.0

[PATCH v6 3/5] drm/amdgpu: Encapsulate all device reset info

2023-09-10 Thread André Almeida

To better organize struct amdgpu_device, keep all reset information
related fields together in a separated struct.

Signed-off-by: André Almeida 
Reviewed-by: Shashank Sharma 
---
v6: no changes
v5: new patch, as required by Shashank Sharma
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 34 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 16 +-
 3 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0d560b713948..56d78ca6e917 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -781,6 +781,26 @@ struct amdgpu_mqd {
 #define AMDGPU_PRODUCT_NAME_LEN 64
 struct amdgpu_reset_domain;
 
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+   struct amdgpu_device*adev;
+   struct amdgpu_task_info reset_task_info;
+   struct timespec64   reset_time;
+   boolreset_vram_lost;
+};
+#endif
+
+struct amdgpu_reset_info {
+   /* reset dump register */
+   u32 *reset_dump_reg_list;
+   u32 *reset_dump_reg_value;
+   int num_regs;
+
+#ifdef CONFIG_DEV_COREDUMP
+   struct amdgpu_coredump_info *coredump_info;
+#endif
+};
+
 /*
  * Non-zero (true) if the GPU has VRAM. Zero (false) otherwise.
  */
@@ -1084,10 +1104,7 @@ struct amdgpu_device {
 
struct mutexbenchmark_mutex;
 
-   /* reset dump register */
-   uint32_t*reset_dump_reg_list;
-   uint32_t*reset_dump_reg_value;
-   int num_regs;
+   struct amdgpu_reset_inforeset_info;
 
boolscpm_enabled;
uint32_tscpm_status;
@@ -1100,15 +1117,6 @@ struct amdgpu_device {
uint32_taid_mask;
 };
 
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
-   struct amdgpu_device*adev;
-   struct amdgpu_task_info reset_task_info;
-   struct timespec64   reset_time;
-   boolreset_vram_lost;
-};
-#endif
-
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
 {
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a4faea4fa0b5..3136a0774dd9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2016,8 +2016,8 @@ static ssize_t 
amdgpu_reset_dump_register_list_read(struct file *f,
if (ret)
return ret;
 
-   for (i = 0; i < adev->num_regs; i++) {
-   sprintf(reg_offset, "0x%x\n", adev->reset_dump_reg_list[i]);
+   for (i = 0; i < adev->reset_info.num_regs; i++) {
+   sprintf(reg_offset, "0x%x\n", 
adev->reset_info.reset_dump_reg_list[i]);
up_read(>reset_domain->sem);
if (copy_to_user(buf + len, reg_offset, strlen(reg_offset)))
return -EFAULT;
@@ -2074,9 +2074,9 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
if (ret)
goto error_free;
 
-   swap(adev->reset_dump_reg_list, tmp);
-   swap(adev->reset_dump_reg_value, new);
-   adev->num_regs = i;
+   swap(adev->reset_info.reset_dump_reg_list, tmp);
+   swap(adev->reset_info.reset_dump_reg_value, new);
+   adev->reset_info.num_regs = i;
up_write(>reset_domain->sem);
ret = size;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b5b879bcc5c9..96975591841d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4790,10 +4790,10 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
 
lockdep_assert_held(>reset_domain->sem);
 
-   for (i = 0; i < adev->num_regs; i++) {
-   adev->reset_dump_reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);
-   trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list[i],
-adev->reset_dump_reg_value[i]);
+   for (i = 0; i < adev->reset_info.num_regs; i++) {
+   adev->reset_info.reset_dump_reg_value[i] = 
RREG32(adev->reset_info.reset_dump_reg_list[i]);
+   
trace_amdgpu_reset_reg_dumps(adev->reset_info.reset_dump_reg_list[i],
+
adev->reset_info.reset_dump_reg_value[i]);
}
 
return 0;
@@ -4831,13 +4831,13 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
 
if (coredump->reset_vram_lost)
drm

[PATCH v3 2/2] drm/amdgpu: Create an option to disable soft recovery

2023-08-31 Thread André Almeida

Create a module option to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 82eaccfce347..5f49e2c0ae7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1105,6 +1105,7 @@ struct amdgpu_device {
/* Debug */
booldebug_vm;
booldebug_largebar;
+   booldebug_disable_soft_recovery;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 33fddf7b1c4f..861830e1afa8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -124,6 +124,7 @@
 enum AMDGPU_DEBUG_MASK {
AMDGPU_DEBUG_VM = BIT(0),
AMDGPU_DEBUG_LARGEBAR = BIT(1),
+   AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY = BIT(2),
 };
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
@@ -935,6 +936,7 @@ MODULE_PARM_DESC(enforce_isolation, "enforce process 
isolation between graphics
  * - 0x2: Enable simulating large-bar capability on non-large bar system. This
  *   limits the VRAM size reported to ROCm applications to the visible
  *   size, usually 256MB.
+ * - 0x4: Disable GPU soft recovery, always do a full reset
  */
 MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by default");
 module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444);
@@ -2054,6 +2056,11 @@ static void amdgpu_init_debug_options(struct 
amdgpu_device *adev)
pr_info("debug: enabled simulating large-bar capability on 
non-large bar system\n");
adev->debug_largebar = true;
}
+
+   if (amdgpu_debug_mask & AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY) {
+   pr_info("debug: soft reset for GPU recovery disabled\n");
+   adev->debug_disable_soft_recovery = true;
+   }
 }
 
 static int amdgpu_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80d6e132e409..6a80d3ec887e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, 
unsigned int vmid,
   struct dma_fence *fence)
 {
unsigned long flags;
+   ktime_t deadline;
 
-   ktime_t deadline = ktime_add_us(ktime_get(), 1);
+   if (unlikely(ring->adev->debug_disable_soft_recovery))
+   return false;
+
+   deadline = ktime_add_us(ktime_get(), 1);
 
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || 
!fence)
return false;
-- 
2.42.0

[PATCH v3 1/2] drm/amdgpu: Merge debug module parameters

2023-08-31 Thread André Almeida

Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.

Drop the obsolete module options in favor of the new ones.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
v3:
- move from include/amd_shared.h to amdgpu/amdgpu_drv.c
v2:
- drop old module params
- use BIT() macros
- replace global var with adev-> vars
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 56 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
 7 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4de074243c4d..82eaccfce347 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1101,6 +1101,10 @@ struct amdgpu_device {
booldc_enabled;
/* Mask of active clusters */
uint32_taid_mask;
+
+   /* Debug */
+   booldebug_vm;
+   booldebug_largebar;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fb78a8f47587..8a26bed76505 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1191,7 +1191,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo);
}
 
-   if (amdgpu_vm_debug) {
+   if (adev->debug_vm) {
/* Invalidate all BOs to test for userspace bugs */
amdgpu_bo_list_for_each_entry(e, p->bo_list) {
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f5856b82605e..33fddf7b1c4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -118,6 +118,14 @@
 #define KMS_DRIVER_MINOR   54
 #define KMS_DRIVER_PATCHLEVEL  0
 
+/*
+ * amdgpu.debug module options. Are all disabled by default
+ */
+enum AMDGPU_DEBUG_MASK {
+   AMDGPU_DEBUG_VM = BIT(0),
+   AMDGPU_DEBUG_LARGEBAR = BIT(1),
+};
+
 unsigned int amdgpu_vram_limit = UINT_MAX;
 int amdgpu_vis_vram_limit;
 int amdgpu_gart_size = -1; /* auto */
@@ -140,7 +148,6 @@ int amdgpu_vm_size = -1;
 int amdgpu_vm_fragment_size = -1;
 int amdgpu_vm_block_size = -1;
 int amdgpu_vm_fault_stop;
-int amdgpu_vm_debug;
 int amdgpu_vm_update_mode = -1;
 int amdgpu_exp_hw_support;
 int amdgpu_dc = -1;
@@ -194,6 +201,7 @@ int amdgpu_use_xgmi_p2p = 1;
 int amdgpu_vcnfw_log;
 int amdgpu_sg_display = -1; /* auto */
 int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE;
+uint amdgpu_debug_mask;
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
@@ -405,13 +413,6 @@ module_param_named(vm_block_size, amdgpu_vm_block_size, 
int, 0444);
 MODULE_PARM_DESC(vm_fault_stop, "Stop on VM fault (0 = never (default), 1 = 
print first, 2 = always)");
 module_param_named(vm_fault_stop, amdgpu_vm_fault_stop, int, 0444);
 
-/**
- * DOC: vm_debug (int)
- * Debug VM handling (0 = disabled, 1 = enabled). The default is 0 (Disabled).
- */
-MODULE_PARM_DESC(vm_debug, "Debug VM handling (0 = disabled (default), 1 = 
enabled)");
-module_param_named(vm_debug, amdgpu_vm_debug, int, 0644);
-
 /**
  * DOC: vm_update_mode (int)
  * Override VM update mode. VM updated by using CPU (0 = never, 1 = Graphics 
only, 2 = Compute only, 3 = Both). The default
@@ -743,18 +744,6 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = 
enable)");
 
-/**
- * DOC: debug_largebar (int)
- * Set debug_largebar as 1 to enable simulating large-bar capability on 
non-large bar
- * system. This limits the VRAM size reported to ROCm applications to the 
visible
- * size, usually 256MB.
- * Default value is 0, diabled.
- */
-int debug_largebar;
-module_param(debug_largebar, int, 0444);
-MODULE_PARM_DESC(debug_largebar,
-   "Debug large-bar flag used to simulate large-bar capability on 
non-large bar machine (0 = disable, 1 = enable)");
-
 /**
  * DOC: halt_if_hws_hang (int)
  * Halt if HWS hang is detected. Default value, 0, disables the halt on hang.
@@ -938,6 +927,18 @@ module_param_named(user_partt_mode, 
amdgpu_user_partt_mode, uint, 0444);
 module_param(enforce_isolation, bool, 0444);
 MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between 
graphics and

[PATCH v3 0/2] Merge all debug module parameters

2023-08-31 Thread André Almeida

As suggested by Christian at [0], this patchset merges all debug modules
parameters and creates a new one for disabling soft recovery:

> Maybe we can overload the amdgpu_gpu_recovery module option with this. 
> Or even better merge all the developer module parameter into a 
> amdgpu_debug option. This way it should be pretty obvious that this 
> isn't meant to be used by someone who doesn't know how to use it.

[0] 
https://lore.kernel.org/dri-devel/55f69184-1aa2-55d6-4a10-1560d75c7...@amd.com/

Changelog:
- move enum from include/amd_shared.h to amdgpu/amdgpu_drv.c
v2: https://lore.kernel.org/lkml/20230830220808.421935-1-andrealm...@igalia.com/

- drop old module params
- use BIT() macros
- replace global var with adev-> vars
v1: https://lore.kernel.org/lkml/20230824162505.173399-1-andrealm...@igalia.com/

André Almeida (2):
  drm/amdgpu: Merge debug module parameters
  drm/amdgpu: Create an option to disable soft recovery

 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 63 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  6 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
 8 files changed, 58 insertions(+), 26 deletions(-)

-- 
2.42.0

[PATCH v2 1/2] drm/amdgpu: Merge debug module parameters

2023-08-30 Thread André Almeida

Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.

Drop the obsolete module options in favor of the new ones.

Signed-off-by: André Almeida 
---
v2:
- drop old module params
- use BIT() macros
- replace global var with adev-> vars
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 48 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
 drivers/gpu/drm/amd/include/amd_shared.h |  8 
 8 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4de074243c4d..82eaccfce347 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1101,6 +1101,10 @@ struct amdgpu_device {
booldc_enabled;
/* Mask of active clusters */
uint32_taid_mask;
+
+   /* Debug */
+   booldebug_vm;
+   booldebug_largebar;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fb78a8f47587..8a26bed76505 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1191,7 +1191,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo);
}
 
-   if (amdgpu_vm_debug) {
+   if (adev->debug_vm) {
/* Invalidate all BOs to test for userspace bugs */
amdgpu_bo_list_for_each_entry(e, p->bo_list) {
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f5856b82605e..0cd48c025433 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -140,7 +140,6 @@ int amdgpu_vm_size = -1;
 int amdgpu_vm_fragment_size = -1;
 int amdgpu_vm_block_size = -1;
 int amdgpu_vm_fault_stop;
-int amdgpu_vm_debug;
 int amdgpu_vm_update_mode = -1;
 int amdgpu_exp_hw_support;
 int amdgpu_dc = -1;
@@ -194,6 +193,7 @@ int amdgpu_use_xgmi_p2p = 1;
 int amdgpu_vcnfw_log;
 int amdgpu_sg_display = -1; /* auto */
 int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE;
+uint amdgpu_debug_mask;
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
@@ -405,13 +405,6 @@ module_param_named(vm_block_size, amdgpu_vm_block_size, 
int, 0444);
 MODULE_PARM_DESC(vm_fault_stop, "Stop on VM fault (0 = never (default), 1 = 
print first, 2 = always)");
 module_param_named(vm_fault_stop, amdgpu_vm_fault_stop, int, 0444);
 
-/**
- * DOC: vm_debug (int)
- * Debug VM handling (0 = disabled, 1 = enabled). The default is 0 (Disabled).
- */
-MODULE_PARM_DESC(vm_debug, "Debug VM handling (0 = disabled (default), 1 = 
enabled)");
-module_param_named(vm_debug, amdgpu_vm_debug, int, 0644);
-
 /**
  * DOC: vm_update_mode (int)
  * Override VM update mode. VM updated by using CPU (0 = never, 1 = Graphics 
only, 2 = Compute only, 3 = Both). The default
@@ -743,18 +736,6 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = 
enable)");
 
-/**
- * DOC: debug_largebar (int)
- * Set debug_largebar as 1 to enable simulating large-bar capability on 
non-large bar
- * system. This limits the VRAM size reported to ROCm applications to the 
visible
- * size, usually 256MB.
- * Default value is 0, diabled.
- */
-int debug_largebar;
-module_param(debug_largebar, int, 0444);
-MODULE_PARM_DESC(debug_largebar,
-   "Debug large-bar flag used to simulate large-bar capability on 
non-large bar machine (0 = disable, 1 = enable)");
-
 /**
  * DOC: halt_if_hws_hang (int)
  * Halt if HWS hang is detected. Default value, 0, disables the halt on hang.
@@ -938,6 +919,18 @@ module_param_named(user_partt_mode, 
amdgpu_user_partt_mode, uint, 0444);
 module_param(enforce_isolation, bool, 0444);
 MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between 
graphics and compute . enforce_isolation = on");
 
+/**
+ * DOC: debug_mask (uint)
+ * Debug options for amdgpu, work as a binary mask with the following options:
+ *
+ * - 0x1: Debug VM handling
+ * - 0x2: Enable simulating large-bar capability on non-large bar system. This
+ *   limits the VRAM size reported to ROCm applications to the visible
+ *   size, usually 256MB.
+ */
+MODULE_PARM_DESC(debug_mask, "debug

[PATCH v2 2/2] drm/amdgpu: Create an option to disable soft recovery

2023-08-30 Thread André Almeida

Create a module option to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
 drivers/gpu/drm/amd/include/amd_shared.h | 1 +
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 82eaccfce347..5f49e2c0ae7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1105,6 +1105,7 @@ struct amdgpu_device {
/* Debug */
booldebug_vm;
booldebug_largebar;
+   booldebug_disable_soft_recovery;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0cd48c025433..59e9fe594b51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -927,6 +927,7 @@ MODULE_PARM_DESC(enforce_isolation, "enforce process 
isolation between graphics
  * - 0x2: Enable simulating large-bar capability on non-large bar system. This
  *   limits the VRAM size reported to ROCm applications to the visible
  *   size, usually 256MB.
+ * - 0x4: Disable GPU soft recovery
  */
 MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by default");
 module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444);
@@ -2046,6 +2047,11 @@ static void amdgpu_init_debug_options(struct 
amdgpu_device *adev)
pr_info("debug: enabled simulating large-bar capability on 
non-large bar system\n");
adev->debug_largebar = true;
}
+
+   if (amdgpu_debug_mask & AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY) {
+   pr_info("debug: soft reset for GPU recovery disabled\n");
+   adev->debug_disable_soft_recovery = true;
+   }
 }
 
 static int amdgpu_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80d6e132e409..6a80d3ec887e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, 
unsigned int vmid,
   struct dma_fence *fence)
 {
unsigned long flags;
+   ktime_t deadline;
 
-   ktime_t deadline = ktime_add_us(ktime_get(), 1);
+   if (unlikely(ring->adev->debug_disable_soft_recovery))
+   return false;
+
+   deadline = ktime_add_us(ktime_get(), 1);
 
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || 
!fence)
return false;
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index 2fd6af2183cc..32ee982be99e 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -263,6 +263,7 @@ enum amd_dpm_forced_level;
 enum AMDGPU_DEBUG_MASK {
AMDGPU_DEBUG_VM = BIT(0),
AMDGPU_DEBUG_LARGEBAR = BIT(1),
+   AMDGPU_DEBUG_DISABLE_GPU_SOFT_RECOVERY = BIT(2),
 };
 
 /**
-- 
2.41.0

[PATCH v2 0/2] Merge all debug module parameters

2023-08-30 Thread André Almeida

As suggested by Christian at [0], this patchset merges all debug modules
parameters and creates a new one for disabling soft recovery:

> Maybe we can overload the amdgpu_gpu_recovery module option with this. 
> Or even better merge all the developer module parameter into a 
> amdgpu_debug option. This way it should be pretty obvious that this 
> isn't meant to be used by someone who doesn't know how to use it.

[0] 
https://lore.kernel.org/dri-devel/55f69184-1aa2-55d6-4a10-1560d75c7...@amd.com/

Changelog:
- drop old module params
- use BIT() macros
- replace global var with adev-> vars
v1: https://lore.kernel.org/lkml/20230824162505.173399-1-andrealm...@igalia.com/

André Almeida (2):
  drm/amdgpu: Merge debug module parameters
  drm/amdgpu: Create an option to disable soft recovery

 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  5 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 54 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  6 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c|  2 +-
 drivers/gpu/drm/amd/include/amd_shared.h |  9 
 9 files changed, 58 insertions(+), 26 deletions(-)

-- 
2.41.0

Re: [PATCH 1/2] drm/amdgpu: Merge debug module parameters

2023-08-25 Thread André Almeida


Em 25/08/2023 09:29, Christian König escreveu:

Am 25.08.23 um 14:24 schrieb André Almeida:

Em 25/08/2023 03:56, Christian König escreveu:
> Am 24.08.23 um 18:25 schrieb André Almeida:
>> Merge all developer debug options available as separated module
>> parameters in one, making it obvious that are for developers.
>>
>> Signed-off-by: André Almeida 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 24 


>>   drivers/gpu/drm/amd/include/amd_shared.h |  9 +
>>   2 files changed, 33 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index f5856b82605e..d53e4097acc0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -194,6 +194,7 @@ int amdgpu_use_xgmi_p2p = 1;
>>   int amdgpu_vcnfw_log;
>>   int amdgpu_sg_display = -1; /* auto */
>>   int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE;
>> +uint amdgpu_debug_mask;
>>   static void amdgpu_drv_delayed_reset_work_handler(struct work_struct
>> *work);
>> @@ -938,6 +939,9 @@ module_param_named(user_partt_mode,
>> amdgpu_user_partt_mode, uint, 0444);
>>   module_param(enforce_isolation, bool, 0444);
>>   MODULE_PARM_DESC(enforce_isolation, "enforce process isolation
>> between graphics and compute . enforce_isolation = on");
>> +MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by
>> default");
>> +module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444);
>> +
>>   /* These devices are not supported by amdgpu.
>>    * They are supported by the mach64, r128, radeon drivers
>>    */
>> @@ -2871,6 +2875,24 @@ static struct pci_driver 
amdgpu_kms_pci_driver = {

>>   .dev_groups = amdgpu_sysfs_groups,
>>   };
>> +static void amdgpu_init_debug_options(void)
>> +{
>> +    if (amdgpu_debug_mask & DEBUG_VERBOSE_EVICTIONS) {
>> +    pr_info("debug: eviction debug messages enabled\n");
>> +    debug_evictions = true;
>> +    }
>> +
>> +    if (amdgpu_debug_mask & DEBUG_VM) {
>> +    pr_info("debug: VM handling debug enabled\n");
>> +    amdgpu_vm_debug = true;
>> +    }
>> +
>> +    if (amdgpu_debug_mask & DEBUG_LARGEBAR) {
>> +    pr_info("debug: enabled simulating large-bar capability on
>> non-large bar system\n");
>> +    debug_largebar = true;
>
> How should that work???

Ops, I thought it was a boolean. It should be

+    debug_largebar = 1;



That's not the problem, the question is since when do we have a 
debug_largebar option and what should that one do?




It should work exactly like the other one, but instead of using 
amdgpu.large_bar=1, one would use amdgpu.debug_mask=0x4 to activate it, 
as the plan is to merge all current debug options in a single one right?



Regards,
Christian.

Re: [PATCH 1/2] drm/amdgpu: Merge debug module parameters

2023-08-25 Thread André Almeida


Em 25/08/2023 03:56, Christian König escreveu:
> Am 24.08.23 um 18:25 schrieb André Almeida:
>> Merge all developer debug options available as separated module
>> parameters in one, making it obvious that are for developers.
>>
>> Signed-off-by: André Almeida 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 24 
>>   drivers/gpu/drm/amd/include/amd_shared.h |  9 +
>>   2 files changed, 33 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index f5856b82605e..d53e4097acc0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -194,6 +194,7 @@ int amdgpu_use_xgmi_p2p = 1;
>>   int amdgpu_vcnfw_log;
>>   int amdgpu_sg_display = -1; /* auto */
>>   int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE;
>> +uint amdgpu_debug_mask;
>>   static void amdgpu_drv_delayed_reset_work_handler(struct work_struct
>> *work);
>> @@ -938,6 +939,9 @@ module_param_named(user_partt_mode,
>> amdgpu_user_partt_mode, uint, 0444);
>>   module_param(enforce_isolation, bool, 0444);
>>   MODULE_PARM_DESC(enforce_isolation, "enforce process isolation
>> between graphics and compute . enforce_isolation = on");
>> +MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by
>> default");
>> +module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444);
>> +
>>   /* These devices are not supported by amdgpu.
>>* They are supported by the mach64, r128, radeon drivers
>>*/
>> @@ -2871,6 +2875,24 @@ static struct pci_driver 
amdgpu_kms_pci_driver = {

>>   .dev_groups = amdgpu_sysfs_groups,
>>   };
>> +static void amdgpu_init_debug_options(void)
>> +{
>> +if (amdgpu_debug_mask & DEBUG_VERBOSE_EVICTIONS) {
>> +pr_info("debug: eviction debug messages enabled\n");
>> +debug_evictions = true;
>> +}
>> +
>> +if (amdgpu_debug_mask & DEBUG_VM) {
>> +pr_info("debug: VM handling debug enabled\n");
>> +amdgpu_vm_debug = true;
>> +}
>> +
>> +if (amdgpu_debug_mask & DEBUG_LARGEBAR) {
>> +pr_info("debug: enabled simulating large-bar capability on
>> non-large bar system\n");
>> +debug_largebar = true;
>
> How should that work???

Ops, I thought it was a boolean. It should be

+debug_largebar = 1;

>
>> +}
>> +}
>> +
>>   static int __init amdgpu_init(void)
>>   {
>>   int r;
>> @@ -2893,6 +2915,8 @@ static int __init amdgpu_init(void)
>>   /* Ignore KFD init failures. Normal when CONFIG_HSA_AMD is not
>> set. */
>>   amdgpu_amdkfd_init();
>> +amdgpu_init_debug_options();
>> +
>>   /* let modprobe override vga console setting */
>>   return pci_register_driver(_kms_pci_driver);
>> diff --git a/drivers/gpu/drm/amd/include/amd_shared.h
>> b/drivers/gpu/drm/amd/include/amd_shared.h
>> index 67d7b7ee8a2a..6fa644c249a5 100644
>> --- a/drivers/gpu/drm/amd/include/amd_shared.h
>> +++ b/drivers/gpu/drm/amd/include/amd_shared.h
>> @@ -257,6 +257,15 @@ enum DC_DEBUG_MASK {
>>   enum amd_dpm_forced_level;
>> +/*
>> + * amdgpu.debug module options. Are all disabled by default
>> + */
>> +enum AMDGPU_DEBUG_MASK {
>> +DEBUG_VERBOSE_EVICTIONS = (1 << 0),// 0x1
>> +DEBUG_VM = (1 << 1),// 0x2
>> +DEBUG_LARGEBAR = (1 << 2),// 0x4
>
> Good start, but please give the symbol names an AMDGPU_ prefix. Stuff
> like DEBUG_VM is just way to general and could clash.
>
> Apart from that comments on the same line and using // style comments
> are frowned upon. You should probably rather use the BIT() macro here.
>

Ack, I'll change that for next version

> Regards,
> Christian.
>
>> +
>>   /**
>>* struct amd_ip_funcs - general hooks for managing amdgpu IP Blocks
>>* @name: Name of IP block
>
>

[PATCH 1/2] drm/amdgpu: Merge debug module parameters

2023-08-24 Thread André Almeida

Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 24 
 drivers/gpu/drm/amd/include/amd_shared.h |  9 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f5856b82605e..d53e4097acc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -194,6 +194,7 @@ int amdgpu_use_xgmi_p2p = 1;
 int amdgpu_vcnfw_log;
 int amdgpu_sg_display = -1; /* auto */
 int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE;
+uint amdgpu_debug_mask;
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
@@ -938,6 +939,9 @@ module_param_named(user_partt_mode, amdgpu_user_partt_mode, 
uint, 0444);
 module_param(enforce_isolation, bool, 0444);
 MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between 
graphics and compute . enforce_isolation = on");
 
+MODULE_PARM_DESC(debug_mask, "debug options for amdgpu, disabled by default");
+module_param_named(debug_mask, amdgpu_debug_mask, uint, 0444);
+
 /* These devices are not supported by amdgpu.
  * They are supported by the mach64, r128, radeon drivers
  */
@@ -2871,6 +2875,24 @@ static struct pci_driver amdgpu_kms_pci_driver = {
.dev_groups = amdgpu_sysfs_groups,
 };
 
+static void amdgpu_init_debug_options(void)
+{
+   if (amdgpu_debug_mask & DEBUG_VERBOSE_EVICTIONS) {
+   pr_info("debug: eviction debug messages enabled\n");
+   debug_evictions = true;
+   }
+
+   if (amdgpu_debug_mask & DEBUG_VM) {
+   pr_info("debug: VM handling debug enabled\n");
+   amdgpu_vm_debug = true;
+   }
+
+   if (amdgpu_debug_mask & DEBUG_LARGEBAR) {
+   pr_info("debug: enabled simulating large-bar capability on 
non-large bar system\n");
+   debug_largebar = true;
+   }
+}
+
 static int __init amdgpu_init(void)
 {
int r;
@@ -2893,6 +2915,8 @@ static int __init amdgpu_init(void)
/* Ignore KFD init failures. Normal when CONFIG_HSA_AMD is not set. */
amdgpu_amdkfd_init();
 
+   amdgpu_init_debug_options();
+
/* let modprobe override vga console setting */
return pci_register_driver(_kms_pci_driver);
 
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index 67d7b7ee8a2a..6fa644c249a5 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -257,6 +257,15 @@ enum DC_DEBUG_MASK {
 
 enum amd_dpm_forced_level;
 
+/*
+ * amdgpu.debug module options. Are all disabled by default
+ */
+enum AMDGPU_DEBUG_MASK {
+   DEBUG_VERBOSE_EVICTIONS = (1 << 0), // 0x1
+   DEBUG_VM = (1 << 1),// 0x2
+   DEBUG_LARGEBAR = (1 << 2),  // 0x4
+};
+
 /**
  * struct amd_ip_funcs - general hooks for managing amdgpu IP Blocks
  * @name: Name of IP block
-- 
2.41.0

[PATCH 2/2] drm/amdgpu: Create an option to disable soft recovery

2023-08-24 Thread André Almeida

Create a module option to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
 drivers/gpu/drm/amd/include/amd_shared.h | 1 +
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4de074243c4d..8f4a93890345 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training;
 extern int amdgpu_lbpw;
 extern int amdgpu_compute_multipipe;
 extern int amdgpu_gpu_recovery;
+extern bool amdgpu_soft_recovery;
 extern int amdgpu_emu_mode;
 extern uint amdgpu_smu_memory_pool_size;
 extern int amdgpu_smu_pptable_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d53e4097acc0..7d6c39b547cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -163,6 +163,7 @@ uint amdgpu_force_long_training;
 int amdgpu_lbpw = -1;
 int amdgpu_compute_multipipe = -1;
 int amdgpu_gpu_recovery = -1; /* auto */
+bool amdgpu_soft_recovery = true;
 int amdgpu_emu_mode;
 uint amdgpu_smu_memory_pool_size;
 int amdgpu_smu_pptable_id = -1;
@@ -2891,6 +2892,11 @@ static void amdgpu_init_debug_options(void)
pr_info("debug: enabled simulating large-bar capability on 
non-large bar system\n");
debug_largebar = true;
}
+
+   if (amdgpu_debug_mask & DEBUG_DISABLE_GPU_SOFT_RECOVERY) {
+   pr_info("debug: soft reset for GPU recovery disabled\n");
+   amdgpu_soft_recovery = false;
+   }
 }
 
 static int __init amdgpu_init(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80d6e132e409..40678d9fb17e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, 
unsigned int vmid,
   struct dma_fence *fence)
 {
unsigned long flags;
+   ktime_t deadline;
 
-   ktime_t deadline = ktime_add_us(ktime_get(), 1);
+   if (!amdgpu_soft_recovery)
+   return false;
+
+   deadline = ktime_add_us(ktime_get(), 1);
 
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || 
!fence)
return false;
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index 6fa644c249a5..afcbacce0a13 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -264,6 +264,7 @@ enum AMDGPU_DEBUG_MASK {
DEBUG_VERBOSE_EVICTIONS = (1 << 0), // 0x1
DEBUG_VM = (1 << 1),// 0x2
DEBUG_LARGEBAR = (1 << 2),  // 0x4
+   DEBUG_DISABLE_GPU_SOFT_RECOVERY = (1 << 3), // 0x8
 };
 
 /**
-- 
2.41.0

[PATCH 0/2] drm/amdgpu: Merge all debug module parameters

2023-08-24 Thread André Almeida

As suggested by Christian at [0], this patchset merges all debug modules
parameters and creates a new one for disabling soft recovery:

> Maybe we can overload the amdgpu_gpu_recovery module option with this. 
> Or even better merge all the developer module parameter into a 
> amdgpu_debug option. This way it should be pretty obvious that this 
> isn't meant to be used by someone who doesn't know how to use it.

[0] 
https://lore.kernel.org/dri-devel/55f69184-1aa2-55d6-4a10-1560d75c7...@amd.com/

André Almeida (2):
  drm/amdgpu: Merge debug module parameters
  drm/amdgpu: Create an option to disable soft recovery

 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 30 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  6 -
 drivers/gpu/drm/amd/include/amd_shared.h | 10 
 4 files changed, 46 insertions(+), 1 deletion(-)

-- 
2.41.0

Re: [PATCH v7] drm/doc: Document DRM device reset expectations

2023-08-23 Thread André Almeida


Hi Rodrigo,

Em 23/08/2023 14:31, Rodrigo Vivi escreveu:

On Fri, Aug 18, 2023 at 05:06:42PM -0300, André Almeida wrote:

Create a section that specifies how to deal with DRM device resets for
kernel and userspace drivers.

Signed-off-by: André Almeida 

---

v7 changes:
  - s/application/graphical API contex/ in the robustness part (Michel)
  - Grammar fixes (Randy)

v6: https://lore.kernel.org/lkml/20230815185710.159779-1-andrealm...@igalia.com/

v6 changes:
  - Due to substantial changes in the content, dropped Pekka's Acked-by
  - Grammar fixes (Randy)
  - Add paragraph about disabling device resets
  - Add note about integrating reset tracking in drm/sched
  - Add note that KMD should return failure for contexts affected by
resets and UMD should check for this
  - Add note about lack of consensus around what to do about non-robust
apps

v5: 
https://lore.kernel.org/dri-devel/20230627132323.115440-1-andrealm...@igalia.com/
---
  Documentation/gpu/drm-uapi.rst | 77 ++
  1 file changed, 77 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 65fb3036a580..3694bdb977f5 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -285,6 +285,83 @@ for GPU1 and GPU2 from different vendors, and a third 
handler for
  mmapped regular files. Threads cause additional pain with signal
  handling as well.
  
+Device reset

+
+
+The GPU stack is really complex and is prone to errors, from hardware bugs,
+faulty applications and everything in between the many layers. Some errors
+require resetting the device in order to make the device usable again. This
+section describes the expectations for DRM and usermode drivers when a
+device resets and how to propagate the reset status.
+
+Device resets can not be disabled without tainting the kernel, which can lead 
to
+hanging the entire kernel through shrinkers/mmu_notifiers. Userspace role in
+device resets is to propagate the message to the application and apply any
+special policy for blocking guilty applications, if any. Corollary is that
+debugging a hung GPU context require hardware support to be able to preempt 
such
+a GPU context while it's stopped.
+
+Kernel Mode Driver
+--
+
+The KMD is responsible for checking if the device needs a reset, and to perform
+it as needed. Usually a hang is detected when a job gets stuck executing. KMD
+should keep track of resets, because userspace can query any time about the
+reset status for a specific context. This is needed to propagate to the rest of
+the stack that a reset has happened. Currently, this is implemented by each
+driver separately, with no common DRM interface. Ideally this should be 
properly
+integrated at DRM scheduler to provide a common ground for all drivers. After a
+reset, KMD should reject new command submissions for affected contexts.


is there any consensus around what exactly 'affected contexts' might mean?
I see i915 pin-point only the context that was at execution with head pointing
at it and doesn't blame the queued ones, while on Xe it looks like we are
blaming all the queued context. Not sure what other drivers are doing for the
'affected contexts'.



"Affected contexts" is a generic term indeed, giving the differences 
from each driver as you already pointed out. amdgpu also tends to affect 
all queued contexts during a reset. This wording was used to fit how 
different drivers works.

Re: [PATCH v6 6/6] drm/doc: Define KMS atomic state set

2023-08-21 Thread André Almeida


Hi Michel,

Em 17/08/2023 07:37, Michel Dänzer escreveu:

On 8/15/23 20:57, André Almeida wrote:

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 


[...]


+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. [...]


During the hackfest in Brno, it was mentioned that a commit which re-sets the 
same FB_ID could actually have an effect with VRR: It could trigger scanout of 
the next frame before vertical blank has reached its maximum duration. Some 
kind of mechanism is required for this in order to allow user space to perform 
low frame rate compensation.



I believe the documentation already addresses that sending redundant 
information may not lead to the desired behavior during an async flip. 
Do you think adding a note about using the same FB_ID would be helpful?

[PATCH v7] drm/doc: Document DRM device reset expectations

2023-08-18 Thread André Almeida

Create a section that specifies how to deal with DRM device resets for
kernel and userspace drivers.

Signed-off-by: André Almeida 

---

v7 changes:
 - s/application/graphical API contex/ in the robustness part (Michel)
 - Grammar fixes (Randy)

v6: https://lore.kernel.org/lkml/20230815185710.159779-1-andrealm...@igalia.com/

v6 changes:
 - Due to substantial changes in the content, dropped Pekka's Acked-by
 - Grammar fixes (Randy)
 - Add paragraph about disabling device resets
 - Add note about integrating reset tracking in drm/sched
 - Add note that KMD should return failure for contexts affected by
   resets and UMD should check for this
 - Add note about lack of consensus around what to do about non-robust
   apps

v5: 
https://lore.kernel.org/dri-devel/20230627132323.115440-1-andrealm...@igalia.com/
---
 Documentation/gpu/drm-uapi.rst | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 65fb3036a580..3694bdb977f5 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -285,6 +285,83 @@ for GPU1 and GPU2 from different vendors, and a third 
handler for
 mmapped regular files. Threads cause additional pain with signal
 handling as well.
 
+Device reset
+
+
+The GPU stack is really complex and is prone to errors, from hardware bugs,
+faulty applications and everything in between the many layers. Some errors
+require resetting the device in order to make the device usable again. This
+section describes the expectations for DRM and usermode drivers when a
+device resets and how to propagate the reset status.
+
+Device resets can not be disabled without tainting the kernel, which can lead 
to
+hanging the entire kernel through shrinkers/mmu_notifiers. Userspace role in
+device resets is to propagate the message to the application and apply any
+special policy for blocking guilty applications, if any. Corollary is that
+debugging a hung GPU context require hardware support to be able to preempt 
such
+a GPU context while it's stopped.
+
+Kernel Mode Driver
+--
+
+The KMD is responsible for checking if the device needs a reset, and to perform
+it as needed. Usually a hang is detected when a job gets stuck executing. KMD
+should keep track of resets, because userspace can query any time about the
+reset status for a specific context. This is needed to propagate to the rest of
+the stack that a reset has happened. Currently, this is implemented by each
+driver separately, with no common DRM interface. Ideally this should be 
properly
+integrated at DRM scheduler to provide a common ground for all drivers. After a
+reset, KMD should reject new command submissions for affected contexts.
+
+User Mode Driver
+
+
+After command submission, UMD should check if the submission was accepted or
+rejected. After a reset, KMD should reject submissions, and UMD can issue an
+ioctl to the KMD to check the reset status, and this can be checked more often
+if the UMD requires it. After detecting a reset, UMD will then proceed to 
report
+it to the application using the appropriate API error code, as explained in the
+section below about robustness.
+
+Robustness
+--
+
+The only way to try to keep a graphical API context working after a reset is if
+it complies with the robustness aspects of the graphical API that it is using.
+
+Graphical APIs provide ways to applications to deal with device resets. 
However,
+there is no guarantee that the app will use such features correctly, and a
+userspace that doesn't support robust interfaces (like a non-robust
+OpenGL context or API without any robustness support like libva) leave the
+robustness handling entirely to the userspace driver. There is no strong
+community consensus on what the userspace driver should do in that case,
+since all reasonable approaches have some clear downsides.
+
+OpenGL
+~~
+
+Apps using OpenGL should use the available robust interfaces, like the
+extension ``GL_ARB_robustness`` (or ``GL_EXT_robustness`` for OpenGL ES). This
+interface tells if a reset has happened, and if so, all the context state is
+considered lost and the app proceeds by creating new ones. There's no consensus
+on what to do to if robustness is not in use.
+
+Vulkan
+~~
+
+Apps using Vulkan should check for ``VK_ERROR_DEVICE_LOST`` for submissions.
+This error code means, among other things, that a device reset has happened and
+it needs to recreate the contexts to keep going.
+
+Reporting causes of resets
+--
+
+Apart from propagating the reset through the stack so apps can recover, it's
+really useful for driver developers to learn more about what caused the reset 
in
+the first place. DRM devices should make use of devcoredump to store relevant
+information about the reset, so this information can be added to user bug
+reports.
+
 .. _drm_driver_ioctl:
 
 IOCTL Support

[PATCH v5 5/5] drm/amdgpu: Create version number for coredumps

2023-08-17 Thread André Almeida

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida 
---
v5: new patch
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 579b70a3cdab..e92c81ff27be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t 
offset,
p = drm_coredump_printer();
 
drm_printf(, " AMDGPU Device Coredump \n");
+   drm_printf(, "version: " AMDGPU_COREDUMP_VERSION "\n");
drm_printf(, "kernel: " UTS_RELEASE "\n");
drm_printf(, "module: " KBUILD_MODNAME "\n");
drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 01e8183ade4b..ec3a409ec509 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -88,6 +88,9 @@ struct amdgpu_reset_domain {
 };
 
 #ifdef CONFIG_DEV_COREDUMP
+
+#define AMDGPU_COREDUMP_VERSION "1"
+
 struct amdgpu_coredump_info {
struct amdgpu_device*adev;
struct amdgpu_task_info reset_task_info;
-- 
2.41.0

[PATCH v5 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file

2023-08-17 Thread André Almeida

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida 
---
v5: no change
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  9 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 78 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 76 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 10 +++
 4 files changed, 86 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 56d78ca6e917..b11187d153ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -781,15 +781,6 @@ struct amdgpu_mqd {
 #define AMDGPU_PRODUCT_NAME_LEN 64
 struct amdgpu_reset_domain;
 
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
-   struct amdgpu_device*adev;
-   struct amdgpu_task_info reset_task_info;
-   struct timespec64   reset_time;
-   boolreset_vram_lost;
-};
-#endif
-
 struct amdgpu_reset_info {
/* reset dump register */
u32 *reset_dump_reg_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 96975591841d..883953f2ae53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -32,8 +32,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 
@@ -4799,82 +4797,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
return 0;
 }
 
-#ifndef CONFIG_DEV_COREDUMP
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
-   struct amdgpu_reset_context *reset_context)
-{
-}
-#else
-static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
-   size_t count, void *data, size_t datalen)
-{
-   struct drm_printer p;
-   struct amdgpu_coredump_info *coredump = data;
-   struct drm_print_iterator iter;
-   int i;
-
-   iter.data = buffer;
-   iter.offset = 0;
-   iter.start = offset;
-   iter.remain = count;
-
-   p = drm_coredump_printer();
-
-   drm_printf(, " AMDGPU Device Coredump \n");
-   drm_printf(, "kernel: " UTS_RELEASE "\n");
-   drm_printf(, "module: " KBUILD_MODNAME "\n");
-   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
-   if (coredump->reset_task_info.pid)
-   drm_printf(, "process_name: %s PID: %d\n",
-  coredump->reset_task_info.process_name,
-  coredump->reset_task_info.pid);
-
-   if (coredump->reset_vram_lost)
-   drm_printf(, "VRAM is lost due to GPU reset!\n");
-   if (coredump->adev->reset_info.num_regs) {
-   drm_printf(, "AMDGPU register dumps:\nOffset: Value:\n");
-
-   for (i = 0; i < coredump->adev->reset_info.num_regs; i++)
-   drm_printf(, "0x%08x: 0x%08x\n",
-  
coredump->adev->reset_info.reset_dump_reg_list[i],
-  
coredump->adev->reset_info.reset_dump_reg_value[i]);
-   }
-
-   return count - iter.remain;
-}
-
-static void amdgpu_devcoredump_free(void *data)
-{
-   kfree(data);
-}
-
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
-   struct amdgpu_reset_context *reset_context)
-{
-   struct amdgpu_coredump_info *coredump;
-   struct drm_device *dev = adev_to_drm(adev);
-
-   coredump = kzalloc(sizeof(*coredump), GFP_NOWAIT);
-
-   if (!coredump) {
-   DRM_ERROR("%s: failed to allocate memory for coredump\n", 
__func__);
-   return;
-   }
-
-   coredump->reset_vram_lost = vram_lost;
-
-   if (reset_context->job && reset_context->job->vm)
-   coredump->reset_task_info = reset_context->job->vm->task_info;
-
-   coredump->adev = adev;
-
-   ktime_get_ts64(>reset_time);
-
-   dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
- amdgpu_devcoredump_read, amdgpu_devcoredump_free);
-}
-#endif
-
 int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 struct amdgpu_reset_context *reset_context)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 5fed06ffcc6b..579b70a3cdab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -21,6 +21,9 @@
  *
  */
 
+#include 
+#include 
+
 #include "amdgpu_reset.h"
 #include "aldebaran.h"
 #include "sienna_cichlid.h"
@@ -167

[PATCH v5 3/5] drm/amdgpu: Encapsulate all device reset info

2023-08-17 Thread André Almeida

To better organize struct amdgpu_device, keep all reset information
related fields together in a separated struct.

Signed-off-by: André Almeida 
---
v5: new patch, as requested by Shashank Sharma
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 34 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 16 +-
 3 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0d560b713948..56d78ca6e917 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -781,6 +781,26 @@ struct amdgpu_mqd {
 #define AMDGPU_PRODUCT_NAME_LEN 64
 struct amdgpu_reset_domain;
 
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+   struct amdgpu_device*adev;
+   struct amdgpu_task_info reset_task_info;
+   struct timespec64   reset_time;
+   boolreset_vram_lost;
+};
+#endif
+
+struct amdgpu_reset_info {
+   /* reset dump register */
+   u32 *reset_dump_reg_list;
+   u32 *reset_dump_reg_value;
+   int num_regs;
+
+#ifdef CONFIG_DEV_COREDUMP
+   struct amdgpu_coredump_info *coredump_info;
+#endif
+};
+
 /*
  * Non-zero (true) if the GPU has VRAM. Zero (false) otherwise.
  */
@@ -1084,10 +1104,7 @@ struct amdgpu_device {
 
struct mutexbenchmark_mutex;
 
-   /* reset dump register */
-   uint32_t*reset_dump_reg_list;
-   uint32_t*reset_dump_reg_value;
-   int num_regs;
+   struct amdgpu_reset_inforeset_info;
 
boolscpm_enabled;
uint32_tscpm_status;
@@ -1100,15 +1117,6 @@ struct amdgpu_device {
uint32_taid_mask;
 };
 
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
-   struct amdgpu_device*adev;
-   struct amdgpu_task_info reset_task_info;
-   struct timespec64   reset_time;
-   boolreset_vram_lost;
-};
-#endif
-
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
 {
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a4faea4fa0b5..3136a0774dd9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2016,8 +2016,8 @@ static ssize_t 
amdgpu_reset_dump_register_list_read(struct file *f,
if (ret)
return ret;
 
-   for (i = 0; i < adev->num_regs; i++) {
-   sprintf(reg_offset, "0x%x\n", adev->reset_dump_reg_list[i]);
+   for (i = 0; i < adev->reset_info.num_regs; i++) {
+   sprintf(reg_offset, "0x%x\n", 
adev->reset_info.reset_dump_reg_list[i]);
up_read(>reset_domain->sem);
if (copy_to_user(buf + len, reg_offset, strlen(reg_offset)))
return -EFAULT;
@@ -2074,9 +2074,9 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
if (ret)
goto error_free;
 
-   swap(adev->reset_dump_reg_list, tmp);
-   swap(adev->reset_dump_reg_value, new);
-   adev->num_regs = i;
+   swap(adev->reset_info.reset_dump_reg_list, tmp);
+   swap(adev->reset_info.reset_dump_reg_value, new);
+   adev->reset_info.num_regs = i;
up_write(>reset_domain->sem);
ret = size;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b5b879bcc5c9..96975591841d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4790,10 +4790,10 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
 
lockdep_assert_held(>reset_domain->sem);
 
-   for (i = 0; i < adev->num_regs; i++) {
-   adev->reset_dump_reg_value[i] = 
RREG32(adev->reset_dump_reg_list[i]);
-   trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list[i],
-adev->reset_dump_reg_value[i]);
+   for (i = 0; i < adev->reset_info.num_regs; i++) {
+   adev->reset_info.reset_dump_reg_value[i] = 
RREG32(adev->reset_info.reset_dump_reg_list[i]);
+   
trace_amdgpu_reset_reg_dumps(adev->reset_info.reset_dump_reg_list[i],
+
adev->reset_info.reset_dump_reg_value[i]);
}
 
return 0;
@@ -4831,13 +4831,13 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
 
if (coredump->reset_vram_lost)
drm_printf(, "VRAM is lost due to GPU reset!\n&qu

[PATCH v5 2/5] drm/amdgpu: Rework coredump to use memory dynamically

2023-08-17 Thread André Almeida

Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.

Signed-off-by: André Almeida 
---
v5: no change
v4: change kmalloc to kzalloc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 ++
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
uint32_t*reset_dump_reg_list;
uint32_t*reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
-   struct amdgpu_task_info reset_task_info;
-   boolreset_vram_lost;
-   struct timespec64   reset_time;
-#endif
 
boolscpm_enabled;
uint32_tscpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
uint32_taid_mask;
 };
 
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+   struct amdgpu_device*adev;
+   struct amdgpu_task_info reset_task_info;
+   struct timespec64   reset_time;
+   boolreset_vram_lost;
+};
+#endif
+
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
 {
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bf4781551f88..b5b879bcc5c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4799,12 +4799,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
return 0;
 }
 
-#ifdef CONFIG_DEV_COREDUMP
+#ifndef CONFIG_DEV_COREDUMP
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+   struct amdgpu_reset_context *reset_context)
+{
+}
+#else
 static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
size_t count, void *data, size_t datalen)
 {
struct drm_printer p;
-   struct amdgpu_device *adev = data;
+   struct amdgpu_coredump_info *coredump = data;
struct drm_print_iterator iter;
int i;
 
@@ -4818,21 +4823,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
drm_printf(, " AMDGPU Device Coredump \n");
drm_printf(, "kernel: " UTS_RELEASE "\n");
drm_printf(, "module: " KBUILD_MODNAME "\n");
-   drm_printf(, "time: %lld.%09ld\n", adev->reset_time.tv_sec, 
adev->reset_time.tv_nsec);
-   if (adev->reset_task_info.pid)
+   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
+   if (coredump->reset_task_info.pid)
drm_printf(, "process_name: %s PID: %d\n",
-  adev->reset_task_info.process_name,
-  adev->reset_task_info.pid);
+  coredump->reset_task_info.process_name,
+  coredump->reset_task_info.pid);
 
-   if (adev->reset_vram_lost)
+   if (coredump->reset_vram_lost)
drm_printf(, "VRAM is lost due to GPU reset!\n");
-   if (adev->num_regs) {
+   if (coredump->adev->num_regs) {
drm_printf(, "AMDGPU register dumps:\nOffset: Value:\n");
 
-   for (i = 0; i < adev->num_regs; i++)
+   for (i = 0; i < coredump->adev->num_regs; i++)
drm_printf(, "0x%08x: 0x%08x\n",
-  adev->reset_dump_reg_list[i],
-  adev->reset_dump_reg_value[i]);
+  coredump->adev->reset_dump_reg_list[i],
+  coredump->adev->reset_dump_reg_value[i]);
}
 
return count - iter.remain;
@@ -4840,14 +4845,32 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
 
 static void amdgpu_devcoredump_free(void *data)
 {
+   kfree(data);
 }
 
-static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+   struct amdgpu_reset_context *reset_context)
 {
+   struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);
 
-   ktime_get_ts64(>reset_time);
-   dev_cored

[PATCH v5 0/5] drm/amdgpu: Rework coredump memory allocation

2023-08-17 Thread André Almeida

Hi,

The patches of this set are a rework to alloc devcoredump dynamically and to
move it to a better source file.

Thanks,
André

Changelog:

v4: 
https://lore.kernel.org/dri-devel/20230815195100.294458-1-andrealm...@igalia.com/
- New patch to encapsulate all reset info in a struct

v3: 
https://lore.kernel.org/dri-devel/20230810192330.198326-1-andrealm...@igalia.com/
- Changed from kmalloc to kzalloc
- Dropped "Create a module param to disable soft recovery" for now

v2: 
https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealm...@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset

v1: 
https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealm...@igalia.com/
 - Drop "Mark contexts guilty for causing soft recoveries" patch
 - Use GFP_NOWAIT for devcoredump allocatio

André Almeida (5):
  drm/amdgpu: Allocate coredump memory in a nonblocking way
  drm/amdgpu: Rework coredump to use memory dynamically
  drm/amdgpu: Encapsulate all device reset info
  drm/amdgpu: Move coredump code to amdgpu_reset file
  drm/amdgpu: Create version number for coredumps

 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 21 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 75 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c   | 77 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h   | 13 
 5 files changed, 114 insertions(+), 82 deletions(-)

-- 
2.41.0

[PATCH v5 1/5] drm/amdgpu: Allocate coredump memory in a nonblocking way

2023-08-17 Thread André Almeida

During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
v5: no change
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index aa171db68639..bf4781551f88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4847,7 +4847,7 @@ static void amdgpu_reset_capture_coredumpm(struct 
amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
 
ktime_get_ts64(>reset_time);
-   dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL,
+   dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
  amdgpu_devcoredump_read, amdgpu_devcoredump_free);
 }
 #endif
-- 
2.41.0

Re: [PATCH v4 2/4] drm/amdgpu: Rework coredump to use memory dynamically

2023-08-17 Thread André Almeida





Em 17/08/2023 12:26, Shashank Sharma escreveu:


On 17/08/2023 17:17, André Almeida wrote:



Em 17/08/2023 12:04, Shashank Sharma escreveu:


On 17/08/2023 15:45, André Almeida wrote:

Hi Shashank,

Em 17/08/2023 03:41, Shashank Sharma escreveu:

Hello Andre,

On 15/08/2023 21:50, André Almeida wrote:

Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. 
This

will make it easier to further expand the logged information.

Signed-off-by: André Almeida 
---
v4: change kmalloc to kzalloc
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    | 14 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 
++

  2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
  uint32_t *reset_dump_reg_list;
  uint32_t    *reset_dump_reg_value;
  int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
-    struct amdgpu_task_info reset_task_info;
-    bool    reset_vram_lost;
-    struct timespec64   reset_time;
-#endif
  bool    scpm_enabled;
  uint32_t    scpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
  uint32_t    aid_mask;
  };
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+    struct amdgpu_device    *adev;
+    struct amdgpu_task_info reset_task_info;
+    struct timespec64   reset_time;
+    bool    reset_vram_lost;
+};


The patch looks good to me in general, but I would recommend 
slightly different arrangement and segregation of GPU reset 
information.


Please consider a higher level structure adev->gpu_reset_info, and 
move everything related to reset dump info into that, including 
this new coredump_info structure, something like this:


struct amdgpu_reset_info {

 uint32_t *reset_dump_reg_list;

 uint32_t *reset_dump_reg_value;

 int num_regs;



Right, I can encapsulate there reset_dump members,


#ifdef CONFIG_DEV_COREDUMP

    struct amdgpu_coredump_info *coredump_info;/* keep this dynamic 
allocation */


but we don't need a pointer for amdgpu_coredump_info inside 
amdgpu_device or inside of amdgpu_device->gpu_reset_info, right?


I think it would be better if we keep all of the GPU reset related 
data in the same structure, so adev->gpu_reset_info->coredump_info 
sounds about right to me.




But after patch 2/4, we don't need to store a coredump_info pointer 
inside adev, this is what I meant. What would be the purpose of having 
this pointer? It's freed by amdgpu_devcoredump_free(), so we don't 
need to keep track of it.


Well, actually we are pulling in some 0parallel efforts on enhancing the 
GPU reset information, and we were planning to use the coredump info for 
some additional things. So if I have the coredump_info available (like 
reset_task_info and vram_lost) across a few functions in the driver with 
adev, it would make my job easy there :).


It seems dangerous to use an object with this limited lifetime to rely 
to read on. If you want to use it you will need to change 
amdgpu_devcoredump_free() to drop a reference or you will need to use it 
statically, which defeats the purpose of this patch. Anyway, I'll add it 
as you requested.




- Shashank




- Shashank





#endif

}


This will make sure that all the relevant information is at the 
same place.


- Shashank


   amdgpu_inc_vram_lost(tmp_adev);

Re: [PATCH v4 2/4] drm/amdgpu: Rework coredump to use memory dynamically

2023-08-17 Thread André Almeida





Em 17/08/2023 12:04, Shashank Sharma escreveu:


On 17/08/2023 15:45, André Almeida wrote:

Hi Shashank,

Em 17/08/2023 03:41, Shashank Sharma escreveu:

Hello Andre,

On 15/08/2023 21:50, André Almeida wrote:

Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.

Signed-off-by: André Almeida 
---
v4: change kmalloc to kzalloc
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    | 14 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 
++

  2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
  uint32_t    *reset_dump_reg_list;
  uint32_t    *reset_dump_reg_value;
  int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
-    struct amdgpu_task_info reset_task_info;
-    bool    reset_vram_lost;
-    struct timespec64   reset_time;
-#endif
  bool    scpm_enabled;
  uint32_t    scpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
  uint32_t    aid_mask;
  };
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+    struct amdgpu_device    *adev;
+    struct amdgpu_task_info reset_task_info;
+    struct timespec64   reset_time;
+    bool    reset_vram_lost;
+};


The patch looks good to me in general, but I would recommend slightly 
different arrangement and segregation of GPU reset information.


Please consider a higher level structure adev->gpu_reset_info, and 
move everything related to reset dump info into that, including this 
new coredump_info structure, something like this:


struct amdgpu_reset_info {

 uint32_t *reset_dump_reg_list;

 uint32_t *reset_dump_reg_value;

 int num_regs;



Right, I can encapsulate there reset_dump members,


#ifdef CONFIG_DEV_COREDUMP

    struct amdgpu_coredump_info *coredump_info;/* keep this dynamic 
allocation */


but we don't need a pointer for amdgpu_coredump_info inside 
amdgpu_device or inside of amdgpu_device->gpu_reset_info, right?


I think it would be better if we keep all of the GPU reset related data 
in the same structure, so adev->gpu_reset_info->coredump_info sounds 
about right to me.




But after patch 2/4, we don't need to store a coredump_info pointer 
inside adev, this is what I meant. What would be the purpose of having 
this pointer? It's freed by amdgpu_devcoredump_free(), so we don't need 
to keep track of it.



- Shashank





#endif

}


This will make sure that all the relevant information is at the same 
place.


- Shashank


   amdgpu_inc_vram_lost(tmp_adev);

Re: [PATCH v4 2/4] drm/amdgpu: Rework coredump to use memory dynamically

2023-08-17 Thread André Almeida


Hi Shashank,

Em 17/08/2023 03:41, Shashank Sharma escreveu:

Hello Andre,

On 15/08/2023 21:50, André Almeida wrote:

Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.

Signed-off-by: André Almeida 
---
v4: change kmalloc to kzalloc
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    | 14 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 ++
  2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
  uint32_t    *reset_dump_reg_list;
  uint32_t    *reset_dump_reg_value;
  int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
-    struct amdgpu_task_info reset_task_info;
-    bool    reset_vram_lost;
-    struct timespec64   reset_time;
-#endif
  bool    scpm_enabled;
  uint32_t    scpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
  uint32_t    aid_mask;
  };
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+    struct amdgpu_device    *adev;
+    struct amdgpu_task_info reset_task_info;
+    struct timespec64   reset_time;
+    bool    reset_vram_lost;
+};


The patch looks good to me in general, but I would recommend slightly 
different arrangement and segregation of GPU reset information.


Please consider a higher level structure adev->gpu_reset_info, and move 
everything related to reset dump info into that, including this new 
coredump_info structure, something like this:


struct amdgpu_reset_info {

     uint32_t *reset_dump_reg_list;

     uint32_t *reset_dump_reg_value;

     int num_regs;



Right, I can encapsulate there reset_dump members,


#ifdef CONFIG_DEV_COREDUMP

    struct amdgpu_coredump_info *coredump_info;/* keep this dynamic 
allocation */


but we don't need a pointer for amdgpu_coredump_info inside 
amdgpu_device or inside of amdgpu_device->gpu_reset_info, right?




#endif

}


This will make sure that all the relevant information is at the same place.

- Shashank


   amdgpu_inc_vram_lost(tmp_adev);

[PATCH v4 4/4] drm/amdgpu: Create version number for coredumps

2023-08-15 Thread André Almeida

Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 46c8d6ce349c..6696ff0a94af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t 
offset,
p = drm_coredump_printer();
 
drm_printf(, " AMDGPU Device Coredump \n");
+   drm_printf(, "version: " AMDGPU_COREDUMP_VERSION "\n");
drm_printf(, "kernel: " UTS_RELEASE "\n");
drm_printf(, "module: " KBUILD_MODNAME "\n");
drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 362954521721..7b6767ca8127 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -88,6 +88,9 @@ struct amdgpu_reset_domain {
 };
 
 #ifdef CONFIG_DEV_COREDUMP
+
+#define AMDGPU_COREDUMP_VERSION "1"
+
 struct amdgpu_coredump_info {
struct amdgpu_device*adev;
struct amdgpu_task_info reset_task_info;
-- 
2.41.0

[PATCH v4 3/4] drm/amdgpu: Move coredump code to amdgpu_reset file

2023-08-15 Thread André Almeida

Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  9 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 78 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 76 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 11 +++
 4 files changed, 87 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0d560b713948..314b06cddc39 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1100,15 +1100,6 @@ struct amdgpu_device {
uint32_taid_mask;
 };
 
-#ifdef CONFIG_DEV_COREDUMP
-struct amdgpu_coredump_info {
-   struct amdgpu_device*adev;
-   struct amdgpu_task_info reset_task_info;
-   struct timespec64   reset_time;
-   boolreset_vram_lost;
-};
-#endif
-
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
 {
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b5b879bcc5c9..9706f608723a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -32,8 +32,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 
@@ -4799,82 +4797,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
return 0;
 }
 
-#ifndef CONFIG_DEV_COREDUMP
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
-   struct amdgpu_reset_context *reset_context)
-{
-}
-#else
-static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
-   size_t count, void *data, size_t datalen)
-{
-   struct drm_printer p;
-   struct amdgpu_coredump_info *coredump = data;
-   struct drm_print_iterator iter;
-   int i;
-
-   iter.data = buffer;
-   iter.offset = 0;
-   iter.start = offset;
-   iter.remain = count;
-
-   p = drm_coredump_printer();
-
-   drm_printf(, " AMDGPU Device Coredump \n");
-   drm_printf(, "kernel: " UTS_RELEASE "\n");
-   drm_printf(, "module: " KBUILD_MODNAME "\n");
-   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
-   if (coredump->reset_task_info.pid)
-   drm_printf(, "process_name: %s PID: %d\n",
-  coredump->reset_task_info.process_name,
-  coredump->reset_task_info.pid);
-
-   if (coredump->reset_vram_lost)
-   drm_printf(, "VRAM is lost due to GPU reset!\n");
-   if (coredump->adev->num_regs) {
-   drm_printf(, "AMDGPU register dumps:\nOffset: Value:\n");
-
-   for (i = 0; i < coredump->adev->num_regs; i++)
-   drm_printf(, "0x%08x: 0x%08x\n",
-  coredump->adev->reset_dump_reg_list[i],
-  coredump->adev->reset_dump_reg_value[i]);
-   }
-
-   return count - iter.remain;
-}
-
-static void amdgpu_devcoredump_free(void *data)
-{
-   kfree(data);
-}
-
-static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
-   struct amdgpu_reset_context *reset_context)
-{
-   struct amdgpu_coredump_info *coredump;
-   struct drm_device *dev = adev_to_drm(adev);
-
-   coredump = kzalloc(sizeof(*coredump), GFP_NOWAIT);
-
-   if (!coredump) {
-   DRM_ERROR("%s: failed to allocate memory for coredump\n", 
__func__);
-   return;
-   }
-
-   coredump->reset_vram_lost = vram_lost;
-
-   if (reset_context->job && reset_context->job->vm)
-   coredump->reset_task_info = reset_context->job->vm->task_info;
-
-   coredump->adev = adev;
-
-   ktime_get_ts64(>reset_time);
-
-   dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT,
- amdgpu_devcoredump_read, amdgpu_devcoredump_free);
-}
-#endif
-
 int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 struct amdgpu_reset_context *reset_context)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 5fed06ffcc6b..46c8d6ce349c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -21,6 +21,9 @@
  *
  */
 
+#include 
+#include 
+
 #include "amdgpu_reset.h"
 #include "aldebaran.h"
 #include "sienna_cichlid.h&quo

[PATCH v4 2/4] drm/amdgpu: Rework coredump to use memory dynamically

2023-08-15 Thread André Almeida

Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.

Signed-off-by: André Almeida 
---
v4: change kmalloc to kzalloc
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 14 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 ++
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
uint32_t*reset_dump_reg_list;
uint32_t*reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
-   struct amdgpu_task_info reset_task_info;
-   boolreset_vram_lost;
-   struct timespec64   reset_time;
-#endif
 
boolscpm_enabled;
uint32_tscpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
uint32_taid_mask;
 };
 
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+   struct amdgpu_device*adev;
+   struct amdgpu_task_info reset_task_info;
+   struct timespec64   reset_time;
+   boolreset_vram_lost;
+};
+#endif
+
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
 {
return container_of(ddev, struct amdgpu_device, ddev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bf4781551f88..b5b879bcc5c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4799,12 +4799,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device 
*adev)
return 0;
 }
 
-#ifdef CONFIG_DEV_COREDUMP
+#ifndef CONFIG_DEV_COREDUMP
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+   struct amdgpu_reset_context *reset_context)
+{
+}
+#else
 static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset,
size_t count, void *data, size_t datalen)
 {
struct drm_printer p;
-   struct amdgpu_device *adev = data;
+   struct amdgpu_coredump_info *coredump = data;
struct drm_print_iterator iter;
int i;
 
@@ -4818,21 +4823,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
drm_printf(, " AMDGPU Device Coredump \n");
drm_printf(, "kernel: " UTS_RELEASE "\n");
drm_printf(, "module: " KBUILD_MODNAME "\n");
-   drm_printf(, "time: %lld.%09ld\n", adev->reset_time.tv_sec, 
adev->reset_time.tv_nsec);
-   if (adev->reset_task_info.pid)
+   drm_printf(, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, 
coredump->reset_time.tv_nsec);
+   if (coredump->reset_task_info.pid)
drm_printf(, "process_name: %s PID: %d\n",
-  adev->reset_task_info.process_name,
-  adev->reset_task_info.pid);
+  coredump->reset_task_info.process_name,
+  coredump->reset_task_info.pid);
 
-   if (adev->reset_vram_lost)
+   if (coredump->reset_vram_lost)
drm_printf(, "VRAM is lost due to GPU reset!\n");
-   if (adev->num_regs) {
+   if (coredump->adev->num_regs) {
drm_printf(, "AMDGPU register dumps:\nOffset: Value:\n");
 
-   for (i = 0; i < adev->num_regs; i++)
+   for (i = 0; i < coredump->adev->num_regs; i++)
drm_printf(, "0x%08x: 0x%08x\n",
-  adev->reset_dump_reg_list[i],
-  adev->reset_dump_reg_value[i]);
+  coredump->adev->reset_dump_reg_list[i],
+  coredump->adev->reset_dump_reg_value[i]);
}
 
return count - iter.remain;
@@ -4840,14 +4845,32 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, 
loff_t offset,
 
 static void amdgpu_devcoredump_free(void *data)
 {
+   kfree(data);
 }
 
-static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev)
+static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost,
+   struct amdgpu_reset_context *reset_context)
 {
+   struct amdgpu_coredump_info *coredump;
struct drm_device *dev = adev_to_drm(adev);
 
-   ktime_get_ts64(>reset_time);
-   dev_coredumpm(dev->de

[PATCH v4 1/4] drm/amdgpu: Allocate coredump memory in a nonblocking way

2023-08-15 Thread André Almeida

During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.

Signed-off-by: André Almeida 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index aa171db68639..bf4781551f88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4847,7 +4847,7 @@ static void amdgpu_reset_capture_coredumpm(struct 
amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
 
ktime_get_ts64(>reset_time);
-   dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL,
+   dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT,
  amdgpu_devcoredump_read, amdgpu_devcoredump_free);
 }
 #endif
-- 
2.41.0

[PATCH v4 0/4] drm/amdgpu: Rework coredump memory allocation

2023-08-15 Thread André Almeida

Hi,

The patches of this set are a rework to alloc devcoredump dynamically and to
move it to a better source file.

Thanks,
André

Changelog:

v3: 
https://lore.kernel.org/dri-devel/20230810192330.198326-1-andrealm...@igalia.com/
- Changed from kmalloc to kzalloc
- Dropped "Create a module param to disable soft recovery" for now

v2: 
https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealm...@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset

v1: 
https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealm...@igalia.com/
 - Drop "Mark contexts guilty for causing soft recoveries" patch
 - Use GFP_NOWAIT for devcoredump allocation

André Almeida (4):
  drm/amdgpu: Allocate coredump memory in a nonblocking way
  drm/amdgpu: Rework coredump to use memory dynamically
  drm/amdgpu: Move coredump code to amdgpu_reset file
  drm/amdgpu: Create version number for coredumps

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  5 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 77 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 14 
 4 files changed, 94 insertions(+), 69 deletions(-)

-- 
2.41.0

[PATCH v6 6/6] drm/doc: Define KMS atomic state set

2023-08-15 Thread André Almeida

From: Pekka Paalanen 

Specify how the atomic state is maintained between userspace and
kernel, plus the special case for async flips.

Signed-off-by: Pekka Paalanen 
Signed-off-by: André Almeida 
---
v5: Add note that not every redundant attribute will result in no-op
v4: total rework by Pekka
---
 Documentation/gpu/drm-uapi.rst | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 65fb3036a580..b91ccaddeeb9 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -486,3 +486,47 @@ and the CRTC index is its position in this array.
 
 .. kernel-doc:: include/uapi/drm/drm_mode.h
:internal:
+
+KMS atomic state
+
+
+An atomic commit can change multiple KMS properties in an atomic fashion,
+without ever applying intermediate or partial state changes.  Either the whole
+commit succeeds or fails, and it will never be applied partially. This is the
+fundamental improvement of the atomic API over the older non-atomic API which 
is
+referred to as the "legacy API".  Applying intermediate state could 
unexpectedly
+fail, cause visible glitches, or delay reaching the final state.
+
+An atomic commit can be flagged with DRM_MODE_ATOMIC_TEST_ONLY, which means the
+complete state change is validated but not applied.  Userspace should use this
+flag to validate any state change before asking to apply it. If validation 
fails
+for any reason, userspace should attempt to fall back to another, perhaps
+simpler, final state.  This allows userspace to probe for various 
configurations
+without causing visible glitches on screen and without the need to undo a
+probing change.
+
+The changes recorded in an atomic commit apply on top the current KMS state in
+the kernel. Hence, the complete new KMS state is the complete old KMS state 
with
+the committed property settings done on top. The kernel will try to avoid
+no-operation changes, so it is safe for userspace to send redundant property
+settings.  However, not every situation allows for no-op changes, due to the
+need to acquire locks for some attributes. Userspace needs to be aware that 
some
+redundant information might result in oversynchronization issues.  No-operation
+changes do not count towards actually needed changes, e.g.  setting MODE_ID to 
a
+different blob with identical contents as the current KMS state shall not be a
+modeset on its own.
+
+A "modeset" is a change in KMS state that might enable, disable, or temporarily
+disrupt the emitted video signal, possibly causing visible glitches on screen. 
A
+modeset may also take considerably more time to complete than other kinds of
+changes, and the video sink might also need time to adapt to the new signal
+properties. Therefore a modeset must be explicitly allowed with the flag
+DRM_MODE_ATOMIC_ALLOW_MODESET.  This in combination with
+DRM_MODE_ATOMIC_TEST_ONLY allows userspace to determine if a state change is
+likely to cause visible disruption on screen and avoid such changes when end
+users do not expect them.
+
+An atomic commit with the flag DRM_MODE_PAGE_FLIP_ASYNC is allowed to
+effectively change only the FB_ID property on any planes. No-operation changes
+are ignored as always. Changing any other property will cause the commit to be
+rejected.
-- 
2.41.0

[PATCH v6 5/6] drm: Refuse to async flip with atomic prop changes

2023-08-15 Thread André Almeida

Given that prop changes may lead to modesetting, which would defeat the
fast path of the async flip, refuse any atomic prop change for async
flips in atomic API. The only exceptions are the framebuffer ID to flip
to and the mode ID, that could be referring to an identical mode.

Signed-off-by: André Almeida 
---
v5: no changes
v4: new patch
---
 drivers/gpu/drm/drm_atomic_helper.c |  5 +++
 drivers/gpu/drm/drm_atomic_uapi.c   | 52 +++--
 drivers/gpu/drm/drm_crtc_internal.h |  2 +-
 drivers/gpu/drm/drm_mode_object.c   |  2 +-
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 292e38eb6218..b34e3104afd1 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -629,6 +629,11 @@ drm_atomic_helper_check_modeset(struct drm_device *dev,
WARN_ON(!drm_modeset_is_locked(>mutex));
 
if (!drm_mode_equal(_crtc_state->mode, 
_crtc_state->mode)) {
+   if (new_crtc_state->async_flip) {
+   drm_dbg_atomic(dev, "[CRTC:%d:%s] no mode 
changes allowed during async flip\n",
+  crtc->base.id, crtc->name);
+   return -EINVAL;
+   }
drm_dbg_atomic(dev, "[CRTC:%d:%s] mode changed\n",
   crtc->base.id, crtc->name);
new_crtc_state->mode_changed = true;
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index a15121e75a0a..6c423a7e8c7b 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1006,13 +1006,28 @@ int drm_atomic_connector_commit_dpms(struct 
drm_atomic_state *state,
return ret;
 }
 
+static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
prop_value,
+struct drm_property *prop)
+{
+   if (ret != 0 || old_val != prop_value) {
+   drm_dbg_atomic(prop->dev,
+  "[PROP:%d:%s] No prop can be changed during 
async flip\n",
+  prop->base.id, prop->name);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int drm_atomic_set_property(struct drm_atomic_state *state,
struct drm_file *file_priv,
struct drm_mode_object *obj,
struct drm_property *prop,
-   uint64_t prop_value)
+   uint64_t prop_value,
+   bool async_flip)
 {
struct drm_mode_object *ref;
+   uint64_t old_val;
int ret;
 
if (!drm_property_change_valid_get(prop, prop_value, ))
@@ -1029,6 +1044,13 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   if (async_flip) {
+   ret = drm_atomic_connector_get_property(connector, 
connector_state,
+   prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_connector_set_property(connector,
connector_state, file_priv,
prop, prop_value);
@@ -1037,6 +1059,7 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
case DRM_MODE_OBJECT_CRTC: {
struct drm_crtc *crtc = obj_to_crtc(obj);
struct drm_crtc_state *crtc_state;
+   struct drm_mode_config *config = >dev->mode_config;
 
crtc_state = drm_atomic_get_crtc_state(state, crtc);
if (IS_ERR(crtc_state)) {
@@ -1044,6 +1067,18 @@ int drm_atomic_set_property(struct drm_atomic_state 
*state,
break;
}
 
+   /*
+* We allow mode_id changes here for async flips, because we
+* check later on drm_atomic_helper_check_modeset() callers if
+* there are modeset changes or they are equal
+*/
+   if (async_flip && prop != config->prop_mode_id) {
+   ret = drm_atomic_crtc_get_property(crtc, crtc_state,
+  prop, _val);
+   ret = drm_atomic_check_prop_changes(ret, old_val, 
prop_value, prop);
+   break;
+   }
+
ret = drm_atomic_crtc_set_property(crtc,
crtc_state, prop, prop_value);
break;
@@ -1051,6 +1086,7 @@ int drm_atomic_set_property(struct drm_atomic_state 
*s

1 2 3 >

1 - 100 of 250 matches

Mail list logo