Re: [PATCH] MAINTAINERS: remove myself as a VKMS maintainer

2024-05-27 Thread Maíra Canal

On 5/25/24 11:26, Melissa Wen wrote:

I haven't been able to follow or review the work on the driver for some
time now and I don't see the situation improving anytime soon. I'd like
to continue being listed as a reviewer.

Signed-off-by: Melissa Wen 


Acked-by: Maíra Canal 

Thanks for all the good work you put into VKMS in the last couple of
years!

Best Regards,
- Maíra


---
  MAINTAINERS | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7d735037a383..79fe536355b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7027,10 +7027,10 @@ F:  drivers/gpu/drm/udl/
  
  DRM DRIVER FOR VIRTUAL KERNEL MODESETTING (VKMS)

  M:Rodrigo Siqueira 
-M: Melissa Wen 
  M:Maíra Canal 
  R:Haneen Mohammed 
  R:Daniel Vetter 
+R: Melissa Wen 
  L:dri-devel@lists.freedesktop.org
  S:Maintained
  T:git https://gitlab.freedesktop.org/drm/misc/kernel.git


Re: [PATCH v2 0/6] drm/v3d: Improve Performance Counters handling

2024-05-21 Thread Maíra Canal

Hi Jani,

On 5/21/24 08:07, Jani Nikula wrote:

On Mon, 20 May 2024, Maíra Canal  wrote:

On 5/12/24 19:23, Maíra Canal wrote:>

Maíra Canal (6):
drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
drm/v3d: Different V3D versions can have different number of perfcnt
drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
drm/v3d: Create new IOCTL to expose performance counters information
drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
drm/v3d: Deprecate the use of the Performance Counters enum >
   drivers/gpu/drm/v3d/v3d_drv.c |  11 +
   drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
   drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
   .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
   drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
   include/uapi/drm/v3d_drm.h|  48 
   6 files changed, 316 insertions(+), 3 deletions(-)
   create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h



Applied to drm-misc/drm-misc-next!


What compiler do you use? I'm hitting the same as kernel test robot [1]
with arm-linux-gnueabihf-gcc 12.2.0.


I use clang version 17.0.6.



In general, I don't think it's a great idea to put arrays in headers,
and then include it everywhere via v3d_drv.h. You're not just relying on
the compiler to optimize it away in compilation units where its not
referenced (likely to happen), but also for the linker to deduplicate
rodata (possible, but I'm not sure that it will happen).

I think you need to move the arrays to a .c file, and then either a) add
interfaces to access the arrays, or b) declare the arrays and make them
global. For the latter you also need to figure out how to expose the
size.


I'll write a patch to fix it. Sorry for the disturbance, I didn't notice
it with clang.

Best Regards,
- Maíra



BR,
Jani.


[1] https://lore.kernel.org/r/202405211137.huefklkg-...@intel.com




Re: [PATCH v2 0/6] drm/v3d: Improve Performance Counters handling

2024-05-20 Thread Maíra Canal

On 5/12/24 19:23, Maíra Canal wrote:>

Maíra Canal (6):
   drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
   drm/v3d: Different V3D versions can have different number of perfcnt
   drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
   drm/v3d: Create new IOCTL to expose performance counters information
   drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
   drm/v3d: Deprecate the use of the Performance Counters enum >
  drivers/gpu/drm/v3d/v3d_drv.c |  11 +
  drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
  drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
  .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
  drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
  include/uapi/drm/v3d_drm.h|  48 
  6 files changed, 316 insertions(+), 3 deletions(-)
  create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h



Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra


Re: [PATCH v7 11/17] drm/vkms: Remove useless drm_rotation_simplify

2024-05-16 Thread Maíra Canal

Hi Louis,

On 5/13/24 04:50, Louis Chauvet wrote:

As all the rotation are now supported by VKMS, this simplification does
not make sense anymore, so remove it.

Signed-off-by: Louis Chauvet 


I'd like to push all commits up to this point to drm-misc-next. Do you
see a problem with it? Reason: I'd like Melissa to take a look at the
YUV patches and patches 1 to 11 fix several composition errors.

Let me know your thoughts about it.

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_plane.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c 
b/drivers/gpu/drm/vkms/vkms_plane.c
index 8875bed76410..5a028ee96c91 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -115,12 +115,7 @@ static void vkms_plane_atomic_update(struct drm_plane 
*plane,
frame_info->fb = fb;
memcpy(_info->map, _plane_state->data, 
sizeof(frame_info->map));
drm_framebuffer_get(frame_info->fb);
-   frame_info->rotation = drm_rotation_simplify(new_state->rotation, 
DRM_MODE_ROTATE_0 |
- 
DRM_MODE_ROTATE_90 |
- 
DRM_MODE_ROTATE_270 |
- 
DRM_MODE_REFLECT_X |
- 
DRM_MODE_REFLECT_Y);
-
+   frame_info->rotation = new_state->rotation;
  
  	vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt);

  }



[PATCH v2 6/6] drm/v3d: Deprecate the use of the Performance Counters enum

2024-05-12 Thread Maíra Canal
The Performance Counters enum used to identify the index of each
performance counter and provide the total number of performance
counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2,
not for V3D 7.1.

As we implemented a new flexible structure to retrieve performance
counters information, we can deprecate this enum.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 include/uapi/drm/v3d_drm.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 0860ddb3d0b6..87fc5bb0a61e 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -603,6 +603,16 @@ struct drm_v3d_submit_cpu {
__u64 extensions;
 };
 
+/* The performance counters index represented by this enum are deprecated and
+ * must no longer be used. These counters are only valid for V3D 4.2.
+ *
+ * In order to check for performance counter information,
+ * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER.
+ *
+ * Don't use V3D_PERFCNT_NUM to retrieve the maximum number of performance
+ * counters. You should use DRM_IOCTL_V3D_GET_PARAM with the following
+ * parameter: DRM_V3D_PARAM_MAX_PERF_COUNTERS.
+ */
 enum {
V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
V3D_PERFCNT_FEP_VALID_PRIMS,
-- 
2.44.0



[PATCH v2 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM

2024-05-12 Thread Maíra Canal
V3D_PERFCNT_NUM represents the maximum number of performance counters
for V3D 4.2, but not for V3D 7.1. This means that, if we use
V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.

Therefore, use the number of performance counters on V3D 7.1 as the
maximum number of counters. This will allow us to create arrays on the
stack with reasonable size. Note that userspace must use the value
provided by DRM_V3D_PARAM_MAX_PERF_COUNTERS.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.h   | 5 -
 drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 44cfddedebde..556cbb400ba0 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,8 +351,11 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
+/* Maximum number of performance counters supported by any version of V3D */
+#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters)
+
 /* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \
+#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
  DRM_V3D_MAX_PERF_COUNTERS)
 
 struct v3d_performance_query {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7cd8c335cd9b..03df37a3acf5 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -490,7 +490,7 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, 
void *data, u32 quer
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_PERFCNT_NUM];
+   u64 counter_values[V3D_MAX_COUNTERS];
 
for (int i = 0; i < performance_query->nperfmons; i++) {
perfmon = v3d_perfmon_find(v3d_priv,
-- 
2.44.0



[PATCH v2 4/6] drm/v3d: Create new IOCTL to expose performance counters information

2024-05-12 Thread Maíra Canal
Userspace usually needs some information about the performance counters
available. Although we could replicate this information in the kernel
and user-space, let's use the kernel as the "single source of truth" to
avoid issues in the future (e.g. list of performance counters is updated
in user-space, but not in the kernel, generating invalid requests).

Therefore, create a new IOCTL to expose the performance counters
information, that is name, category, and description.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c |  1 +
 drivers/gpu/drm/v3d/v3d_drv.h |  2 ++
 drivers/gpu/drm/v3d/v3d_perfmon.c | 33 +++
 include/uapi/drm/v3d_drm.h| 37 +++
 4 files changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index d2c1d5053132..f7477488b1cc 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -211,6 +211,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, 
DRM_RENDER_ALLOW | DRM_AUTH),
+   DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, 
v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver v3d_drm_driver = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index bd1e38f7d10a..44cfddedebde 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -582,6 +582,8 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void 
*data,
  struct drm_file *file_priv);
 int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
 
 /* v3d_sysfs.c */
 int v3d_sysfs_init(struct device *dev);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index f268d9466c0f..73e2bb8bdb7f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -217,3 +217,36 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, 
void *data,
 
return ret;
 }
+
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct drm_v3d_perfmon_get_counter *req = data;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
+   const struct v3d_perf_counter_desc *counter;
+
+   for (int i = 0; i < ARRAY_SIZE(req->reserved); i++) {
+   if (req->reserved[i] != 0)
+   return -EINVAL;
+   }
+
+   /* Make sure that the counter ID is valid */
+   if (req->counter >= v3d->max_counters)
+   return -EINVAL;
+
+   if (v3d->ver >= 71) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v71_performance_counters));
+   counter = _v71_performance_counters[req->counter];
+   } else if (v3d->ver >= 42) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v42_performance_counters));
+   counter = _v42_performance_counters[req->counter];
+   } else {
+   return -EOPNOTSUPP;
+   }
+
+   strscpy(req->name, counter->name, sizeof(req->name));
+   strscpy(req->category, counter->category, sizeof(req->category));
+   strscpy(req->description, counter->description, 
sizeof(req->description));
+
+   return 0;
+}
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 215b01bb69c3..0860ddb3d0b6 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -42,6 +42,7 @@ extern "C" {
 #define DRM_V3D_PERFMON_DESTROY   0x09
 #define DRM_V3D_PERFMON_GET_VALUES0x0a
 #define DRM_V3D_SUBMIT_CPU0x0b
+#define DRM_V3D_PERFMON_GET_COUNTER   0x0c
 
 #define DRM_IOCTL_V3D_SUBMIT_CL   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
 #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
@@ -58,6 +59,8 @@ extern "C" {
 #define DRM_IOCTL_V3D_PERFMON_GET_VALUES  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_VALUES, \
   struct 
drm_v3d_perfmon_get_values)
 #define DRM_IOCTL_V3D_SUBMIT_CPU  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu)
+#define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_COUNTER, \

[PATCH v2 3/6] drm/v3d: Create a new V3D parameter for the maximum number of perfcnt

2024-05-12 Thread Maíra Canal
The maximum number of performance counters can change from version to
version and it's important for userspace to know this value, as it needs
to use the counters for performance queries. Therefore, expose the
maximum number of performance counters to userspace as a parameter.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 3 +++
 include/uapi/drm/v3d_drm.h| 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 6b9dd26df9fe..d2c1d5053132 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -94,6 +94,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
args->value = 1;
return 0;
+   case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
+   args->value = v3d->max_counters;
+   return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index dce1835eced4..215b01bb69c3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -286,6 +286,7 @@ enum drm_v3d_param {
DRM_V3D_PARAM_SUPPORTS_PERFMON,
DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE,
+   DRM_V3D_PARAM_MAX_PERF_COUNTERS,
 };
 
 struct drm_v3d_get_param {
-- 
2.44.0



[PATCH v2 2/6] drm/v3d: Different V3D versions can have different number of perfcnt

2024-05-12 Thread Maíra Canal
Currently, even though V3D 7.1 has 93 performance counters, it is not
possible to create counters bigger than 87, as
`v3d_perfmon_create_ioctl()` understands that counters bigger than 87
are invalid.

Therefore, create a device variable to expose the maximum
number of counters for a given V3D version and make
`v3d_perfmon_create_ioctl()` check this variable.

This commit fixes CTS failures in the performance queries tests
`dEQP-VK.query_pool.performance_query.*` [1]

Link: 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
 [1]
Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device")
Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 7 +++
 drivers/gpu/drm/v3d/v3d_drv.h | 5 +
 drivers/gpu/drm/v3d/v3d_perfmon.c | 3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..6b9dd26df9fe 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -294,6 +294,13 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
+   if (v3d->ver >= 71)
+   v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters);
+   else if (v3d->ver >= 42)
+   v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters);
+   else
+   v3d->max_counters = 0;
+
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
ret = PTR_ERR(v3d->reset);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 671375a3bb66..bd1e38f7d10a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,6 +104,11 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
+   /* Different revisions of V3D have different total number of performance
+* counters
+*/
+   unsigned int max_counters;
+
void __iomem *hub_regs;
void __iomem *core_regs[3];
void __iomem *bridge_regs;
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index e1be7368b87d..f268d9466c0f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -123,6 +123,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 {
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
struct drm_v3d_perfmon_create *req = data;
+   struct v3d_dev *v3d = v3d_priv->v3d;
struct v3d_perfmon *perfmon;
unsigned int i;
int ret;
@@ -134,7 +135,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= V3D_PERFCNT_NUM)
+   if (req->counters[i] >= v3d->max_counters)
return -EINVAL;
}
 
-- 
2.44.0



[PATCH v2 0/6] drm/v3d: Improve Performance Counters handling

2024-05-12 Thread Maíra Canal
This series has the intention to address two issues with Performance Counters
on V3D:

1. Update the number of Performance Counters for V3D 7.1 

V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87. Although the
series [1] enabled support for V3D 7.1, it didn’t replace the maximum number of
performance counters. This led to errors in user space as the Vulkan driver
updated the maximum number of performance counters, but the kernel didn’t. 

Currently, the user space can request values for performance counters that
are greater than 87 and the kernel will return an error instead of the values.
That’s why `dEQP-VK.query_pool.performance_query.*` currently fails on Mesa
CI [2]. This series intends to fix the `dEQP-VK.query_pool.performance_query.*`
fail.

2. Make the kernel able to provide the Performance Counter descriptions

Although all the management of the Performance Monitors is done through IOCTLs,
which means that the code is in the kernel, the performance counter descriptions
are in Mesa. This means two things: (#1) only Mesa has access to the 
descriptions
and (#2) we can have inconsistencies between the information provided by Mesa
and the kernel, as seen in the first issue addressed by this series.

To minimize the risk of inconsistencies, this series proposes to use the kernel
as a “single source of truth”. Therefore, if there are any changes to the
performance monitors, all the changes must be done only in the kernel. This
means that all information about the maximum number of performance counters and
all the descriptions will now be retrieved from the kernel. 

This series is coupled with a Mesa series [3] that enabled the use of the new
IOCTL. I appreciate any feedback from both the kernel and Mesa implementations.

[1] https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/
[2] 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
[3] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt

Best Regards,
- Maíra Canal

---

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240508143306.2435304-2-mca...@igalia.com/T/

* [5/6] s/DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS/DRM_V3D_PARAM_MAX_PERF_COUNTERS 
(Iago Toral)
* [6/6] Include a reference to the new DRM_V3D_PARAM_MAX_PERF_COUNTERS param 
(Iago Toral)
* Add Iago's R-b (Iago Toral)

Maíra Canal (6):
  drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
  drm/v3d: Different V3D versions can have different number of perfcnt
  drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
  drm/v3d: Create new IOCTL to expose performance counters information
  drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
  drm/v3d: Deprecate the use of the Performance Counters enum

 drivers/gpu/drm/v3d/v3d_drv.c |  11 +
 drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
 include/uapi/drm/v3d_drm.h|  48 
 6 files changed, 316 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

-- 
2.44.0



[PATCH v2 1/6] drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1

2024-05-12 Thread Maíra Canal
Add name, category and description for each one of the 93 performance
counters available on V3D.

Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93.
Therefore, there are two performance counters arrays. The index of the
performance counter for each V3D version is represented by its position
on the array.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.h |   2 +
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 2 files changed, 210 insertions(+)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..671375a3bb66 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#include "v3d_performance_counters.h"
+
 #include "uapi/drm/v3d_drm.h"
 
 struct clk;
diff --git a/drivers/gpu/drm/v3d/v3d_performance_counters.h 
b/drivers/gpu/drm/v3d/v3d_performance_counters.h
new file mode 100644
index ..72822205ebdc
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_performance_counters.h
@@ -0,0 +1,208 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright (C) 2024 Raspberry Pi
+ */
+#ifndef V3D_PERFORMANCE_COUNTERS_H
+#define V3D_PERFORMANCE_COUNTERS_H
+
+/* Holds a description of a given performance counter. The index of performance
+ * counter is given by the array on v3d_performance_counter.h
+ */
+struct v3d_perf_counter_desc {
+   /* Category of the counter */
+   char category[32];
+
+   /* Name of the counter */
+   char name[64];
+
+   /* Description of the counter */
+   char description[256];
+};
+
+static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = {
+   {"CORE", "cycle-count", "[CORE] Cycle counter"},
+   {"CORE", "core-active", "[CORE] Bin/Render/Compute active cycles"},
+   {"CLE", "CLE-bin-thread-active-cycles", "[CLE] Bin thread active 
cycles"},
+   {"CLE", "CLE-render-thread-active-cycles", "[CLE] Render thread active 
cycles"},
+   {"CORE", "compute-active-cycles", "[CORE] Compute active cycles"},
+   {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid 
primitives that result in no rendered pixels, for all rendered tiles"},
+   {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives 
for all rendered tiles (primitives may be counted in more than one tile)"},
+   {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"},
+   {"FEP", "FEP-valid-quads", "[FEP] Valid quads"},
+   {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no 
pixels passing the stencil test"},
+   {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with 
no pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any 
pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid 
pixels written to colour buffer"},
+   {"TLB", "TLB-partial-quads-written-to-color-buffer", "[TLB] Partial 
quads written to the colour buffer"},
+   {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need 
clipping"},
+   {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives 
discarded by being outside the viewport"},
+   {"PTB", "PTB-primitives-binned", "[PTB] Total primitives binned"},
+   {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are 
discarded because they are reversed"},
+   {"QPU", "QPU-total-instr-cache-hit", "[QPU] Total instruction cache 
hits for all slices"},
+   {"QPU", "QPU-total-instr-cache-miss", "[QPU] Total instruction cache 
misses for all slices"},
+   {"QPU", "QPU-total-uniform-cache-hit", "[QPU] Total uniforms cache hits 
for all slices"},
+   {"QPU", "QPU-total-uniform-cache-miss", "[QPU] Total uniforms cache 
misses for all slices"},
+   {"TMU", "TMU-active-cycles", "[TMU] Active cycles"},
+   {"TMU", "TMU-stalled-cycles", "[TMU] Stalled cycles"},
+   {"TMU", "TMU-total-text-quads-access", "[TMU] Total texture cache 
access

[PATCH 0/6] drm/v3d: Improve Performance Counters handling

2024-05-08 Thread Maíra Canal
This series has the intention to address two issues with Performance Counters
on V3D:

1. Update the number of Performance Counters for V3D 7.1 

V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87. Although the
series [1] enabled support for V3D 7.1, it didn’t replace the maximum number of
performance counters. This led to errors in user space as the Vulkan driver
updated the maximum number of performance counters, but the kernel didn’t. 

Currently, the user space can request values for performance counters that
are greater than 87 and the kernel will return an error instead of the values.
That’s why `dEQP-VK.query_pool.performance_query.*` currently fails on Mesa
CI [2]. This series intends to fix the `dEQP-VK.query_pool.performance_query.*`
fail.

2. Make the kernel able to provide the Performance Counter descriptions

Although all the management of the Performance Monitors is done through IOCTLs,
which means that the code is in the kernel, the performance counter descriptions
are in Mesa. This means two things: (#1) only Mesa has access to the 
descriptions
and (#2) we can have inconsistencies between the information provided by Mesa
and the kernel, as seen in the first issue addressed by this series.

To minimize the risk of inconsistencies, this series proposes to use the kernel
as a “single source of truth”. Therefore, if there are any changes to the
performance monitors, all the changes must be done only in the kernel. This
means that all information about the maximum number of performance counters and
all the descriptions will now be retrieved from the kernel. 

This series is coupled with a Mesa series [3] that enabled the use of the new
IOCTL. I appreciate any feedback from both the kernel and Mesa implementations.

[1] https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/
[2] 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
[3] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt

Best Regards,
- Maíra Canal

Maíra Canal (6):
  drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
  drm/v3d: Different V3D versions can have different number of perfcnt
  drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
  drm/v3d: Create new IOCTL to expose performance counters information
  drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
  drm/v3d: Deprecate the use of the Performance Counters enum

 drivers/gpu/drm/v3d/v3d_drv.c |  11 +
 drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
 include/uapi/drm/v3d_drm.h|  44 
 6 files changed, 312 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

-- 
2.44.0



[PATCH 6/6] drm/v3d: Deprecate the use of the Performance Counters enum

2024-05-08 Thread Maíra Canal
The Performance Counters enum used to identify the index of each
performance counter and provide the total number of performance
counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2,
not for V3D 7.1.

As we implemented a new flexible structure to retrieve performance
counters information, we can deprecate this enum.

Signed-off-by: Maíra Canal 
---
 include/uapi/drm/v3d_drm.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 0860ddb3d0b6..706b4dea1c45 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -603,6 +603,12 @@ struct drm_v3d_submit_cpu {
__u64 extensions;
 };
 
+/* The performance counters index represented by this enum are deprecated and
+ * must no longer be used. These counters are only valid for V3D 4.2.
+ *
+ * In order to check for performance counter information,
+ * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER.
+ */
 enum {
V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
V3D_PERFCNT_FEP_VALID_PRIMS,
-- 
2.44.0



[PATCH 2/6] drm/v3d: Different V3D versions can have different number of perfcnt

2024-05-08 Thread Maíra Canal
Currently, even though V3D 7.1 has 93 performance counters, it is not
possible to create counters bigger than 87, as
`v3d_perfmon_create_ioctl()` understands that counters bigger than 87
are invalid.

Therefore, create a device variable to expose the maximum
number of counters for a given V3D version and make
`v3d_perfmon_create_ioctl()` check this variable.

This commit fixes CTS failures in the performance queries tests
(dEQP-VK.query_pool.performance_query.*) [1]

Link: 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
 [1]
Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device")
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 7 +++
 drivers/gpu/drm/v3d/v3d_drv.h | 5 +
 drivers/gpu/drm/v3d/v3d_perfmon.c | 3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..6b9dd26df9fe 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -294,6 +294,13 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
+   if (v3d->ver >= 71)
+   v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters);
+   else if (v3d->ver >= 42)
+   v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters);
+   else
+   v3d->max_counters = 0;
+
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
ret = PTR_ERR(v3d->reset);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 671375a3bb66..bd1e38f7d10a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,6 +104,11 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
+   /* Different revisions of V3D have different total number of performance
+* counters
+*/
+   unsigned int max_counters;
+
void __iomem *hub_regs;
void __iomem *core_regs[3];
void __iomem *bridge_regs;
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index e1be7368b87d..f268d9466c0f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -123,6 +123,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 {
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
struct drm_v3d_perfmon_create *req = data;
+   struct v3d_dev *v3d = v3d_priv->v3d;
struct v3d_perfmon *perfmon;
unsigned int i;
int ret;
@@ -134,7 +135,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= V3D_PERFCNT_NUM)
+   if (req->counters[i] >= v3d->max_counters)
return -EINVAL;
}
 
-- 
2.44.0



[PATCH 3/6] drm/v3d: Create a new V3D parameter for the maximum number of perfcnt

2024-05-08 Thread Maíra Canal
The maximum number of performance counters can change from version to
version and it's important for userspace to know this value, as it needs
to use the counters for performance queries. Therefore, expose the
maximum number of performance counters to userspace as a parameter.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 3 +++
 include/uapi/drm/v3d_drm.h| 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 6b9dd26df9fe..d2c1d5053132 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -94,6 +94,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
args->value = 1;
return 0;
+   case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
+   args->value = v3d->max_counters;
+   return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index dce1835eced4..215b01bb69c3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -286,6 +286,7 @@ enum drm_v3d_param {
DRM_V3D_PARAM_SUPPORTS_PERFMON,
DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE,
+   DRM_V3D_PARAM_MAX_PERF_COUNTERS,
 };
 
 struct drm_v3d_get_param {
-- 
2.44.0



[PATCH 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM

2024-05-08 Thread Maíra Canal
V3D_PERFCNT_NUM represents the maximum number of performance counters
for V3D 4.2, but not for V3D 7.1. This means that, if we use
V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.

Therefore, use the number of performance counters on V3D 7.1 as the
maximum number of counters. This will allow us to create arrays on the
stack with reasonable size. Note that userspace must use the value
provided by DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h   | 5 -
 drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 44cfddedebde..556cbb400ba0 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,8 +351,11 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
+/* Maximum number of performance counters supported by any version of V3D */
+#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters)
+
 /* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \
+#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
  DRM_V3D_MAX_PERF_COUNTERS)
 
 struct v3d_performance_query {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7cd8c335cd9b..03df37a3acf5 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -490,7 +490,7 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, 
void *data, u32 quer
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_PERFCNT_NUM];
+   u64 counter_values[V3D_MAX_COUNTERS];
 
for (int i = 0; i < performance_query->nperfmons; i++) {
perfmon = v3d_perfmon_find(v3d_priv,
-- 
2.44.0



[PATCH 4/6] drm/v3d: Create new IOCTL to expose performance counters information

2024-05-08 Thread Maíra Canal
Userspace usually needs some information about the performance counters
available. Although we could replicate this information in the kernel
and user-space, let's use the kernel as the "single source of truth" to
avoid issues in the future (e.g. list of performance counters is updated
in user-space, but not in the kernel, generating invalid requests).

Therefore, create a new IOCTL to expose the performance counters
information, that is name, category, and description.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c |  1 +
 drivers/gpu/drm/v3d/v3d_drv.h |  2 ++
 drivers/gpu/drm/v3d/v3d_perfmon.c | 33 +++
 include/uapi/drm/v3d_drm.h| 37 +++
 4 files changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index d2c1d5053132..f7477488b1cc 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -211,6 +211,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, 
DRM_RENDER_ALLOW | DRM_AUTH),
+   DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, 
v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver v3d_drm_driver = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index bd1e38f7d10a..44cfddedebde 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -582,6 +582,8 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void 
*data,
  struct drm_file *file_priv);
 int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
 
 /* v3d_sysfs.c */
 int v3d_sysfs_init(struct device *dev);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index f268d9466c0f..73e2bb8bdb7f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -217,3 +217,36 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, 
void *data,
 
return ret;
 }
+
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct drm_v3d_perfmon_get_counter *req = data;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
+   const struct v3d_perf_counter_desc *counter;
+
+   for (int i = 0; i < ARRAY_SIZE(req->reserved); i++) {
+   if (req->reserved[i] != 0)
+   return -EINVAL;
+   }
+
+   /* Make sure that the counter ID is valid */
+   if (req->counter >= v3d->max_counters)
+   return -EINVAL;
+
+   if (v3d->ver >= 71) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v71_performance_counters));
+   counter = _v71_performance_counters[req->counter];
+   } else if (v3d->ver >= 42) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v42_performance_counters));
+   counter = _v42_performance_counters[req->counter];
+   } else {
+   return -EOPNOTSUPP;
+   }
+
+   strscpy(req->name, counter->name, sizeof(req->name));
+   strscpy(req->category, counter->category, sizeof(req->category));
+   strscpy(req->description, counter->description, 
sizeof(req->description));
+
+   return 0;
+}
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 215b01bb69c3..0860ddb3d0b6 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -42,6 +42,7 @@ extern "C" {
 #define DRM_V3D_PERFMON_DESTROY   0x09
 #define DRM_V3D_PERFMON_GET_VALUES0x0a
 #define DRM_V3D_SUBMIT_CPU0x0b
+#define DRM_V3D_PERFMON_GET_COUNTER   0x0c
 
 #define DRM_IOCTL_V3D_SUBMIT_CL   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
 #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
@@ -58,6 +59,8 @@ extern "C" {
 #define DRM_IOCTL_V3D_PERFMON_GET_VALUES  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_VALUES, \
   struct 
drm_v3d_perfmon_get_values)
 #define DRM_IOCTL_V3D_SUBMIT_CPU  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu)
+#define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_COUNTER, \
+  stru

[PATCH 1/6] drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1

2024-05-08 Thread Maíra Canal
Add name, category and description for each one of the 93 performance
counters available on V3D.

Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93.
Therefore, there are two performance counters arrays. The index of the
performance counter for each V3D version is represented by its position
on the array.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h |   2 +
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 2 files changed, 210 insertions(+)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..671375a3bb66 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#include "v3d_performance_counters.h"
+
 #include "uapi/drm/v3d_drm.h"
 
 struct clk;
diff --git a/drivers/gpu/drm/v3d/v3d_performance_counters.h 
b/drivers/gpu/drm/v3d/v3d_performance_counters.h
new file mode 100644
index ..72822205ebdc
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_performance_counters.h
@@ -0,0 +1,208 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright (C) 2024 Raspberry Pi
+ */
+#ifndef V3D_PERFORMANCE_COUNTERS_H
+#define V3D_PERFORMANCE_COUNTERS_H
+
+/* Holds a description of a given performance counter. The index of performance
+ * counter is given by the array on v3d_performance_counter.h
+ */
+struct v3d_perf_counter_desc {
+   /* Category of the counter */
+   char category[32];
+
+   /* Name of the counter */
+   char name[64];
+
+   /* Description of the counter */
+   char description[256];
+};
+
+static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = {
+   {"CORE", "cycle-count", "[CORE] Cycle counter"},
+   {"CORE", "core-active", "[CORE] Bin/Render/Compute active cycles"},
+   {"CLE", "CLE-bin-thread-active-cycles", "[CLE] Bin thread active 
cycles"},
+   {"CLE", "CLE-render-thread-active-cycles", "[CLE] Render thread active 
cycles"},
+   {"CORE", "compute-active-cycles", "[CORE] Compute active cycles"},
+   {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid 
primitives that result in no rendered pixels, for all rendered tiles"},
+   {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives 
for all rendered tiles (primitives may be counted in more than one tile)"},
+   {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"},
+   {"FEP", "FEP-valid-quads", "[FEP] Valid quads"},
+   {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no 
pixels passing the stencil test"},
+   {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with 
no pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any 
pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid 
pixels written to colour buffer"},
+   {"TLB", "TLB-partial-quads-written-to-color-buffer", "[TLB] Partial 
quads written to the colour buffer"},
+   {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need 
clipping"},
+   {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives 
discarded by being outside the viewport"},
+   {"PTB", "PTB-primitives-binned", "[PTB] Total primitives binned"},
+   {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are 
discarded because they are reversed"},
+   {"QPU", "QPU-total-instr-cache-hit", "[QPU] Total instruction cache 
hits for all slices"},
+   {"QPU", "QPU-total-instr-cache-miss", "[QPU] Total instruction cache 
misses for all slices"},
+   {"QPU", "QPU-total-uniform-cache-hit", "[QPU] Total uniforms cache hits 
for all slices"},
+   {"QPU", "QPU-total-uniform-cache-miss", "[QPU] Total uniforms cache 
misses for all slices"},
+   {"TMU", "TMU-active-cycles", "[TMU] Active cycles"},
+   {"TMU", "TMU-stalled-cycles", "[TMU] Stalled cycles"},
+   {"TMU", "TMU-total-text-quads-access", "[TMU] Total texture cache 
accesses"},
+   {"TMU", 

Re: [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-29 Thread Maíra Canal

Hi Iago,

On 4/29/24 02:22, Iago Toral wrote:

Hi Maíra,

a question below:

El dom, 28-04-2024 a las 09:40 -0300, Maíra Canal escribió:

Although Big/Super Pages could appear naturally, it would be quite
hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we
can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

Therefore, as V3D has a mountpoint with "huge=within_size" (if user
has
THP enabled), use this mountpoint for BO creation if available. This
will allow us to create large pages allocated contiguously and make
use
of Big/Super Pages.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---



(...)


@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device
*dev, struct drm_file *file_priv,
     size_t unaligned_size)
  {
    struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
    struct v3d_bo *bo;
    int ret;
  
-	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);

+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->gemfs)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev,
unaligned_size,
+     v3d-

gemfs);

+   else
+   shmem_obj = drm_gem_shmem_create(dev,
unaligned_size);
+
    if (IS_ERR(shmem_obj))
    return ERR_CAST(shmem_obj);
    bo = to_v3d_bo(_obj->base);



if I read this correctly when we have THP we always allocate with that,
Even objects that are smaller than 64KB. I was wondering if there is
any benefit/downside to this or if the behavior for small allocations
is the same we had without the new mount point.


I'm assuming that your concern is related to memory pressure and memory
fragmentation.

As we are using `huge=within_size`, we only allocate a huge page if it
will be fully within `i_size`. When using `huge=within_size`, we can
optimize the performance for smaller files without forcing larger files
to also use huge pages. I don't understand `huge=within_size` in full
details, but it is possible to check that it is able to avoid the system
(even the RPi) to go OOM. Although it is slightly less performant than
`huge=always` (used in v1), I believe it is more ideal for a system such
as the RPi due to the memory requirements.

Best Regards,
- Maíra



Iago


[PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages

2024-04-28 Thread Maíra Canal
Add a modparam for turning off Big/Super Pages to make sure that if an
user doesn't want Big/Super Pages enabled, it can disabled it by setting
the modparam to false.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 7 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 5 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..1a6e01235df6 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,13 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+/* Only expose the `super_pages` modparam if THP is enabled. */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.");
+#endif
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..0ade02bb7209 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -11,6 +11,7 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
char huge_opt[] = "huge=within_size";
struct file_system_type *type;
struct vfsmount *gemfs;
+   extern bool super_pages;
 
/*
 * By creating our own shmemfs mountpoint, we can pass in
@@ -20,6 +21,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
goto err;
 
+   /* The user doesn't want to enable Super Pages */
+   if (!super_pages)
+   goto err;
+
type = get_fs_type("tmpfs");
if (!type)
goto err;
-- 
2.44.0



[PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-28 Thread Maíra Canal
Although Big/Super Pages could appear naturally, it would be quite hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

Therefore, as V3D has a mountpoint with "huge=within_size" (if user has
THP enabled), use this mountpoint for BO creation if available. This
will allow us to create large pages allocated contiguously and make use
of Big/Super Pages.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_bo.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..16ac26c31c6b 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;
 
/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);
 
+   if (!v3d->gemfs)
+   align = SZ_4K;
+   else if (obj->size >= SZ_1M)
+   align = SZ_1M;
+   else if (obj->size >= SZ_64K)
+   align = SZ_64K;
+   else
+   align = SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
int ret;
 
-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->gemfs)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+ v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
-- 
2.44.0



[PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs

2024-04-28 Thread Maíra Canal
The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.h |  1 +
 drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++-
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index e1f291db68de..3276eef280ef 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;
 
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
 
diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..2e0b31e373b2 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -25,9 +25,16 @@
  * superpage bit set.
  */
 #define V3D_PTE_SUPERPAGE BIT(31)
+#define V3D_PTE_BIGPAGE BIT(30)
 #define V3D_PTE_WRITEABLE BIT(29)
 #define V3D_PTE_VALID BIT(28)
 
+static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment)
+{
+   return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) &&
+   IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT);
+}
+
 static int v3d_mmu_flush_all(struct v3d_dev *v3d)
 {
int ret;
@@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
struct drm_gem_shmem_object *shmem_obj = >base;
struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
u32 page = bo->node.start;
-   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-   struct sg_dma_page_iter dma_iter;
-
-   for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) {
-   dma_addr_t dma_addr = sg_page_iter_dma_address(_iter);
-   u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
-   u32 pte = page_prot | page_address;
-   u32 i;
-
-   BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
-  BIT(24));
-   for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
-   v3d->pt[page++] = pte + i;
+   struct scatterlist *sgl;
+   unsigned int count;
+
+   for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) {
+   dma_addr_t dma_addr = sg_dma_address(sgl);
+   u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT;
+   unsigned int len = sg_dma_len(sgl);
+
+   while (len > 0) {
+   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
+   u32 page_address = page_prot | pfn;
+   unsigned int i, page_size;
+
+   BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24));
+
+   if (len >= SZ_1M && v3d_mmu_is_aligned(page, 
page_address, SZ_1M)) {
+   page_size = SZ_1M;
+   page_address |= V3D_PTE_SUPERPAGE;
+   } else if (len >= SZ_64K && v3d_mmu_is_aligned(page, 
page_address, SZ_64K)) {
+   page_size = SZ_64K;
+   page_address |= V3D_PTE_BIGPAGE;
+   } else {
+   page_size = SZ_4K;
+   }
+
+   for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) {
+   v3d->pt[page++] = page_address + i;
+   pfn++;
+   }
+
+   len -= page_size;
+   }
}
 
WARN_ON_ONCE(page - bo->node.start !=
-- 
2.44.0



[PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation

2024-04-28 Thread Maíra Canal
Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 100 BOs
of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index cef2f82b7a75..e1f291db68de 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0



[PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint

2024-04-28 Thread Maíra Canal
Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++
 include/drm/drm_gem_shmem_helper.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+  struct vfsmount *gemfs)
 {
struct drm_gem_shmem_object *shmem;
struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, 
bool private)
drm_gem_private_object_init(dev, obj, size);
shmem->map_wc = false; /* dma-buf mappings use always 
writecombine */
} else {
-   ret = drm_gem_object_init(dev, obj, size);
+   ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
}
if (ret) {
drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t 
size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size)
 {
-   return __drm_gem_shmem_create(dev, size, false);
+   return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs)
+{
+   return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
size_t size = PAGE_ALIGN(attach->dmabuf->size);
struct drm_gem_shmem_object *shmem;
 
-   shmem = __drm_gem_shmem_create(dev, size, true);
+   shmem = __drm_gem_shmem_create(dev, size, true, NULL);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h 
b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0



[PATCH v4 0/8] drm/v3d: Enable Big and Super Pages

2024-04-28 Thread Maíra Canal
 performance.
This indicates an enhancement in the baseline scenario, rather than any 
detriment
caused by v2. Additionally, I've included stats from v1 in the comparisons. Upon
scrutinizing the average FPS of v2 in contrast to v1, it becomes evident that v2
not only maintains the improvements but may even surpass them.

This version provides a much safer way to iterate through memory and doesn't
hold to the same limitations as v1. For example, v1 had a hard-coded hack that
only allowed a huge page to be created if the BO was bigger than 2MB. These
limitations are gone now.

This series also introduces changes in the GEM helpers, in order to enable V3D
to have a separate mount point for shmfs GEM objects. Any feedback from the
community about the changes in the GEM helpers is welcomed!

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240311100959.205545-1-mca...@igalia.com/

* [1/6] Add Iago's R-b to PATCH 1/5 (Iago Toral)
* [2/6] Create a new function `drm_gem_object_init_with_mnt()` to define the
shmfs mountpoint. Now, we don't touch a bunch of drivers, as
`drm_gem_object_init()` preserves its signature 
(Tvrtko Ursulin)
* [3/6] Use `huge=within_size` instead of `huge=always`, in order to avoid OOM.
This also allow us to move away from the 2MB hack. (Tvrtko Ursulin)
* [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral)
* [5/6] Create a separate patch to reduce the alignment of the node allocation
(Iago Toral)
* [6/6] Complete refactoring to the way that we iterate through the memory
(Tvrtko Ursulin)
* [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us 
misleading
data (Tvrtko Ursulin)
* [6/6] Use both Big Pages (64K) and Super Pages (1MB)

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mca...@igalia.com/T/

* [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin)
* [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin)
* [6/8] Now, PATCH 6/8 regards supporting big/super pages when writing out PTEs
(Tvrtko Ursulin)
* [6/8] s/page_address/pfn (Tvrtko Ursulin)
* [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be `unsigned 
int`
too (Tvrtko Ursulin)
* [6/8] `i` and `page_size` are `unsigned int` as well (Tvrtko Ursulin)
* [6/8] Move `i`, `page_prot` and `page_size` to the inner scope (Tvrtko 
Ursulin)
* [6/8] s/pte/page_address/ (Tvrtko Ursulin)
* [7/8] New patch: use gemfs/THP in BO creation if available
* [8/8] New patch: 
* [8/8] Don't expose the modparam `super_pages` unless 
CONFIG_TRANSPARENT_HUGEPAGE
is enabled (Tvrtko Ursulin)
* [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support
(Tvrtko Ursulin)

v3 -> v4: 
https://lore.kernel.org/dri-devel/20240421215309.660018-1-mca...@igalia.com/T/

* [5/8] Add Iago's R-b to PATCH 5/8 (Iago Toral)
* [6/8] Add Tvrtko's R-b to PATCH 6/8 (Tvrtko Ursulin)
* [7/8] Add Tvrtko's R-b to PATCH 7/8 (Tvrtko Ursulin)
* [8/8] Move `bool super_pages` to the guard (Tvrtko Ursulin)

Best Regards,
- Maíra

Maíra Canal (8):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Create a drm_gem_object_init_with_mnt() function
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Reduce the alignment of the node allocation
  drm/v3d: Support Big/Super Pages when writing out PTEs
  drm/v3d: Use gemfs/THP in BO creation if available
  drm/v3d: Add modparam for turning off Big/Super Pages

 drivers/gpu/drm/drm_gem.c  | 34 +++--
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +--
 drivers/gpu/drm/v3d/Makefile   |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c   | 21 ++-
 drivers/gpu/drm/v3d/v3d_drv.c  |  7 
 drivers/gpu/drm/v3d/v3d_drv.h  | 12 +-
 drivers/gpu/drm/v3d/v3d_gem.c  |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c| 51 +
 drivers/gpu/drm/v3d/v3d_mmu.c  | 52 +++---
 include/drm/drm_gem.h  |  3 ++
 include/drm/drm_gem_shmem_helper.h |  3 ++
 11 files changed, 195 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

-- 
2.44.0



[PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails

2024-04-28 Thread Maíra Canal
If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index da8faf3b9011..b3b76332f2c5 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -291,8 +291,9 @@ v3d_gem_init(struct drm_device *dev)
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
-   dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+   dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
  v3d->pt_paddr);
+   return ret;
}
 
return 0;
-- 
2.44.0



[PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function

2024-04-28 Thread Maíra Canal
For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem.c | 34 ++
 include/drm/drm_gem.h |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }
 
 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-   struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs)
 {
struct file *filp;
 
drm_gem_private_object_init(dev, obj, size);
 
-   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+   if (gemfs)
+   filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+VM_NORESERVE);
+   else
+   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
if (IS_ERR(filp))
return PTR_ERR(filp);
 
@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,
 
return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+   size_t size)
+{
+   return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);
 
 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
-- 
2.44.0



[PATCH v4 3/8] drm/v3d: Introduce gemfs

2024-04-28 Thread Maíra Canal
Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/Makefile|  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
v3d_trace_points.o \
v3d_sched.o \
v3d_sysfs.o \
-   v3d_submit.o
+   v3d_submit.o \
+   v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..cef2f82b7a75 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -131,6 +131,11 @@ struct v3d_dev {
struct drm_mm mm;
spinlock_t mm_lock;
 
+   /*
+* tmpfs instance used for shmem backed objects
+*/
+   struct vfsmount *gemfs;
+
struct work_struct overflow_mem_work;
 
struct v3d_bin_job *bin_job;
@@ -532,6 +537,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b3b76332f2c5..b1e681630ded 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -288,6 +288,8 @@ v3d_gem_init(struct drm_device *dev)
v3d_init_hw_state(v3d);
v3d_mmu_set_page_table(v3d);
 
+   v3d_gemfs_init(v3d);
+
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
@@ -305,6 +307,7 @@ v3d_gem_destroy(struct drm_device *dev)
struct v3d_dev *v3d = to_v3d_dev(dev);
 
v3d_sched_fini(v3d);
+   v3d_gemfs_fini(v3d);
 
/* Waiting for jobs to finish would need to be done before
 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index ..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include 
+#include 
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+   char huge_opt[] = "huge=within_size";
+   struct file_system_type *type;
+   struct vfsmount *gemfs;
+
+   /*
+* By creating our own shmemfs mountpoint, we can pass in
+* mount flags that better match our usecase. However, we
+* only do so on platforms which benefit from it.
+*/
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   goto err;
+
+   type = get_fs_type("tmpfs");
+   if (!type)
+   goto err;
+
+   gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+   if (IS_ERR(gemfs))
+   goto err;
+
+   v3d->gemfs = gemfs;
+   drm_info(>drm, "Using Transparent Hugepages\n");
+
+   return;
+
+err:
+   v3d->gemfs = NULL;
+   drm_notice(>drm,
+  "Transparent Hugepage support is recommended for optimal 
performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+   if (v3d->gemfs)
+   kern_unmount(v3d->gemfs);
+}
-- 
2.44.0



Re: [PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-23 Thread Maíra Canal

On 4/23/24 04:05, Maxime Ripard wrote:

Hi,

On Mon, Apr 22, 2024 at 01:08:44PM -0300, Maíra Canal wrote:

@drm-misc maintainers, is there any chance you could backport commit
35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice") [1] to drm-
misc-next?

I would like to apply this series to drm-misc-next because it fixes
another issue with the GPU stats, but this series depends on commit
35f4f8c9fc97, as it has plenty of refactors on the GPU stats code.

Although I could theoretically apply this series in drm-misc-fixes, I
don't believe it would be ideal, as discussed in #dri-devel earlier
today.

[1] 
https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/35f4f8c9fc972248055096d63b782060e473311b


I just did the backmerge


Thanks Maxime! I just applied the series to drm-misc/drm-misc-next.

Thanks for drm-misc maintainers for the quick action!

Best Regards,
- Maíra



Maxime


Re: [PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-22 Thread Maíra Canal

Hi,

@drm-misc maintainers, is there any chance you could backport commit
35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice") [1] to drm-
misc-next?

I would like to apply this series to drm-misc-next because it fixes
another issue with the GPU stats, but this series depends on commit
35f4f8c9fc97, as it has plenty of refactors on the GPU stats code.

Although I could theoretically apply this series in drm-misc-fixes, I
don't believe it would be ideal, as discussed in #dri-devel earlier
today.

[1] 
https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/35f4f8c9fc972248055096d63b782060e473311b


Best Regards,
- Maíra

On 4/20/24 18:32, Maíra Canal wrote:

The first version of this series had the intention to fix two major
issues with the GPU stats:

1. We were incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

The first of the issues was already addressed and the fix was applied to
drm-misc-fixes. Now, what is left, addresses the second issue.

Apart from addressing this issue, this series improved the GPU stats
code as a whole. We reduced code repetition, creating functions to start and
update the GPU stats. This will likely reduce the odds of issue #1 happen again.

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/

- As the first patch was a bugfix, it was pushed to drm-misc-fixes.
- [1/4] Add Chema Casanova's R-b
- [2/4] s/jobs_sent/jobs_completed and add the reasoning in the commit message
(Chema Casanova)
- [2/4] Add Chema Casanova's and Tvrtko Ursulin's R-b
- [3/4] Call `local_clock()` only once, by adding a new parameter to the
`v3d_stats_update` function (Chema Casanova)
- [4/4] Move new line to the correct patch [2/4] (Tvrtko Ursulin)
- [4/4] Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko 
Ursulin)

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240417011021.600889-1-mca...@igalia.com/T/

- [4/5] New patch: separates the code refactor from the race-condition fix 
(Tvrtko Ursulin)
- [5/5] s/interruption/interrupt (Tvrtko Ursulin)
- [5/5] s/matches/match (Tvrtko Ursulin)
- [5/5] Add Tvrtko Ursulin's R-b

Best Regards,
- Maíra

Maíra Canal (5):
   drm/v3d: Create two functions to update all GPU stats variables
   drm/v3d: Create a struct to store the GPU stats
   drm/v3d: Create function to update a set of GPU stats
   drm/v3d: Decouple stats calculation from printing
   drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

  drivers/gpu/drm/v3d/v3d_drv.c   | 33 
  drivers/gpu/drm/v3d/v3d_drv.h   | 30 ---
  drivers/gpu/drm/v3d/v3d_gem.c   |  9 ++--
  drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++---
  drivers/gpu/drm/v3d/v3d_sched.c | 94 +
  drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++---
  6 files changed, 109 insertions(+), 118 deletions(-)



[PATCH v3 8/8] drm/v3d: Add modparam for turning off Big/Super Pages

2024-04-21 Thread Maíra Canal
Add a modparam for turning off Big/Super Pages to make sure that if an
user doesn't want Big/Super Pages enabled, it can disabled it by setting
the modparam to false.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 8 
 drivers/gpu/drm/v3d/v3d_gemfs.c | 5 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..bc8c8905112a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,14 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+bool super_pages = true;
+
+/* Only expose the `super_pages` modparam if THP is enabled. */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.");
+#endif
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..5fa08263cff2 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -11,6 +11,11 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
char huge_opt[] = "huge=within_size";
struct file_system_type *type;
struct vfsmount *gemfs;
+   extern bool super_pages;
+
+   /* The user doesn't want to enable Super Pages */
+   if (!super_pages)
+   goto err;
 
/*
 * By creating our own shmemfs mountpoint, we can pass in
-- 
2.44.0



[PATCH v3 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-21 Thread Maíra Canal
Although Big/Super Pages could appear naturally, it would be quite hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

As V3D has a mountpoint with "huge=within_size" (if user has THP enabled),
use this mountpoint for BO creation if available. This will allow us to create
large pages allocated contiguously and make use of Big/Super Pages.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..16ac26c31c6b 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;
 
/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);
 
+   if (!v3d->gemfs)
+   align = SZ_4K;
+   else if (obj->size >= SZ_1M)
+   align = SZ_1M;
+   else if (obj->size >= SZ_64K)
+   align = SZ_64K;
+   else
+   align = SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
int ret;
 
-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->gemfs)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+ v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
-- 
2.44.0



[PATCH v3 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs

2024-04-21 Thread Maíra Canal
The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h |  1 +
 drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++-
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 17236ee23490..79d8a1a059aa 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;
 
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
 
diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..2e0b31e373b2 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -25,9 +25,16 @@
  * superpage bit set.
  */
 #define V3D_PTE_SUPERPAGE BIT(31)
+#define V3D_PTE_BIGPAGE BIT(30)
 #define V3D_PTE_WRITEABLE BIT(29)
 #define V3D_PTE_VALID BIT(28)
 
+static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment)
+{
+   return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) &&
+   IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT);
+}
+
 static int v3d_mmu_flush_all(struct v3d_dev *v3d)
 {
int ret;
@@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
struct drm_gem_shmem_object *shmem_obj = >base;
struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
u32 page = bo->node.start;
-   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-   struct sg_dma_page_iter dma_iter;
-
-   for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) {
-   dma_addr_t dma_addr = sg_page_iter_dma_address(_iter);
-   u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
-   u32 pte = page_prot | page_address;
-   u32 i;
-
-   BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
-  BIT(24));
-   for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
-   v3d->pt[page++] = pte + i;
+   struct scatterlist *sgl;
+   unsigned int count;
+
+   for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) {
+   dma_addr_t dma_addr = sg_dma_address(sgl);
+   u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT;
+   unsigned int len = sg_dma_len(sgl);
+
+   while (len > 0) {
+   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
+   u32 page_address = page_prot | pfn;
+   unsigned int i, page_size;
+
+   BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24));
+
+   if (len >= SZ_1M && v3d_mmu_is_aligned(page, 
page_address, SZ_1M)) {
+   page_size = SZ_1M;
+   page_address |= V3D_PTE_SUPERPAGE;
+   } else if (len >= SZ_64K && v3d_mmu_is_aligned(page, 
page_address, SZ_64K)) {
+   page_size = SZ_64K;
+   page_address |= V3D_PTE_BIGPAGE;
+   } else {
+   page_size = SZ_4K;
+   }
+
+   for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) {
+   v3d->pt[page++] = page_address + i;
+   pfn++;
+   }
+
+   len -= page_size;
+   }
}
 
WARN_ON_ONCE(page - bo->node.start !=
-- 
2.44.0



[PATCH v3 4/8] drm/gem: Create shmem GEM object in a given mountpoint

2024-04-21 Thread Maíra Canal
Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++
 include/drm/drm_gem_shmem_helper.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+  struct vfsmount *gemfs)
 {
struct drm_gem_shmem_object *shmem;
struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, 
bool private)
drm_gem_private_object_init(dev, obj, size);
shmem->map_wc = false; /* dma-buf mappings use always 
writecombine */
} else {
-   ret = drm_gem_object_init(dev, obj, size);
+   ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
}
if (ret) {
drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t 
size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size)
 {
-   return __drm_gem_shmem_create(dev, size, false);
+   return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs)
+{
+   return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
size_t size = PAGE_ALIGN(attach->dmabuf->size);
struct drm_gem_shmem_object *shmem;
 
-   shmem = __drm_gem_shmem_create(dev, size, true);
+   shmem = __drm_gem_shmem_create(dev, size, true, NULL);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h 
b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0



[PATCH v3 5/8] drm/v3d: Reduce the alignment of the node allocation

2024-04-21 Thread Maíra Canal
Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 100 BOs
of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d2ce8222771a..17236ee23490 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0



[PATCH v3 0/8] drm/v3d: Enable Big and Super Pages

2024-04-21 Thread Maíra Canal
This also allow us to move away from the 2MB hack. (Tvrtko Ursulin)
* [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral)
* [5/6] Create a separate patch to reduce the alignment of the node allocation.
(Iago Toral)
* [6/6] Complete refactoring to the way that we iterate through the memory.
(Tvrtko Ursulin)
* [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us
misleading data. (Tvrtko Ursulin)
* [6/6] Use both Big Pages (64K) and Super Pages (1MB).

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mca...@igalia.com/T/

* [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin)
* [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin)
* [6/8] Now, PATCH 6/8 only adds support to big/super pages when writing out
PTEs. BO creation with THP and addition of modparam are moved to
other patches. (Tvrtko Ursulin)
* [6/8] s/page_address/pfn (Tvrtko Ursulin)
* [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be
`unsigned int` too. (Tvrtko Ursulin)
* [6/8] `i` and `page_size` are `unsigned int` as well. (Tvrtko Ursulin)
* [6/8] Move `i`, `page_prot` and `page_size` to the inner scope. (Tvrtko 
Ursulin)
* [6/8] s/pte/page_address/ (Tvrtko Ursulin)
* [7/8] New patch: Use gemfs/THP in BO creation if available (Tvrtko Ursulin)
* [8/8] New patch: Add modparam for turning off Big/Super Pages (Tvrtko Ursulin)
* [8/8] Don't expose the modparam `super_pages` unless 
CONFIG_TRANSPARENT_HUGEPAGE
is enabled. (Tvrtko Ursulin)
* [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support.
(Tvrtko Ursulin)

Best Regards,
- Maíra

Maíra Canal (8):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Create a drm_gem_object_init_with_mnt() function
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Reduce the alignment of the node allocation
  drm/v3d: Support Big/Super Pages when writing out PTEs
  drm/v3d: Use gemfs/THP in BO creation if available
  drm/v3d: Add modparam for turning off Big/Super Pages

 drivers/gpu/drm/drm_gem.c  | 34 +++--
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +--
 drivers/gpu/drm/v3d/Makefile   |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c   | 21 ++-
 drivers/gpu/drm/v3d/v3d_drv.c  |  8 
 drivers/gpu/drm/v3d/v3d_drv.h  | 12 +-
 drivers/gpu/drm/v3d/v3d_gem.c  |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c| 51 +
 drivers/gpu/drm/v3d/v3d_mmu.c  | 52 +++---
 include/drm/drm_gem.h  |  3 ++
 include/drm/drm_gem_shmem_helper.h |  3 ++
 11 files changed, 196 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

-- 
2.44.0



[PATCH v3 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function

2024-04-21 Thread Maíra Canal
For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem.c | 34 ++
 include/drm/drm_gem.h |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }
 
 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-   struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs)
 {
struct file *filp;
 
drm_gem_private_object_init(dev, obj, size);
 
-   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+   if (gemfs)
+   filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+VM_NORESERVE);
+   else
+   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
if (IS_ERR(filp))
return PTR_ERR(filp);
 
@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,
 
return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+   size_t size)
+{
+   return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);
 
 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
-- 
2.44.0



[PATCH v3 3/8] drm/v3d: Introduce gemfs

2024-04-21 Thread Maíra Canal
Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/Makefile|  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
v3d_trace_points.o \
v3d_sched.o \
v3d_sysfs.o \
-   v3d_submit.o
+   v3d_submit.o \
+   v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..d2ce8222771a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -119,6 +119,11 @@ struct v3d_dev {
struct drm_mm mm;
spinlock_t mm_lock;
 
+   /*
+* tmpfs instance used for shmem backed objects
+*/
+   struct vfsmount *gemfs;
+
struct work_struct overflow_mem_work;
 
struct v3d_bin_job *bin_job;
@@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 66f4b78a6b2e..faefbe497e8d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
v3d_init_hw_state(v3d);
v3d_mmu_set_page_table(v3d);
 
+   v3d_gemfs_init(v3d);
+
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
@@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
struct v3d_dev *v3d = to_v3d_dev(dev);
 
v3d_sched_fini(v3d);
+   v3d_gemfs_fini(v3d);
 
/* Waiting for jobs to finish would need to be done before
 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index ..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include 
+#include 
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+   char huge_opt[] = "huge=within_size";
+   struct file_system_type *type;
+   struct vfsmount *gemfs;
+
+   /*
+* By creating our own shmemfs mountpoint, we can pass in
+* mount flags that better match our usecase. However, we
+* only do so on platforms which benefit from it.
+*/
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   goto err;
+
+   type = get_fs_type("tmpfs");
+   if (!type)
+   goto err;
+
+   gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+   if (IS_ERR(gemfs))
+   goto err;
+
+   v3d->gemfs = gemfs;
+   drm_info(>drm, "Using Transparent Hugepages\n");
+
+   return;
+
+err:
+   v3d->gemfs = NULL;
+   drm_notice(>drm,
+  "Transparent Hugepage support is recommended for optimal 
performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+   if (v3d->gemfs)
+   kern_unmount(v3d->gemfs);
+}
-- 
2.44.0



[PATCH v3 1/8] drm/v3d: Fix return if scheduler initialization fails

2024-04-21 Thread Maíra Canal
If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..66f4b78a6b2e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev)
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
-   dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+   dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
  v3d->pt_paddr);
+   return ret;
}
 
return 0;
-- 
2.44.0



[PATCH v3 4/5] drm/v3d: Decouple stats calculation from printing

2024-04-20 Thread Maíra Canal
Create a function to decouple the stats calculation from the printing.
This will be useful in the next step when we add a seqcount to protect
the stats.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 18 ++
 drivers/gpu/drm/v3d/v3d_drv.h   |  4 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 11 +++
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 52e3ba9df46f..2ec359ed2def 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -142,6 +142,15 @@ v3d_postclose(struct drm_device *dev, struct drm_file 
*file)
kfree(v3d_priv);
 }
 
+void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
+  u64 *active_runtime, u64 *jobs_completed)
+{
+   *active_runtime = stats->enabled_ns;
+   if (stats->start_ns)
+   *active_runtime += timestamp - stats->start_ns;
+   *jobs_completed = stats->jobs_completed;
+}
+
 static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
struct v3d_file_priv *file_priv = file->driver_priv;
@@ -150,20 +159,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = _priv->stats[queue];
+   u64 active_runtime, jobs_completed;
+
+   v3d_get_stats(stats, timestamp, _runtime, 
_completed);
 
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
-  v3d_queue_to_string(queue),
-  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
-  : stats->enabled_ns);
+  v3d_queue_to_string(queue), active_runtime);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), stats->jobs_completed);
+  v3d_queue_to_string(queue), jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 5a198924d568..ff06dc1cc078 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -510,6 +510,10 @@ struct drm_gem_object *v3d_prime_import_sg_table(struct 
drm_device *dev,
 /* v3d_debugfs.c */
 void v3d_debugfs_init(struct drm_minor *minor);
 
+/* v3d_drv.c */
+void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
+  u64 *active_runtime, u64 *jobs_completed);
+
 /* v3d_fence.c */
 extern const struct dma_fence_ops v3d_fence_ops;
 struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue queue);
diff --git a/drivers/gpu/drm/v3d/v3d_sysfs.c b/drivers/gpu/drm/v3d/v3d_sysfs.c
index 6a8e7acc8b82..d610e355964f 100644
--- a/drivers/gpu/drm/v3d/v3d_sysfs.c
+++ b/drivers/gpu/drm/v3d/v3d_sysfs.c
@@ -15,18 +15,15 @@ gpu_stats_show(struct device *dev, struct device_attribute 
*attr, char *buf)
struct v3d_dev *v3d = to_v3d_dev(drm);
enum v3d_queue queue;
u64 timestamp = local_clock();
-   u64 active_runtime;
ssize_t len = 0;
 
len += sysfs_emit(buf, "queue\ttimestamp\tjobs\truntime\n");
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = >queue[queue].stats;
+   u64 active_runtime, jobs_completed;
 
-   if (stats->start_ns)
-   active_runtime = timestamp - stats->start_ns;
-   else
-   active_runtime = 0;
+   v3d_get_stats(stats, timestamp, _runtime, 
_completed);
 
/* Each line will display the queue name, timestamp, the number
 * of jobs sent to that queue and the runtime, as can be seem 
here:
@@ -40,9 +37,7 @@ gpu_stats_show(struct device *dev, struct device_attribute 
*attr, char *buf)
 */
len += sysfs_emit_at(buf, len, "%s\t%llu\t%llu\t%llu\n",
 v3d_queue_to_string(queue),
-timestamp,
-stats->jobs_completed,
-stats->enabled_ns + active_runtime);
+timestamp, jobs_completed, active_runtime);
}
 
return len;
-- 
2.44.0



[PATCH v3 5/5] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

2024-04-20 Thread Maíra Canal
In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.

For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.

In this simple example, the user would see a spike in the queue usage,
that didn't match reality.

In order to address this issue properly, use a seqcount to protect read
and write sections of the code.

Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage 
stats")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 14 ++
 drivers/gpu/drm/v3d/v3d_drv.h   |  7 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  1 +
 drivers/gpu/drm/v3d/v3d_sched.c |  7 +++
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 2ec359ed2def..28b7ddce7747 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -121,6 +121,7 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
  1, NULL);
 
memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
+   seqcount_init(_priv->stats[i].lock);
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -145,10 +146,15 @@ v3d_postclose(struct drm_device *dev, struct drm_file 
*file)
 void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
   u64 *active_runtime, u64 *jobs_completed)
 {
-   *active_runtime = stats->enabled_ns;
-   if (stats->start_ns)
-   *active_runtime += timestamp - stats->start_ns;
-   *jobs_completed = stats->jobs_completed;
+   unsigned int seq;
+
+   do {
+   seq = read_seqcount_begin(>lock);
+   *active_runtime = stats->enabled_ns;
+   if (stats->start_ns)
+   *active_runtime += timestamp - stats->start_ns;
+   *jobs_completed = stats->jobs_completed;
+   } while (read_seqcount_retry(>lock, seq));
 }
 
 static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ff06dc1cc078..a2c516fe6d79 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -40,6 +40,13 @@ struct v3d_stats {
u64 start_ns;
u64 enabled_ns;
u64 jobs_completed;
+
+   /*
+* This seqcount is used to protect the access to the GPU stats
+* variables. It must be used as, while we are reading the stats,
+* IRQs can happen and the stats can be updated.
+*/
+   seqcount_t lock;
 };
 
 struct v3d_queue_state {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 0086081a9261..da8faf3b9011 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -251,6 +251,7 @@ v3d_gem_init(struct drm_device *dev)
 
queue->fence_context = dma_fence_context_alloc(1);
memset(>stats, 0, sizeof(queue->stats));
+   seqcount_init(>stats.lock);
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index b9614944931c..7cd8c335cd9b 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -114,16 +114,23 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_stats *local_stats = >stats[queue];
u64 now = local_clock();
 
+   write_seqcount_begin(_stats->lock);
local_stats->start_ns = now;
+   write_seqcount_end(_stats->lock);
+
+   write_seqcount_begin(_stats->lock);
global_stats->start_ns = now;
+   write_seqcount_end(_stats->lock);
 }
 
 static void
 v3d_stats_update(struct v3d_stats *stats, u64 now)
 {
+   write_seqcount_begin(>lock);
stats->enabled_ns += now - stats->start_ns;
stats->jobs_completed++;
stats->start_ns = 0;
+   write_seqcount_end(>lock);
 }
 
 void
-- 
2.44.0



[PATCH v3 3/5] drm/v3d: Create function to update a set of GPU stats

2024-04-20 Thread Maíra Canal
Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index b6b5542c3fcf..b9614944931c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -118,6 +118,14 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
global_stats->start_ns = now;
 }
 
+static void
+v3d_stats_update(struct v3d_stats *stats, u64 now)
+{
+   stats->enabled_ns += now - stats->start_ns;
+   stats->jobs_completed++;
+   stats->start_ns = 0;
+}
+
 void
 v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue)
 {
@@ -127,13 +135,8 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_stats *local_stats = >stats[queue];
u64 now = local_clock();
 
-   local_stats->enabled_ns += now - local_stats->start_ns;
-   local_stats->jobs_completed++;
-   local_stats->start_ns = 0;
-
-   global_stats->enabled_ns += now - global_stats->start_ns;
-   global_stats->jobs_completed++;
-   global_stats->start_ns = 0;
+   v3d_stats_update(local_stats, now);
+   v3d_stats_update(global_stats, now);
 }
 
 static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
-- 
2.44.0



[PATCH v3 1/5] drm/v3d: Create two functions to update all GPU stats variables

2024-04-20 Thread Maíra Canal
Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment
`enabled_ns` twice").

Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.

Co-developed-by: Tvrtko Ursulin 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  1 +
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++--
 drivers/gpu/drm/v3d/v3d_sched.c | 80 +++--
 3 files changed, 40 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..ee3545226d7f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index ce6b2fb341d1..d469bda52c1a 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FLDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->bin_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_BIN];
-
-   file->jobs_sent[V3D_BIN]++;
-   v3d->queue[V3D_BIN].jobs_sent++;
-
-   file->start_ns[V3D_BIN] = 0;
-   v3d->queue[V3D_BIN].start_ns = 0;
-
-   file->enabled_ns[V3D_BIN] += runtime;
-   v3d->queue[V3D_BIN].enabled_ns += runtime;
 
+   v3d_job_update_stats(>bin_job->base, V3D_BIN);
trace_v3d_bcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FRDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->render_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
-
-   file->jobs_sent[V3D_RENDER]++;
-   v3d->queue[V3D_RENDER].jobs_sent++;
-
-   file->start_ns[V3D_RENDER] = 0;
-   v3d->queue[V3D_RENDER].start_ns = 0;
-
-   file->enabled_ns[V3D_RENDER] += runtime;
-   v3d->queue[V3D_RENDER].enabled_ns += runtime;
 
+   v3d_job_update_stats(>render_job->base, V3D_RENDER);
trace_v3d_rcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_CSDDONE(v3d->ver)) {
struct v3d_fence *fence =
to_v3d_fence(v3d->csd_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_CSD];
-
-   file->jobs_sent[V3D_CSD]++;
-   v3d->queue[V3D_CSD].jobs_sent++;
-
-   file->start_ns[V3D_CSD] = 0;
-   v3d->queue[V3D_CSD].start_ns = 0;
-
-   file->enabled_ns[V3D_CSD] += runtime;
-   v3d->queue[V3D_CSD].enabled_ns += runtime;
 
+   v3d_job_update_stats(>csd_job->base, V3D_CSD);
trace_v3d_csd_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg)
if (intsts & V3D_HUB_INT_TFUC) {
struct v3d_fence *fence =
to_v3d_fence(v3d->tfu_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_TFU];
-
-   file->jobs_sent[V3D_TFU]++;
-   v3d->queue[V3D_TFU].jobs_sent++;
-
-   file->start_ns[V3D_TFU] = 0;
-   v3d->queue[V3D_TFU].start_ns = 0;
-
-   file->enabled_ns[V3D_TFU] += runtime;
-   v3d->queue[V3D_TFU].enabled_ns += runtime;
 
+   v3d_job_u

[PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-20 Thread Maíra Canal
The first version of this series had the intention to fix two major
issues with the GPU stats:

1. We were incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

The first of the issues was already addressed and the fix was applied to
drm-misc-fixes. Now, what is left, addresses the second issue.

Apart from addressing this issue, this series improved the GPU stats
code as a whole. We reduced code repetition, creating functions to start and
update the GPU stats. This will likely reduce the odds of issue #1 happen again.

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/

- As the first patch was a bugfix, it was pushed to drm-misc-fixes.
- [1/4] Add Chema Casanova's R-b
- [2/4] s/jobs_sent/jobs_completed and add the reasoning in the commit message
(Chema Casanova)
- [2/4] Add Chema Casanova's and Tvrtko Ursulin's R-b
- [3/4] Call `local_clock()` only once, by adding a new parameter to the
`v3d_stats_update` function (Chema Casanova)
- [4/4] Move new line to the correct patch [2/4] (Tvrtko Ursulin)
- [4/4] Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko 
Ursulin)

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240417011021.600889-1-mca...@igalia.com/T/

- [4/5] New patch: separates the code refactor from the race-condition fix 
(Tvrtko Ursulin)
- [5/5] s/interruption/interrupt (Tvrtko Ursulin)
- [5/5] s/matches/match (Tvrtko Ursulin)
- [5/5] Add Tvrtko Ursulin's R-b

Best Regards,
- Maíra

Maíra Canal (5):
  drm/v3d: Create two functions to update all GPU stats variables
  drm/v3d: Create a struct to store the GPU stats
  drm/v3d: Create function to update a set of GPU stats
  drm/v3d: Decouple stats calculation from printing
  drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

 drivers/gpu/drm/v3d/v3d_drv.c   | 33 
 drivers/gpu/drm/v3d/v3d_drv.h   | 30 ---
 drivers/gpu/drm/v3d/v3d_gem.c   |  9 ++--
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++---
 drivers/gpu/drm/v3d/v3d_sched.c | 94 +
 drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++---
 6 files changed, 109 insertions(+), 118 deletions(-)

-- 
2.44.0



[PATCH v3 2/5] drm/v3d: Create a struct to store the GPU stats

2024-04-20 Thread Maíra Canal
This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.

Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 15 +++
 drivers/gpu/drm/v3d/v3d_drv.h   | 18 ++
 drivers/gpu/drm/v3d/v3d_gem.c   |  8 
 drivers/gpu/drm/v3d/v3d_sched.c | 20 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++
 5 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..52e3ba9df46f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -115,14 +115,12 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
v3d_priv->v3d = v3d;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d_priv->enabled_ns[i] = 0;
-   v3d_priv->start_ns[i] = 0;
-   v3d_priv->jobs_sent[i] = 0;
-
sched = >queue[i].sched;
drm_sched_entity_init(_priv->sched_entity[i],
  DRM_SCHED_PRIORITY_NORMAL, ,
  1, NULL);
+
+   memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -151,20 +149,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
enum v3d_queue queue;
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+   struct v3d_stats *stats = _priv->stats[queue];
+
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
   v3d_queue_to_string(queue),
-  file_priv->start_ns[queue] ? 
file_priv->enabled_ns[queue]
- + timestamp - 
file_priv->start_ns[queue]
- : 
file_priv->enabled_ns[queue]);
+  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
+  : stats->enabled_ns);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), 
file_priv->jobs_sent[queue]);
+  v3d_queue_to_string(queue), stats->jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ee3545226d7f..5a198924d568 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue 
queue)
return "UNKNOWN";
 }
 
+struct v3d_stats {
+   u64 start_ns;
+   u64 enabled_ns;
+   u64 jobs_completed;
+};
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
u64 fence_context;
u64 emit_seqno;
 
-   u64 start_ns;
-   u64 enabled_ns;
-   u64 jobs_sent;
+   /* Stores the GPU stats for this queue in the global context. */
+   struct v3d_stats stats;
 };
 
 /* Performance monitor object. The perform lifetime is controlled by userspace
@@ -188,11 +193,8 @@ struct v3d_file_priv {
 
struct drm_sched_entity sched_entity[V3D_MAX_QUEUES];
 
-   u64 start_ns[V3D_MAX_QUEUES];
-
-   u64 enabled_ns[V3D_MAX_QUEUES];
-
-   u64 jobs_sent[V3D_MAX_QUEUES];
+   /* Stores the GPU stats for a specific queue for this fd. */
+   struct v3d_stats stats[V3D_MAX_QUEUES];
 };
 
 struct v3d_bo {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..0086081a9261 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -247,10 +247,10 @@ v3d_gem_init(struct drm_device *dev)
int ret, i;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   v3d->queue[i].start_ns = 0;
-   v3d->queue[i].enabled_ns = 0;
-   v3d->queue[i].jobs_sent = 0;
+   struct v3d_queue_state *queue = >queue[i];
+
+   queue->fence_context = dma_fence_context_alloc(1);
+   memset(>stats, 0, sizeof(queue->stats));
}

[PATCH v2 4/4] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

2024-04-16 Thread Maíra Canal
In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.

For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interruption happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.

In this simple example, the user would see a spike in the queue usage,
that didn't matches reality.

In order to address this issue properly, use a seqcount to protect read
and write sections of the code.

Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage 
stats")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 10 ++
 drivers/gpu/drm/v3d/v3d_drv.h   | 21 +
 drivers/gpu/drm/v3d/v3d_gem.c   |  7 +--
 drivers/gpu/drm/v3d/v3d_sched.c |  7 +++
 drivers/gpu/drm/v3d/v3d_sysfs.c | 11 +++
 5 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 52e3ba9df46f..cf15fa142968 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -121,6 +121,7 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
  1, NULL);
 
memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
+   seqcount_init(_priv->stats[i].lock);
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -150,20 +151,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = _priv->stats[queue];
+   u64 active_runtime, jobs_completed;
+
+   v3d_get_stats(stats, timestamp, _runtime, 
_completed);
 
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
-  v3d_queue_to_string(queue),
-  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
-  : stats->enabled_ns);
+  v3d_queue_to_string(queue), active_runtime);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), stats->jobs_completed);
+  v3d_queue_to_string(queue), jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 5a198924d568..5211df7c7317 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -40,8 +40,29 @@ struct v3d_stats {
u64 start_ns;
u64 enabled_ns;
u64 jobs_completed;
+
+   /*
+* This seqcount is used to protect the access to the GPU stats
+* variables. It must be used as, while we are reading the stats,
+* IRQs can happen and the stats can be updated.
+*/
+   seqcount_t lock;
 };
 
+static inline void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
+u64 *active_runtime, u64 *jobs_completed)
+{
+   unsigned int seq;
+
+   do {
+   seq = read_seqcount_begin(>lock);
+   *active_runtime = stats->enabled_ns;
+   if (stats->start_ns)
+   *active_runtime += timestamp - stats->start_ns;
+   *jobs_completed = stats->jobs_completed;
+   } while (read_seqcount_retry(>lock, seq));
+}
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index d14589d3ae6c..da8faf3b9011 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -247,8 +247,11 @@ v3d_gem_init(struct drm_device *dev)
int ret, i;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats));
+   struct v3d_queue_state *queue = >queue[i];
+
+   queue->fence_context = dma_fence_context_alloc(1);
+   memset(>stats, 0, sizeof(queue->stats));
+   seqcount_init(>stats.lock);
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/g

[PATCH v2 3/4] drm/v3d: Create function to update a set of GPU stats

2024-04-16 Thread Maíra Canal
Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index b6b5542c3fcf..b9614944931c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -118,6 +118,14 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
global_stats->start_ns = now;
 }
 
+static void
+v3d_stats_update(struct v3d_stats *stats, u64 now)
+{
+   stats->enabled_ns += now - stats->start_ns;
+   stats->jobs_completed++;
+   stats->start_ns = 0;
+}
+
 void
 v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue)
 {
@@ -127,13 +135,8 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_stats *local_stats = >stats[queue];
u64 now = local_clock();
 
-   local_stats->enabled_ns += now - local_stats->start_ns;
-   local_stats->jobs_completed++;
-   local_stats->start_ns = 0;
-
-   global_stats->enabled_ns += now - global_stats->start_ns;
-   global_stats->jobs_completed++;
-   global_stats->start_ns = 0;
+   v3d_stats_update(local_stats, now);
+   v3d_stats_update(global_stats, now);
 }
 
 static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
-- 
2.44.0



[PATCH v2 2/4] drm/v3d: Create a struct to store the GPU stats

2024-04-16 Thread Maíra Canal
This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.

Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 15 +++
 drivers/gpu/drm/v3d/v3d_drv.h   | 18 ++
 drivers/gpu/drm/v3d/v3d_gem.c   |  4 +---
 drivers/gpu/drm/v3d/v3d_sched.c | 20 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++
 5 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..52e3ba9df46f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -115,14 +115,12 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
v3d_priv->v3d = v3d;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d_priv->enabled_ns[i] = 0;
-   v3d_priv->start_ns[i] = 0;
-   v3d_priv->jobs_sent[i] = 0;
-
sched = >queue[i].sched;
drm_sched_entity_init(_priv->sched_entity[i],
  DRM_SCHED_PRIORITY_NORMAL, ,
  1, NULL);
+
+   memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -151,20 +149,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
enum v3d_queue queue;
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+   struct v3d_stats *stats = _priv->stats[queue];
+
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
   v3d_queue_to_string(queue),
-  file_priv->start_ns[queue] ? 
file_priv->enabled_ns[queue]
- + timestamp - 
file_priv->start_ns[queue]
- : 
file_priv->enabled_ns[queue]);
+  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
+  : stats->enabled_ns);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), 
file_priv->jobs_sent[queue]);
+  v3d_queue_to_string(queue), stats->jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ee3545226d7f..5a198924d568 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue 
queue)
return "UNKNOWN";
 }
 
+struct v3d_stats {
+   u64 start_ns;
+   u64 enabled_ns;
+   u64 jobs_completed;
+};
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
u64 fence_context;
u64 emit_seqno;
 
-   u64 start_ns;
-   u64 enabled_ns;
-   u64 jobs_sent;
+   /* Stores the GPU stats for this queue in the global context. */
+   struct v3d_stats stats;
 };
 
 /* Performance monitor object. The perform lifetime is controlled by userspace
@@ -188,11 +193,8 @@ struct v3d_file_priv {
 
struct drm_sched_entity sched_entity[V3D_MAX_QUEUES];
 
-   u64 start_ns[V3D_MAX_QUEUES];
-
-   u64 enabled_ns[V3D_MAX_QUEUES];
-
-   u64 jobs_sent[V3D_MAX_QUEUES];
+   /* Stores the GPU stats for a specific queue for this fd. */
+   struct v3d_stats stats[V3D_MAX_QUEUES];
 };
 
 struct v3d_bo {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..d14589d3ae6c 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -248,9 +248,7 @@ v3d_gem_init(struct drm_device *dev)
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   v3d->queue[i].start_ns = 0;
-   v3d->queue[i].enabled_ns = 0;
-   v3d->queue[i].jobs_sent = 0;
+   memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats));
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 8ca61bcd4b1

[PATCH v2 0/4] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-16 Thread Maíra Canal
The first version of this series had the intention to fix two major
issues with the GPU stats:

1. We were incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

The first of the issues was already addressed and the fix was applied to
drm-misc-fixes. Now, what is left, addresses the second issue.

Apart from addressing this issue, this series improved the GPU stats
code as a whole. We reduced code repetition as a whole, creating functions
to start and update the GPU stats. This will likely reduce the odds of
issue #1 happen again.

Best Regards,
- Maíra

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/

* As the first patch was a bugfix, it was pushed to drm-misc-fixes.
* [1/4]: Add Chema Casanova's R-b
* [2/4]: s/jobs_sent/jobs_completed and add the reasoning in the commit
message (Chema Casanova)
* [2/4]: Add Chema Casanova's and Tvrtko Ursulin's R-b
* [3/4]: Call `local_clock()` only once, by adding a new parameter to 
the
`v3d_stats_update` function (Chema Casanova)
* [4/4]: Move new line to the correct patch (2/4) (Tvrtko Ursulin)
* [4/4]: Use `seqcount_t` as locking primitive instead of a `rw_lock` 
(Tvrtko Ursulin)

Maíra Canal (4):
  drm/v3d: Create two functions to update all GPU stats variables
  drm/v3d: Create a struct to store the GPU stats
  drm/v3d: Create function to update a set of GPU stats
  drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

 drivers/gpu/drm/v3d/v3d_drv.c   | 19 +++
 drivers/gpu/drm/v3d/v3d_drv.h   | 40 +++---
 drivers/gpu/drm/v3d/v3d_gem.c   |  9 ++--
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++---
 drivers/gpu/drm/v3d/v3d_sched.c | 94 +
 drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++---
 6 files changed, 105 insertions(+), 118 deletions(-)

-- 
2.44.0



[PATCH v2 1/4] drm/v3d: Create two functions to update all GPU stats variables

2024-04-16 Thread Maíra Canal
Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment
`enabled_ns` twice").

Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.

Co-developed-by: Tvrtko Ursulin 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  1 +
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++--
 drivers/gpu/drm/v3d/v3d_sched.c | 80 +++--
 3 files changed, 40 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..ee3545226d7f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index ce6b2fb341d1..d469bda52c1a 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FLDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->bin_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_BIN];
-
-   file->jobs_sent[V3D_BIN]++;
-   v3d->queue[V3D_BIN].jobs_sent++;
-
-   file->start_ns[V3D_BIN] = 0;
-   v3d->queue[V3D_BIN].start_ns = 0;
-
-   file->enabled_ns[V3D_BIN] += runtime;
-   v3d->queue[V3D_BIN].enabled_ns += runtime;
 
+   v3d_job_update_stats(>bin_job->base, V3D_BIN);
trace_v3d_bcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FRDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->render_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
-
-   file->jobs_sent[V3D_RENDER]++;
-   v3d->queue[V3D_RENDER].jobs_sent++;
-
-   file->start_ns[V3D_RENDER] = 0;
-   v3d->queue[V3D_RENDER].start_ns = 0;
-
-   file->enabled_ns[V3D_RENDER] += runtime;
-   v3d->queue[V3D_RENDER].enabled_ns += runtime;
 
+   v3d_job_update_stats(>render_job->base, V3D_RENDER);
trace_v3d_rcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_CSDDONE(v3d->ver)) {
struct v3d_fence *fence =
to_v3d_fence(v3d->csd_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_CSD];
-
-   file->jobs_sent[V3D_CSD]++;
-   v3d->queue[V3D_CSD].jobs_sent++;
-
-   file->start_ns[V3D_CSD] = 0;
-   v3d->queue[V3D_CSD].start_ns = 0;
-
-   file->enabled_ns[V3D_CSD] += runtime;
-   v3d->queue[V3D_CSD].enabled_ns += runtime;
 
+   v3d_job_update_stats(>csd_job->base, V3D_CSD);
trace_v3d_csd_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg)
if (intsts & V3D_HUB_INT_TFUC) {
struct v3d_fence *fence =
to_v3d_fence(v3d->tfu_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_TFU];
-
-   file->jobs_sent[V3D_TFU]++;
-   v3d->queue[V3D_TFU].jobs_sent++;
-
-   file->start_ns[V3D_TFU] = 0;
-   v3d->queue[V3D_TFU].start_ns = 0;
-
-   file->enabled_ns[V3D_TFU] += runtime;
-   v3d->queue[V3D_TFU].enabled_ns += runtime;
 
+   v3d_job_u

Re: [PATCH v2 20/43] drm/vkms: Use fbdev-shmem

2024-04-16 Thread Maíra Canal

On 4/10/24 10:02, Thomas Zimmermann wrote:

Implement fbdev emulation with fbdev-shmem. Avoids the overhead of
fbdev-generic's additional shadow buffering. No functional changes.

Signed-off-by: Thomas Zimmermann 


Acked-by: Maíra Canal 

Best Regards,
- Maíra


Cc: Rodrigo Siqueira 
Cc: Melissa Wen 
Cc: "Maíra Canal" 
Cc: Haneen Mohammed 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/vkms/vkms_drv.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
index dd0af086e7fa9..8dc9dc13896e9 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.c
+++ b/drivers/gpu/drm/vkms/vkms_drv.c
@@ -17,7 +17,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
@@ -223,7 +223,7 @@ static int vkms_create(struct vkms_config *config)
if (ret)
goto out_devres;
  
-	drm_fbdev_generic_setup(_device->drm, 0);

+   drm_fbdev_shmem_setup(_device->drm, 0);
  
  	return 0;
  


Re: [PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-16 Thread Maíra Canal

On 4/16/24 02:30, Stefan Wahren wrote:

Hi Maíra,

Am 16.04.24 um 03:02 schrieb Maíra Canal:

On 4/15/24 13:54, Andre Przywara wrote:

On Mon, 15 Apr 2024 13:00:39 -0300
Maíra Canal  wrote:

Hi,


RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.


So I think Krzysztof's initial comment still stands: What does that
patch
actually change? If I build those DTBs as of now, none of them has a
status property in the v3d node. Which means it's enabled:
https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter2-devicetree-basics.rst#status

So adding an explicit 'status = "okay";' doesn't make a difference.

What do I miss here?


As mentioned by Stefan in the last version, in Raspberry Pi OS, there is
a systemd script which is trying to check for the V3D driver (/usr/lib
/systemd/scripts/gldriver_test.sh). Within the first check, "raspi-
config nonint is_kms" is called, which always seems to fail. What
"raspi-config" does is check if
/proc/device-tree/soc/v3d@7ec0/status is equal to "okay". As
/proc/device-tree/soc/v3d@7ec0/status doesn't exists, it returns
false.

yes, but i also mention that the V3D driver starts without this patch.
The commit message of this patch suggests this is a DT issue, which is not.

I hadn't the time to update my SD card to Bookworm yet. Does the issue
still exists with this version?


I'm using a 32-bit kernel and the recommended OS for 32-bit is Bullseye.
But I checked the Bookworm code and indeed, Bookworm doesn't check
the device tree [1].

I'm thinking about sending a patch to the Bullseye branch to fix this
issue.

[1] 
https://github.com/RPi-Distro/raspi-config/blob/966ed3fecc159ff3e69a774d74bfd716c04dafff/raspi-config#L128


Best Regards,
- Maíra



I'll send if I can improve the userspace tool by just checking if the
folder /proc/device-tree/soc/v3d@7ec0/ exists. >>
Thanks for the explanation!

Best Regards,
- Maíra



Cheers,
Andre


Signed-off-by: Maíra Canal 
---

v1 -> v2:
https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/

* As mentioned by Krzysztof, enabling should be done in last place of
override/extend. Therefore, I'm disabling V3D in the common dtsi
and enabling in the last place of extend, i.e. the RPi DTS files.

  arch/arm/boot/dts/broadcom/bcm2835-common.dtsi  | 1 +
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts    | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts    | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 
  15 files changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..69e34831de51 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
  compatible = "brcm,bcm2835-v3d";
  reg = <0x7ec0 0x1000>;
  interrupts = <1 10>;
+    status = "disabled";
  };
    vc4: gpu {
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
index 069b48272aa5..495ab1dfd2ce 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
@@ -128,3 +128,7 @@  {
  pinctrl-0 = <_gpio14>;
  status = "okay";
  };
+
+ {
+    status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
index 2726c00431e8..4634d88ce3af 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
@@ -121,3 +121,7 @@  {
  pinctrl-0 = <_gpio14>;
  status = "okay";
  };
+
+ {
+    status = "okay";
+};
diff --git a/arch/arm/boot/dts/broad

Re: [PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-15 Thread Maíra Canal

On 4/15/24 13:54, Andre Przywara wrote:

On Mon, 15 Apr 2024 13:00:39 -0300
Maíra Canal  wrote:

Hi,


RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.


So I think Krzysztof's initial comment still stands: What does that patch
actually change? If I build those DTBs as of now, none of them has a
status property in the v3d node. Which means it's enabled:
https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter2-devicetree-basics.rst#status
So adding an explicit 'status = "okay";' doesn't make a difference.

What do I miss here?


As mentioned by Stefan in the last version, in Raspberry Pi OS, there is
a systemd script which is trying to check for the V3D driver (/usr/lib
/systemd/scripts/gldriver_test.sh). Within the first check, "raspi-
config nonint is_kms" is called, which always seems to fail. What 
"raspi-config" does is check if 
/proc/device-tree/soc/v3d@7ec0/status is equal to "okay". As 
/proc/device-tree/soc/v3d@7ec0/status doesn't exists, it returns false.


I'll send if I can improve the userspace tool by just checking if the
folder /proc/device-tree/soc/v3d@7ec0/ exists.

Thanks for the explanation!

Best Regards,
- Maíra



Cheers,
Andre


Signed-off-by: Maíra Canal 
---

v1 -> v2: 
https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/

* As mentioned by Krzysztof, enabling should be done in last place of
override/extend. Therefore, I'm disabling V3D in the common dtsi
and enabling in the last place of extend, i.e. the RPi DTS files.

  arch/arm/boot/dts/broadcom/bcm2835-common.dtsi  | 1 +
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts| 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts| 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 
  15 files changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..69e34831de51 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec0 0x1000>;
interrupts = <1 10>;
+   status = "disabled";
};
  
  		vc4: gpu {

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
index 069b48272aa5..495ab1dfd2ce 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
@@ -128,3 +128,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
  };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
index 2726c00431e8..4634d88ce3af 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
@@ -121,3 +121,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
  };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
index c57b999a4520..45fa0f6851fc 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
@@ -130,3 +130,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
  };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
index ae6d3a9586ab..c1dac5d704aa 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
@@ -121,3 +121,7 

Re: [PATCH 1/5] drm/v3d: Don't increment `enabled_ns` twice

2024-04-15 Thread Maíra Canal

On 4/3/24 17:24, Maíra Canal wrote:

The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
introduced the calculation of global GPU stats. For the regards, it used
the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d:
Implement show_fdinfo() callback for GPU usage stats"). While adding
global GPU stats calculation ability, the author forgot to delete the
existing one.

Currently, the value of `enabled_ns` is incremented twice by the end of
the job, when it should be added just once. Therefore, delete the
leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage
stats on sysfs").

Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 


As this patch is a isolated bugfix and it was reviewed by two
developers, I'm applying it to drm-misc/drm-misc-fixes.

I'll address the feedback for the rest of the series later and send a
v2.

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_irq.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index 2e04f6cb661e..ce6b2fb341d1 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -105,7 +105,6 @@ v3d_irq(int irq, void *arg)
struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_BIN];
  
-		file->enabled_ns[V3D_BIN] += local_clock() - file->start_ns[V3D_BIN];

file->jobs_sent[V3D_BIN]++;
v3d->queue[V3D_BIN].jobs_sent++;
  
@@ -126,7 +125,6 @@ v3d_irq(int irq, void *arg)

struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
  
-		file->enabled_ns[V3D_RENDER] += local_clock() - file->start_ns[V3D_RENDER];

file->jobs_sent[V3D_RENDER]++;
v3d->queue[V3D_RENDER].jobs_sent++;
  
@@ -147,7 +145,6 @@ v3d_irq(int irq, void *arg)

struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_CSD];
  
-		file->enabled_ns[V3D_CSD] += local_clock() - file->start_ns[V3D_CSD];

file->jobs_sent[V3D_CSD]++;
v3d->queue[V3D_CSD].jobs_sent++;
  
@@ -195,7 +192,6 @@ v3d_hub_irq(int irq, void *arg)

struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_TFU];
  
-		file->enabled_ns[V3D_TFU] += local_clock() - file->start_ns[V3D_TFU];

file->jobs_sent[V3D_TFU]++;
v3d->queue[V3D_TFU].jobs_sent++;
  


Re: [PATCH] dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FS

2024-04-15 Thread Maíra Canal

Hi Tvrtko,

On 4/1/24 10:21, Tvrtko Ursulin wrote:


On 01/04/2024 13:45, Christian König wrote:

Am 01.04.24 um 14:39 schrieb Tvrtko Ursulin:


On 29/03/2024 00:00, T.J. Mercier wrote:
On Thu, Mar 28, 2024 at 7:53 AM Tvrtko Ursulin  
wrote:


From: Tvrtko Ursulin 

There is no point in compiling in the list and mutex operations 
which are
only used from the dma-buf debugfs code, if debugfs is not compiled 
in.


Put the code in questions behind some kconfig guards and so save 
some text

and maybe even a pointer per object at runtime when not enabled.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: T.J. Mercier 


Thanks!

How would patches to dma-buf be typically landed? Via what tree I 
mean? drm-misc-next?


That should go through drm-misc-next.

And feel free to add Reviewed-by: Christian König 
 as well.


Thanks!

Maarten if I got it right you are handling the next drm-misc-next pull - 
could you merge this one please?


Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra



Regards,

Tvrtko


Re: [PATCH] drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_buffer

2024-04-15 Thread Maíra Canal

On 4/15/24 12:19, Jocelyn Falempe wrote:

Hi,

You're right, I messed up the rename, and I mostly test on x86, where I 
don't build the imx driver.


Reviewed-by: Jocelyn Falempe 

Best regards,



Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra


[PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-15 Thread Maíra Canal
RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.

Signed-off-by: Maíra Canal 
---

v1 -> v2: 
https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/

* As mentioned by Krzysztof, enabling should be done in last place of
override/extend. Therefore, I'm disabling V3D in the common dtsi
and enabling in the last place of extend, i.e. the RPi DTS files.

 arch/arm/boot/dts/broadcom/bcm2835-common.dtsi  | 1 +
 arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts| 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts| 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 
 arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 
 15 files changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..69e34831de51 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec0 0x1000>;
interrupts = <1 10>;
+   status = "disabled";
};
 
vc4: gpu {
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
index 069b48272aa5..495ab1dfd2ce 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
@@ -128,3 +128,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
index 2726c00431e8..4634d88ce3af 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
@@ -121,3 +121,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
index c57b999a4520..45fa0f6851fc 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
@@ -130,3 +130,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
index ae6d3a9586ab..c1dac5d704aa 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
@@ -121,3 +121,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts
index 72764be75a79..72ca31f2a7d6 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts
@@ -115,3 +115,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts
index 3f9d198ac3ab..881a07d2f28f 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts
@@ -95,3 +95,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts
index 1f0b163e400c..1c7324067442 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts
@@ -134,6 

[PATCH] drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_buffer

2024-04-15 Thread Maíra Canal
On version 11, Thomas suggested to change the name of the function and
this request was applied on version 12, which is the version that
landed. Although the name of the function changed on the C file, it
didn't changed on the header file, leading to a compilation error as
such:

drivers/gpu/drm/imx/ipuv3/ipuv3-plane.c:780:24: error: use of undeclared
identifier 'drm_fb_dma_get_scanout_buffer'; did you mean 
'drm_panic_gem_get_scanout_buffer'?
  780 | .get_scanout_buffer = drm_fb_dma_get_scanout_buffer,
  |   ^
  |   drm_panic_gem_get_scanout_buffer
./include/drm/drm_fb_dma_helper.h:23:5: note: 'drm_panic_gem_get_scanout_buffer'
declared here
   23 | int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane,
  | ^
1 error generated.

Best Regards,
- Maíra

Fixes: 879b3b6511fe ("drm/fb_dma: Add generic get_scanout_buffer() for 
drm_panic"
Signed-off-by: Maíra Canal 
---
 include/drm/drm_fb_dma_helper.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/drm/drm_fb_dma_helper.h b/include/drm/drm_fb_dma_helper.h
index 61f24c2aba2f..c950732c6d36 100644
--- a/include/drm/drm_fb_dma_helper.h
+++ b/include/drm/drm_fb_dma_helper.h
@@ -6,6 +6,7 @@
 
 struct drm_device;
 struct drm_framebuffer;
+struct drm_plane;
 struct drm_plane_state;
 struct drm_scanout_buffer;
 
@@ -20,8 +21,8 @@ void drm_fb_dma_sync_non_coherent(struct drm_device *drm,
  struct drm_plane_state *old_state,
  struct drm_plane_state *state);
 
-int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane,
-struct drm_scanout_buffer *sb);
+int drm_fb_dma_get_scanout_buffer(struct drm_plane *plane,
+ struct drm_scanout_buffer *sb);
 
 #endif
 
-- 
2.44.0



Re: [PATCH] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-14 Thread Maíra Canal

Hi Phil,

On 4/14/24 15:43, Phil Elwell wrote:

Hello all,

On Fri, 12 Apr 2024 at 18:17, Stefan Wahren  wrote:


Hi Maíra,

[add Phil & Dave]

Am 12.04.24 um 15:25 schrieb Maíra Canal:

RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.

thanks for trying to improve the combination Raspberry Pi OS + Mainline
Kernel. I think i'm able to reproduce the issue with Raspberry Pi 3 B +
on Buster.


Buster? We launched Buster with 4.19 and ended on 5.10. We've moved
onto Bookworm now. A lot has changed in that time...


 From the kernel side everything looks good:

[   11.054833] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[   11.055118] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
[   11.055340] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4])
[   11.055521] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops
vc4_crtc_ops [vc4])
[   11.055695] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops
vc4_crtc_ops [vc4])
[   11.055874] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops
vc4_crtc_ops [vc4])
[   11.056020] vc4-drm soc:gpu: bound 3fc0.v3d (ops vc4_v3d_ops [vc4])
[   11.063277] Bluetooth: hci0: BCM4345C0
'brcm/BCM4345C0.raspberrypi,3-model-b-plus.hcd' Patch
[   11.070466] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[   11.174803] Console: switching to colour frame buffer device 240x75
[   11.205125] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

But in Raspberry Pi OS there is a systemd script which is trying to
check for the V3D driver /usr/lib/systemd/scripts/gldriver_test.sh
Within the first check "raspi-config nonint is_kms" is called, which
always seems to fail. If i run strace on this command it seems to check
for /proc/device-tree/soc/v3d@7ec0/status which doesn't exists in
the Mainline device tree.

Maybe there is a chance to improve the userspace tool?


...such as the raspi-config tool, which now always succeeds for is_kms.



I'm using Raspberry Pi OS Bulleye with the raspi-config tool on version
20231012~bulleye. I can still reproduce this issue when using a upstream
kernel.

I ran `sudo apt upgrade`, but a new version of the raspi-config tool
didn't appeared.

Best Regards,
- Maíra


Phil



Signed-off-by: Maíra Canal 
---

I decided to add the status property to the `bcm2835-common.dtsi`, but
there are two other options:

1. To add the status property to the `bcm2835-rpi-common.dtsi` file
2. To add the status property to each individual RPi model, e.g.
`bcm2837-rpi-3-b.dts`.

Let me know which option is more suitable, and if `bcm2835-common.dtsi`
is not the best option, I can send a v2.

Best Regards,
- Maíra

   arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 +
   1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..851a6bce1939 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
   compatible = "brcm,bcm2835-v3d";
   reg = <0x7ec0 0x1000>;
   interrupts = <1 10>;
+ status = "okay";
   };

   vc4: gpu {




[PATCH] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-12 Thread Maíra Canal
RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.

Signed-off-by: Maíra Canal 
---

I decided to add the status property to the `bcm2835-common.dtsi`, but
there are two other options:

1. To add the status property to the `bcm2835-rpi-common.dtsi` file
2. To add the status property to each individual RPi model, e.g.
`bcm2837-rpi-3-b.dts`.

Let me know which option is more suitable, and if `bcm2835-common.dtsi`
is not the best option, I can send a v2.

Best Regards,
- Maíra

 arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..851a6bce1939 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec0 0x1000>;
interrupts = <1 10>;
+   status = "okay";
};
 
vc4: gpu {
-- 
2.44.0



Re: [PATCH v6 3/8] drm/ci: uprev IGT and update testlist

2024-04-09 Thread Maíra Canal

On 4/9/24 12:15, Dmitry Baryshkov wrote:

On Tue, Apr 09, 2024 at 07:22:38PM +0530, Vignesh Raman wrote:

Hi Maíra,

On 09/04/24 15:10, Maíra Canal wrote:

On 4/9/24 05:13, Vignesh Raman wrote:

Uprev IGT and add amd, v3d, vc4 and vgem specific tests to
testlist and skip driver-specific tests in *-skips.txt.
Also add testlist to the MAINTAINERS file and update xfails.

A better approach would be to stop vendoring the testlist
into the kernel and instead use testlist from the IGT build
to ensure we do not miss renamed or newly added tests.
This implementation is planned for the future.


How problamatic would be to just do this in this test series, instead
of adding a huge testlist that we need to maintain synced with IGT?


Is it okay if these changes are submitted in another patch series to avoid
delaying the current one. There are patches like vkms which are
blocked due to the mesa uprev patch. We would also need to rerun all jobs
and update xfails with the new testlist. In next series we could uprev IGT
to the latest version and use the testlist from the build and remove the one
in drm-ci. We can also test with the latest kernel. I will work on this.
Please let me know your thoughts.


As we have to rebase/retest anyway, I think it makes more sense to land
from-IGT-test-list first, fixing it for the devices that are currently
present, and to land the rest afterwards. As for the IGT uprev we have
been waititng for that for quite a while (I think I've event sent a
patch a while ago) in order to fix test failures on drm/msm.


Agreed on that.

Best Regards
- Maíra





Regards,
Vignesh



Best Regards,
- Maíra



Acked-by: Helen Koike 
Signed-off-by: Vignesh Raman 
---

v3:
    - New patch in series to uprev IGT and update testlist.

v4:
    - Add testlists to the MAINTAINERS file and remove amdgpu xfails
changes.

v5:
    - Keep single testlist and update xfails. Skip driver specific tests.

v6:
    - Update xfails.

---
   MAINTAINERS   |   8 +
   drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
   drivers/gpu/drm/ci/testlist.txt   | 321 ++
   .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  25 +-
   .../drm/ci/xfails/amdgpu-stoney-flakes.txt    |  10 +-
   .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  23 +-
   drivers/gpu/drm/ci/xfails/i915-amly-fails.txt |   1 +
   drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   9 +-
   drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-cml-fails.txt  |   1 +
   drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   7 +
   drivers/gpu/drm/ci/xfails/i915-glk-fails.txt  |   2 +-
   drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   2 +
   drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |   1 +
   drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   9 +-
   .../drm/ci/xfails/mediatek-mt8173-fails.txt   |   3 -
   .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   6 +
   .../drm/ci/xfails/mediatek-mt8183-fails.txt   |   1 +
   .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   5 +
   .../gpu/drm/ci/xfails/meson-g12b-fails.txt    |   1 +
   .../gpu/drm/ci/xfails/meson-g12b-skips.txt    |   5 +
   .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   5 +
   .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   8 +-
   .../msm-sc7180-trogdor-kingoftown-skips.txt   |   6 +
   ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   6 +
   .../gpu/drm/ci/xfails/msm-sdm845-skips.txt    |   6 +
   .../drm/ci/xfails/rockchip-rk3288-fails.txt   |   1 +
   .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   8 +-
   .../drm/ci/xfails/rockchip-rk3399-fails.txt   |   1 +
   .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   6 +
   .../drm/ci/xfails/virtio_gpu-none-fails.txt   |  15 +
   .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   9 +-
   35 files changed, 532 insertions(+), 17 deletions(-)
   create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bc7e122a094..f7d0040a6c21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1665,6 +1665,7 @@ L:    dri-devel@lists.freedesktop.org
   S:    Supported
   T:    git git://anongit.freedesktop.org/drm/drm-misc
   F:    Documentation/gpu/panfrost.rst
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/panfrost/
   F:    include/uapi/drm/panfrost_drm.h
@@ -6753,6 +6754,7 @@ S:    Maintained
   B:    https://gitlab.freedesktop.org/drm/msm/-/issues
   T:    git https://gitlab.freedesktop.org/drm/msm.git
   F:    Documentation/devicetree/bindings/display/msm/
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers

Re: [PATCH v6 3/8] drm/ci: uprev IGT and update testlist

2024-04-09 Thread Maíra Canal

On 4/9/24 05:13, Vignesh Raman wrote:

Uprev IGT and add amd, v3d, vc4 and vgem specific tests to
testlist and skip driver-specific tests in *-skips.txt.
Also add testlist to the MAINTAINERS file and update xfails.

A better approach would be to stop vendoring the testlist
into the kernel and instead use testlist from the IGT build
to ensure we do not miss renamed or newly added tests.
This implementation is planned for the future.


How problamatic would be to just do this in this test series, instead
of adding a huge testlist that we need to maintain synced with IGT?

Best Regards,
- Maíra



Acked-by: Helen Koike 
Signed-off-by: Vignesh Raman 
---

v3:
   - New patch in series to uprev IGT and update testlist.

v4:
   - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes.

v5:
   - Keep single testlist and update xfails. Skip driver specific tests.

v6:
   - Update xfails.

---
  MAINTAINERS   |   8 +
  drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
  drivers/gpu/drm/ci/testlist.txt   | 321 ++
  .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  25 +-
  .../drm/ci/xfails/amdgpu-stoney-flakes.txt|  10 +-
  .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  23 +-
  drivers/gpu/drm/ci/xfails/i915-amly-fails.txt |   1 +
  drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   9 +-
  drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-cml-fails.txt  |   1 +
  drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   7 +
  drivers/gpu/drm/ci/xfails/i915-glk-fails.txt  |   2 +-
  drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   2 +
  drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |   1 +
  drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   9 +-
  .../drm/ci/xfails/mediatek-mt8173-fails.txt   |   3 -
  .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   6 +
  .../drm/ci/xfails/mediatek-mt8183-fails.txt   |   1 +
  .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   5 +
  .../gpu/drm/ci/xfails/meson-g12b-fails.txt|   1 +
  .../gpu/drm/ci/xfails/meson-g12b-skips.txt|   5 +
  .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   5 +
  .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   8 +-
  .../msm-sc7180-trogdor-kingoftown-skips.txt   |   6 +
  ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   6 +
  .../gpu/drm/ci/xfails/msm-sdm845-skips.txt|   6 +
  .../drm/ci/xfails/rockchip-rk3288-fails.txt   |   1 +
  .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   8 +-
  .../drm/ci/xfails/rockchip-rk3399-fails.txt   |   1 +
  .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   6 +
  .../drm/ci/xfails/virtio_gpu-none-fails.txt   |  15 +
  .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   9 +-
  35 files changed, 532 insertions(+), 17 deletions(-)
  create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bc7e122a094..f7d0040a6c21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1665,6 +1665,7 @@ L:dri-devel@lists.freedesktop.org
  S:Supported
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/gpu/panfrost.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/panfrost/
  F:include/uapi/drm/panfrost_drm.h
  
@@ -6753,6 +6754,7 @@ S:	Maintained

  B:https://gitlab.freedesktop.org/drm/msm/-/issues
  T:git https://gitlab.freedesktop.org/drm/msm.git
  F:Documentation/devicetree/bindings/display/msm/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/msm*
  F:drivers/gpu/drm/msm/
  F:include/uapi/drm/msm_drm.h
@@ -7047,6 +7049,7 @@ T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
  F:Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
  F:Documentation/gpu/meson.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/meson*
  F:drivers/gpu/drm/meson/
  
@@ -7160,6 +7163,7 @@ L:	dri-devel@lists.freedesktop.org

  L:linux-media...@lists.infradead.org (moderated for non-subscribers)
  S:Supported
  F:Documentation/devicetree/bindings/display/mediatek/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/mediatek*
  F:drivers/gpu/drm/mediatek/
  F:drivers/phy/mediatek/phy-mtk-dp.c
@@ -7211,6 +7215,7 @@ L:dri-devel@lists.freedesktop.org
  S:Maintained
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/rockchip/
+F: drivers/gpu/drm/ci/testlist.txt
  F:

Re: [PATCH 5/5] drm/vkms: Use drm_crtc_vblank_crtc()

2024-04-08 Thread Maíra Canal

On 4/8/24 16:06, Ville Syrjala wrote:

From: Ville Syrjälä 

Replace the open coded drm_crtc_vblank_crtc() with the real
thing.

Cc: Rodrigo Siqueira 
Cc: Melissa Wen 
Cc: "Maíra Canal" 
Cc: Haneen Mohammed 
Cc: Daniel Vetter 
Signed-off-by: Ville Syrjälä 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_crtc.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 61e500b8c9da..40b4d084e3ce 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -61,9 +61,7 @@ static enum hrtimer_restart vkms_vblank_simulate(struct 
hrtimer *timer)
  
  static int vkms_enable_vblank(struct drm_crtc *crtc)

  {
-   struct drm_device *dev = crtc->dev;
-   unsigned int pipe = drm_crtc_index(crtc);
-   struct drm_vblank_crtc *vblank = >vblank[pipe];
+   struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc);
struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
  
  	drm_calc_timestamping_constants(crtc, >mode);

@@ -88,10 +86,9 @@ static bool vkms_get_vblank_timestamp(struct drm_crtc *crtc,
  bool in_vblank_irq)
  {
struct drm_device *dev = crtc->dev;
-   unsigned int pipe = crtc->index;
struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev);
struct vkms_output *output = >output;
-   struct drm_vblank_crtc *vblank = >vblank[pipe];
+   struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc);
  
  	if (!READ_ONCE(vblank->enabled)) {

*vblank_time = ktime_get();


Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob

2024-04-08 Thread Maíra Canal

On 4/4/24 18:30, Adrián Larumbe wrote:

On 04.04.2024 11:31, Maíra Canal wrote:

On 4/4/24 11:00, Adrián Larumbe wrote:

This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose
the total GPU usage stats on sysfs"). The point is making broader GPU
occupancy numbers available through the sysfs interface, so that for every
job slot, its number of processed jobs and total processing time are
displayed.


Shouldn't we make this sysfs interface a generic DRM interface?
Something that would be standard for all drivers and that we could
integrate into gputop in the future.


I think the best way to generalise this sysfs knob would be to create a DRM
class attribute somewhere in drivers/gpu/drm/drm_sysfs.c and then adding a new
function to 'struct drm_driver' that would return a structure with the relevant
information (execution units and their names, number of processed jobs, etc).


These is exactly what I was thinking about.



What that information would exactly be is up to debate, I guess, since different
drivers might be interested in showing different bits of information.


I believe we can start with the requirements of V3D and Panfrost and 
them, expand from it.




Laying that down is important because the sysfs file would become part of the
device class API.


My PoV: it is important, but not completly tragic if we don't get it
perfect. Just like fdinfo, which is evolving.



I might come up with a new RFC patch series that does precisely that, at least
for v3d and Panfrost, and maybe other people could pitch in with the sort of
things they'd like to see for other drivers?


Yeah, this would be a great idea. Please, CC me on this series.

Best Regards,
- Maíra



Cheers,
Adrian


Best Regards,
- Maíra



Cc: Boris Brezillon 
Cc: Christopher Healy 
Signed-off-by: Adrián Larumbe 
---
   drivers/gpu/drm/panfrost/panfrost_device.h |  5 +++
   drivers/gpu/drm/panfrost/panfrost_drv.c| 49 --
   drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++-
   drivers/gpu/drm/panfrost/panfrost_job.h|  3 ++
   4 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index cffcb0ac7c11..1d343351c634 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -169,6 +169,11 @@ struct panfrost_engine_usage {
unsigned long long cycles[NUM_JOB_SLOTS];
   };
+struct panfrost_slot_usage {
+   u64 enabled_ns;
+   u64 jobs_sent;
+};
+
   struct panfrost_file_priv {
struct panfrost_device *pfdev;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ef9f6c0716d5..6afcde66270f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -8,6 +8,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -524,6 +525,10 @@ static const struct drm_ioctl_desc 
panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW),
   };
+static const char * const engine_names[] = {
+   "fragment", "vertex-tiler", "compute-only"
+};
+
   static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
 struct panfrost_file_priv *panfrost_priv,
 struct drm_printer *p)
@@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct 
panfrost_device *pfdev,
 *   job spent on the GPU.
 */
-   static const char * const engine_names[] = {
-   "fragment", "vertex-tiler", "compute-only"
-   };
-
BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
@@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev,
   static DEVICE_ATTR_RW(profiling);
+static ssize_t
+gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct panfrost_device *pfdev = dev_get_drvdata(dev);
+   struct panfrost_slot_usage stats;
+   u64 timestamp = local_clock();
+   ssize_t len = 0;
+   unsigned int i;
+
+   BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
+
+   len += sysfs_emit(buf, "queuetimestampjobs
runtime\n");
+   len += sysfs_emit_at(buf, len, 
"-\n");
+
+   for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+
+   stats = get_slot_stats(pfdev, i);
+
+   /*
+* Each line will display the slot name, timestamp, the number
+* of jobs handled by that engine and runtime, as shown below:
+*
+* queuetimestampjobsruntime
+* -

[PATCH v2 6/6] drm/v3d: Enable big and super pages

2024-04-05 Thread Maíra Canal
The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c| 21 +--
 drivers/gpu/drm/v3d/v3d_drv.c   |  8 ++
 drivers/gpu/drm/v3d/v3d_drv.h   |  2 ++
 drivers/gpu/drm/v3d/v3d_gemfs.c |  6 +
 drivers/gpu/drm/v3d/v3d_mmu.c   | 46 ++---
 5 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..cfe82232886a 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;

/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);

+   if (!v3d->super_pages)
+   align = SZ_4K;
+   else if (obj->size >= SZ_1M)
+   align = SZ_1M;
+   else if (obj->size >= SZ_64K)
+   align = SZ_64K;
+   else
+   align = SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
int ret;

-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->super_pages)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+ v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..3dbd29560be4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,12 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0

+static bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \
+  To enable Super Pages, you need support to \
+  enable THP.");
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
@@ -308,6 +314,8 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
return -ENOMEM;
}

+   v3d->super_pages = super_pages;
+
ret = v3d_gem_init(drm);
if (ret)
goto dma_free;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 17236ee23490..0a7aacf51164 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;

 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)

 #define V3D_MAX_QUEUES (V3D_CPU + 1)

@@ -121,6 +122,7 @@ struct v3d_dev {
 * tmpfs instance used for shmem backed objects
 */
struct vfsmount *gemfs;
+   bool super_pages;

struct work_struct overflow_mem_work;

diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..7ee55b32c36e 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -12,6 +12,10 @@ void v3d_gemfs_

[PATCH v2 5/6] drm/v3d: Reduce the alignment of the node allocation

2024-04-05 Thread Maíra Canal
Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 100 BOs
of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d2ce8222771a..17236ee23490 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0



[PATCH v2 4/6] drm/gem: Create shmem GEM object in a given mountpoint

2024-04-05 Thread Maíra Canal
Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++
 include/drm/drm_gem_shmem_helper.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+  struct vfsmount *gemfs)
 {
struct drm_gem_shmem_object *shmem;
struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, 
bool private)
drm_gem_private_object_init(dev, obj, size);
shmem->map_wc = false; /* dma-buf mappings use always 
writecombine */
} else {
-   ret = drm_gem_object_init(dev, obj, size);
+   ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
}
if (ret) {
drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t 
size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size)
 {
-   return __drm_gem_shmem_create(dev, size, false);
+   return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs)
+{
+   return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
size_t size = PAGE_ALIGN(attach->dmabuf->size);
struct drm_gem_shmem_object *shmem;
 
-   shmem = __drm_gem_shmem_create(dev, size, true);
+   shmem = __drm_gem_shmem_create(dev, size, true, NULL);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h 
b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0



[PATCH v2 3/6] drm/v3d: Introduce gemfs

2024-04-05 Thread Maíra Canal
Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/Makefile|  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
v3d_trace_points.o \
v3d_sched.o \
v3d_sysfs.o \
-   v3d_submit.o
+   v3d_submit.o \
+   v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..d2ce8222771a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -119,6 +119,11 @@ struct v3d_dev {
struct drm_mm mm;
spinlock_t mm_lock;
 
+   /*
+* tmpfs instance used for shmem backed objects
+*/
+   struct vfsmount *gemfs;
+
struct work_struct overflow_mem_work;
 
struct v3d_bin_job *bin_job;
@@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 66f4b78a6b2e..faefbe497e8d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
v3d_init_hw_state(v3d);
v3d_mmu_set_page_table(v3d);
 
+   v3d_gemfs_init(v3d);
+
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
@@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
struct v3d_dev *v3d = to_v3d_dev(dev);
 
v3d_sched_fini(v3d);
+   v3d_gemfs_fini(v3d);
 
/* Waiting for jobs to finish would need to be done before
 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index ..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include 
+#include 
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+   char huge_opt[] = "huge=within_size";
+   struct file_system_type *type;
+   struct vfsmount *gemfs;
+
+   /*
+* By creating our own shmemfs mountpoint, we can pass in
+* mount flags that better match our usecase. However, we
+* only do so on platforms which benefit from it.
+*/
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   goto err;
+
+   type = get_fs_type("tmpfs");
+   if (!type)
+   goto err;
+
+   gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+   if (IS_ERR(gemfs))
+   goto err;
+
+   v3d->gemfs = gemfs;
+   drm_info(>drm, "Using Transparent Hugepages\n");
+
+   return;
+
+err:
+   v3d->gemfs = NULL;
+   drm_notice(>drm,
+  "Transparent Hugepage support is recommended for optimal 
performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+   if (v3d->gemfs)
+   kern_unmount(v3d->gemfs);
+}
-- 
2.44.0



[PATCH v2 2/6] drm/gem: Create a drm_gem_object_init_with_mnt() function

2024-04-05 Thread Maíra Canal
For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/drm_gem.c | 34 ++
 include/drm/drm_gem.h |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }

 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-   struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs)
 {
struct file *filp;

drm_gem_private_object_init(dev, obj, size);

-   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+   if (gemfs)
+   filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+VM_NORESERVE);
+   else
+   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
if (IS_ERR(filp))
return PTR_ERR(filp);

@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,

return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+   size_t size)
+{
+   return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);

 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
--
2.44.0



[PATCH v2 0/6] drm/v3d: Enable Big and Super Pages

2024-04-05 Thread Maíra Canal
 performance.
This indicates an enhancement in the baseline scenario, rather than any 
detriment
caused by v2. Additionally, I've included stats from v1 in the comparisons. Upon
scrutinizing the average FPS of v2 in contrast to v1, it becomes evident that v2
not only maintains the improvements but may even surpass them.

This version provides a much safer way to iterate through memory and doesn't
hold to the same limitations as v1. For example, v1 had a hard-coded hack that
only allowed a huge page to be created if the BO was bigger than 2MB. These
limitations are gone now.

This series also introduces changes in the GEM helpers, in order to enable V3D
to have a separate mount point for shmfs GEM objects. Any feedback from the
community about the changes in the GEM helpers is welcomed!

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240311100959.205545-1-mca...@igalia.com/

* [1/6] Add Iago's R-b to PATCH 1/5 (Iago Toral)
* [2/6] Create a new function `drm_gem_object_init_with_mnt()` to define the
shmfs mountpoint. Now, we don't touch a bunch of drivers, as
`drm_gem_object_init()` preserves its signature (Tvrtko Ursulin)
* [3/6] Use `huge=within_size` instead of `huge=always`, in order to avoid OOM.
This also allow us to move away from the 2MB hack. (Tvrtko Ursulin)
* [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral)
* [5/6] Create a separate patch to reduce the alignment of the node
allocation (Iago Toral)
* [6/6] Complete refactoring to the way that we iterate through the
memory (Tvrtko Ursulin)
* [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us
misleading data (Tvrtko Ursulin)
* [6/6] Use both Big Pages (64K) and Super Pages (1MB)

Best Regards,
- Maíra

Maíra Canal (6):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Create a drm_gem_object_init_with_mnt() function
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Reduce the alignment of the node allocation
  drm/v3d: Enable big and super pages

 drivers/gpu/drm/drm_gem.c  | 34 +++--
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +--
 drivers/gpu/drm/v3d/Makefile   |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c   | 21 ++-
 drivers/gpu/drm/v3d/v3d_drv.c  |  8 
 drivers/gpu/drm/v3d/v3d_drv.h  | 13 ++-
 drivers/gpu/drm/v3d/v3d_gem.c  |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c| 52 ++
 drivers/gpu/drm/v3d/v3d_mmu.c  | 46 ++-
 include/drm/drm_gem.h  |  3 ++
 include/drm/drm_gem_shmem_helper.h |  3 ++
 11 files changed, 195 insertions(+), 24 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

--
2.44.0



[PATCH v2 1/6] drm/v3d: Fix return if scheduler initialization fails

2024-04-05 Thread Maíra Canal
If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..66f4b78a6b2e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev)
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
-   dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+   dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
  v3d->pt_paddr);
+   return ret;
}
 
return 0;
-- 
2.44.0



Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob

2024-04-04 Thread Maíra Canal

On 4/4/24 11:00, Adrián Larumbe wrote:

This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose
the total GPU usage stats on sysfs"). The point is making broader GPU
occupancy numbers available through the sysfs interface, so that for every
job slot, its number of processed jobs and total processing time are
displayed.


Shouldn't we make this sysfs interface a generic DRM interface?
Something that would be standard for all drivers and that we could
integrate into gputop in the future.

Best Regards,
- Maíra



Cc: Boris Brezillon 
Cc: Christopher Healy 
Signed-off-by: Adrián Larumbe 
---
  drivers/gpu/drm/panfrost/panfrost_device.h |  5 +++
  drivers/gpu/drm/panfrost/panfrost_drv.c| 49 --
  drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++-
  drivers/gpu/drm/panfrost/panfrost_job.h|  3 ++
  4 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index cffcb0ac7c11..1d343351c634 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -169,6 +169,11 @@ struct panfrost_engine_usage {
unsigned long long cycles[NUM_JOB_SLOTS];
  };
  
+struct panfrost_slot_usage {

+   u64 enabled_ns;
+   u64 jobs_sent;
+};
+
  struct panfrost_file_priv {
struct panfrost_device *pfdev;
  
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c

index ef9f6c0716d5..6afcde66270f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -8,6 +8,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -524,6 +525,10 @@ static const struct drm_ioctl_desc 
panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW),
  };
  
+static const char * const engine_names[] = {

+   "fragment", "vertex-tiler", "compute-only"
+};
+
  static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
 struct panfrost_file_priv *panfrost_priv,
 struct drm_printer *p)
@@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct 
panfrost_device *pfdev,
 *   job spent on the GPU.
 */
  
-	static const char * const engine_names[] = {

-   "fragment", "vertex-tiler", "compute-only"
-   };
-
BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
  
  	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {

@@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev,
  
  static DEVICE_ATTR_RW(profiling);
  
+static ssize_t

+gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct panfrost_device *pfdev = dev_get_drvdata(dev);
+   struct panfrost_slot_usage stats;
+   u64 timestamp = local_clock();
+   ssize_t len = 0;
+   unsigned int i;
+
+   BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
+
+   len += sysfs_emit(buf, "queuetimestampjobs
runtime\n");
+   len += sysfs_emit_at(buf, len, 
"-\n");
+
+   for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+
+   stats = get_slot_stats(pfdev, i);
+
+   /*
+* Each line will display the slot name, timestamp, the number
+* of jobs handled by that engine and runtime, as shown below:
+*
+* queuetimestampjobsruntime
+* -
+* fragment 12252943467507   638 1184747640
+* vertex-tiler 12252943467507   636 121663838
+*
+*/
+   len += sysfs_emit_at(buf, len, "%-13s%-17llu%-12llu%llu\n",
+engine_names[i],
+timestamp,
+stats.jobs_sent,
+stats.enabled_ns);
+   }
+
+   return len;
+}
+static DEVICE_ATTR_RO(gpu_stats);
+
  static struct attribute *panfrost_attrs[] = {
_attr_profiling.attr,
+   _attr_gpu_stats.attr,
NULL,
  };
  
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c

index a61ef0af9a4e..4c779e6f4cb0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -31,6 +31,8 @@ struct panfrost_queue_state {
struct drm_gpu_scheduler sched;
u64 fence_context;
u64 emit_seqno;
+
+   struct panfrost_slot_usage stats;
  };
  
  struct panfrost_job_slot {

@@ -160,15 +162,20 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int 
slot)
  
  	WARN_ON(!job);

if (job->is_profiled) {
+   u64 job_time = 

[PATCH 5/5] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

2024-04-03 Thread Maíra Canal
In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.

For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interruption happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to 
`gpu_stats_show()`
to calculate `active_runtime` and now, `active_runtime = timestamp`.

In this simple example, the user would see a spike in the queue usage,
that didn't matches reality.

In order to address this issue properly, use rw-locks to protect read
and write sections of the code.

Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage 
stats")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 16 
 drivers/gpu/drm/v3d/v3d_drv.h   |  7 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  7 +--
 drivers/gpu/drm/v3d/v3d_sched.c |  9 +
 drivers/gpu/drm/v3d/v3d_sysfs.c | 16 
 5 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index cbb62be18aa5..60437718786c 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -119,7 +119,9 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
drm_sched_entity_init(_priv->sched_entity[i],
  DRM_SCHED_PRIORITY_NORMAL, ,
  1, NULL);
+
memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
+   rwlock_init(_priv->stats[i].rw_lock);
}

v3d_perfmon_open_file(v3d_priv);
@@ -149,20 +151,26 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)

for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = _priv->stats[queue];
+   u64 active_time, jobs_sent;
+   unsigned long flags;
+
+   read_lock_irqsave(>rw_lock, flags);
+   active_time = stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
+ : stats->enabled_ns;
+   jobs_sent = stats->jobs_sent;
+   read_unlock_irqrestore(>rw_lock, flags);

/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
-  v3d_queue_to_string(queue),
-  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
-  : stats->enabled_ns);
+  v3d_queue_to_string(queue), active_time);

/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), stats->jobs_sent);
+  v3d_queue_to_string(queue), jobs_sent);
}
 }

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 0117593976ed..8fde2623f763 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -40,6 +40,13 @@ struct v3d_stats {
u64 start_ns;
u64 enabled_ns;
u64 jobs_sent;
+
+   /*
+* This lock is used to protect the access to the GPU stats variables.
+* It must be used as, while we are reading the stats, IRQs can happen
+* and the stats would be updated.
+*/
+   rwlock_t rw_lock;
 };

 struct v3d_queue_state {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index d14589d3ae6c..439088724a51 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -247,8 +247,11 @@ v3d_gem_init(struct drm_device *dev)
int ret, i;

for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats));
+   struct v3d_queue_state *queue = >queue[i];
+
+   queue->fence_context = dma_fence_context_alloc(1);
+   memset(>stats, 0, sizeof(queue->stats));
+   rwlock_init(>stats.rw_lock);
}

spin_lock_init(>mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 754107b80f67..640de6768b15 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -

[PATCH 4/5] drm/v3d: Create function to update a set of GPU stats

2024-04-03 Thread Maíra Canal
Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update all this set of
GPU stats.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index ea5f5a84b55b..754107b80f67 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -118,6 +118,16 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
global_stats->start_ns = now;
 }
 
+static void
+v3d_stats_update(struct v3d_stats *stats)
+{
+   u64 now = local_clock();
+
+   stats->enabled_ns += now - stats->start_ns;
+   stats->jobs_sent++;
+   stats->start_ns = 0;
+}
+
 void
 v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue)
 {
@@ -125,15 +135,9 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_file_priv *file = job->file->driver_priv;
struct v3d_stats *global_stats = >queue[queue].stats;
struct v3d_stats *local_stats = >stats[queue];
-   u64 now = local_clock();
-
-   local_stats->enabled_ns += now - local_stats->start_ns;
-   local_stats->jobs_sent++;
-   local_stats->start_ns = 0;
 
-   global_stats->enabled_ns += now - global_stats->start_ns;
-   global_stats->jobs_sent++;
-   global_stats->start_ns = 0;
+   v3d_stats_update(local_stats);
+   v3d_stats_update(global_stats);
 }
 
 static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
-- 
2.44.0



[PATCH 3/5] drm/v3d: Create a struct to store the GPU stats

2024-04-03 Thread Maíra Canal
This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 14 ++
 drivers/gpu/drm/v3d/v3d_drv.h   | 18 ++
 drivers/gpu/drm/v3d/v3d_gem.c   |  4 +---
 drivers/gpu/drm/v3d/v3d_sched.c | 20 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++
 5 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..cbb62be18aa5 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -115,14 +115,11 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
v3d_priv->v3d = v3d;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d_priv->enabled_ns[i] = 0;
-   v3d_priv->start_ns[i] = 0;
-   v3d_priv->jobs_sent[i] = 0;
-
sched = >queue[i].sched;
drm_sched_entity_init(_priv->sched_entity[i],
  DRM_SCHED_PRIORITY_NORMAL, ,
  1, NULL);
+   memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -151,20 +148,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
enum v3d_queue queue;
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+   struct v3d_stats *stats = _priv->stats[queue];
+
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
   v3d_queue_to_string(queue),
-  file_priv->start_ns[queue] ? 
file_priv->enabled_ns[queue]
- + timestamp - 
file_priv->start_ns[queue]
- : 
file_priv->enabled_ns[queue]);
+  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
+  : stats->enabled_ns);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), 
file_priv->jobs_sent[queue]);
+  v3d_queue_to_string(queue), stats->jobs_sent);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ee3545226d7f..0117593976ed 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue 
queue)
return "UNKNOWN";
 }
 
+struct v3d_stats {
+   u64 start_ns;
+   u64 enabled_ns;
+   u64 jobs_sent;
+};
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
u64 fence_context;
u64 emit_seqno;
 
-   u64 start_ns;
-   u64 enabled_ns;
-   u64 jobs_sent;
+   /* Stores the GPU stats for this queue in the global context. */
+   struct v3d_stats stats;
 };
 
 /* Performance monitor object. The perform lifetime is controlled by userspace
@@ -188,11 +193,8 @@ struct v3d_file_priv {
 
struct drm_sched_entity sched_entity[V3D_MAX_QUEUES];
 
-   u64 start_ns[V3D_MAX_QUEUES];
-
-   u64 enabled_ns[V3D_MAX_QUEUES];
-
-   u64 jobs_sent[V3D_MAX_QUEUES];
+   /* Stores the GPU stats for a specific queue for this fd. */
+   struct v3d_stats stats[V3D_MAX_QUEUES];
 };
 
 struct v3d_bo {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..d14589d3ae6c 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -248,9 +248,7 @@ v3d_gem_init(struct drm_device *dev)
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   v3d->queue[i].start_ns = 0;
-   v3d->queue[i].enabled_ns = 0;
-   v3d->queue[i].jobs_sent = 0;
+   memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats));
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 8ca61bcd4b1c..ea5f5a84b55b 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -110,10 +110,12 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
 {
struct v3d_dev *v3d = job->v3d;
struct v3d_file_priv *file = job->file->driver_priv;
+ 

[PATCH 0/5] drm/v3d: Fix GPU stats inconsistancies and race-condition

2024-04-03 Thread Maíra Canal
This series addresses two major issues on the GPU stats:

1. Currently, we are incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

Apart from addressing this issues, this series improved the GPU stats
code as a hole. We reduced code repetition as a hole, creating functions
to start and update the GPU stats. This will likely reduce the odds of
issue #1 happen again.

Note that I incrementally improved the code, creating small atomics
commits to ease the reviewing process. Also, I separated the first
patch, that has the fix to issue #1, in order to keep the fix separated
from code improvements.

The issue #1 is addressed on the first patch, while issue #2 is
addressed in the last patch. Patches #2 to #4 are code improvements.

Best Regards,
- Maíra

Maíra Canal (5):
  drm/v3d: Don't increment `enabled_ns` twice
  drm/v3d: Create two functions to update all GPU stats variables
  drm/v3d: Create a struct to store the GPU stats
  drm/v3d: Create function to update a set of GPU stats
  drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

 drivers/gpu/drm/v3d/v3d_drv.c   | 24 +---
 drivers/gpu/drm/v3d/v3d_drv.h   | 26 ++---
 drivers/gpu/drm/v3d/v3d_gem.c   |  9 +--
 drivers/gpu/drm/v3d/v3d_irq.c   | 52 ++
 drivers/gpu/drm/v3d/v3d_sched.c | 97 ++---
 drivers/gpu/drm/v3d/v3d_sysfs.c | 18 +++---
 6 files changed, 104 insertions(+), 122 deletions(-)

--
2.44.0



[PATCH 1/5] drm/v3d: Don't increment `enabled_ns` twice

2024-04-03 Thread Maíra Canal
The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
introduced the calculation of global GPU stats. For the regards, it used
the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d:
Implement show_fdinfo() callback for GPU usage stats"). While adding
global GPU stats calculation ability, the author forgot to delete the
existing one.

Currently, the value of `enabled_ns` is incremented twice by the end of
the job, when it should be added just once. Therefore, delete the
leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage
stats on sysfs").

Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_irq.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index 2e04f6cb661e..ce6b2fb341d1 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -105,7 +105,6 @@ v3d_irq(int irq, void *arg)
struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_BIN];
 
-   file->enabled_ns[V3D_BIN] += local_clock() - 
file->start_ns[V3D_BIN];
file->jobs_sent[V3D_BIN]++;
v3d->queue[V3D_BIN].jobs_sent++;
 
@@ -126,7 +125,6 @@ v3d_irq(int irq, void *arg)
struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
 
-   file->enabled_ns[V3D_RENDER] += local_clock() - 
file->start_ns[V3D_RENDER];
file->jobs_sent[V3D_RENDER]++;
v3d->queue[V3D_RENDER].jobs_sent++;
 
@@ -147,7 +145,6 @@ v3d_irq(int irq, void *arg)
struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_CSD];
 
-   file->enabled_ns[V3D_CSD] += local_clock() - 
file->start_ns[V3D_CSD];
file->jobs_sent[V3D_CSD]++;
v3d->queue[V3D_CSD].jobs_sent++;
 
@@ -195,7 +192,6 @@ v3d_hub_irq(int irq, void *arg)
struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_TFU];
 
-   file->enabled_ns[V3D_TFU] += local_clock() - 
file->start_ns[V3D_TFU];
file->jobs_sent[V3D_TFU]++;
v3d->queue[V3D_TFU].jobs_sent++;
 
-- 
2.44.0



[PATCH 2/5] drm/v3d: Create two functions to update all GPU stats variables

2024-04-03 Thread Maíra Canal
Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on the previous commit.

Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.

Co-developed-by: Tvrtko Ursulin 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  1 +
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++--
 drivers/gpu/drm/v3d/v3d_sched.c | 80 +++--
 3 files changed, 40 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..ee3545226d7f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index ce6b2fb341d1..d469bda52c1a 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FLDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->bin_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_BIN];
-
-   file->jobs_sent[V3D_BIN]++;
-   v3d->queue[V3D_BIN].jobs_sent++;
-
-   file->start_ns[V3D_BIN] = 0;
-   v3d->queue[V3D_BIN].start_ns = 0;
-
-   file->enabled_ns[V3D_BIN] += runtime;
-   v3d->queue[V3D_BIN].enabled_ns += runtime;
 
+   v3d_job_update_stats(>bin_job->base, V3D_BIN);
trace_v3d_bcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FRDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->render_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
-
-   file->jobs_sent[V3D_RENDER]++;
-   v3d->queue[V3D_RENDER].jobs_sent++;
-
-   file->start_ns[V3D_RENDER] = 0;
-   v3d->queue[V3D_RENDER].start_ns = 0;
-
-   file->enabled_ns[V3D_RENDER] += runtime;
-   v3d->queue[V3D_RENDER].enabled_ns += runtime;
 
+   v3d_job_update_stats(>render_job->base, V3D_RENDER);
trace_v3d_rcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_CSDDONE(v3d->ver)) {
struct v3d_fence *fence =
to_v3d_fence(v3d->csd_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_CSD];
-
-   file->jobs_sent[V3D_CSD]++;
-   v3d->queue[V3D_CSD].jobs_sent++;
-
-   file->start_ns[V3D_CSD] = 0;
-   v3d->queue[V3D_CSD].start_ns = 0;
-
-   file->enabled_ns[V3D_CSD] += runtime;
-   v3d->queue[V3D_CSD].enabled_ns += runtime;
 
+   v3d_job_update_stats(>csd_job->base, V3D_CSD);
trace_v3d_csd_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg)
if (intsts & V3D_HUB_INT_TFUC) {
struct v3d_fence *fence =
to_v3d_fence(v3d->tfu_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_TFU];
-
-   file->jobs_sent[V3D_TFU]++;
-   v3d->queue[V3D_TFU].jobs_sent++;
-
-   file->start_ns[V3D_TFU] = 0;
-   v3d->queue[V3D_TFU].start_ns = 0;
-
-   file->enabled_ns[V3D_TFU] += runtime;
-   v3d->queue[V3D_TFU].enabled_ns += runtime;
 
+   v3d_job_update_stats(>tfu_job->base, V3D_TFU);
trace_v3d_tfu_irq(>drm, fence->seq

Re: [PATCH v5 03/10] drm/ci: uprev IGT and update testlist

2024-04-02 Thread Maíra Canal

On 4/2/24 06:41, Dmitry Baryshkov wrote:

On Tue, Apr 02, 2024 at 12:35:17PM +0530, Vignesh Raman wrote:

Hi Maíra,

On 01/04/24 22:33, Maíra Canal wrote:

On 4/1/24 03:12, Vignesh Raman wrote:

Uprev IGT and add amd, v3d, vc4 and vgem specific tests to
testlist and skip driver-specific tests. Also add testlist
to the MAINTAINERS file and update xfails.

Signed-off-by: Vignesh Raman 
---

v3:
    - New patch in series to uprev IGT and update testlist.

v4:
    - Add testlists to the MAINTAINERS file and remove amdgpu xfails
changes.

v5:
    - Keep single testlist and update xfails. Skip driver specific tests.


Looks a bit odd to me to have a single testlist with the specific tests
in it. We will need to skip the specific tests on all *-skips.txt. Could
you justify this choice in the commit message?


The reason for choosing this option was a suggestion from Dmitry,
https://www.spinics.net/lists/dri-devel/msg437901.html


My suggestion was to stop vendoring the test list into the kernel and to
always use a test list from IGT. Otherwise it is very easy to miss
renamed or freshly added tests.



This makes much more sense to me.

Best Regards,
- Maíra


Also to keep it similar to IGT which has a single testlist. I will add this
justification in the commit message.

Regards,
Vignesh


Best Regards,
- Maíra



---
   MAINTAINERS   |   8 +
   drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
   drivers/gpu/drm/ci/testlist.txt   | 321 ++
   .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  25 +-
   .../drm/ci/xfails/amdgpu-stoney-flakes.txt    |  10 +-
   .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  23 +-
   drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   9 +-
   drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   7 +
   drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   9 +-
   .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   6 +
   .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   6 +
   .../gpu/drm/ci/xfails/meson-g12b-skips.txt    |   6 +
   .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   5 +
   .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   8 +-
   .../msm-sc7180-trogdor-kingoftown-skips.txt   |   6 +
   ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   6 +
   .../gpu/drm/ci/xfails/msm-sdm845-skips.txt    |   6 +
   .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   9 +-
   .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   7 +
   .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   9 +-
   24 files changed, 511 insertions(+), 13 deletions(-)
   create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bc7e122a094..f7d0040a6c21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1665,6 +1665,7 @@ L:    dri-devel@lists.freedesktop.org
   S:    Supported
   T:    git git://anongit.freedesktop.org/drm/drm-misc
   F:    Documentation/gpu/panfrost.rst
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/panfrost/
   F:    include/uapi/drm/panfrost_drm.h
@@ -6753,6 +6754,7 @@ S:    Maintained
   B:    https://gitlab.freedesktop.org/drm/msm/-/issues
   T:    git https://gitlab.freedesktop.org/drm/msm.git
   F:    Documentation/devicetree/bindings/display/msm/
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/ci/xfails/msm*
   F:    drivers/gpu/drm/msm/
   F:    include/uapi/drm/msm_drm.h
@@ -7047,6 +7049,7 @@ T:    git
git://anongit.freedesktop.org/drm/drm-misc
   F:
Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
   F:    Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
   F:    Documentation/gpu/meson.rst
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/ci/xfails/meson*
   F:    drivers/gpu/drm/meson/
@@ -7160,6 +7163,7 @@ L:    dri-devel@lists.freedesktop.org
   L:    linux-media...@lists.infradead.org (moderated for
non-subscribers)
   S:    Supported
   F:    Documentation/devicetree/bindings/display/mediatek/
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/ci/xfails/mediatek*
   F:    drivers/gpu/drm/mediatek/
   F:    drivers/phy/mediatek/phy-mtk-dp.c
@@ -7211,6 +7215,7 @@ L:    dri-devel@lists.freedesktop.org
   S:    Maintained
   T:    git git://anongit.freedesktop.org/drm/drm-misc
   F:    Documentation/devicetree/bindings/display/rockchip/
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/ci/xfails/rockchip*
   F:    drivers/gpu/drm/rockchip/
@@ -10739,6 +10744,7 @@ C:    irc://irc.oftc.net/intel-gfx
   T:    git git

Re: [PATCH v5 03/10] drm/ci: uprev IGT and update testlist

2024-04-01 Thread Maíra Canal

On 4/1/24 03:12, Vignesh Raman wrote:

Uprev IGT and add amd, v3d, vc4 and vgem specific tests to
testlist and skip driver-specific tests. Also add testlist
to the MAINTAINERS file and update xfails.

Signed-off-by: Vignesh Raman 
---

v3:
   - New patch in series to uprev IGT and update testlist.

v4:
   - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes.

v5:
   - Keep single testlist and update xfails. Skip driver specific tests.


Looks a bit odd to me to have a single testlist with the specific tests
in it. We will need to skip the specific tests on all *-skips.txt. Could
you justify this choice in the commit message?

Best Regards,
- Maíra



---
  MAINTAINERS   |   8 +
  drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
  drivers/gpu/drm/ci/testlist.txt   | 321 ++
  .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  25 +-
  .../drm/ci/xfails/amdgpu-stoney-flakes.txt|  10 +-
  .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  23 +-
  drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   9 +-
  drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   7 +
  drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   9 +-
  .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   6 +
  .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   6 +
  .../gpu/drm/ci/xfails/meson-g12b-skips.txt|   6 +
  .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   5 +
  .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   8 +-
  .../msm-sc7180-trogdor-kingoftown-skips.txt   |   6 +
  ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   6 +
  .../gpu/drm/ci/xfails/msm-sdm845-skips.txt|   6 +
  .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   9 +-
  .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   7 +
  .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   9 +-
  24 files changed, 511 insertions(+), 13 deletions(-)
  create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bc7e122a094..f7d0040a6c21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1665,6 +1665,7 @@ L:dri-devel@lists.freedesktop.org
  S:Supported
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/gpu/panfrost.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/panfrost/
  F:include/uapi/drm/panfrost_drm.h
  
@@ -6753,6 +6754,7 @@ S:	Maintained

  B:https://gitlab.freedesktop.org/drm/msm/-/issues
  T:git https://gitlab.freedesktop.org/drm/msm.git
  F:Documentation/devicetree/bindings/display/msm/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/msm*
  F:drivers/gpu/drm/msm/
  F:include/uapi/drm/msm_drm.h
@@ -7047,6 +7049,7 @@ T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
  F:Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
  F:Documentation/gpu/meson.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/meson*
  F:drivers/gpu/drm/meson/
  
@@ -7160,6 +7163,7 @@ L:	dri-devel@lists.freedesktop.org

  L:linux-media...@lists.infradead.org (moderated for non-subscribers)
  S:Supported
  F:Documentation/devicetree/bindings/display/mediatek/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/mediatek*
  F:drivers/gpu/drm/mediatek/
  F:drivers/phy/mediatek/phy-mtk-dp.c
@@ -7211,6 +7215,7 @@ L:dri-devel@lists.freedesktop.org
  S:Maintained
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/rockchip/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/rockchip*
  F:drivers/gpu/drm/rockchip/
  
@@ -10739,6 +10744,7 @@ C:	irc://irc.oftc.net/intel-gfx

  T:git git://anongit.freedesktop.org/drm-intel
  F:Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
  F:Documentation/gpu/i915.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/i915*
  F:drivers/gpu/drm/i915/
  F:include/drm/i915*
@@ -18255,6 +18261,7 @@ C:  irc://irc.oftc.net/radeon
  T:git https://gitlab.freedesktop.org/agd5f/linux.git
  F:Documentation/gpu/amdgpu/
  F:drivers/gpu/drm/amd/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/amd*
  F:drivers/gpu/drm/radeon/
  F:include/uapi/drm/amdgpu_drm.h
@@ -23303,6 +23310,7 @@ L:  dri-devel@lists.freedesktop.org
  L:virtualizat...@lists.linux.dev
  

Re: [PATCH v5 10/10] drm/ci: add tests on vkms

2024-04-01 Thread Maíra Canal

On 4/1/24 03:12, Vignesh Raman wrote:

Add job that runs igt on top of vkms.

Signed-off-by: Vignesh Raman 
Acked-by: Jessica Zhang 
Tested-by: Jessica Zhang 
Acked-by: Maxime Ripard 
Signed-off-by: Helen Koike 


Acked-by: Maíra Canal 

Best Regards,
- Maíra


---

v4:
   - New patch in the series.
 
https://lore.kernel.org/lkml/20240201065346.801038-1-vignesh.ra...@collabora.com/

v5:
   - No changes.

---
  MAINTAINERS   |  2 ++
  drivers/gpu/drm/ci/build.sh   |  1 -
  drivers/gpu/drm/ci/gitlab-ci.yml  |  3 +-
  drivers/gpu/drm/ci/igt_runner.sh  |  6 ++--
  drivers/gpu/drm/ci/image-tags.yml |  2 +-
  drivers/gpu/drm/ci/test.yml   | 24 +-
  drivers/gpu/drm/ci/x86_64.config  |  1 +
  .../drm/ci/xfails/virtio_gpu-none-fails.txt   |  1 -
  drivers/gpu/drm/ci/xfails/vkms-none-fails.txt | 33 +++
  .../gpu/drm/ci/xfails/vkms-none-flakes.txt| 20 +++
  drivers/gpu/drm/ci/xfails/vkms-none-skips.txt | 23 +
  11 files changed, 108 insertions(+), 8 deletions(-)
  create mode 100644 drivers/gpu/drm/ci/xfails/vkms-none-fails.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/vkms-none-flakes.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/vkms-none-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 333704ceefb6..c78c825508ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6962,6 +6962,8 @@ L:dri-devel@lists.freedesktop.org
  S:Maintained
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/gpu/vkms.rst
+F: drivers/gpu/drm/ci/testlist.txt
+F: drivers/gpu/drm/ci/xfails/vkms*
  F:drivers/gpu/drm/vkms/
  
  DRM DRIVER FOR VIRTUALBOX VIRTUAL GPU

diff --git a/drivers/gpu/drm/ci/build.sh b/drivers/gpu/drm/ci/build.sh
index 8a3baa003904..95493df9cdc2 100644
--- a/drivers/gpu/drm/ci/build.sh
+++ b/drivers/gpu/drm/ci/build.sh
@@ -156,7 +156,6 @@ fi
  
  mkdir -p artifacts/install/lib

  mv install/* artifacts/install/.
-rm -rf artifacts/install/modules
  ln -s common artifacts/install/ci-common
  cp .config artifacts/${CI_JOB_NAME}_config
  
diff --git a/drivers/gpu/drm/ci/gitlab-ci.yml b/drivers/gpu/drm/ci/gitlab-ci.yml

index 5b5d4a324659..df762d03533f 100644
--- a/drivers/gpu/drm/ci/gitlab-ci.yml
+++ b/drivers/gpu/drm/ci/gitlab-ci.yml
@@ -114,6 +114,7 @@ stages:
- panfrost
- powervr
- virtio-gpu
+  - software-driver
  
  # YAML anchors for rule conditions

  # 
@@ -269,4 +270,4 @@ sanity:
  
  # Jobs that need to pass before spending hardware resources on further testing

  .required-for-hardware-jobs:
-  needs: []
\ No newline at end of file
+  needs: []
diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
index ce6e22369d4d..c89acb974645 100755
--- a/drivers/gpu/drm/ci/igt_runner.sh
+++ b/drivers/gpu/drm/ci/igt_runner.sh
@@ -20,10 +20,10 @@ cat /sys/kernel/debug/dri/*/state
  set -e
  
  case "$DRIVER_NAME" in

-amdgpu)
+amdgpu|vkms)
  # Cannot use HWCI_KERNEL_MODULES as at that point we don't have the 
module in /lib
-mv /install/modules/lib/modules/* /lib/modules/.
-modprobe amdgpu
+mv /install/modules/lib/modules/* /lib/modules/. || true
+modprobe --first-time $DRIVER_NAME
  ;;
  esac
  
diff --git a/drivers/gpu/drm/ci/image-tags.yml b/drivers/gpu/drm/ci/image-tags.yml

index cf07c3e09b8c..bf861ab8b9c2 100644
--- a/drivers/gpu/drm/ci/image-tags.yml
+++ b/drivers/gpu/drm/ci/image-tags.yml
@@ -4,7 +4,7 @@ variables:
 DEBIAN_BASE_TAG: "${CONTAINER_TAG}"
  
 DEBIAN_X86_64_BUILD_IMAGE_PATH: "debian/x86_64_build"

-   DEBIAN_BUILD_TAG: "2023-10-08-config"
+   DEBIAN_BUILD_TAG: "2024-01-29-vkms"
  
 KERNEL_ROOTFS_TAG: "2023-10-06-amd"

 PKG_REPO_REV: "67f2c46b"
diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index 8c90ae5a51e6..8fed797a26b9 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -411,7 +411,7 @@ panfrost:g12b:
  - .panfrost-gpu
  
  virtio_gpu:none:

-  stage: virtio-gpu
+  stage: software-driver
variables:
  CROSVM_GALLIUM_DRIVER: llvmpipe
  DRIVER_NAME: virtio_gpu
@@ -431,3 +431,25 @@ virtio_gpu:none:
  - debian/x86_64_test-gl
  - testing:x86_64
  - igt:x86_64
+
+vkms:none:
+  stage: software-driver
+  variables:
+DRIVER_NAME: vkms
+GPU_VERSION: none
+  extends:
+- .test-gl
+- .test-rules
+  tags:
+- kvm
+  script:
+- ln -sf $CI_PROJECT_DIR/install /install
+- mv install/bzImage /lava-files/bzImage
+- mkdir -p /lib/modules
+- mkdir -p $CI_PROJECT_DIR/results
+- ln -sf $CI_PROJECT_DIR/results /results
+- ./install/crosvm-runner.sh ./install/igt_runner.sh
+  needs:
+- debian/x86_64_test-gl
+- testing:x86_64
+- igt:x86_64
diff --git a/drivers/gpu/drm/

Re: [PATCH v5 04/16] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

2024-03-27 Thread Maíra Canal

On 3/26/24 12:56, Louis Chauvet wrote:

Le 25/03/24 - 10:56, Maíra Canal a écrit :

On 3/13/24 14:44, Louis Chauvet wrote:

Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.

Rename input/output variable in a consistent way between read_line and
write_line.

A warn has been added in get_pixel_*_function to alert when an unsupported
pixel format is requested. As those formats are checked before
atomic_update callbacks, it should never append.

Document for those typedefs.

Signed-off-by: Louis Chauvet 
---
   drivers/gpu/drm/vkms/vkms_drv.h |  23 ++-
   drivers/gpu/drm/vkms/vkms_formats.c | 124 
+---
   drivers/gpu/drm/vkms/vkms_formats.h |   4 +-
   drivers/gpu/drm/vkms/vkms_plane.c   |   2 +-
   4 files changed, 95 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 18086423a3a7..4bfc62d26f08 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -53,12 +53,31 @@ struct line_buffer {
struct pixel_argb_u16 *pixels;
   };
   
+/**

+ * typedef pixel_write_t - These functions are used to read a pixel from a
+ * `struct pixel_argb_u16*`, convert it in a specific format and write it in 
the @dst_pixels
+ * buffer.


Your brief description looks a bit big to me. Also, take a look at the
cross-references docs [1].


Is this description sufficient?

typedef pixel_write_t - Convert a pixel from a  pixel_argb_u16 
into a specific format


Yeah.

Best Regards,
- Maíra

  

[1]
https://docs.kernel.org/doc-guide/kernel-doc.html#highlights-and-cross-references


+ *
+ * @out_pixel: destination address to write the pixel
+ * @in_pixel: pixel to write
+ */
+typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel);
+
   struct vkms_writeback_job {
struct iosys_map data[DRM_FORMAT_MAX_PLANES];
struct vkms_frame_info wb_frame_info;
-   void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+   pixel_write_t pixel_write;
   };
   
+/**

+ * typedef pixel_read_t - These functions are used to read a pixel in the 
source frame,
+ * convert it to `struct pixel_argb_u16` and write it to @out_pixel.


Same.


typedef pixel_read_t - Read a pixel and convert it to a  
pixel_argb_u16
  

+ *
+ * @in_pixel: Pointer to the pixel to read
+ * @out_pixel: Pointer to write the converted pixel


s/Pointer/pointer


Fixed in v6.


+ */
+typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+
   /**
* vkms_plane_state - Driver specific plane state
* @base: base plane state
@@ -69,7 +88,7 @@ struct vkms_writeback_job {
   struct vkms_plane_state {
struct drm_shadow_plane_state base;
struct vkms_frame_info *frame_info;
-   void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+   pixel_read_t pixel_read;
   };
   
   struct vkms_plane {

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 6e3dc8682ff9..55a4365d21a4 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -76,7 +76,7 @@ static int get_x_position(const struct vkms_frame_info 
*frame_info, int limit, i
* They are used in the `vkms_compose_row` function to handle multiple 
formats.
*/
   
-static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
   {
/*
 * The 257 is the "conversion ratio". This number is obtained by the
@@ -84,48 +84,48 @@ static void ARGB_to_argb_u16(u8 *src_pixels, struct 
pixel_argb_u16 *out_pixe
 * the best color value in a pixel format with more possibilities.
 * A similar idea applies to others RGB color conversions.
 */
-   out_pixel->a = (u16)src_pixels[3] * 257;
-   out_pixel->r = (u16)src_pixels[2] * 257;
-   out_pixel->g = (u16)src_pixels[1] * 257;
-   out_pixel->b = (u16)src_pixels[0] * 257;
+   out_pixel->a = (u16)in_pixel[3] * 257;
+   out_pixel->r = (u16)in_pixel[2] * 257;
+   out_pixel->g = (u16)in_pixel[1] * 257;
+   out_pixel->b = (u16)in_pixel[0] * 257;
   }
   
-static void XRGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
   {
out_pixel->a = (u16)0x;
-   out_pixel->r = (u16)src_pixels[2] * 257;
-   out_pixel->g = (u16)src_pixels[1] * 257;
-   out_pixel->b = (u16)src_pixels[0] * 257;
+   out_pixel->r = (u16)in_pixel[2] * 257;
+   out_pixel->g = (u16)in_pixel[1] * 257;
+   out_pixel->b = (u16)in_pixel[0] *

Re: [PATCH v2 05/14] drm: Suppress intentional warning backtraces in scaling unit tests

2024-03-25 Thread Maíra Canal

On 3/25/24 16:24, Guenter Roeck wrote:

Hi,

On Mon, Mar 25, 2024 at 04:05:06PM -0300, Maíra Canal wrote:

Hi Guenter,

On 3/25/24 14:52, Guenter Roeck wrote:

The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests
intentionally trigger warning backtraces by providing bad parameters to
the tested functions. What is tested is the return value, not the existence
of a warning backtrace. Suppress the backtraces to avoid clogging the
kernel log.

Tested-by: Linux Kernel Functional Testing 
Acked-by: Dan Carpenter 
Signed-off-by: Guenter Roeck 
---
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags

   drivers/gpu/drm/tests/drm_rect_test.c | 6 ++
   1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_rect_test.c 
b/drivers/gpu/drm/tests/drm_rect_test.c
index 76332cd2ead8..75614cb4deb5 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -406,22 +406,28 @@ KUNIT_ARRAY_PARAM(drm_rect_scale, drm_rect_scale_cases, 
drm_rect_scale_case_desc
   static void drm_test_rect_calc_hscale(struct kunit *test)
   {
+   DEFINE_SUPPRESSED_WARNING(drm_calc_scale);
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
+   START_SUPPRESSED_WARNING(drm_calc_scale);


I'm not sure if it is not that obvious only to me, but it would be nice
to have a comment here, remembering that we provide bad parameters in
some test cases.


Sure. Something like this ?

 /*
  * drm_rect_calc_hscale() generates a warning backtrace whenever bad
  * parameters are passed to it. This affects all unit tests with an
  * error code in expected_scaling_factor.
  */



Yeah, perfect. With that, feel free to add my

Acked-by: Maíra Canal 

Best Regards,
- Maíra


Thanks,
Guenter


Re: [PATCH v2 05/14] drm: Suppress intentional warning backtraces in scaling unit tests

2024-03-25 Thread Maíra Canal

Hi Guenter,

On 3/25/24 14:52, Guenter Roeck wrote:

The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests
intentionally trigger warning backtraces by providing bad parameters to
the tested functions. What is tested is the return value, not the existence
of a warning backtrace. Suppress the backtraces to avoid clogging the
kernel log.

Tested-by: Linux Kernel Functional Testing 
Acked-by: Dan Carpenter 
Signed-off-by: Guenter Roeck 
---
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags

  drivers/gpu/drm/tests/drm_rect_test.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_rect_test.c 
b/drivers/gpu/drm/tests/drm_rect_test.c
index 76332cd2ead8..75614cb4deb5 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -406,22 +406,28 @@ KUNIT_ARRAY_PARAM(drm_rect_scale, drm_rect_scale_cases, 
drm_rect_scale_case_desc
  
  static void drm_test_rect_calc_hscale(struct kunit *test)

  {
+   DEFINE_SUPPRESSED_WARNING(drm_calc_scale);
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
  
+	START_SUPPRESSED_WARNING(drm_calc_scale);


I'm not sure if it is not that obvious only to me, but it would be nice
to have a comment here, remembering that we provide bad parameters in
some test cases.

Best Regards,
- Maíra


scaling_factor = drm_rect_calc_hscale(>src, >dst,
  params->min_range, 
params->max_range);
+   END_SUPPRESSED_WARNING(drm_calc_scale);
  
  	KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);

  }
  
  static void drm_test_rect_calc_vscale(struct kunit *test)

  {
+   DEFINE_SUPPRESSED_WARNING(drm_calc_scale);
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
  
+	START_SUPPRESSED_WARNING(drm_calc_scale);

scaling_factor = drm_rect_calc_vscale(>src, >dst,
  params->min_range, 
params->max_range);
+   END_SUPPRESSED_WARNING(drm_calc_scale);
  
  	KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);

  }


Re: [PATCH v5 14/16] drm/vkms: Create KUnit tests for YUV conversions

2024-03-25 Thread Maíra Canal

On 3/13/24 14:45, Louis Chauvet wrote:

From: Arthur Grillo 

Create KUnit tests to test the conversion between YUV and RGB. Test each
conversion and range combination with some common colors.

The code used to compute the expected result can be found in comment.

Signed-off-by: Arthur Grillo 
[Louis Chauvet:
- fix minor formating issues (whitespace, double line)
- change expected alpha from 0x to 0x
- adapt to the new get_conversion_matrix usage
- apply the changes from Arthur
- move struct pixel_yuv_u8 to the test itself]


Again, a Co-developed-by tag might be more proper.


Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/Kconfig  |  15 ++
  drivers/gpu/drm/vkms/Makefile |   1 +
  drivers/gpu/drm/vkms/tests/.kunitconfig   |   4 +
  drivers/gpu/drm/vkms/tests/Makefile   |   3 +
  drivers/gpu/drm/vkms/tests/vkms_format_test.c | 230 ++
  drivers/gpu/drm/vkms/vkms_formats.c   |   7 +-
  drivers/gpu/drm/vkms/vkms_formats.h   |   4 +
  7 files changed, 262 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vkms/Kconfig b/drivers/gpu/drm/vkms/Kconfig
index b9ecdebecb0b..9b0e1940c14f 100644
--- a/drivers/gpu/drm/vkms/Kconfig
+++ b/drivers/gpu/drm/vkms/Kconfig
@@ -13,3 +13,18 @@ config DRM_VKMS
  a VKMS.
  
  	  If M is selected the module will be called vkms.

+
+config DRM_VKMS_KUNIT_TESTS
+   tristate "Tests for VKMS" if !KUNIT_ALL_TESTS


"KUnit tests for VKMS"


+   depends on DRM_VKMS && KUNIT
+   default KUNIT_ALL_TESTS
+   help
+ This builds unit tests for VKMS. This option is not useful for
+ distributions or general kernels, but only for kernel
+ developers working on VKMS.
+
+ For more information on KUnit and unit tests in general,
+ please refer to the KUnit documentation in
+ Documentation/dev-tools/kunit/.
+
+ If in doubt, say "N".
\ No newline at end of file
diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 1b28a6a32948..8d3e46dde635 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -9,3 +9,4 @@ vkms-y := \
vkms_writeback.o
  
  obj-$(CONFIG_DRM_VKMS) += vkms.o

+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += tests/
diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig 
b/drivers/gpu/drm/vkms/tests/.kunitconfig
new file mode 100644
index ..70e378228cbd
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/.kunitconfig
@@ -0,0 +1,4 @@
+CONFIG_KUNIT=y
+CONFIG_DRM=y
+CONFIG_DRM_VKMS=y
+CONFIG_DRM_VKMS_KUNIT_TESTS=y
diff --git a/drivers/gpu/drm/vkms/tests/Makefile 
b/drivers/gpu/drm/vkms/tests/Makefile
new file mode 100644
index ..2d1df668569e
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += vkms_format_test.o
diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c 
b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
new file mode 100644
index ..0954d606e44a
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include 
+
+#include 
+#include 
+#include 
+
+#include "../../drm_crtc_internal.h"
+
+#include "../vkms_drv.h"
+#include "../vkms_formats.h"
+
+#define TEST_BUFF_SIZE 50
+
+struct pixel_yuv_u8 {
+   u8 y, u, v;
+};
+
+struct yuv_u8_to_argb_u16_case {
+   enum drm_color_encoding encoding;
+   enum drm_color_range range;
+   size_t n_colors;
+   struct format_pair {
+   char *name;
+   struct pixel_yuv_u8 yuv;
+   struct pixel_argb_u16 argb;
+   } colors[TEST_BUFF_SIZE];
+};
+
+/*
+ * The YUV color representation were acquired via the colour python framework.
+ * Below are the function calls used for generating each case.
+ *
+ * for more information got to the docs:


s/for/For


+ * https://colour.readthedocs.io/en/master/generated/colour.RGB_to_YCbCr.html
+ */
+static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
+   /*
+* colour.RGB_to_YCbCr(,
+* K=colour.WEIGHTS_YCBCR["ITU-R BT.601"],
+* in_bits = 16,
+* in_legal = False,
+* in_int = True,
+* out_bits = 8,
+* out_legal = False,
+* out_int = True)
+*/


I feel that this Python code is kind of poluting the test cases.


+   {
+   .encoding = DRM_COLOR_YCBCR_BT601,
+   .range = DRM_COLOR_YCBCR_FULL_RANGE,
+   .n_colors = 6,
+   .colors = {
+   { "white", { 0xff, 0x80, 0x80 }, { 0x, 0x, 
0x, 0x }},
+   { "gray",  { 0x80, 0x80, 0x80 }, { 0x, 0x8080, 
0x8080, 0x8080 }},
+   { 

Re: [PATCH v5 11/16] drm/vkms: Add YUV support

2024-03-25 Thread Maíra Canal

On 3/13/24 14:45, Louis Chauvet wrote:

From: Arthur Grillo 

Add support to the YUV formats bellow:

- NV12/NV16/NV24
- NV21/NV61/NV42
- YUV420/YUV422/YUV444
- YVU420/YVU422/YVU444

The conversion from yuv to rgb is done with fixed-point arithmetic, using
32.32 floats and the drm_fixed helpers.

To do the conversion, a specific matrix must be used for each color range
(DRM_COLOR_*_RANGE) and encoding (DRM_COLOR_*). This matrix is stored in
the `conversion_matrix` struct, along with the specific y_offset needed.
This matrix is queried only once, in `vkms_plane_atomic_update` and
stored in a `vkms_plane_state`. Those conversion matrices of each
encoding and range were obtained by rounding the values of the original
conversion matrices multiplied by 2^32. This is done to avoid the use of
floating point operations.

The same reading function is used for YUV and YVU formats. As the only
difference between those two category of formats is the order of field, a
simple swap in conversion matrix columns allows using the same function.

Signed-off-by: Arthur Grillo 
[Louis Chauvet:
- Adapted Arthur's work
- Implemented the read_line_t callbacks for yuv
- add struct conversion_matrix
- remove struct pixel_yuv_u8
- update the commit message
- Merge the modifications from Arthur]


A Co-developed-by tag would be more appropriate.


Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_drv.h |  22 ++
  drivers/gpu/drm/vkms/vkms_formats.c | 431 
  drivers/gpu/drm/vkms/vkms_formats.h |   4 +
  drivers/gpu/drm/vkms/vkms_plane.c   |  17 +-
  4 files changed, 473 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 23e1d247468d..f3116084de5a 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -99,6 +99,27 @@ typedef void (*pixel_read_line_t)(const struct 
vkms_plane_state *plane, int x_st
  int y_start, enum pixel_read_direction 
direction, int count,
  struct pixel_argb_u16 out_pixel[]);
  
+/**

+ * CONVERSION_MATRIX_FLOAT_DEPTH - Number of digits after the point for 
conversion matrix values
+ */
+#define CONVERSION_MATRIX_FLOAT_DEPTH 32
+
+/**
+ * struct conversion_matrix - Matrix to use for a specific encoding and range
+ *
+ * @matrix: Conversion matrix from yuv to rgb. The matrix is stored in a 
row-major manner and is
+ * used to compute rgb values from yuv values:
+ * [[r],[g],[b]] = @matrix * [[y],[u],[v]]
+ *   OR for yvu formats:
+ * [[r],[g],[b]] = @matrix * [[y],[v],[u]]
+ *  The values of the matrix are fixed floats, 32.CONVERSION_MATRIX_FLOAT_DEPTH 
> + * @y_offest: Offset to apply on the y value.


s/y_offest/y_offset


+ */
+struct conversion_matrix {
+   s64 matrix[3][3];
+   s64 y_offset;
+};
+
  /**
   * vkms_plane_state - Driver specific plane state
   * @base: base plane state
@@ -110,6 +131,7 @@ struct vkms_plane_state {
struct drm_shadow_plane_state base;
struct vkms_frame_info *frame_info;
pixel_read_line_t pixel_read_line;
+   struct conversion_matrix *conversion_matrix;


Add @conversion_matrix on the kernel-doc from the struct
vkms_plane_state.


  };
  
  struct vkms_plane {

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 1449a0e6c706..edbf4b321b91 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -105,6 +105,44 @@ static int get_step_next_block(struct drm_framebuffer *fb, 
enum pixel_read_direc
return 0;
  }
  
+/**

+ * get_subsampling() - Get the subsampling divisor value on a specific 
direction


Where are the arguments?


+ */
+static int get_subsampling(const struct drm_format_info *format,
+  enum pixel_read_direction direction)
+{
+   switch (direction) {
+   case READ_BOTTOM_TO_TOP:
+   case READ_TOP_TO_BOTTOM:
+   return format->vsub;
+   case READ_RIGHT_TO_LEFT:
+   case READ_LEFT_TO_RIGHT:
+   return format->hsub;
+   }
+   WARN_ONCE(true, "Invalid direction for pixel reading: %d\n", direction);
+   return 1;
+}
+
+/**
+ * get_subsampling_offset() - An offset for keeping the chroma siting 
consistent regardless of
+ * x_start and y_start values


Same.


+ */
+static int get_subsampling_offset(enum pixel_read_direction direction, int 
x_start, int y_start)
+{
+   switch (direction) {
+   case READ_BOTTOM_TO_TOP:
+   return -y_start - 1;
+   case READ_TOP_TO_BOTTOM:
+   return y_start;
+   case READ_RIGHT_TO_LEFT:
+   return -x_start - 1;
+   case READ_LEFT_TO_RIGHT:
+   return x_start;
+   }
+   WARN_ONCE(true, "Invalid direction for pixel reading: %d\n", direction);
+   return 0;
+}
+
  /*
   * The following  functions take pixel data (a, r, g, b, pixel, ...), convert 
them to 

Re: [PATCH v5 10/16] drm/vkms: Re-introduce line-per-line composition algorithm

2024-03-25 Thread Maíra Canal

On 3/13/24 14:45, Louis Chauvet wrote:

Re-introduce a line-by-line composition algorithm for each pixel format.
This allows more performance by not requiring an indirection per pixel
read. This patch is focused on readability of the code.

Line-by-line composition was introduced by [1] but rewritten back to
pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
on performance, and it was merged.

This patch is almost a revert of [2], but in addition efforts have been
made to increase readability and maintainability of the rotation handling.
The blend function is now divided in two parts:
- Transformation of coordinates from the output referential to the source
referential
- Line conversion and blending

Most of the complexity of the rotation management is avoided by using
drm_rect_* helpers. The remaining complexity is around the clipping, to
avoid reading/writing outside source/destination buffers.

The pixel conversion is now done line-by-line, so the read_pixel_t was
replaced with read_pixel_line_t callback. This way the indirection is only
required once per line and per plane, instead of once per pixel and per
plane.

The read_line_t callbacks are very similar for most pixel format, but it
is required to avoid performance impact. Some helpers for color
conversion were introduced to avoid code repetition:
- *_to_argb_u16: perform colors conversion. They should be inlined by the
   compiler, and they are used to avoid repetition between multiple variants
   of the same format (argb/xrgb and maybe in the future for formats like
   bgr formats).

This new algorithm was tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations of planes)
- kms_rotation (for all rotations and formats combinations) [3]
The performance gain was mesured with:
- kms_fb_stress


Could you tell us what was the performance gain?



[1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
  new formats")
  
https://lore.kernel.org/all/20220905190811.25024-7-igormtorre...@gmail.com/
[2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
  functionality")
  https://lore.kernel.org/all/20230418130525.128733-2-mca...@igalia.com/
[3]:

Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_composer.c | 167 +++--
  drivers/gpu/drm/vkms/vkms_drv.h  |  27 ++--
  drivers/gpu/drm/vkms/vkms_formats.c  | 236 ++-
  drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
  drivers/gpu/drm/vkms/vkms_plane.c|   5 +-
  5 files changed, 292 insertions(+), 145 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
b/drivers/gpu/drm/vkms/vkms_composer.c
index 989bcf59f375..5d78c33dbf41 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer 
*stage_buffer,
struct line_buffer *output_buffer, int x_start, 
int pixel_count)
  {
struct pixel_argb_u16 *out = _buffer->pixels[x_start];
-   const struct pixel_argb_u16 *in = stage_buffer->pixels;
+   const struct pixel_argb_u16 *in = _buffer->pixels[x_start];
  
  	for (int i = 0; i < pixel_count; i++) {

out[i].a = (u16)0x;
@@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer 
*stage_buffer,
}
  }
  
-static int get_y_pos(struct vkms_frame_info *frame_info, int y)

-{
-   if (frame_info->rotation & DRM_MODE_REFLECT_Y)
-   return drm_rect_height(_info->rotated) - y - 1;
-
-   switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
-   case DRM_MODE_ROTATE_90:
-   return frame_info->rotated.x2 - y - 1;
-   case DRM_MODE_ROTATE_270:
-   return y + frame_info->rotated.x1;
-   default:
-   return y;
-   }
-}
-
-static bool check_limit(struct vkms_frame_info *frame_info, int pos)
-{
-   if (drm_rotation_90_or_270(frame_info->rotation)) {
-   if (pos >= 0 && pos < drm_rect_width(_info->rotated))
-   return true;
-   } else {
-   if (pos >= frame_info->rotated.y1 && pos < 
frame_info->rotated.y2)
-   return true;
-   }
-
-   return false;
-}
  
  static void fill_background(const struct pixel_argb_u16 *background_color,

struct line_buffer *output_buffer)
@@ -215,34 +188,146 @@ static void blend(struct vkms_writeback_job *wb,
  {
struct vkms_plane_state **plane = crtc_state->active_planes;
u32 n_active_planes = crtc_state->num_active_planes;
-   int y_pos, x_dst, x_limit;
  
  	const struct pixel_argb_u16 background_color = { .a = 0x };
  
-	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;

+   int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
+   int crtc_x_limit = 

Re: [PATCH v5 09/16] drm/vkms: Introduce pixel_read_direction enum

2024-03-25 Thread Maíra Canal

On 3/13/24 14:45, Louis Chauvet wrote:

The pixel_read_direction enum is useful to describe the reading direction
in a plane. It avoids using the rotation property of DRM, which not
practical to know the direction of reading.
This patch also introduce two helpers, one to compute the
pixel_read_direction from the DRM rotation property, and one to compute
the step, in byte, between two successive pixel in a specific direction.

Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_composer.c | 36 
  drivers/gpu/drm/vkms/vkms_drv.h  | 11 +++
  drivers/gpu/drm/vkms/vkms_formats.c  | 30 ++
  3 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
b/drivers/gpu/drm/vkms/vkms_composer.c
index 9254086f23ff..989bcf59f375 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -159,6 +159,42 @@ static void apply_lut(const struct vkms_crtc_state 
*crtc_state, struct line_buff
}
  }
  
+/**

+ * direction_for_rotation() - Get the correct reading direction for a given 
rotation
+ *
+ * This function will use the @rotation setting of a source plane to compute 
the reading
+ * direction in this plane which correspond to a "left to right writing" in 
the CRTC.
+ * For example, if the buffer is reflected on X axis, the pixel must be read 
from right to left
+ * to be written from left to right on the CRTC.
+ *
+ * @rotation: Rotation to analyze. It correspond the field 
@frame_info.rotation.


A bit unusual to see arguments after the description.


+ */
+static enum pixel_read_direction direction_for_rotation(unsigned int rotation)
+{
+   if (rotation & DRM_MODE_ROTATE_0) {
+   if (rotation & DRM_MODE_REFLECT_X)
+   return READ_RIGHT_TO_LEFT;
+   else
+   return READ_LEFT_TO_RIGHT;
+   } else if (rotation & DRM_MODE_ROTATE_90) {
+   if (rotation & DRM_MODE_REFLECT_Y)
+   return READ_BOTTOM_TO_TOP;
+   else
+   return READ_TOP_TO_BOTTOM;
+   } else if (rotation & DRM_MODE_ROTATE_180) {
+   if (rotation & DRM_MODE_REFLECT_X)
+   return READ_LEFT_TO_RIGHT;
+   else
+   return READ_RIGHT_TO_LEFT;
+   } else if (rotation & DRM_MODE_ROTATE_270) {
+   if (rotation & DRM_MODE_REFLECT_Y)
+   return READ_TOP_TO_BOTTOM;
+   else
+   return READ_BOTTOM_TO_TOP;
+   }
+   return READ_LEFT_TO_RIGHT;
+}
+
  /**
   * blend - blend the pixels from all planes and compute crc
   * @wb: The writeback frame buffer metadata
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 3ead8b39af4a..985e7a92b7bc 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -69,6 +69,17 @@ struct vkms_writeback_job {
pixel_write_t pixel_write;
  };
  
+/**

+ * enum pixel_read_direction - Enum used internaly by VKMS to represent a 
reading direction in a
+ * plane.
+ */
+enum pixel_read_direction {
+   READ_BOTTOM_TO_TOP,
+   READ_TOP_TO_BOTTOM,
+   READ_RIGHT_TO_LEFT,
+   READ_LEFT_TO_RIGHT
+};
+
  /**
   * typedef pixel_read_t - These functions are used to read a pixel in the 
source frame,
   * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 649d75d05b1f..743b6fd06db5 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -75,6 +75,36 @@ static void packed_pixels_addr(const struct vkms_frame_info 
*frame_info,
*addr = (u8 *)frame_info->map[0].vaddr + offset;
  }
  
+/**

+ * get_step_next_block() - Common helper to compute the correct step value 
between each pixel block
+ * to read in a certain direction.
+ *
+ * As the returned offset is the number of bytes between two consecutive 
blocks in a direction,
+ * the caller may have to read multiple pixel before using the next one (for 
example, to read from
+ * left to right in a DRM_FORMAT_R1 plane, each block contains 8 pixels, so 
the step must be used
+ * only every 8 pixels.
+ *
+ * @fb: Framebuffer to iter on
+ * @direction: Direction of the reading
+ * @plane_index: Plane to get the step from


Same.

Best Regards,
- Maíra


+ */
+static int get_step_next_block(struct drm_framebuffer *fb, enum 
pixel_read_direction direction,
+  int plane_index)
+{
+   switch (direction) {
+   case READ_LEFT_TO_RIGHT:
+   return fb->format->char_per_block[plane_index];
+   case READ_RIGHT_TO_LEFT:
+   return -fb->format->char_per_block[plane_index];
+   case READ_TOP_TO_BOTTOM:
+   return (int)fb->pitches[plane_index];
+   case READ_BOTTOM_TO_TOP:
+   

Re: [PATCH v5 06/16] drm/vkms: Use const for input pointers in pixel_read an pixel_write functions

2024-03-25 Thread Maíra Canal

On 3/13/24 14:45, Louis Chauvet wrote:

As the pixel_read and pixel_write function should never modify the input
buffer, mark those pointers const.

Signed-off-by: Louis Chauvet 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_drv.h |  4 ++--
  drivers/gpu/drm/vkms/vkms_formats.c | 24 
  2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 4bfc62d26f08..3ead8b39af4a 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -61,7 +61,7 @@ struct line_buffer {
   * @out_pixel: destination address to write the pixel
   * @in_pixel: pixel to write
   */
-typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel);
+typedef void (*pixel_write_t)(u8 *out_pixel, const struct pixel_argb_u16 
*in_pixel);
  
  struct vkms_writeback_job {

struct iosys_map data[DRM_FORMAT_MAX_PLANES];
@@ -76,7 +76,7 @@ struct vkms_writeback_job {
   * @in_pixel: Pointer to the pixel to read
   * @out_pixel: Pointer to write the converted pixel
   */
-typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel);
  
  /**

   * vkms_plane_state - Driver specific plane state
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index b57d85b8b935..b2f8dfc26c35 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -76,7 +76,7 @@ static int get_x_position(const struct vkms_frame_info 
*frame_info, int limit, i
   * They are used in the `vkms_compose_row` function to handle multiple 
formats.
   */
  
-static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)

+static void ARGB_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
/*
 * The 257 is the "conversion ratio". This number is obtained by the
@@ -90,7 +90,7 @@ static void ARGB_to_argb_u16(u8 *in_pixel, struct 
pixel_argb_u16 *out_pixel)
out_pixel->b = (u16)in_pixel[0] * 257;
  }
  
-static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)

+static void XRGB_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
out_pixel->a = (u16)0x;
out_pixel->r = (u16)in_pixel[2] * 257;
@@ -98,7 +98,7 @@ static void XRGB_to_argb_u16(u8 *in_pixel, struct 
pixel_argb_u16 *out_pixel)
out_pixel->b = (u16)in_pixel[0] * 257;
  }
  
-static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)

+static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
u16 *pixel = (u16 *)in_pixel;
  
@@ -108,7 +108,7 @@ static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pi

out_pixel->b = le16_to_cpu(pixel[0]);
  }
  
-static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)

+static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
u16 *pixel = (u16 *)in_pixel;
  
@@ -118,7 +118,7 @@ static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pi

out_pixel->b = le16_to_cpu(pixel[0]);
  }
  
-static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)

+static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
u16 *pixel = (u16 *)in_pixel;
  
@@ -143,7 +143,7 @@ static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)

   * It is used to avoid null pointer to be used as a function. In theory, this 
function should
   * never be called, except if you found a bug in the driver/DRM core.
   */
-static void black_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static void black_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
out_pixel->a = (u16)0x;
out_pixel->r = 0;
@@ -189,7 +189,7 @@ void vkms_compose_row(struct line_buffer *stage_buffer, 
struct vkms_plane_state
   * They are used in the `vkms_writeback_row` to convert and store a pixel 
from the src_buffer to
   * the writeback buffer.
   */
-static void argb_u16_to_ARGB(u8 *out_pixel, struct pixel_argb_u16 
*in_pixel)
+static void argb_u16_to_ARGB(u8 *out_pixel, const struct pixel_argb_u16 
*in_pixel)
  {
/*
 * This sequence below is important because the format's byte order is
@@ -207,7 +207,7 @@ static void argb_u16_to_ARGB(u8 *out_pixel, struct 
pixel_argb_u16 *in_pixel)
out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
  }
  
-static void argb_u16_to_XRGB(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)

+static void argb_u16_to_XRGB(u8 *out_pixel, const struct pixel_argb_u16 
*in_pixel)
  {
out_pixel[3] = 0xff;
   

Re: [PATCH v5 05/16] drm/vkms: Add dummy pixel_read/pixel_write callbacks to avoid NULL pointers

2024-03-25 Thread Maíra Canal

On 3/13/24 14:44, Louis Chauvet wrote:

Introduce two callbacks which does nothing. They are used in replacement
of NULL and it avoid kernel OOPS if this NULL is called.

If those callback are used, it means that there is a mismatch between
what formats are announced by atomic_check and what is realy supported by
atomic_update.

Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_formats.c | 43 +++--
  1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 55a4365d21a4..b57d85b8b935 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -136,6 +136,21 @@ static void RGB565_to_argb_u16(u8 *in_pixel, struct 
pixel_argb_u16 *out_pixel)
out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
  }
  
+/**

+ * black_to_argb_u16() - pixel_read callback which always read black
+ *
+ * This callback is used when an invalid format is requested for plane reading.
+ * It is used to avoid null pointer to be used as a function. In theory, this 
function should
+ * never be called, except if you found a bug in the driver/DRM core.
+ */
+static void black_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+{
+   out_pixel->a = (u16)0x;
+   out_pixel->r = 0;
+   out_pixel->g = 0;
+   out_pixel->b = 0;
+}
+
  /**
   * vkms_compose_row - compose a single row of a plane
   * @stage_buffer: output line with the composed pixels
@@ -238,6 +253,16 @@ static void argb_u16_to_RGB565(u8 *out_pixel, struct 
pixel_argb_u16 *in_pixel)
*pixel = cpu_to_le16(r << 11 | g << 5 | b);
  }
  
+/**

+ * argb_u16_to_nothing() - pixel_write callback with no effect
+ *
+ * This callback is used when an invalid format is requested for writeback.
+ * It is used to avoid null pointer to be used as a function. In theory, this 
should never
+ * happen, except if there is a bug in the driver
+ */
+static void argb_u16_to_nothing(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+{}
+
  /**
   * Generic loop for all supported writeback format. It is executed just after 
the blending to
   * write a line in the writeback buffer.
@@ -261,8 +286,8 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
  
  /**

   * Retrieve the correct read_pixel function for a specific format.
- * The returned pointer is NULL for unsupported pixel formats. The caller must 
ensure that the
- * pointer is valid before using it in a vkms_plane_state.
+ * If the format is not supported by VKMS a warn is emitted and a dummy "always 
read black"


"If the format is not supported by VKMS, a warning is emitted and a 
dummy "always read black"..."



+ * function is returned.
   *
   * @format: DRM_FORMAT_* value for which to obtain a conversion function (see 
[drm_fourcc.h])
   */
@@ -285,18 +310,21 @@ pixel_read_t get_pixel_read_function(u32 format)
 * format must:
 * - Be listed in vkms_formats in vkms_plane.c
 * - Have a pixel_read callback defined here
+*
+* To avoid kernel crash, a dummy "always read black" function 
is used. It means
+* that during the composition, this plane will always be black.
 */
WARN(true,
 "Pixel format %p4cc is not supported by VKMS planes. This is a 
kernel bug, atomic check must forbid this configuration.\n",
 );
-   return (pixel_read_t)NULL;
+   return _to_argb_u16;
}
  }
  
  /**

   * Retrieve the correct write_pixel function for a specific format.
- * The returned pointer is NULL for unsupported pixel formats. The caller must 
ensure that the
- * pointer is valid before using it in a vkms_writeback_job.
+ * If the format is not supported by VKMS a warn is emitted and a dummy "don't do 
anything"


"If the format is not supported by VKMS, a warning is emitted and a 
dummy "don't do anything"..."


Best Regards,
- Maíra


+ * function is returned.
   *
   * @format: DRM_FORMAT_* value for which to obtain a conversion function (see 
[drm_fourcc.h])
   */
@@ -319,10 +347,13 @@ pixel_write_t get_pixel_write_function(u32 format)
 * format must:
 * - Be listed in vkms_wb_formats in vkms_writeback.c
 * - Have a pixel_write callback defined here
+*
+* To avoid kernel crash, a dummy "don't do anything" function 
is used. It means
+* that the resulting writeback buffer is not composed and can 
contains any values.
 */
WARN(true,
 "Pixel format %p4cc is not supported by VKMS writeback. This is 
a kernel bug, atomic check must forbid this configuration.\n",
 );
-   return (pixel_write_t)NULL;
+   return _u16_to_nothing;
}
  }



Re: [PATCH v5 04/16] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

2024-03-25 Thread Maíra Canal

On 3/13/24 14:44, Louis Chauvet wrote:

Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.

Rename input/output variable in a consistent way between read_line and
write_line.

A warn has been added in get_pixel_*_function to alert when an unsupported
pixel format is requested. As those formats are checked before
atomic_update callbacks, it should never append.

Document for those typedefs.

Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_drv.h |  23 ++-
  drivers/gpu/drm/vkms/vkms_formats.c | 124 +---
  drivers/gpu/drm/vkms/vkms_formats.h |   4 +-
  drivers/gpu/drm/vkms/vkms_plane.c   |   2 +-
  4 files changed, 95 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 18086423a3a7..4bfc62d26f08 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -53,12 +53,31 @@ struct line_buffer {
struct pixel_argb_u16 *pixels;
  };
  
+/**

+ * typedef pixel_write_t - These functions are used to read a pixel from a
+ * `struct pixel_argb_u16*`, convert it in a specific format and write it in 
the @dst_pixels
+ * buffer.


Your brief description looks a bit big to me. Also, take a look at the 
cross-references docs [1].


[1] 
https://docs.kernel.org/doc-guide/kernel-doc.html#highlights-and-cross-references



+ *
+ * @out_pixel: destination address to write the pixel
+ * @in_pixel: pixel to write
+ */
+typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel);
+
  struct vkms_writeback_job {
struct iosys_map data[DRM_FORMAT_MAX_PLANES];
struct vkms_frame_info wb_frame_info;
-   void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+   pixel_write_t pixel_write;
  };
  
+/**

+ * typedef pixel_read_t - These functions are used to read a pixel in the 
source frame,
+ * convert it to `struct pixel_argb_u16` and write it to @out_pixel.


Same.


+ *
+ * @in_pixel: Pointer to the pixel to read
+ * @out_pixel: Pointer to write the converted pixel


s/Pointer/pointer


+ */
+typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+
  /**
   * vkms_plane_state - Driver specific plane state
   * @base: base plane state
@@ -69,7 +88,7 @@ struct vkms_writeback_job {
  struct vkms_plane_state {
struct drm_shadow_plane_state base;
struct vkms_frame_info *frame_info;
-   void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+   pixel_read_t pixel_read;
  };
  
  struct vkms_plane {

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 6e3dc8682ff9..55a4365d21a4 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -76,7 +76,7 @@ static int get_x_position(const struct vkms_frame_info 
*frame_info, int limit, i
   * They are used in the `vkms_compose_row` function to handle multiple 
formats.
   */
  
-static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
/*
 * The 257 is the "conversion ratio". This number is obtained by the
@@ -84,48 +84,48 @@ static void ARGB_to_argb_u16(u8 *src_pixels, struct 
pixel_argb_u16 *out_pixe
 * the best color value in a pixel format with more possibilities.
 * A similar idea applies to others RGB color conversions.
 */
-   out_pixel->a = (u16)src_pixels[3] * 257;
-   out_pixel->r = (u16)src_pixels[2] * 257;
-   out_pixel->g = (u16)src_pixels[1] * 257;
-   out_pixel->b = (u16)src_pixels[0] * 257;
+   out_pixel->a = (u16)in_pixel[3] * 257;
+   out_pixel->r = (u16)in_pixel[2] * 257;
+   out_pixel->g = (u16)in_pixel[1] * 257;
+   out_pixel->b = (u16)in_pixel[0] * 257;
  }
  
-static void XRGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
out_pixel->a = (u16)0x;
-   out_pixel->r = (u16)src_pixels[2] * 257;
-   out_pixel->g = (u16)src_pixels[1] * 257;
-   out_pixel->b = (u16)src_pixels[0] * 257;
+   out_pixel->r = (u16)in_pixel[2] * 257;
+   out_pixel->g = (u16)in_pixel[1] * 257;
+   out_pixel->b = (u16)in_pixel[0] * 257;
  }
  
-static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
-   u16 *pixels = (u16 *)src_pixels;
+   u16 *pixel = (u16 *)in_pixel;
  
-	out_pixel->a = le16_to_cpu(pixels[3]);

-   out_pixel->r = le16_to_cpu(pixels[2]);
-   out_pixel->g = le16_to_cpu(pixels[1]);
-   

Re: [PATCH v5 03/16] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

2024-03-25 Thread Maíra Canal

On 3/13/24 14:44, Louis Chauvet wrote:

Add some documentation on pixel conversion functions.
Update of outdated comments for pixel_write functions.

Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_composer.c |  7 
  drivers/gpu/drm/vkms/vkms_drv.h  | 13 
  drivers/gpu/drm/vkms/vkms_formats.c  | 62 ++--
  3 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
b/drivers/gpu/drm/vkms/vkms_composer.c
index c6d9b4a65809..da0651a94c9b 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -189,6 +189,13 @@ static void blend(struct vkms_writeback_job *wb,
  
  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
  
+	/*

+* The planes are composed line-by-line to avoid heavy memory usage. It 
is a necessary
+* complexity to avoid poor blending performance.
+*
+* The function vkms_compose_row is used to read a line, 
pixel-by-pixel, into the staging
+* buffer.
+*/
for (size_t y = 0; y < crtc_y_limit; y++) {
fill_background(_color, output_buffer);
  
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h

index b4b357447292..18086423a3a7 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -25,6 +25,17 @@
  
  #define VKMS_LUT_SIZE 256
  
+/**

+ * struct vkms_frame_info - structure to store the state of a frame
+ *
+ * @fb: backing drm framebuffer
+ * @src: source rectangle of this frame in the source framebuffer
+ * @dst: destination rectangle in the crtc buffer
+ * @map: see drm_shadow_plane_state@data
+ * @rotation: rotation applied to the source.
+ *
+ * @src and @dst should have the same size modulo the rotation.
+ */
  struct vkms_frame_info {
struct drm_framebuffer *fb;
struct drm_rect src, dst;
@@ -52,6 +63,8 @@ struct vkms_writeback_job {
   * vkms_plane_state - Driver specific plane state


It should be "* struct vkms_plane_state - Driver specific plane state".


   * @base: base plane state
   * @frame_info: data required for composing computation
+ * @pixel_read: function to read a pixel in this plane. The creator of a 
vkms_plane_state must
+ * ensure that this pointer is valid
   */
  struct vkms_plane_state {
struct drm_shadow_plane_state base;
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 172830a3936a..6e3dc8682ff9 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -9,6 +9,18 @@
  
  #include "vkms_formats.h"
  
+/**

+ * pixel_offset() - Get the offset of the pixel at coordinates x/y in the 
first plane
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x coordinate of the wanted pixel in the buffer
+ * @y: The y coordinate of the wanted pixel in the buffer
+ *
+ * The caller must ensure that the framebuffer associated with this request 
uses a pixel format
+ * where block_h == block_w == 1.
+ * If this requirement is not fulfilled, the resulting offset can point to an 
other pixel or
+ * outside of the buffer.
+ */
  static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, 
int y)
  {
struct drm_framebuffer *fb = frame_info->fb;
@@ -17,18 +29,22 @@ static size_t pixel_offset(const struct vkms_frame_info 
*frame_info, int x, int
  + (x * fb->format->cpp[0]);
  }
  
-/*

- * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
+/**
+ * packed_pixels_addr() - Get the pointer to the block containing the pixel at 
the given
+ * coordinates
   *
   * @frame_info: Buffer metadata
- * @x: The x(width) coordinate of the 2D buffer
- * @y: The y(Heigth) coordinate of the 2D buffer
+ * @x: The x(width) coordinate inside the plane
+ * @y: The y(height) coordinate inside the plane


I would add a space after x and y.


   *
   * Takes the information stored in the frame_info, a pair of coordinates, and
   * returns the address of the first color channel.
   * This function assumes the channels are packed together, i.e. a color 
channel
   * comes immediately after another in the memory. And therefore, this function
   * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ *
+ * The caller must ensure that the framebuffer associated with this request 
uses a pixel format
+ * where block_h == block_w == 1, otherwise the returned pointer can be 
outside the buffer.
   */
  static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
int x, int y)
@@ -53,6 +69,13 @@ static int get_x_position(const struct vkms_frame_info 
*frame_info, int limit, i
return x;
  }
  
+/*

+ * The following  functions take pixel data from the buffer and convert them 
to the format


Double-spacing.


+ * ARGB16161616 in out_pixel.
+ *
+ * They are used in the `vkms_compose_row` function to 

Re: [PATCH v5 02/16] drm/vkms: Use drm_frame directly

2024-03-25 Thread Maíra Canal

On 3/13/24 14:44, Louis Chauvet wrote:

From: Arthur Grillo 

Remove intermidiary variables and access the variables directly from
drm_frame. These changes should be noop.

Signed-off-by: Arthur Grillo 
Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_drv.h   |  3 ---
  drivers/gpu/drm/vkms/vkms_formats.c   | 12 +++-
  drivers/gpu/drm/vkms/vkms_plane.c |  3 ---
  drivers/gpu/drm/vkms/vkms_writeback.c |  5 -
  4 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 8f5710debb1e..b4b357447292 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -31,9 +31,6 @@ struct vkms_frame_info {
struct drm_rect rotated;
struct iosys_map map[DRM_FORMAT_MAX_PLANES];
unsigned int rotation;
-   unsigned int offset;
-   unsigned int pitch;
-   unsigned int cpp;
  };
  
  struct pixel_argb_u16 {

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 36046b12f296..172830a3936a 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -11,8 +11,10 @@
  
  static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)

  {
-   return frame_info->offset + (y * frame_info->pitch)
- + (x * frame_info->cpp);
+   struct drm_framebuffer *fb = frame_info->fb;
+
+   return fb->offsets[0] + (y * fb->pitches[0])
+ + (x * fb->format->cpp[0]);


Nitpicking: Could this be packed into a single line?

Anyway,

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


  }
  
  /*

@@ -131,12 +133,12 @@ void vkms_compose_row(struct line_buffer *stage_buffer, 
struct vkms_plane_state
u8 *src_pixels = get_packed_src_addr(frame_info, y);
int limit = min_t(size_t, drm_rect_width(_info->dst), 
stage_buffer->n_pixels);
  
-	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->cpp) {

+   for (size_t x = 0; x < limit; x++, src_pixels += 
frame_info->fb->format->cpp[0]) {
int x_pos = get_x_position(frame_info, limit, x);
  
  		if (drm_rotation_90_or_270(frame_info->rotation))

src_pixels = get_packed_src_addr(frame_info, x + 
frame_info->rotated.y1)
-   + frame_info->cpp * y;
+   + frame_info->fb->format->cpp[0] * y;
  
  		plane->pixel_read(src_pixels, _pixels[x_pos]);

}
@@ -223,7 +225,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
int x_limit = min_t(size_t, drm_rect_width(_info->dst), 
src_buffer->n_pixels);
  
-	for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->cpp)

+   for (size_t x = 0; x < x_limit; x++, dst_pixels += 
frame_info->fb->format->cpp[0])
wb->pixel_write(dst_pixels, _pixels[x]);
  }
  
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c

index 5a8d295e65f2..21b5adfb44aa 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -125,9 +125,6 @@ static void vkms_plane_atomic_update(struct drm_plane 
*plane,
drm_rect_rotate(_info->rotated, 
drm_rect_width(_info->rotated),
drm_rect_height(_info->rotated), 
frame_info->rotation);
  
-	frame_info->offset = fb->offsets[0];

-   frame_info->pitch = fb->pitches[0];
-   frame_info->cpp = fb->format->cpp[0];
vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
  }
  
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c

index bc724cbd5e3a..c8582df1f739 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -149,11 +149,6 @@ static void vkms_wb_atomic_commit(struct drm_connector 
*conn,
crtc_state->active_writeback = active_wb;
crtc_state->wb_pending = true;
spin_unlock_irq(>composer_lock);
-
-   wb_frame_info->offset = fb->offsets[0];
-   wb_frame_info->pitch = fb->pitches[0];
-   wb_frame_info->cpp = fb->format->cpp[0];
-
drm_writeback_queue_job(wb_conn, connector_state);
active_wb->pixel_write = get_pixel_write_function(wb_format);
drm_rect_init(_frame_info->src, 0, 0, crtc_width, crtc_height);



Re: [PATCH v5 01/16] drm/vkms: Code formatting

2024-03-25 Thread Maíra Canal

On 3/13/24 14:44, Louis Chauvet wrote:

Few no-op changes to remove double spaces and fix wrong alignments.

Signed-off-by: Louis Chauvet 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_composer.c | 10 +-
  drivers/gpu/drm/vkms/vkms_crtc.c |  6 ++
  drivers/gpu/drm/vkms/vkms_drv.c  |  3 +--
  drivers/gpu/drm/vkms/vkms_plane.c|  8 
  4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
b/drivers/gpu/drm/vkms/vkms_composer.c
index e7441b227b3c..c6d9b4a65809 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -96,7 +96,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t)
s64 a_fp = drm_int2fixp(a);
s64 b_fp = drm_int2fixp(b);
  
-	s64 delta = drm_fixp_mul(b_fp - a_fp,  t);

+   s64 delta = drm_fixp_mul(b_fp - a_fp, t);
  
  	return drm_fixp2int(a_fp + delta);

  }
@@ -302,8 +302,8 @@ static int compose_active_planes(struct vkms_writeback_job 
*active_wb,
  void vkms_composer_worker(struct work_struct *work)
  {
struct vkms_crtc_state *crtc_state = container_of(work,
-   struct vkms_crtc_state,
-   composer_work);
+ struct 
vkms_crtc_state,
+ composer_work);
struct drm_crtc *crtc = crtc_state->base.crtc;
struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
@@ -328,7 +328,7 @@ void vkms_composer_worker(struct work_struct *work)
crtc_state->gamma_lut.base = (struct drm_color_lut 
*)crtc->state->gamma_lut->data;
crtc_state->gamma_lut.lut_length =
crtc->state->gamma_lut->length / sizeof(struct 
drm_color_lut);
-   max_lut_index_fp = 
drm_int2fixp(crtc_state->gamma_lut.lut_length  - 1);
+   max_lut_index_fp = 
drm_int2fixp(crtc_state->gamma_lut.lut_length - 1);
crtc_state->gamma_lut.channel_value2index_ratio = 
drm_fixp_div(max_lut_index_fp,
   
u16_max_fp);
  
@@ -367,7 +367,7 @@ void vkms_composer_worker(struct work_struct *work)

drm_crtc_add_crc_entry(crtc, true, frame_start++, );
  }
  
-static const char * const pipe_crc_sources[] = {"auto"};

+static const char *const pipe_crc_sources[] = { "auto" };
  
  const char *const *vkms_get_crc_sources(struct drm_crtc *crtc,

size_t *count)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 61e500b8c9da..7586ae2e1dd3 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -191,8 +191,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc,
return ret;
  
  	drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) {

-   plane_state = 
drm_atomic_get_existing_plane_state(crtc_state->state,
- plane);
+   plane_state = 
drm_atomic_get_existing_plane_state(crtc_state->state, plane);
WARN_ON(!plane_state);
  
  		if (!plane_state->visible)

@@ -208,8 +207,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc,
  
  	i = 0;

drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) {
-   plane_state = 
drm_atomic_get_existing_plane_state(crtc_state->state,
- plane);
+   plane_state = 
drm_atomic_get_existing_plane_state(crtc_state->state, plane);
  
  		if (!plane_state->visible)

continue;
diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
index dd0af086e7fa..83e6c9b9ff46 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.c
+++ b/drivers/gpu/drm/vkms/vkms_drv.c
@@ -81,8 +81,7 @@ static void vkms_atomic_commit_tail(struct drm_atomic_state 
*old_state)
drm_atomic_helper_wait_for_flip_done(dev, old_state);
  
  	for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) {

-   struct vkms_crtc_state *vkms_state =
-   to_vkms_crtc_state(old_crtc_state);
+   struct vkms_crtc_state *vkms_state = 
to_vkms_crtc_state(old_crtc_state);
  
  		flush_work(_state->composer_work);

}
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c 
b/drivers/gpu/drm/vkms/vkms_plane.c
index e5c625ab8e3e..5a8d295e65f2 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -117,10 +117,10 @@ static void vkms_plane_atomic_update(struct drm_plane 
*plane,
m

Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()

2024-03-18 Thread Maíra Canal

Not that the CC list wasn't big enough, but I'm adding MM folks
in the CC list.

On 3/18/24 11:04, Christian König wrote:

Am 18.03.24 um 14:28 schrieb Maíra Canal:

Hi Christian,

On 3/18/24 10:10, Christian König wrote:

Am 18.03.24 um 13:42 schrieb Maíra Canal:

Hi Christian,

On 3/12/24 10:48, Christian König wrote:

Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:


On 12/03/2024 10:37, Christian König wrote:

Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:


On 12/03/2024 10:23, Christian König wrote:

Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:


On 12/03/2024 08:59, Christian König wrote:

Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:


Hi Maira,

On 11/03/2024 10:05, Maíra Canal wrote:
For some applications, such as using huge pages, we might 
want to have a
different mountpoint, for which we pass in mount flags that 
better match

our usecase.

Therefore, add a new parameter to drm_gem_object_init() 
that allow us to
define the tmpfs mountpoint where the GEM object will be 
created. If
this parameter is NULL, then we fallback to 
shmem_file_setup().


One strategy for reducing churn, and so the number of 
drivers this patch touches, could be to add a lower level 
drm_gem_object_init() (which takes vfsmount, call it 
__drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
make drm_gem_object_init() call that one with a NULL argument.


I would even go a step further into the other direction. The 
shmem backed GEM object is just some special handling as far 
as I can see.


So I would rather suggest to rename all drm_gem_* function 
which only deal with the shmem backed GEM object into 
drm_gem_shmem_*.


That makes sense although it would be very churny. I at least 
would be on the fence regarding the cost vs benefit.


Yeah, it should clearly not be part of this patch here.



Also the explanation why a different mount point helps with 
something isn't very satisfying.


Not satisfying as you think it is not detailed enough to say 
driver wants to use huge pages for performance? Or not 
satisying as you question why huge pages would help?


That huge pages are beneficial is clear to me, but I'm missing 
the connection why a different mount point helps with using 
huge pages.


Ah right, same as in i915, one needs to mount a tmpfs instance 
passing huge=within_size or huge=always option. Default is 
'never', see man 5 tmpfs.


Thanks for the explanation, I wasn't aware of that.

Mhm, shouldn't we always use huge pages? Is there a reason for a 
DRM device to not use huge pages with the shmem backend?


AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
back then the understanding was within_size may overallocate, 
meaning there would be some space wastage, until the memory 
pressure makes the thp code split the trailing huge page. I 
haven't checked if that still applies.


Other than that I don't know if some drivers/platforms could have 
problems if they have some limitations or hardcoded assumptions 
when they iterate the sg list.


Yeah, that was the whole point behind my question. As far as I can 
see this isn't driver specific, but platform specific.


I might be wrong here, but I think we should then probably not have 
that handling in each individual driver, but rather centralized in 
the DRM code.


I don't see a point in enabling THP for all shmem drivers. A huge page
is only useful if the driver is going to use it. On V3D, for example,
I only need huge pages because I need the memory contiguously allocated
to implement Super Pages. Otherwise, if we don't have the Super Pages
support implemented in the driver, I would be creating memory pressure
without any performance gain.


Well that's the point I'm disagreeing with. THP doesn't seem to 
create much extra memory pressure for this use case.


As far as I can see background for the option is that files in tmpfs 
usually have a varying size, so it usually isn't beneficial to 
allocate a huge page just to find that the shmem file is much smaller 
than what's needed.


But GEM objects have a fixed size. So we of hand knew if we need 4KiB 
or 1GiB and can therefore directly allocate huge pages if they are 
available and object large enough to back them with.


If the memory pressure is so high that we don't have huge pages 
available the shmem code falls back to standard pages anyway.


The matter is: how do we define the point where the memory pressure is 
high?


Well as driver developers/maintainers we simply don't do that. This is 
the job of the shmem code.



For example, notice that in this implementation of Super Pages
for the V3D driver, I only use a Super Page if the BO is bigger than 
2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
available for the GPU. If I created huge pages for every BO allocation 
(and initially, I tried that), I would end up with hangs in some 
applications.


Yeah, that is what I meant with the trivial optimisation to the shmem 
code. Essentially when you have

Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()

2024-03-18 Thread Maíra Canal

On 3/18/24 10:28, Maíra Canal wrote:

Hi Christian,

On 3/18/24 10:10, Christian König wrote:

Am 18.03.24 um 13:42 schrieb Maíra Canal:

Hi Christian,

On 3/12/24 10:48, Christian König wrote:

Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:


On 12/03/2024 10:37, Christian König wrote:

Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:


On 12/03/2024 10:23, Christian König wrote:

Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:


On 12/03/2024 08:59, Christian König wrote:

Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:


Hi Maira,

On 11/03/2024 10:05, Maíra Canal wrote:
For some applications, such as using huge pages, we might 
want to have a
different mountpoint, for which we pass in mount flags that 
better match

our usecase.

Therefore, add a new parameter to drm_gem_object_init() that 
allow us to
define the tmpfs mountpoint where the GEM object will be 
created. If

this parameter is NULL, then we fallback to shmem_file_setup().


One strategy for reducing churn, and so the number of drivers 
this patch touches, could be to add a lower level 
drm_gem_object_init() (which takes vfsmount, call it 
__drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
make drm_gem_object_init() call that one with a NULL argument.


I would even go a step further into the other direction. The 
shmem backed GEM object is just some special handling as far 
as I can see.


So I would rather suggest to rename all drm_gem_* function 
which only deal with the shmem backed GEM object into 
drm_gem_shmem_*.


That makes sense although it would be very churny. I at least 
would be on the fence regarding the cost vs benefit.


Yeah, it should clearly not be part of this patch here.



Also the explanation why a different mount point helps with 
something isn't very satisfying.


Not satisfying as you think it is not detailed enough to say 
driver wants to use huge pages for performance? Or not 
satisying as you question why huge pages would help?


That huge pages are beneficial is clear to me, but I'm missing 
the connection why a different mount point helps with using huge 
pages.


Ah right, same as in i915, one needs to mount a tmpfs instance 
passing huge=within_size or huge=always option. Default is 
'never', see man 5 tmpfs.


Thanks for the explanation, I wasn't aware of that.

Mhm, shouldn't we always use huge pages? Is there a reason for a 
DRM device to not use huge pages with the shmem backend?


AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
back then the understanding was within_size may overallocate, 
meaning there would be some space wastage, until the memory 
pressure makes the thp code split the trailing huge page. I haven't 
checked if that still applies.


Other than that I don't know if some drivers/platforms could have 
problems if they have some limitations or hardcoded assumptions 
when they iterate the sg list.


Yeah, that was the whole point behind my question. As far as I can 
see this isn't driver specific, but platform specific.


I might be wrong here, but I think we should then probably not have 
that handling in each individual driver, but rather centralized in 
the DRM code.


I don't see a point in enabling THP for all shmem drivers. A huge page
is only useful if the driver is going to use it. On V3D, for example,
I only need huge pages because I need the memory contiguously allocated
to implement Super Pages. Otherwise, if we don't have the Super Pages
support implemented in the driver, I would be creating memory pressure
without any performance gain.


Well that's the point I'm disagreeing with. THP doesn't seem to create 
much extra memory pressure for this use case.


As far as I can see background for the option is that files in tmpfs 
usually have a varying size, so it usually isn't beneficial to 
allocate a huge page just to find that the shmem file is much smaller 
than what's needed.


But GEM objects have a fixed size. So we of hand knew if we need 4KiB 
or 1GiB and can therefore directly allocate huge pages if they are 
available and object large enough to back them with.


If the memory pressure is so high that we don't have huge pages 
available the shmem code falls back to standard pages anyway.


The matter is: how do we define the point where the memory pressure is 
high? For example, notice that in this implementation of Super Pages
for the V3D driver, I only use a Super Page if the BO is bigger than 
2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
available for the GPU. If I created huge pages for every BO allocation 
(and initially, I tried that), I would end up with hangs in some 
applications.


At least, for V3D, I wouldn't like to see THP being used for all the 
allocations. But, we have maintainers of other drivers in the CC.


Okay, I'm thinking about a compromise. What if we create a gemfs 
mountpoint in the DRM core and everytime we init a object, we can

choose if we will use huge pages or not. Therefore,
drm_gem_sh

Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()

2024-03-18 Thread Maíra Canal

Hi Christian,

On 3/18/24 10:10, Christian König wrote:

Am 18.03.24 um 13:42 schrieb Maíra Canal:

Hi Christian,

On 3/12/24 10:48, Christian König wrote:

Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:


On 12/03/2024 10:37, Christian König wrote:

Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:


On 12/03/2024 10:23, Christian König wrote:

Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:


On 12/03/2024 08:59, Christian König wrote:

Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:


Hi Maira,

On 11/03/2024 10:05, Maíra Canal wrote:
For some applications, such as using huge pages, we might 
want to have a
different mountpoint, for which we pass in mount flags that 
better match

our usecase.

Therefore, add a new parameter to drm_gem_object_init() that 
allow us to
define the tmpfs mountpoint where the GEM object will be 
created. If

this parameter is NULL, then we fallback to shmem_file_setup().


One strategy for reducing churn, and so the number of drivers 
this patch touches, could be to add a lower level 
drm_gem_object_init() (which takes vfsmount, call it 
__drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
make drm_gem_object_init() call that one with a NULL argument.


I would even go a step further into the other direction. The 
shmem backed GEM object is just some special handling as far as 
I can see.


So I would rather suggest to rename all drm_gem_* function 
which only deal with the shmem backed GEM object into 
drm_gem_shmem_*.


That makes sense although it would be very churny. I at least 
would be on the fence regarding the cost vs benefit.


Yeah, it should clearly not be part of this patch here.



Also the explanation why a different mount point helps with 
something isn't very satisfying.


Not satisfying as you think it is not detailed enough to say 
driver wants to use huge pages for performance? Or not satisying 
as you question why huge pages would help?


That huge pages are beneficial is clear to me, but I'm missing 
the connection why a different mount point helps with using huge 
pages.


Ah right, same as in i915, one needs to mount a tmpfs instance 
passing huge=within_size or huge=always option. Default is 
'never', see man 5 tmpfs.


Thanks for the explanation, I wasn't aware of that.

Mhm, shouldn't we always use huge pages? Is there a reason for a 
DRM device to not use huge pages with the shmem backend?


AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
back then the understanding was within_size may overallocate, 
meaning there would be some space wastage, until the memory pressure 
makes the thp code split the trailing huge page. I haven't checked 
if that still applies.


Other than that I don't know if some drivers/platforms could have 
problems if they have some limitations or hardcoded assumptions when 
they iterate the sg list.


Yeah, that was the whole point behind my question. As far as I can 
see this isn't driver specific, but platform specific.


I might be wrong here, but I think we should then probably not have 
that handling in each individual driver, but rather centralized in 
the DRM code.


I don't see a point in enabling THP for all shmem drivers. A huge page
is only useful if the driver is going to use it. On V3D, for example,
I only need huge pages because I need the memory contiguously allocated
to implement Super Pages. Otherwise, if we don't have the Super Pages
support implemented in the driver, I would be creating memory pressure
without any performance gain.


Well that's the point I'm disagreeing with. THP doesn't seem to create 
much extra memory pressure for this use case.


As far as I can see background for the option is that files in tmpfs 
usually have a varying size, so it usually isn't beneficial to allocate 
a huge page just to find that the shmem file is much smaller than what's 
needed.


But GEM objects have a fixed size. So we of hand knew if we need 4KiB or 
1GiB and can therefore directly allocate huge pages if they are 
available and object large enough to back them with.


If the memory pressure is so high that we don't have huge pages 
available the shmem code falls back to standard pages anyway.


The matter is: how do we define the point where the memory pressure is 
high? For example, notice that in this implementation of Super Pages
for the V3D driver, I only use a Super Page if the BO is bigger than 
2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
available for the GPU. If I created huge pages for every BO allocation 
(and initially, I tried that), I would end up with hangs in some 
applications.


At least, for V3D, I wouldn't like to see THP being used for all the 
allocations. But, we have maintainers of other drivers in the CC.


Best Regards,
- Maíra



So THP is almost always beneficial for GEM even if the driver doesn't 
actually need it. The only potential case I can think of which might not 
be handled gracefully is the tail pages, e.g. h

Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()

2024-03-18 Thread Maíra Canal

Hi Christian,

On 3/12/24 10:48, Christian König wrote:

Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:


On 12/03/2024 10:37, Christian König wrote:

Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:


On 12/03/2024 10:23, Christian König wrote:

Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:


On 12/03/2024 08:59, Christian König wrote:

Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:


Hi Maira,

On 11/03/2024 10:05, Maíra Canal wrote:
For some applications, such as using huge pages, we might want 
to have a
different mountpoint, for which we pass in mount flags that 
better match

our usecase.

Therefore, add a new parameter to drm_gem_object_init() that 
allow us to
define the tmpfs mountpoint where the GEM object will be 
created. If

this parameter is NULL, then we fallback to shmem_file_setup().


One strategy for reducing churn, and so the number of drivers 
this patch touches, could be to add a lower level 
drm_gem_object_init() (which takes vfsmount, call it 
__drm_gem_object_init(), or drm__gem_object_init_mnt(), and make 
drm_gem_object_init() call that one with a NULL argument.


I would even go a step further into the other direction. The 
shmem backed GEM object is just some special handling as far as I 
can see.


So I would rather suggest to rename all drm_gem_* function which 
only deal with the shmem backed GEM object into drm_gem_shmem_*.


That makes sense although it would be very churny. I at least 
would be on the fence regarding the cost vs benefit.


Yeah, it should clearly not be part of this patch here.



Also the explanation why a different mount point helps with 
something isn't very satisfying.


Not satisfying as you think it is not detailed enough to say 
driver wants to use huge pages for performance? Or not satisying 
as you question why huge pages would help?


That huge pages are beneficial is clear to me, but I'm missing the 
connection why a different mount point helps with using huge pages.


Ah right, same as in i915, one needs to mount a tmpfs instance 
passing huge=within_size or huge=always option. Default is 'never', 
see man 5 tmpfs.


Thanks for the explanation, I wasn't aware of that.

Mhm, shouldn't we always use huge pages? Is there a reason for a DRM 
device to not use huge pages with the shmem backend?


AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back 
then the understanding was within_size may overallocate, meaning there 
would be some space wastage, until the memory pressure makes the thp 
code split the trailing huge page. I haven't checked if that still 
applies.


Other than that I don't know if some drivers/platforms could have 
problems if they have some limitations or hardcoded assumptions when 
they iterate the sg list.


Yeah, that was the whole point behind my question. As far as I can see 
this isn't driver specific, but platform specific.


I might be wrong here, but I think we should then probably not have that 
handling in each individual driver, but rather centralized in the DRM code.


I don't see a point in enabling THP for all shmem drivers. A huge page
is only useful if the driver is going to use it. On V3D, for example,
I only need huge pages because I need the memory contiguously allocated
to implement Super Pages. Otherwise, if we don't have the Super Pages
support implemented in the driver, I would be creating memory pressure
without any performance gain.

Best Regards,
- Maíra



Regards,
Christian.




Te Cc is plenty large so perhaps someone else will have additional 
information. :)


Regards,

Tvrtko



I mean it would make this patch here even smaller.

Regards,
Christian.




Regards,

Tvrtko






Re: [PATCH v2] drm: Fix drm_fixp2int_round() making it add 0.5

2024-03-17 Thread Maíra Canal

Hi Melissa,

On 3/17/24 14:50, Melissa Wen wrote:

On 03/16, Arthur Grillo wrote:

As well noted by Pekka[1], the rounding of drm_fixp2int_round is wrong.
To round a number, you need to add 0.5 to the number and floor that,
drm_fixp2int_round() is adding 0.076. Make it add 0.5.

[1]: 
https://lore.kernel.org/all/20240301135327.22efe0dd.pekka.paala...@collabora.com/

Fixes: 8b25320887d7 ("drm: Add fixed-point helper to get rounded integer 
values")
Suggested-by: Pekka Paalanen 
Reviewed-by: Harry Wentland 
Signed-off-by: Arthur Grillo 


Great, thanks!

Reviewed-by: Melissa Wen 

I'll apply to drm-misc-next.


Shouldn't this patch be applied in drm-misc-fixes?

Best Regards,
- Maíra



Melissa


---
Changes in v2:
- Add Fixes tag (Melissa Wen)
- Remove DRM_FIXED_POINT_HALF (Melissa Wen)
- Link to v1: 
https://lore.kernel.org/all/20240306-louis-vkms-conv-v1-1-5bfe7d129...@riseup.net/
---
  include/drm/drm_fixed.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/drm/drm_fixed.h b/include/drm/drm_fixed.h
index 0c9f917a4d4b..81572d32db0c 100644
--- a/include/drm/drm_fixed.h
+++ b/include/drm/drm_fixed.h
@@ -71,7 +71,6 @@ static inline u32 dfixed_div(fixed20_12 A, fixed20_12 B)
  }
  
  #define DRM_FIXED_POINT		32

-#define DRM_FIXED_POINT_HALF   16
  #define DRM_FIXED_ONE (1ULL << DRM_FIXED_POINT)
  #define DRM_FIXED_DECIMAL_MASK(DRM_FIXED_ONE - 1)
  #define DRM_FIXED_DIGITS_MASK (~DRM_FIXED_DECIMAL_MASK)
@@ -90,7 +89,7 @@ static inline int drm_fixp2int(s64 a)
  
  static inline int drm_fixp2int_round(s64 a)

  {
-   return drm_fixp2int(a + (1 << (DRM_FIXED_POINT_HALF - 1)));
+   return drm_fixp2int(a + DRM_FIXED_ONE / 2);
  }
  
  static inline int drm_fixp2int_ceil(s64 a)


---
base-commit: f89632a9e5fa6c4787c14458cd42a9ef42025434
change-id: 20240315-drm_fixed-c680ba078ecb

Best regards,
--
Arthur Grillo 



[PATCH 5/5] drm/v3d: Enable super pages

2024-03-11 Thread Maíra Canal
The V3D MMU also supports 1MB pages, called super pages. In order to
set a 1MB page in the MMU, we need to make sure that page table entries
for all 4KB pages within a super page must be correctly configured.

Therefore, if the BO is larger than 2MB, we allocate it in a separate
mountpoint that uses THP. This will allow us to create a contiguous
memory region to create our super pages. In order to place the page
table entries in the MMU, we iterate over the 256 4KB pages and insert
the PTE.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c| 19 +--
 drivers/gpu/drm/v3d/v3d_drv.c   |  7 +++
 drivers/gpu/drm/v3d/v3d_drv.h   |  6 --
 drivers/gpu/drm/v3d/v3d_gemfs.c |  6 ++
 drivers/gpu/drm/v3d/v3d_mmu.c   | 24 ++--
 5 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..cb8e49a33be7 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;

/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,9 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);

+   bo->huge_pages = (obj->size >= SZ_2M && v3d->super_pages);
+   align = bo->huge_pages ? SZ_1M : SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +114,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +134,21 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
+   size_t size;
int ret;

-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   size = PAGE_ALIGN(unaligned_size);
+
+   /* To avoid memory fragmentation, we only use THP if the BO is bigger
+* than two Super Pages (1MB).
+*/
+   if (size >= SZ_2M && v3d->super_pages)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, size, 
v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..96f4d8227407 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,11 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0

+static bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \
+  To enable Super Pages, you need support to 
THP.");
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
@@ -308,6 +313,8 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
return -ENOMEM;
}

+   v3d->super_pages = super_pages;
+
ret = v3d_gem_init(drm);
if (ret)
goto dma_free;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d2ce8222771a..795087663739 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,9 +17,8 @@ struct clk;
 struct platform_device;
 struct reset_control;

-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)

 #define V3D_MAX_QUEUES (V3D_CPU + 1)

@@ -123,6 +122,7 @@ struct v3d_dev {
 * tmpfs instance used for shmem backed objects
 */
struct vfsmount *gemfs;
+   bool super_pages;

struct work_struct overflow_mem_work;

@@ -211,6 +211,8 @@ struct v3d_bo {
struct list_head unref_head;

void *vaddr;
+
+   bool huge_pages;
 };

 static inline struct v3d_bo *
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 8518b7da6f73..bcde3138f555 100644
--- a/drivers/gpu/drm/v

  1   2   3   4   5   6   7   8   >