Re: [PATCH] MAINTAINERS: remove myself as a VKMS maintainer
On 5/25/24 11:26, Melissa Wen wrote: I haven't been able to follow or review the work on the driver for some time now and I don't see the situation improving anytime soon. I'd like to continue being listed as a reviewer. Signed-off-by: Melissa Wen Acked-by: Maíra Canal Thanks for all the good work you put into VKMS in the last couple of years! Best Regards, - Maíra --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7d735037a383..79fe536355b0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7027,10 +7027,10 @@ F: drivers/gpu/drm/udl/ DRM DRIVER FOR VIRTUAL KERNEL MODESETTING (VKMS) M:Rodrigo Siqueira -M: Melissa Wen M:Maíra Canal R:Haneen Mohammed R:Daniel Vetter +R: Melissa Wen L:dri-devel@lists.freedesktop.org S:Maintained T:git https://gitlab.freedesktop.org/drm/misc/kernel.git
Re: [PATCH v2 0/6] drm/v3d: Improve Performance Counters handling
Hi Jani, On 5/21/24 08:07, Jani Nikula wrote: On Mon, 20 May 2024, Maíra Canal wrote: On 5/12/24 19:23, Maíra Canal wrote:> Maíra Canal (6): drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1 drm/v3d: Different V3D versions can have different number of perfcnt drm/v3d: Create a new V3D parameter for the maximum number of perfcnt drm/v3d: Create new IOCTL to expose performance counters information drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM drm/v3d: Deprecate the use of the Performance Counters enum > drivers/gpu/drm/v3d/v3d_drv.c | 11 + drivers/gpu/drm/v3d/v3d_drv.h | 14 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 36 ++- .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++ drivers/gpu/drm/v3d/v3d_sched.c | 2 +- include/uapi/drm/v3d_drm.h| 48 6 files changed, 316 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h Applied to drm-misc/drm-misc-next! What compiler do you use? I'm hitting the same as kernel test robot [1] with arm-linux-gnueabihf-gcc 12.2.0. I use clang version 17.0.6. In general, I don't think it's a great idea to put arrays in headers, and then include it everywhere via v3d_drv.h. You're not just relying on the compiler to optimize it away in compilation units where its not referenced (likely to happen), but also for the linker to deduplicate rodata (possible, but I'm not sure that it will happen). I think you need to move the arrays to a .c file, and then either a) add interfaces to access the arrays, or b) declare the arrays and make them global. For the latter you also need to figure out how to expose the size. I'll write a patch to fix it. Sorry for the disturbance, I didn't notice it with clang. Best Regards, - Maíra BR, Jani. [1] https://lore.kernel.org/r/202405211137.huefklkg-...@intel.com
Re: [PATCH v2 0/6] drm/v3d: Improve Performance Counters handling
On 5/12/24 19:23, Maíra Canal wrote:> Maíra Canal (6): drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1 drm/v3d: Different V3D versions can have different number of perfcnt drm/v3d: Create a new V3D parameter for the maximum number of perfcnt drm/v3d: Create new IOCTL to expose performance counters information drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM drm/v3d: Deprecate the use of the Performance Counters enum > drivers/gpu/drm/v3d/v3d_drv.c | 11 + drivers/gpu/drm/v3d/v3d_drv.h | 14 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 36 ++- .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++ drivers/gpu/drm/v3d/v3d_sched.c | 2 +- include/uapi/drm/v3d_drm.h| 48 6 files changed, 316 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h Applied to drm-misc/drm-misc-next! Best Regards, - Maíra
Re: [PATCH v7 11/17] drm/vkms: Remove useless drm_rotation_simplify
Hi Louis, On 5/13/24 04:50, Louis Chauvet wrote: As all the rotation are now supported by VKMS, this simplification does not make sense anymore, so remove it. Signed-off-by: Louis Chauvet I'd like to push all commits up to this point to drm-misc-next. Do you see a problem with it? Reason: I'd like Melissa to take a look at the YUV patches and patches 1 to 11 fix several composition errors. Let me know your thoughts about it. Best Regards, - Maíra --- drivers/gpu/drm/vkms/vkms_plane.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c index 8875bed76410..5a028ee96c91 100644 --- a/drivers/gpu/drm/vkms/vkms_plane.c +++ b/drivers/gpu/drm/vkms/vkms_plane.c @@ -115,12 +115,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane, frame_info->fb = fb; memcpy(_info->map, _plane_state->data, sizeof(frame_info->map)); drm_framebuffer_get(frame_info->fb); - frame_info->rotation = drm_rotation_simplify(new_state->rotation, DRM_MODE_ROTATE_0 | - DRM_MODE_ROTATE_90 | - DRM_MODE_ROTATE_270 | - DRM_MODE_REFLECT_X | - DRM_MODE_REFLECT_Y); - + frame_info->rotation = new_state->rotation; vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt); }
[PATCH v2 6/6] drm/v3d: Deprecate the use of the Performance Counters enum
The Performance Counters enum used to identify the index of each performance counter and provide the total number of performance counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2, not for V3D 7.1. As we implemented a new flexible structure to retrieve performance counters information, we can deprecate this enum. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- include/uapi/drm/v3d_drm.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index 0860ddb3d0b6..87fc5bb0a61e 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -603,6 +603,16 @@ struct drm_v3d_submit_cpu { __u64 extensions; }; +/* The performance counters index represented by this enum are deprecated and + * must no longer be used. These counters are only valid for V3D 4.2. + * + * In order to check for performance counter information, + * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER. + * + * Don't use V3D_PERFCNT_NUM to retrieve the maximum number of performance + * counters. You should use DRM_IOCTL_V3D_GET_PARAM with the following + * parameter: DRM_V3D_PARAM_MAX_PERF_COUNTERS. + */ enum { V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS, V3D_PERFCNT_FEP_VALID_PRIMS, -- 2.44.0
[PATCH v2 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
V3D_PERFCNT_NUM represents the maximum number of performance counters for V3D 4.2, but not for V3D 7.1. This means that, if we use V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1. Therefore, use the number of performance counters on V3D 7.1 as the maximum number of counters. This will allow us to create arrays on the stack with reasonable size. Note that userspace must use the value provided by DRM_V3D_PARAM_MAX_PERF_COUNTERS. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_drv.h | 5 - drivers/gpu/drm/v3d/v3d_sched.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 44cfddedebde..556cbb400ba0 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,8 +351,11 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; +/* Maximum number of performance counters supported by any version of V3D */ +#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters) + /* Number of perfmons required to handle all supported performance counters */ -#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \ +#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ DRM_V3D_MAX_PERF_COUNTERS) struct v3d_performance_query { diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 7cd8c335cd9b..03df37a3acf5 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -490,7 +490,7 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer struct v3d_file_priv *v3d_priv = job->base.file->driver_priv; struct v3d_dev *v3d = job->base.v3d; struct v3d_perfmon *perfmon; - u64 counter_values[V3D_PERFCNT_NUM]; + u64 counter_values[V3D_MAX_COUNTERS]; for (int i = 0; i < performance_query->nperfmons; i++) { perfmon = v3d_perfmon_find(v3d_priv, -- 2.44.0
[PATCH v2 4/6] drm/v3d: Create new IOCTL to expose performance counters information
Userspace usually needs some information about the performance counters available. Although we could replicate this information in the kernel and user-space, let's use the kernel as the "single source of truth" to avoid issues in the future (e.g. list of performance counters is updated in user-space, but not in the kernel, generating invalid requests). Therefore, create a new IOCTL to expose the performance counters information, that is name, category, and description. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_drv.c | 1 + drivers/gpu/drm/v3d/v3d_drv.h | 2 ++ drivers/gpu/drm/v3d/v3d_perfmon.c | 33 +++ include/uapi/drm/v3d_drm.h| 37 +++ 4 files changed, 73 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index d2c1d5053132..f7477488b1cc 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -211,6 +211,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = { DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH), + DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW), }; static const struct drm_driver v3d_drm_driver = { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index bd1e38f7d10a..44cfddedebde 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -582,6 +582,8 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); +int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); /* v3d_sysfs.c */ int v3d_sysfs_init(struct device *dev); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index f268d9466c0f..73e2bb8bdb7f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -217,3 +217,36 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data, return ret; } + +int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_v3d_perfmon_get_counter *req = data; + struct v3d_dev *v3d = to_v3d_dev(dev); + const struct v3d_perf_counter_desc *counter; + + for (int i = 0; i < ARRAY_SIZE(req->reserved); i++) { + if (req->reserved[i] != 0) + return -EINVAL; + } + + /* Make sure that the counter ID is valid */ + if (req->counter >= v3d->max_counters) + return -EINVAL; + + if (v3d->ver >= 71) { + WARN_ON(v3d->max_counters != ARRAY_SIZE(v3d_v71_performance_counters)); + counter = _v71_performance_counters[req->counter]; + } else if (v3d->ver >= 42) { + WARN_ON(v3d->max_counters != ARRAY_SIZE(v3d_v42_performance_counters)); + counter = _v42_performance_counters[req->counter]; + } else { + return -EOPNOTSUPP; + } + + strscpy(req->name, counter->name, sizeof(req->name)); + strscpy(req->category, counter->category, sizeof(req->category)); + strscpy(req->description, counter->description, sizeof(req->description)); + + return 0; +} diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index 215b01bb69c3..0860ddb3d0b6 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -42,6 +42,7 @@ extern "C" { #define DRM_V3D_PERFMON_DESTROY 0x09 #define DRM_V3D_PERFMON_GET_VALUES0x0a #define DRM_V3D_SUBMIT_CPU0x0b +#define DRM_V3D_PERFMON_GET_COUNTER 0x0c #define DRM_IOCTL_V3D_SUBMIT_CL DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl) #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo) @@ -58,6 +59,8 @@ extern "C" { #define DRM_IOCTL_V3D_PERFMON_GET_VALUES DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_PERFMON_GET_VALUES, \ struct drm_v3d_perfmon_get_values) #define DRM_IOCTL_V3D_SUBMIT_CPU DRM_IOW(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu) +#define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_PERFMON_GET_COUNTER, \
[PATCH v2 3/6] drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
The maximum number of performance counters can change from version to version and it's important for userspace to know this value, as it needs to use the counters for performance queries. Therefore, expose the maximum number of performance counters to userspace as a parameter. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_drv.c | 3 +++ include/uapi/drm/v3d_drm.h| 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 6b9dd26df9fe..d2c1d5053132 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -94,6 +94,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE: args->value = 1; return 0; + case DRM_V3D_PARAM_MAX_PERF_COUNTERS: + args->value = v3d->max_counters; + return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); return -EINVAL; diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index dce1835eced4..215b01bb69c3 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -286,6 +286,7 @@ enum drm_v3d_param { DRM_V3D_PARAM_SUPPORTS_PERFMON, DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT, DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE, + DRM_V3D_PARAM_MAX_PERF_COUNTERS, }; struct drm_v3d_get_param { -- 2.44.0
[PATCH v2 2/6] drm/v3d: Different V3D versions can have different number of perfcnt
Currently, even though V3D 7.1 has 93 performance counters, it is not possible to create counters bigger than 87, as `v3d_perfmon_create_ioctl()` understands that counters bigger than 87 are invalid. Therefore, create a device variable to expose the maximum number of counters for a given V3D version and make `v3d_perfmon_create_ioctl()` check this variable. This commit fixes CTS failures in the performance queries tests `dEQP-VK.query_pool.performance_query.*` [1] Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81 [1] Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device") Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_drv.c | 7 +++ drivers/gpu/drm/v3d/v3d_drv.h | 5 + drivers/gpu/drm/v3d/v3d_perfmon.c | 3 ++- 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 28b7ddce7747..6b9dd26df9fe 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -294,6 +294,13 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ + if (v3d->ver >= 71) + v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters); + else if (v3d->ver >= 42) + v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters); + else + v3d->max_counters = 0; + v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { ret = PTR_ERR(v3d->reset); diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 671375a3bb66..bd1e38f7d10a 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,6 +104,11 @@ struct v3d_dev { int ver; bool single_irq_line; + /* Different revisions of V3D have different total number of performance +* counters +*/ + unsigned int max_counters; + void __iomem *hub_regs; void __iomem *core_regs[3]; void __iomem *bridge_regs; diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index e1be7368b87d..f268d9466c0f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -123,6 +123,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, { struct v3d_file_priv *v3d_priv = file_priv->driver_priv; struct drm_v3d_perfmon_create *req = data; + struct v3d_dev *v3d = v3d_priv->v3d; struct v3d_perfmon *perfmon; unsigned int i; int ret; @@ -134,7 +135,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= V3D_PERFCNT_NUM) + if (req->counters[i] >= v3d->max_counters) return -EINVAL; } -- 2.44.0
[PATCH v2 0/6] drm/v3d: Improve Performance Counters handling
This series has the intention to address two issues with Performance Counters on V3D: 1. Update the number of Performance Counters for V3D 7.1 V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87. Although the series [1] enabled support for V3D 7.1, it didn’t replace the maximum number of performance counters. This led to errors in user space as the Vulkan driver updated the maximum number of performance counters, but the kernel didn’t. Currently, the user space can request values for performance counters that are greater than 87 and the kernel will return an error instead of the values. That’s why `dEQP-VK.query_pool.performance_query.*` currently fails on Mesa CI [2]. This series intends to fix the `dEQP-VK.query_pool.performance_query.*` fail. 2. Make the kernel able to provide the Performance Counter descriptions Although all the management of the Performance Monitors is done through IOCTLs, which means that the code is in the kernel, the performance counter descriptions are in Mesa. This means two things: (#1) only Mesa has access to the descriptions and (#2) we can have inconsistencies between the information provided by Mesa and the kernel, as seen in the first issue addressed by this series. To minimize the risk of inconsistencies, this series proposes to use the kernel as a “single source of truth”. Therefore, if there are any changes to the performance monitors, all the changes must be done only in the kernel. This means that all information about the maximum number of performance counters and all the descriptions will now be retrieved from the kernel. This series is coupled with a Mesa series [3] that enabled the use of the new IOCTL. I appreciate any feedback from both the kernel and Mesa implementations. [1] https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/ [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81 [3] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt Best Regards, - Maíra Canal --- v1 -> v2: https://lore.kernel.org/dri-devel/20240508143306.2435304-2-mca...@igalia.com/T/ * [5/6] s/DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS/DRM_V3D_PARAM_MAX_PERF_COUNTERS (Iago Toral) * [6/6] Include a reference to the new DRM_V3D_PARAM_MAX_PERF_COUNTERS param (Iago Toral) * Add Iago's R-b (Iago Toral) Maíra Canal (6): drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1 drm/v3d: Different V3D versions can have different number of perfcnt drm/v3d: Create a new V3D parameter for the maximum number of perfcnt drm/v3d: Create new IOCTL to expose performance counters information drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM drm/v3d: Deprecate the use of the Performance Counters enum drivers/gpu/drm/v3d/v3d_drv.c | 11 + drivers/gpu/drm/v3d/v3d_drv.h | 14 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 36 ++- .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++ drivers/gpu/drm/v3d/v3d_sched.c | 2 +- include/uapi/drm/v3d_drm.h| 48 6 files changed, 316 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h -- 2.44.0
[PATCH v2 1/6] drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
Add name, category and description for each one of the 93 performance counters available on V3D. Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93. Therefore, there are two performance counters arrays. The index of the performance counter for each V3D version is represented by its position on the array. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_drv.h | 2 + .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++ 2 files changed, 210 insertions(+) create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index a2c516fe6d79..671375a3bb66 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -11,6 +11,8 @@ #include #include +#include "v3d_performance_counters.h" + #include "uapi/drm/v3d_drm.h" struct clk; diff --git a/drivers/gpu/drm/v3d/v3d_performance_counters.h b/drivers/gpu/drm/v3d/v3d_performance_counters.h new file mode 100644 index ..72822205ebdc --- /dev/null +++ b/drivers/gpu/drm/v3d/v3d_performance_counters.h @@ -0,0 +1,208 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Copyright (C) 2024 Raspberry Pi + */ +#ifndef V3D_PERFORMANCE_COUNTERS_H +#define V3D_PERFORMANCE_COUNTERS_H + +/* Holds a description of a given performance counter. The index of performance + * counter is given by the array on v3d_performance_counter.h + */ +struct v3d_perf_counter_desc { + /* Category of the counter */ + char category[32]; + + /* Name of the counter */ + char name[64]; + + /* Description of the counter */ + char description[256]; +}; + +static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { + {"CORE", "cycle-count", "[CORE] Cycle counter"}, + {"CORE", "core-active", "[CORE] Bin/Render/Compute active cycles"}, + {"CLE", "CLE-bin-thread-active-cycles", "[CLE] Bin thread active cycles"}, + {"CLE", "CLE-render-thread-active-cycles", "[CLE] Render thread active cycles"}, + {"CORE", "compute-active-cycles", "[CORE] Compute active cycles"}, + {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid primitives that result in no rendered pixels, for all rendered tiles"}, + {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives for all rendered tiles (primitives may be counted in more than one tile)"}, + {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"}, + {"FEP", "FEP-valid-quads", "[FEP] Valid quads"}, + {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no pixels passing the stencil test"}, + {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with no pixels passing the Z and stencil tests"}, + {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any pixels passing the Z and stencil tests"}, + {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid pixels written to colour buffer"}, + {"TLB", "TLB-partial-quads-written-to-color-buffer", "[TLB] Partial quads written to the colour buffer"}, + {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need clipping"}, + {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives discarded by being outside the viewport"}, + {"PTB", "PTB-primitives-binned", "[PTB] Total primitives binned"}, + {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are discarded because they are reversed"}, + {"QPU", "QPU-total-instr-cache-hit", "[QPU] Total instruction cache hits for all slices"}, + {"QPU", "QPU-total-instr-cache-miss", "[QPU] Total instruction cache misses for all slices"}, + {"QPU", "QPU-total-uniform-cache-hit", "[QPU] Total uniforms cache hits for all slices"}, + {"QPU", "QPU-total-uniform-cache-miss", "[QPU] Total uniforms cache misses for all slices"}, + {"TMU", "TMU-active-cycles", "[TMU] Active cycles"}, + {"TMU", "TMU-stalled-cycles", "[TMU] Stalled cycles"}, + {"TMU", "TMU-total-text-quads-access", "[TMU] Total texture cache access
[PATCH 0/6] drm/v3d: Improve Performance Counters handling
This series has the intention to address two issues with Performance Counters on V3D: 1. Update the number of Performance Counters for V3D 7.1 V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87. Although the series [1] enabled support for V3D 7.1, it didn’t replace the maximum number of performance counters. This led to errors in user space as the Vulkan driver updated the maximum number of performance counters, but the kernel didn’t. Currently, the user space can request values for performance counters that are greater than 87 and the kernel will return an error instead of the values. That’s why `dEQP-VK.query_pool.performance_query.*` currently fails on Mesa CI [2]. This series intends to fix the `dEQP-VK.query_pool.performance_query.*` fail. 2. Make the kernel able to provide the Performance Counter descriptions Although all the management of the Performance Monitors is done through IOCTLs, which means that the code is in the kernel, the performance counter descriptions are in Mesa. This means two things: (#1) only Mesa has access to the descriptions and (#2) we can have inconsistencies between the information provided by Mesa and the kernel, as seen in the first issue addressed by this series. To minimize the risk of inconsistencies, this series proposes to use the kernel as a “single source of truth”. Therefore, if there are any changes to the performance monitors, all the changes must be done only in the kernel. This means that all information about the maximum number of performance counters and all the descriptions will now be retrieved from the kernel. This series is coupled with a Mesa series [3] that enabled the use of the new IOCTL. I appreciate any feedback from both the kernel and Mesa implementations. [1] https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/ [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81 [3] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt Best Regards, - Maíra Canal Maíra Canal (6): drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1 drm/v3d: Different V3D versions can have different number of perfcnt drm/v3d: Create a new V3D parameter for the maximum number of perfcnt drm/v3d: Create new IOCTL to expose performance counters information drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM drm/v3d: Deprecate the use of the Performance Counters enum drivers/gpu/drm/v3d/v3d_drv.c | 11 + drivers/gpu/drm/v3d/v3d_drv.h | 14 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 36 ++- .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++ drivers/gpu/drm/v3d/v3d_sched.c | 2 +- include/uapi/drm/v3d_drm.h| 44 6 files changed, 312 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h -- 2.44.0
[PATCH 6/6] drm/v3d: Deprecate the use of the Performance Counters enum
The Performance Counters enum used to identify the index of each performance counter and provide the total number of performance counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2, not for V3D 7.1. As we implemented a new flexible structure to retrieve performance counters information, we can deprecate this enum. Signed-off-by: Maíra Canal --- include/uapi/drm/v3d_drm.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index 0860ddb3d0b6..706b4dea1c45 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -603,6 +603,12 @@ struct drm_v3d_submit_cpu { __u64 extensions; }; +/* The performance counters index represented by this enum are deprecated and + * must no longer be used. These counters are only valid for V3D 4.2. + * + * In order to check for performance counter information, + * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER. + */ enum { V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS, V3D_PERFCNT_FEP_VALID_PRIMS, -- 2.44.0
[PATCH 2/6] drm/v3d: Different V3D versions can have different number of perfcnt
Currently, even though V3D 7.1 has 93 performance counters, it is not possible to create counters bigger than 87, as `v3d_perfmon_create_ioctl()` understands that counters bigger than 87 are invalid. Therefore, create a device variable to expose the maximum number of counters for a given V3D version and make `v3d_perfmon_create_ioctl()` check this variable. This commit fixes CTS failures in the performance queries tests (dEQP-VK.query_pool.performance_query.*) [1] Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81 [1] Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device") Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 7 +++ drivers/gpu/drm/v3d/v3d_drv.h | 5 + drivers/gpu/drm/v3d/v3d_perfmon.c | 3 ++- 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 28b7ddce7747..6b9dd26df9fe 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -294,6 +294,13 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ + if (v3d->ver >= 71) + v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters); + else if (v3d->ver >= 42) + v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters); + else + v3d->max_counters = 0; + v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { ret = PTR_ERR(v3d->reset); diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 671375a3bb66..bd1e38f7d10a 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,6 +104,11 @@ struct v3d_dev { int ver; bool single_irq_line; + /* Different revisions of V3D have different total number of performance +* counters +*/ + unsigned int max_counters; + void __iomem *hub_regs; void __iomem *core_regs[3]; void __iomem *bridge_regs; diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index e1be7368b87d..f268d9466c0f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -123,6 +123,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, { struct v3d_file_priv *v3d_priv = file_priv->driver_priv; struct drm_v3d_perfmon_create *req = data; + struct v3d_dev *v3d = v3d_priv->v3d; struct v3d_perfmon *perfmon; unsigned int i; int ret; @@ -134,7 +135,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= V3D_PERFCNT_NUM) + if (req->counters[i] >= v3d->max_counters) return -EINVAL; } -- 2.44.0
[PATCH 3/6] drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
The maximum number of performance counters can change from version to version and it's important for userspace to know this value, as it needs to use the counters for performance queries. Therefore, expose the maximum number of performance counters to userspace as a parameter. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 3 +++ include/uapi/drm/v3d_drm.h| 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 6b9dd26df9fe..d2c1d5053132 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -94,6 +94,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE: args->value = 1; return 0; + case DRM_V3D_PARAM_MAX_PERF_COUNTERS: + args->value = v3d->max_counters; + return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); return -EINVAL; diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index dce1835eced4..215b01bb69c3 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -286,6 +286,7 @@ enum drm_v3d_param { DRM_V3D_PARAM_SUPPORTS_PERFMON, DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT, DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE, + DRM_V3D_PARAM_MAX_PERF_COUNTERS, }; struct drm_v3d_get_param { -- 2.44.0
[PATCH 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
V3D_PERFCNT_NUM represents the maximum number of performance counters for V3D 4.2, but not for V3D 7.1. This means that, if we use V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1. Therefore, use the number of performance counters on V3D 7.1 as the maximum number of counters. This will allow us to create arrays on the stack with reasonable size. Note that userspace must use the value provided by DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.h | 5 - drivers/gpu/drm/v3d/v3d_sched.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 44cfddedebde..556cbb400ba0 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,8 +351,11 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; +/* Maximum number of performance counters supported by any version of V3D */ +#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters) + /* Number of perfmons required to handle all supported performance counters */ -#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \ +#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ DRM_V3D_MAX_PERF_COUNTERS) struct v3d_performance_query { diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 7cd8c335cd9b..03df37a3acf5 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -490,7 +490,7 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer struct v3d_file_priv *v3d_priv = job->base.file->driver_priv; struct v3d_dev *v3d = job->base.v3d; struct v3d_perfmon *perfmon; - u64 counter_values[V3D_PERFCNT_NUM]; + u64 counter_values[V3D_MAX_COUNTERS]; for (int i = 0; i < performance_query->nperfmons; i++) { perfmon = v3d_perfmon_find(v3d_priv, -- 2.44.0
[PATCH 4/6] drm/v3d: Create new IOCTL to expose performance counters information
Userspace usually needs some information about the performance counters available. Although we could replicate this information in the kernel and user-space, let's use the kernel as the "single source of truth" to avoid issues in the future (e.g. list of performance counters is updated in user-space, but not in the kernel, generating invalid requests). Therefore, create a new IOCTL to expose the performance counters information, that is name, category, and description. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 1 + drivers/gpu/drm/v3d/v3d_drv.h | 2 ++ drivers/gpu/drm/v3d/v3d_perfmon.c | 33 +++ include/uapi/drm/v3d_drm.h| 37 +++ 4 files changed, 73 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index d2c1d5053132..f7477488b1cc 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -211,6 +211,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = { DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH), + DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW), }; static const struct drm_driver v3d_drm_driver = { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index bd1e38f7d10a..44cfddedebde 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -582,6 +582,8 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); +int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); /* v3d_sysfs.c */ int v3d_sysfs_init(struct device *dev); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index f268d9466c0f..73e2bb8bdb7f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -217,3 +217,36 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data, return ret; } + +int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_v3d_perfmon_get_counter *req = data; + struct v3d_dev *v3d = to_v3d_dev(dev); + const struct v3d_perf_counter_desc *counter; + + for (int i = 0; i < ARRAY_SIZE(req->reserved); i++) { + if (req->reserved[i] != 0) + return -EINVAL; + } + + /* Make sure that the counter ID is valid */ + if (req->counter >= v3d->max_counters) + return -EINVAL; + + if (v3d->ver >= 71) { + WARN_ON(v3d->max_counters != ARRAY_SIZE(v3d_v71_performance_counters)); + counter = _v71_performance_counters[req->counter]; + } else if (v3d->ver >= 42) { + WARN_ON(v3d->max_counters != ARRAY_SIZE(v3d_v42_performance_counters)); + counter = _v42_performance_counters[req->counter]; + } else { + return -EOPNOTSUPP; + } + + strscpy(req->name, counter->name, sizeof(req->name)); + strscpy(req->category, counter->category, sizeof(req->category)); + strscpy(req->description, counter->description, sizeof(req->description)); + + return 0; +} diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index 215b01bb69c3..0860ddb3d0b6 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -42,6 +42,7 @@ extern "C" { #define DRM_V3D_PERFMON_DESTROY 0x09 #define DRM_V3D_PERFMON_GET_VALUES0x0a #define DRM_V3D_SUBMIT_CPU0x0b +#define DRM_V3D_PERFMON_GET_COUNTER 0x0c #define DRM_IOCTL_V3D_SUBMIT_CL DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl) #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo) @@ -58,6 +59,8 @@ extern "C" { #define DRM_IOCTL_V3D_PERFMON_GET_VALUES DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_PERFMON_GET_VALUES, \ struct drm_v3d_perfmon_get_values) #define DRM_IOCTL_V3D_SUBMIT_CPU DRM_IOW(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu) +#define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_PERFMON_GET_COUNTER, \ + stru
[PATCH 1/6] drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
Add name, category and description for each one of the 93 performance counters available on V3D. Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93. Therefore, there are two performance counters arrays. The index of the performance counter for each V3D version is represented by its position on the array. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.h | 2 + .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++ 2 files changed, 210 insertions(+) create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index a2c516fe6d79..671375a3bb66 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -11,6 +11,8 @@ #include #include +#include "v3d_performance_counters.h" + #include "uapi/drm/v3d_drm.h" struct clk; diff --git a/drivers/gpu/drm/v3d/v3d_performance_counters.h b/drivers/gpu/drm/v3d/v3d_performance_counters.h new file mode 100644 index ..72822205ebdc --- /dev/null +++ b/drivers/gpu/drm/v3d/v3d_performance_counters.h @@ -0,0 +1,208 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Copyright (C) 2024 Raspberry Pi + */ +#ifndef V3D_PERFORMANCE_COUNTERS_H +#define V3D_PERFORMANCE_COUNTERS_H + +/* Holds a description of a given performance counter. The index of performance + * counter is given by the array on v3d_performance_counter.h + */ +struct v3d_perf_counter_desc { + /* Category of the counter */ + char category[32]; + + /* Name of the counter */ + char name[64]; + + /* Description of the counter */ + char description[256]; +}; + +static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { + {"CORE", "cycle-count", "[CORE] Cycle counter"}, + {"CORE", "core-active", "[CORE] Bin/Render/Compute active cycles"}, + {"CLE", "CLE-bin-thread-active-cycles", "[CLE] Bin thread active cycles"}, + {"CLE", "CLE-render-thread-active-cycles", "[CLE] Render thread active cycles"}, + {"CORE", "compute-active-cycles", "[CORE] Compute active cycles"}, + {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid primitives that result in no rendered pixels, for all rendered tiles"}, + {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives for all rendered tiles (primitives may be counted in more than one tile)"}, + {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"}, + {"FEP", "FEP-valid-quads", "[FEP] Valid quads"}, + {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no pixels passing the stencil test"}, + {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with no pixels passing the Z and stencil tests"}, + {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any pixels passing the Z and stencil tests"}, + {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid pixels written to colour buffer"}, + {"TLB", "TLB-partial-quads-written-to-color-buffer", "[TLB] Partial quads written to the colour buffer"}, + {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need clipping"}, + {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives discarded by being outside the viewport"}, + {"PTB", "PTB-primitives-binned", "[PTB] Total primitives binned"}, + {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are discarded because they are reversed"}, + {"QPU", "QPU-total-instr-cache-hit", "[QPU] Total instruction cache hits for all slices"}, + {"QPU", "QPU-total-instr-cache-miss", "[QPU] Total instruction cache misses for all slices"}, + {"QPU", "QPU-total-uniform-cache-hit", "[QPU] Total uniforms cache hits for all slices"}, + {"QPU", "QPU-total-uniform-cache-miss", "[QPU] Total uniforms cache misses for all slices"}, + {"TMU", "TMU-active-cycles", "[TMU] Active cycles"}, + {"TMU", "TMU-stalled-cycles", "[TMU] Stalled cycles"}, + {"TMU", "TMU-total-text-quads-access", "[TMU] Total texture cache accesses"}, + {"TMU",
Re: [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available
Hi Iago, On 4/29/24 02:22, Iago Toral wrote: Hi Maíra, a question below: El dom, 28-04-2024 a las 09:40 -0300, Maíra Canal escribió: Although Big/Super Pages could appear naturally, it would be quite hard to have 1MB or 64KB allocated contiguously naturally. Therefore, we can force the creation of large pages allocated contiguously by using a mountpoint with "huge=within_size" enabled. Therefore, as V3D has a mountpoint with "huge=within_size" (if user has THP enabled), use this mountpoint for BO creation if available. This will allow us to create large pages allocated contiguously and make use of Big/Super Pages. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- (...) @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv, size_t unaligned_size) { struct drm_gem_shmem_object *shmem_obj; + struct v3d_dev *v3d = to_v3d_dev(dev); struct v3d_bo *bo; int ret; - shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + /* Let the user opt out of allocating the BOs with THP */ + if (v3d->gemfs) + shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size, + v3d- gemfs); + else + shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + if (IS_ERR(shmem_obj)) return ERR_CAST(shmem_obj); bo = to_v3d_bo(_obj->base); if I read this correctly when we have THP we always allocate with that, Even objects that are smaller than 64KB. I was wondering if there is any benefit/downside to this or if the behavior for small allocations is the same we had without the new mount point. I'm assuming that your concern is related to memory pressure and memory fragmentation. As we are using `huge=within_size`, we only allocate a huge page if it will be fully within `i_size`. When using `huge=within_size`, we can optimize the performance for smaller files without forcing larger files to also use huge pages. I don't understand `huge=within_size` in full details, but it is possible to check that it is able to avoid the system (even the RPi) to go OOM. Although it is slightly less performant than `huge=always` (used in v1), I believe it is more ideal for a system such as the RPi due to the memory requirements. Best Regards, - Maíra Iago
[PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages
Add a modparam for turning off Big/Super Pages to make sure that if an user doesn't want Big/Super Pages enabled, it can disabled it by setting the modparam to false. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 7 +++ drivers/gpu/drm/v3d/v3d_gemfs.c | 5 + 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 28b7ddce7747..1a6e01235df6 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -36,6 +36,13 @@ #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 +/* Only expose the `super_pages` modparam if THP is enabled. */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +bool super_pages = true; +module_param_named(super_pages, super_pages, bool, 0400); +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support."); +#endif + static int v3d_get_param_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c index 31cf5bd11e39..0ade02bb7209 100644 --- a/drivers/gpu/drm/v3d/v3d_gemfs.c +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c @@ -11,6 +11,7 @@ void v3d_gemfs_init(struct v3d_dev *v3d) char huge_opt[] = "huge=within_size"; struct file_system_type *type; struct vfsmount *gemfs; + extern bool super_pages; /* * By creating our own shmemfs mountpoint, we can pass in @@ -20,6 +21,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d) if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) goto err; + /* The user doesn't want to enable Super Pages */ + if (!super_pages) + goto err; + type = get_fs_type("tmpfs"); if (!type) goto err; -- 2.44.0
[PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available
Although Big/Super Pages could appear naturally, it would be quite hard to have 1MB or 64KB allocated contiguously naturally. Therefore, we can force the creation of large pages allocated contiguously by using a mountpoint with "huge=within_size" enabled. Therefore, as V3D has a mountpoint with "huge=within_size" (if user has THP enabled), use this mountpoint for BO creation if available. This will allow us to create large pages allocated contiguously and make use of Big/Super Pages. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_bo.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index 79e31c5299b1..16ac26c31c6b 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) struct v3d_dev *v3d = to_v3d_dev(obj->dev); struct v3d_bo *bo = to_v3d_bo(obj); struct sg_table *sgt; + u64 align; int ret; /* So far we pin the BO in the MMU for its lifetime, so use @@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj) if (IS_ERR(sgt)) return PTR_ERR(sgt); + if (!v3d->gemfs) + align = SZ_4K; + else if (obj->size >= SZ_1M) + align = SZ_1M; + else if (obj->size >= SZ_64K) + align = SZ_64K; + else + align = SZ_4K; + spin_lock(>mm_lock); /* Allocate the object's space in the GPU's page tables. * Inserting PTEs will happen later, but the offset is for the @@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0); +align >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv, size_t unaligned_size) { struct drm_gem_shmem_object *shmem_obj; + struct v3d_dev *v3d = to_v3d_dev(dev); struct v3d_bo *bo; int ret; - shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + /* Let the user opt out of allocating the BOs with THP */ + if (v3d->gemfs) + shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size, + v3d->gemfs); + else + shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + if (IS_ERR(shmem_obj)) return ERR_CAST(shmem_obj); bo = to_v3d_bo(_obj->base); -- 2.44.0
[PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs
The V3D MMU also supports 64KB and 1MB pages, called big and super pages, respectively. In order to set a 64KB page or 1MB page in the MMU, we need to make sure that page table entries for all 4KB pages within a big/super page must be correctly configured. In order to create a big/super page, we need a contiguous memory region. That's why we use a separate mountpoint with THP enabled. In order to place the page table entries in the MMU, we iterate over the 16 4KB pages (for big pages) or 256 4KB pages (for super pages) and insert the PTE. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_drv.h | 1 + drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++- 2 files changed, 40 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index e1f291db68de..3276eef280ef 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -18,6 +18,7 @@ struct platform_device; struct reset_control; #define V3D_MMU_PAGE_SHIFT 12 +#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) #define V3D_MAX_QUEUES (V3D_CPU + 1) diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c index 14f3af40d6f6..2e0b31e373b2 100644 --- a/drivers/gpu/drm/v3d/v3d_mmu.c +++ b/drivers/gpu/drm/v3d/v3d_mmu.c @@ -25,9 +25,16 @@ * superpage bit set. */ #define V3D_PTE_SUPERPAGE BIT(31) +#define V3D_PTE_BIGPAGE BIT(30) #define V3D_PTE_WRITEABLE BIT(29) #define V3D_PTE_VALID BIT(28) +static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment) +{ + return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) && + IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT); +} + static int v3d_mmu_flush_all(struct v3d_dev *v3d) { int ret; @@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo) struct drm_gem_shmem_object *shmem_obj = >base; struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev); u32 page = bo->node.start; - u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID; - struct sg_dma_page_iter dma_iter; - - for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) { - dma_addr_t dma_addr = sg_page_iter_dma_address(_iter); - u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT; - u32 pte = page_prot | page_address; - u32 i; - - BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >= - BIT(24)); - for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++) - v3d->pt[page++] = pte + i; + struct scatterlist *sgl; + unsigned int count; + + for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) { + dma_addr_t dma_addr = sg_dma_address(sgl); + u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT; + unsigned int len = sg_dma_len(sgl); + + while (len > 0) { + u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID; + u32 page_address = page_prot | pfn; + unsigned int i, page_size; + + BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24)); + + if (len >= SZ_1M && v3d_mmu_is_aligned(page, page_address, SZ_1M)) { + page_size = SZ_1M; + page_address |= V3D_PTE_SUPERPAGE; + } else if (len >= SZ_64K && v3d_mmu_is_aligned(page, page_address, SZ_64K)) { + page_size = SZ_64K; + page_address |= V3D_PTE_BIGPAGE; + } else { + page_size = SZ_4K; + } + + for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) { + v3d->pt[page++] = page_address + i; + pfn++; + } + + len -= page_size; + } } WARN_ON_ONCE(page - bo->node.start != -- 2.44.0
[PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 100 BOs of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] https://github.com/zmike/vkoverhead/issues/14 Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_bo.c | 2 +- drivers/gpu/drm/v3d/v3d_drv.h | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index a07ede668cc1..79e31c5299b1 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0); +SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index cef2f82b7a75..e1f291db68de 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -17,8 +17,6 @@ struct clk; struct platform_device; struct reset_control; -#define GMP_GRANULARITY (128 * 1024) - #define V3D_MMU_PAGE_SHIFT 12 #define V3D_MAX_QUEUES (V3D_CPU + 1) -- 2.44.0
[PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint
Create a function `drm_gem_shmem_create_with_mnt()`, similar to `drm_gem_shmem_create()`, that has a mountpoint as a argument. This function will create a shmem GEM object in a given tmpfs mountpoint. This function will be useful for drivers that have a special mountpoint with flags enabled. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++ include/drm/drm_gem_shmem_helper.h | 3 +++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 13bcdbfd..10b7c4c769a3 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs = { }; static struct drm_gem_shmem_object * -__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) +__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private, + struct vfsmount *gemfs) { struct drm_gem_shmem_object *shmem; struct drm_gem_object *obj; @@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) drm_gem_private_object_init(dev, obj, size); shmem->map_wc = false; /* dma-buf mappings use always writecombine */ } else { - ret = drm_gem_object_init(dev, obj, size); + ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs); } if (ret) { drm_gem_private_object_fini(obj); @@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) */ struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size) { - return __drm_gem_shmem_create(dev, size, false); + return __drm_gem_shmem_create(dev, size, false, NULL); } EXPORT_SYMBOL_GPL(drm_gem_shmem_create); +/** + * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a + * given mountpoint + * @dev: DRM device + * @size: Size of the object to allocate + * @gemfs: tmpfs mount where the GEM object will be created + * + * This function creates a shmem GEM object in a given tmpfs mountpoint. + * + * Returns: + * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative + * error code on failure. + */ +struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev, + size_t size, + struct vfsmount *gemfs) +{ + return __drm_gem_shmem_create(dev, size, false, gemfs); +} +EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt); + /** * drm_gem_shmem_free - Free resources associated with a shmem GEM object * @shmem: shmem GEM object to free @@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev, size_t size = PAGE_ALIGN(attach->dmabuf->size); struct drm_gem_shmem_object *shmem; - shmem = __drm_gem_shmem_create(dev, size, true); + shmem = __drm_gem_shmem_create(dev, size, true, NULL); if (IS_ERR(shmem)) return ERR_CAST(shmem); diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h index efbc9f27312b..d22e3fb53631 100644 --- a/include/drm/drm_gem_shmem_helper.h +++ b/include/drm/drm_gem_shmem_helper.h @@ -97,6 +97,9 @@ struct drm_gem_shmem_object { container_of(obj, struct drm_gem_shmem_object, base) struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size); +struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev, + size_t size, + struct vfsmount *gemfs); void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem); void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem); -- 2.44.0
[PATCH v4 0/8] drm/v3d: Enable Big and Super Pages
performance. This indicates an enhancement in the baseline scenario, rather than any detriment caused by v2. Additionally, I've included stats from v1 in the comparisons. Upon scrutinizing the average FPS of v2 in contrast to v1, it becomes evident that v2 not only maintains the improvements but may even surpass them. This version provides a much safer way to iterate through memory and doesn't hold to the same limitations as v1. For example, v1 had a hard-coded hack that only allowed a huge page to be created if the BO was bigger than 2MB. These limitations are gone now. This series also introduces changes in the GEM helpers, in order to enable V3D to have a separate mount point for shmfs GEM objects. Any feedback from the community about the changes in the GEM helpers is welcomed! v1 -> v2: https://lore.kernel.org/dri-devel/20240311100959.205545-1-mca...@igalia.com/ * [1/6] Add Iago's R-b to PATCH 1/5 (Iago Toral) * [2/6] Create a new function `drm_gem_object_init_with_mnt()` to define the shmfs mountpoint. Now, we don't touch a bunch of drivers, as `drm_gem_object_init()` preserves its signature (Tvrtko Ursulin) * [3/6] Use `huge=within_size` instead of `huge=always`, in order to avoid OOM. This also allow us to move away from the 2MB hack. (Tvrtko Ursulin) * [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral) * [5/6] Create a separate patch to reduce the alignment of the node allocation (Iago Toral) * [6/6] Complete refactoring to the way that we iterate through the memory (Tvrtko Ursulin) * [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us misleading data (Tvrtko Ursulin) * [6/6] Use both Big Pages (64K) and Super Pages (1MB) v2 -> v3: https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mca...@igalia.com/T/ * [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin) * [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin) * [6/8] Now, PATCH 6/8 regards supporting big/super pages when writing out PTEs (Tvrtko Ursulin) * [6/8] s/page_address/pfn (Tvrtko Ursulin) * [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be `unsigned int` too (Tvrtko Ursulin) * [6/8] `i` and `page_size` are `unsigned int` as well (Tvrtko Ursulin) * [6/8] Move `i`, `page_prot` and `page_size` to the inner scope (Tvrtko Ursulin) * [6/8] s/pte/page_address/ (Tvrtko Ursulin) * [7/8] New patch: use gemfs/THP in BO creation if available * [8/8] New patch: * [8/8] Don't expose the modparam `super_pages` unless CONFIG_TRANSPARENT_HUGEPAGE is enabled (Tvrtko Ursulin) * [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support (Tvrtko Ursulin) v3 -> v4: https://lore.kernel.org/dri-devel/20240421215309.660018-1-mca...@igalia.com/T/ * [5/8] Add Iago's R-b to PATCH 5/8 (Iago Toral) * [6/8] Add Tvrtko's R-b to PATCH 6/8 (Tvrtko Ursulin) * [7/8] Add Tvrtko's R-b to PATCH 7/8 (Tvrtko Ursulin) * [8/8] Move `bool super_pages` to the guard (Tvrtko Ursulin) Best Regards, - Maíra Maíra Canal (8): drm/v3d: Fix return if scheduler initialization fails drm/gem: Create a drm_gem_object_init_with_mnt() function drm/v3d: Introduce gemfs drm/gem: Create shmem GEM object in a given mountpoint drm/v3d: Reduce the alignment of the node allocation drm/v3d: Support Big/Super Pages when writing out PTEs drm/v3d: Use gemfs/THP in BO creation if available drm/v3d: Add modparam for turning off Big/Super Pages drivers/gpu/drm/drm_gem.c | 34 +++-- drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +-- drivers/gpu/drm/v3d/Makefile | 3 +- drivers/gpu/drm/v3d/v3d_bo.c | 21 ++- drivers/gpu/drm/v3d/v3d_drv.c | 7 drivers/gpu/drm/v3d/v3d_drv.h | 12 +- drivers/gpu/drm/v3d/v3d_gem.c | 6 ++- drivers/gpu/drm/v3d/v3d_gemfs.c| 51 + drivers/gpu/drm/v3d/v3d_mmu.c | 52 +++--- include/drm/drm_gem.h | 3 ++ include/drm/drm_gem_shmem_helper.h | 3 ++ 11 files changed, 195 insertions(+), 27 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c -- 2.44.0
[PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails
If the scheduler initialization fails, GEM initialization must fail as well. Therefore, if `v3d_sched_init()` fails, free the DMA memory allocated and return the error value in `v3d_gem_init()`. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_gem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index da8faf3b9011..b3b76332f2c5 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -291,8 +291,9 @@ v3d_gem_init(struct drm_device *dev) ret = v3d_sched_init(v3d); if (ret) { drm_mm_takedown(>mm); - dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt, + dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt, v3d->pt_paddr); + return ret; } return 0; -- 2.44.0
[PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function
For some applications, such as applications that uses huge pages, we might want to have a different mountpoint, for which we pass mount flags that better match our usecase. Therefore, create a new function `drm_gem_object_init_with_mnt()` that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to `shmem_file_setup()`. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/drm_gem.c | 34 ++ include/drm/drm_gem.h | 3 +++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index d4bbc5d109c8..74ebe68e3d61 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev) } /** - * drm_gem_object_init - initialize an allocated shmem-backed GEM object + * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM + * object in a given shmfs mountpoint + * * @dev: drm_device the object should be initialized for * @obj: drm_gem_object to initialize * @size: object size + * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use + * the usual tmpfs mountpoint (`shm_mnt`). * * Initialize an already allocated GEM object of the specified size with * shmfs backing store. */ -int drm_gem_object_init(struct drm_device *dev, - struct drm_gem_object *obj, size_t size) +int drm_gem_object_init_with_mnt(struct drm_device *dev, +struct drm_gem_object *obj, size_t size, +struct vfsmount *gemfs) { struct file *filp; drm_gem_private_object_init(dev, obj, size); - filp = shmem_file_setup("drm mm object", size, VM_NORESERVE); + if (gemfs) + filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size, +VM_NORESERVE); + else + filp = shmem_file_setup("drm mm object", size, VM_NORESERVE); + if (IS_ERR(filp)) return PTR_ERR(filp); @@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev, return 0; } +EXPORT_SYMBOL(drm_gem_object_init_with_mnt); + +/** + * drm_gem_object_init - initialize an allocated shmem-backed GEM object + * @dev: drm_device the object should be initialized for + * @obj: drm_gem_object to initialize + * @size: object size + * + * Initialize an already allocated GEM object of the specified size with + * shmfs backing store. + */ +int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj, + size_t size) +{ + return drm_gem_object_init_with_mnt(dev, obj, size, NULL); +} EXPORT_SYMBOL(drm_gem_object_init); /** diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bae4865b2101..2ebf6e10cc44 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj); void drm_gem_object_free(struct kref *kref); int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj, size_t size); +int drm_gem_object_init_with_mnt(struct drm_device *dev, +struct drm_gem_object *obj, size_t size, +struct vfsmount *gemfs); void drm_gem_private_object_init(struct drm_device *dev, struct drm_gem_object *obj, size_t size); void drm_gem_private_object_fini(struct drm_gem_object *obj); -- 2.44.0
[PATCH v4 3/8] drm/v3d: Introduce gemfs
Create a separate "tmpfs" kernel mount for V3D. This will allow us to move away from the shmemfs `shm_mnt` and gives the flexibility to do things like set our own mount options. Here, the interest is to use "huge=", which should allow us to enable the use of THP for our shmem-backed objects. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/Makefile| 3 ++- drivers/gpu/drm/v3d/v3d_drv.h | 9 +++ drivers/gpu/drm/v3d/v3d_gem.c | 3 +++ drivers/gpu/drm/v3d/v3d_gemfs.c | 46 + 4 files changed, 60 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile index b7d673f1153b..fcf710926057 100644 --- a/drivers/gpu/drm/v3d/Makefile +++ b/drivers/gpu/drm/v3d/Makefile @@ -13,7 +13,8 @@ v3d-y := \ v3d_trace_points.o \ v3d_sched.o \ v3d_sysfs.o \ - v3d_submit.o + v3d_submit.o \ + v3d_gemfs.o v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index a2c516fe6d79..cef2f82b7a75 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -131,6 +131,11 @@ struct v3d_dev { struct drm_mm mm; spinlock_t mm_lock; + /* +* tmpfs instance used for shmem backed objects +*/ + struct vfsmount *gemfs; + struct work_struct overflow_mem_work; struct v3d_bin_job *bin_job; @@ -532,6 +537,10 @@ void v3d_reset(struct v3d_dev *v3d); void v3d_invalidate_caches(struct v3d_dev *v3d); void v3d_clean_caches(struct v3d_dev *v3d); +/* v3d_gemfs.c */ +void v3d_gemfs_init(struct v3d_dev *v3d); +void v3d_gemfs_fini(struct v3d_dev *v3d); + /* v3d_submit.c */ void v3d_job_cleanup(struct v3d_job *job); void v3d_job_put(struct v3d_job *job); diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index b3b76332f2c5..b1e681630ded 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -288,6 +288,8 @@ v3d_gem_init(struct drm_device *dev) v3d_init_hw_state(v3d); v3d_mmu_set_page_table(v3d); + v3d_gemfs_init(v3d); + ret = v3d_sched_init(v3d); if (ret) { drm_mm_takedown(>mm); @@ -305,6 +307,7 @@ v3d_gem_destroy(struct drm_device *dev) struct v3d_dev *v3d = to_v3d_dev(dev); v3d_sched_fini(v3d); + v3d_gemfs_fini(v3d); /* Waiting for jobs to finish would need to be done before * unregistering V3D. diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c new file mode 100644 index ..31cf5bd11e39 --- /dev/null +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* Copyright (C) 2024 Raspberry Pi */ + +#include +#include + +#include "v3d_drv.h" + +void v3d_gemfs_init(struct v3d_dev *v3d) +{ + char huge_opt[] = "huge=within_size"; + struct file_system_type *type; + struct vfsmount *gemfs; + + /* +* By creating our own shmemfs mountpoint, we can pass in +* mount flags that better match our usecase. However, we +* only do so on platforms which benefit from it. +*/ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + goto err; + + type = get_fs_type("tmpfs"); + if (!type) + goto err; + + gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt); + if (IS_ERR(gemfs)) + goto err; + + v3d->gemfs = gemfs; + drm_info(>drm, "Using Transparent Hugepages\n"); + + return; + +err: + v3d->gemfs = NULL; + drm_notice(>drm, + "Transparent Hugepage support is recommended for optimal performance on this platform!\n"); +} + +void v3d_gemfs_fini(struct v3d_dev *v3d) +{ + if (v3d->gemfs) + kern_unmount(v3d->gemfs); +} -- 2.44.0
Re: [PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition
On 4/23/24 04:05, Maxime Ripard wrote: Hi, On Mon, Apr 22, 2024 at 01:08:44PM -0300, Maíra Canal wrote: @drm-misc maintainers, is there any chance you could backport commit 35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice") [1] to drm- misc-next? I would like to apply this series to drm-misc-next because it fixes another issue with the GPU stats, but this series depends on commit 35f4f8c9fc97, as it has plenty of refactors on the GPU stats code. Although I could theoretically apply this series in drm-misc-fixes, I don't believe it would be ideal, as discussed in #dri-devel earlier today. [1] https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/35f4f8c9fc972248055096d63b782060e473311b I just did the backmerge Thanks Maxime! I just applied the series to drm-misc/drm-misc-next. Thanks for drm-misc maintainers for the quick action! Best Regards, - Maíra Maxime
Re: [PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition
Hi, @drm-misc maintainers, is there any chance you could backport commit 35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice") [1] to drm- misc-next? I would like to apply this series to drm-misc-next because it fixes another issue with the GPU stats, but this series depends on commit 35f4f8c9fc97, as it has plenty of refactors on the GPU stats code. Although I could theoretically apply this series in drm-misc-fixes, I don't believe it would be ideal, as discussed in #dri-devel earlier today. [1] https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/35f4f8c9fc972248055096d63b782060e473311b Best Regards, - Maíra On 4/20/24 18:32, Maíra Canal wrote: The first version of this series had the intention to fix two major issues with the GPU stats: 1. We were incrementing `enabled_ns` twice by the end of each job. 2. There is a race-condition between the IRQ handler and the users The first of the issues was already addressed and the fix was applied to drm-misc-fixes. Now, what is left, addresses the second issue. Apart from addressing this issue, this series improved the GPU stats code as a whole. We reduced code repetition, creating functions to start and update the GPU stats. This will likely reduce the odds of issue #1 happen again. v1 -> v2: https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/ - As the first patch was a bugfix, it was pushed to drm-misc-fixes. - [1/4] Add Chema Casanova's R-b - [2/4] s/jobs_sent/jobs_completed and add the reasoning in the commit message (Chema Casanova) - [2/4] Add Chema Casanova's and Tvrtko Ursulin's R-b - [3/4] Call `local_clock()` only once, by adding a new parameter to the `v3d_stats_update` function (Chema Casanova) - [4/4] Move new line to the correct patch [2/4] (Tvrtko Ursulin) - [4/4] Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko Ursulin) v2 -> v3: https://lore.kernel.org/dri-devel/20240417011021.600889-1-mca...@igalia.com/T/ - [4/5] New patch: separates the code refactor from the race-condition fix (Tvrtko Ursulin) - [5/5] s/interruption/interrupt (Tvrtko Ursulin) - [5/5] s/matches/match (Tvrtko Ursulin) - [5/5] Add Tvrtko Ursulin's R-b Best Regards, - Maíra Maíra Canal (5): drm/v3d: Create two functions to update all GPU stats variables drm/v3d: Create a struct to store the GPU stats drm/v3d: Create function to update a set of GPU stats drm/v3d: Decouple stats calculation from printing drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler drivers/gpu/drm/v3d/v3d_drv.c | 33 drivers/gpu/drm/v3d/v3d_drv.h | 30 --- drivers/gpu/drm/v3d/v3d_gem.c | 9 ++-- drivers/gpu/drm/v3d/v3d_irq.c | 48 ++--- drivers/gpu/drm/v3d/v3d_sched.c | 94 + drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++--- 6 files changed, 109 insertions(+), 118 deletions(-)
[PATCH v3 8/8] drm/v3d: Add modparam for turning off Big/Super Pages
Add a modparam for turning off Big/Super Pages to make sure that if an user doesn't want Big/Super Pages enabled, it can disabled it by setting the modparam to false. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 8 drivers/gpu/drm/v3d/v3d_gemfs.c | 5 + 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3debf37e7d9b..bc8c8905112a 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -36,6 +36,14 @@ #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 +bool super_pages = true; + +/* Only expose the `super_pages` modparam if THP is enabled. */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +module_param_named(super_pages, super_pages, bool, 0400); +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support."); +#endif + static int v3d_get_param_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c index 31cf5bd11e39..5fa08263cff2 100644 --- a/drivers/gpu/drm/v3d/v3d_gemfs.c +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c @@ -11,6 +11,11 @@ void v3d_gemfs_init(struct v3d_dev *v3d) char huge_opt[] = "huge=within_size"; struct file_system_type *type; struct vfsmount *gemfs; + extern bool super_pages; + + /* The user doesn't want to enable Super Pages */ + if (!super_pages) + goto err; /* * By creating our own shmemfs mountpoint, we can pass in -- 2.44.0
[PATCH v3 7/8] drm/v3d: Use gemfs/THP in BO creation if available
Although Big/Super Pages could appear naturally, it would be quite hard to have 1MB or 64KB allocated contiguously naturally. Therefore, we can force the creation of large pages allocated contiguously by using a mountpoint with "huge=within_size" enabled. As V3D has a mountpoint with "huge=within_size" (if user has THP enabled), use this mountpoint for BO creation if available. This will allow us to create large pages allocated contiguously and make use of Big/Super Pages. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_bo.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index 79e31c5299b1..16ac26c31c6b 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) struct v3d_dev *v3d = to_v3d_dev(obj->dev); struct v3d_bo *bo = to_v3d_bo(obj); struct sg_table *sgt; + u64 align; int ret; /* So far we pin the BO in the MMU for its lifetime, so use @@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj) if (IS_ERR(sgt)) return PTR_ERR(sgt); + if (!v3d->gemfs) + align = SZ_4K; + else if (obj->size >= SZ_1M) + align = SZ_1M; + else if (obj->size >= SZ_64K) + align = SZ_64K; + else + align = SZ_4K; + spin_lock(>mm_lock); /* Allocate the object's space in the GPU's page tables. * Inserting PTEs will happen later, but the offset is for the @@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0); +align >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv, size_t unaligned_size) { struct drm_gem_shmem_object *shmem_obj; + struct v3d_dev *v3d = to_v3d_dev(dev); struct v3d_bo *bo; int ret; - shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + /* Let the user opt out of allocating the BOs with THP */ + if (v3d->gemfs) + shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size, + v3d->gemfs); + else + shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + if (IS_ERR(shmem_obj)) return ERR_CAST(shmem_obj); bo = to_v3d_bo(_obj->base); -- 2.44.0
[PATCH v3 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs
The V3D MMU also supports 64KB and 1MB pages, called big and super pages, respectively. In order to set a 64KB page or 1MB page in the MMU, we need to make sure that page table entries for all 4KB pages within a big/super page must be correctly configured. In order to create a big/super page, we need a contiguous memory region. That's why we use a separate mountpoint with THP enabled. In order to place the page table entries in the MMU, we iterate over the 16 4KB pages (for big pages) or 256 4KB pages (for super pages) and insert the PTE. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.h | 1 + drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++- 2 files changed, 40 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 17236ee23490..79d8a1a059aa 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -18,6 +18,7 @@ struct platform_device; struct reset_control; #define V3D_MMU_PAGE_SHIFT 12 +#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) #define V3D_MAX_QUEUES (V3D_CPU + 1) diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c index 14f3af40d6f6..2e0b31e373b2 100644 --- a/drivers/gpu/drm/v3d/v3d_mmu.c +++ b/drivers/gpu/drm/v3d/v3d_mmu.c @@ -25,9 +25,16 @@ * superpage bit set. */ #define V3D_PTE_SUPERPAGE BIT(31) +#define V3D_PTE_BIGPAGE BIT(30) #define V3D_PTE_WRITEABLE BIT(29) #define V3D_PTE_VALID BIT(28) +static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment) +{ + return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) && + IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT); +} + static int v3d_mmu_flush_all(struct v3d_dev *v3d) { int ret; @@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo) struct drm_gem_shmem_object *shmem_obj = >base; struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev); u32 page = bo->node.start; - u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID; - struct sg_dma_page_iter dma_iter; - - for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) { - dma_addr_t dma_addr = sg_page_iter_dma_address(_iter); - u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT; - u32 pte = page_prot | page_address; - u32 i; - - BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >= - BIT(24)); - for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++) - v3d->pt[page++] = pte + i; + struct scatterlist *sgl; + unsigned int count; + + for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) { + dma_addr_t dma_addr = sg_dma_address(sgl); + u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT; + unsigned int len = sg_dma_len(sgl); + + while (len > 0) { + u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID; + u32 page_address = page_prot | pfn; + unsigned int i, page_size; + + BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24)); + + if (len >= SZ_1M && v3d_mmu_is_aligned(page, page_address, SZ_1M)) { + page_size = SZ_1M; + page_address |= V3D_PTE_SUPERPAGE; + } else if (len >= SZ_64K && v3d_mmu_is_aligned(page, page_address, SZ_64K)) { + page_size = SZ_64K; + page_address |= V3D_PTE_BIGPAGE; + } else { + page_size = SZ_4K; + } + + for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) { + v3d->pt[page++] = page_address + i; + pfn++; + } + + len -= page_size; + } } WARN_ON_ONCE(page - bo->node.start != -- 2.44.0
[PATCH v3 4/8] drm/gem: Create shmem GEM object in a given mountpoint
Create a function `drm_gem_shmem_create_with_mnt()`, similar to `drm_gem_shmem_create()`, that has a mountpoint as a argument. This function will create a shmem GEM object in a given tmpfs mountpoint. This function will be useful for drivers that have a special mountpoint with flags enabled. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++ include/drm/drm_gem_shmem_helper.h | 3 +++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 13bcdbfd..10b7c4c769a3 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs = { }; static struct drm_gem_shmem_object * -__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) +__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private, + struct vfsmount *gemfs) { struct drm_gem_shmem_object *shmem; struct drm_gem_object *obj; @@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) drm_gem_private_object_init(dev, obj, size); shmem->map_wc = false; /* dma-buf mappings use always writecombine */ } else { - ret = drm_gem_object_init(dev, obj, size); + ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs); } if (ret) { drm_gem_private_object_fini(obj); @@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) */ struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size) { - return __drm_gem_shmem_create(dev, size, false); + return __drm_gem_shmem_create(dev, size, false, NULL); } EXPORT_SYMBOL_GPL(drm_gem_shmem_create); +/** + * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a + * given mountpoint + * @dev: DRM device + * @size: Size of the object to allocate + * @gemfs: tmpfs mount where the GEM object will be created + * + * This function creates a shmem GEM object in a given tmpfs mountpoint. + * + * Returns: + * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative + * error code on failure. + */ +struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev, + size_t size, + struct vfsmount *gemfs) +{ + return __drm_gem_shmem_create(dev, size, false, gemfs); +} +EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt); + /** * drm_gem_shmem_free - Free resources associated with a shmem GEM object * @shmem: shmem GEM object to free @@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev, size_t size = PAGE_ALIGN(attach->dmabuf->size); struct drm_gem_shmem_object *shmem; - shmem = __drm_gem_shmem_create(dev, size, true); + shmem = __drm_gem_shmem_create(dev, size, true, NULL); if (IS_ERR(shmem)) return ERR_CAST(shmem); diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h index efbc9f27312b..d22e3fb53631 100644 --- a/include/drm/drm_gem_shmem_helper.h +++ b/include/drm/drm_gem_shmem_helper.h @@ -97,6 +97,9 @@ struct drm_gem_shmem_object { container_of(obj, struct drm_gem_shmem_object, base) struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size); +struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev, + size_t size, + struct vfsmount *gemfs); void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem); void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem); -- 2.44.0
[PATCH v3 5/8] drm/v3d: Reduce the alignment of the node allocation
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 100 BOs of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] https://github.com/zmike/vkoverhead/issues/14 Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_bo.c | 2 +- drivers/gpu/drm/v3d/v3d_drv.h | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index a07ede668cc1..79e31c5299b1 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0); +SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index d2ce8222771a..17236ee23490 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -17,8 +17,6 @@ struct clk; struct platform_device; struct reset_control; -#define GMP_GRANULARITY (128 * 1024) - #define V3D_MMU_PAGE_SHIFT 12 #define V3D_MAX_QUEUES (V3D_CPU + 1) -- 2.44.0
[PATCH v3 0/8] drm/v3d: Enable Big and Super Pages
This also allow us to move away from the 2MB hack. (Tvrtko Ursulin) * [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral) * [5/6] Create a separate patch to reduce the alignment of the node allocation. (Iago Toral) * [6/6] Complete refactoring to the way that we iterate through the memory. (Tvrtko Ursulin) * [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us misleading data. (Tvrtko Ursulin) * [6/6] Use both Big Pages (64K) and Super Pages (1MB). v2 -> v3: https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mca...@igalia.com/T/ * [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin) * [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin) * [6/8] Now, PATCH 6/8 only adds support to big/super pages when writing out PTEs. BO creation with THP and addition of modparam are moved to other patches. (Tvrtko Ursulin) * [6/8] s/page_address/pfn (Tvrtko Ursulin) * [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be `unsigned int` too. (Tvrtko Ursulin) * [6/8] `i` and `page_size` are `unsigned int` as well. (Tvrtko Ursulin) * [6/8] Move `i`, `page_prot` and `page_size` to the inner scope. (Tvrtko Ursulin) * [6/8] s/pte/page_address/ (Tvrtko Ursulin) * [7/8] New patch: Use gemfs/THP in BO creation if available (Tvrtko Ursulin) * [8/8] New patch: Add modparam for turning off Big/Super Pages (Tvrtko Ursulin) * [8/8] Don't expose the modparam `super_pages` unless CONFIG_TRANSPARENT_HUGEPAGE is enabled. (Tvrtko Ursulin) * [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support. (Tvrtko Ursulin) Best Regards, - Maíra Maíra Canal (8): drm/v3d: Fix return if scheduler initialization fails drm/gem: Create a drm_gem_object_init_with_mnt() function drm/v3d: Introduce gemfs drm/gem: Create shmem GEM object in a given mountpoint drm/v3d: Reduce the alignment of the node allocation drm/v3d: Support Big/Super Pages when writing out PTEs drm/v3d: Use gemfs/THP in BO creation if available drm/v3d: Add modparam for turning off Big/Super Pages drivers/gpu/drm/drm_gem.c | 34 +++-- drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +-- drivers/gpu/drm/v3d/Makefile | 3 +- drivers/gpu/drm/v3d/v3d_bo.c | 21 ++- drivers/gpu/drm/v3d/v3d_drv.c | 8 drivers/gpu/drm/v3d/v3d_drv.h | 12 +- drivers/gpu/drm/v3d/v3d_gem.c | 6 ++- drivers/gpu/drm/v3d/v3d_gemfs.c| 51 + drivers/gpu/drm/v3d/v3d_mmu.c | 52 +++--- include/drm/drm_gem.h | 3 ++ include/drm/drm_gem_shmem_helper.h | 3 ++ 11 files changed, 196 insertions(+), 27 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c -- 2.44.0
[PATCH v3 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function
For some applications, such as applications that uses huge pages, we might want to have a different mountpoint, for which we pass mount flags that better match our usecase. Therefore, create a new function `drm_gem_object_init_with_mnt()` that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to `shmem_file_setup()`. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/drm_gem.c | 34 ++ include/drm/drm_gem.h | 3 +++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index d4bbc5d109c8..74ebe68e3d61 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev) } /** - * drm_gem_object_init - initialize an allocated shmem-backed GEM object + * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM + * object in a given shmfs mountpoint + * * @dev: drm_device the object should be initialized for * @obj: drm_gem_object to initialize * @size: object size + * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use + * the usual tmpfs mountpoint (`shm_mnt`). * * Initialize an already allocated GEM object of the specified size with * shmfs backing store. */ -int drm_gem_object_init(struct drm_device *dev, - struct drm_gem_object *obj, size_t size) +int drm_gem_object_init_with_mnt(struct drm_device *dev, +struct drm_gem_object *obj, size_t size, +struct vfsmount *gemfs) { struct file *filp; drm_gem_private_object_init(dev, obj, size); - filp = shmem_file_setup("drm mm object", size, VM_NORESERVE); + if (gemfs) + filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size, +VM_NORESERVE); + else + filp = shmem_file_setup("drm mm object", size, VM_NORESERVE); + if (IS_ERR(filp)) return PTR_ERR(filp); @@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev, return 0; } +EXPORT_SYMBOL(drm_gem_object_init_with_mnt); + +/** + * drm_gem_object_init - initialize an allocated shmem-backed GEM object + * @dev: drm_device the object should be initialized for + * @obj: drm_gem_object to initialize + * @size: object size + * + * Initialize an already allocated GEM object of the specified size with + * shmfs backing store. + */ +int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj, + size_t size) +{ + return drm_gem_object_init_with_mnt(dev, obj, size, NULL); +} EXPORT_SYMBOL(drm_gem_object_init); /** diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bae4865b2101..2ebf6e10cc44 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj); void drm_gem_object_free(struct kref *kref); int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj, size_t size); +int drm_gem_object_init_with_mnt(struct drm_device *dev, +struct drm_gem_object *obj, size_t size, +struct vfsmount *gemfs); void drm_gem_private_object_init(struct drm_device *dev, struct drm_gem_object *obj, size_t size); void drm_gem_private_object_fini(struct drm_gem_object *obj); -- 2.44.0
[PATCH v3 3/8] drm/v3d: Introduce gemfs
Create a separate "tmpfs" kernel mount for V3D. This will allow us to move away from the shmemfs `shm_mnt` and gives the flexibility to do things like set our own mount options. Here, the interest is to use "huge=", which should allow us to enable the use of THP for our shmem-backed objects. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/Makefile| 3 ++- drivers/gpu/drm/v3d/v3d_drv.h | 9 +++ drivers/gpu/drm/v3d/v3d_gem.c | 3 +++ drivers/gpu/drm/v3d/v3d_gemfs.c | 46 + 4 files changed, 60 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile index b7d673f1153b..fcf710926057 100644 --- a/drivers/gpu/drm/v3d/Makefile +++ b/drivers/gpu/drm/v3d/Makefile @@ -13,7 +13,8 @@ v3d-y := \ v3d_trace_points.o \ v3d_sched.o \ v3d_sysfs.o \ - v3d_submit.o + v3d_submit.o \ + v3d_gemfs.o v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 1950c723dde1..d2ce8222771a 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -119,6 +119,11 @@ struct v3d_dev { struct drm_mm mm; spinlock_t mm_lock; + /* +* tmpfs instance used for shmem backed objects +*/ + struct vfsmount *gemfs; + struct work_struct overflow_mem_work; struct v3d_bin_job *bin_job; @@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d); void v3d_invalidate_caches(struct v3d_dev *v3d); void v3d_clean_caches(struct v3d_dev *v3d); +/* v3d_gemfs.c */ +void v3d_gemfs_init(struct v3d_dev *v3d); +void v3d_gemfs_fini(struct v3d_dev *v3d); + /* v3d_submit.c */ void v3d_job_cleanup(struct v3d_job *job); void v3d_job_put(struct v3d_job *job); diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index 66f4b78a6b2e..faefbe497e8d 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev) v3d_init_hw_state(v3d); v3d_mmu_set_page_table(v3d); + v3d_gemfs_init(v3d); + ret = v3d_sched_init(v3d); if (ret) { drm_mm_takedown(>mm); @@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev) struct v3d_dev *v3d = to_v3d_dev(dev); v3d_sched_fini(v3d); + v3d_gemfs_fini(v3d); /* Waiting for jobs to finish would need to be done before * unregistering V3D. diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c new file mode 100644 index ..31cf5bd11e39 --- /dev/null +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* Copyright (C) 2024 Raspberry Pi */ + +#include +#include + +#include "v3d_drv.h" + +void v3d_gemfs_init(struct v3d_dev *v3d) +{ + char huge_opt[] = "huge=within_size"; + struct file_system_type *type; + struct vfsmount *gemfs; + + /* +* By creating our own shmemfs mountpoint, we can pass in +* mount flags that better match our usecase. However, we +* only do so on platforms which benefit from it. +*/ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + goto err; + + type = get_fs_type("tmpfs"); + if (!type) + goto err; + + gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt); + if (IS_ERR(gemfs)) + goto err; + + v3d->gemfs = gemfs; + drm_info(>drm, "Using Transparent Hugepages\n"); + + return; + +err: + v3d->gemfs = NULL; + drm_notice(>drm, + "Transparent Hugepage support is recommended for optimal performance on this platform!\n"); +} + +void v3d_gemfs_fini(struct v3d_dev *v3d) +{ + if (v3d->gemfs) + kern_unmount(v3d->gemfs); +} -- 2.44.0
[PATCH v3 1/8] drm/v3d: Fix return if scheduler initialization fails
If the scheduler initialization fails, GEM initialization must fail as well. Therefore, if `v3d_sched_init()` fails, free the DMA memory allocated and return the error value in `v3d_gem_init()`. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_gem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index afc565078c78..66f4b78a6b2e 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev) ret = v3d_sched_init(v3d); if (ret) { drm_mm_takedown(>mm); - dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt, + dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt, v3d->pt_paddr); + return ret; } return 0; -- 2.44.0
[PATCH v3 4/5] drm/v3d: Decouple stats calculation from printing
Create a function to decouple the stats calculation from the printing. This will be useful in the next step when we add a seqcount to protect the stats. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 18 ++ drivers/gpu/drm/v3d/v3d_drv.h | 4 drivers/gpu/drm/v3d/v3d_sysfs.c | 11 +++ 3 files changed, 21 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 52e3ba9df46f..2ec359ed2def 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -142,6 +142,15 @@ v3d_postclose(struct drm_device *dev, struct drm_file *file) kfree(v3d_priv); } +void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp, + u64 *active_runtime, u64 *jobs_completed) +{ + *active_runtime = stats->enabled_ns; + if (stats->start_ns) + *active_runtime += timestamp - stats->start_ns; + *jobs_completed = stats->jobs_completed; +} + static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) { struct v3d_file_priv *file_priv = file->driver_priv; @@ -150,20 +159,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { struct v3d_stats *stats = _priv->stats[queue]; + u64 active_runtime, jobs_completed; + + v3d_get_stats(stats, timestamp, _runtime, _completed); /* Note that, in case of a GPU reset, the time spent during an * attempt of executing the job is not computed in the runtime. */ drm_printf(p, "drm-engine-%s: \t%llu ns\n", - v3d_queue_to_string(queue), - stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns - : stats->enabled_ns); + v3d_queue_to_string(queue), active_runtime); /* Note that we only count jobs that completed. Therefore, jobs * that were resubmitted due to a GPU reset are not computed. */ drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", - v3d_queue_to_string(queue), stats->jobs_completed); + v3d_queue_to_string(queue), jobs_completed); } } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 5a198924d568..ff06dc1cc078 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -510,6 +510,10 @@ struct drm_gem_object *v3d_prime_import_sg_table(struct drm_device *dev, /* v3d_debugfs.c */ void v3d_debugfs_init(struct drm_minor *minor); +/* v3d_drv.c */ +void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp, + u64 *active_runtime, u64 *jobs_completed); + /* v3d_fence.c */ extern const struct dma_fence_ops v3d_fence_ops; struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue queue); diff --git a/drivers/gpu/drm/v3d/v3d_sysfs.c b/drivers/gpu/drm/v3d/v3d_sysfs.c index 6a8e7acc8b82..d610e355964f 100644 --- a/drivers/gpu/drm/v3d/v3d_sysfs.c +++ b/drivers/gpu/drm/v3d/v3d_sysfs.c @@ -15,18 +15,15 @@ gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf) struct v3d_dev *v3d = to_v3d_dev(drm); enum v3d_queue queue; u64 timestamp = local_clock(); - u64 active_runtime; ssize_t len = 0; len += sysfs_emit(buf, "queue\ttimestamp\tjobs\truntime\n"); for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { struct v3d_stats *stats = >queue[queue].stats; + u64 active_runtime, jobs_completed; - if (stats->start_ns) - active_runtime = timestamp - stats->start_ns; - else - active_runtime = 0; + v3d_get_stats(stats, timestamp, _runtime, _completed); /* Each line will display the queue name, timestamp, the number * of jobs sent to that queue and the runtime, as can be seem here: @@ -40,9 +37,7 @@ gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf) */ len += sysfs_emit_at(buf, len, "%s\t%llu\t%llu\t%llu\n", v3d_queue_to_string(queue), -timestamp, -stats->jobs_completed, -stats->enabled_ns + active_runtime); +timestamp, jobs_completed, active_runtime); } return len; -- 2.44.0
[PATCH v3 5/5] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler
In V3D, the conclusion of a job is indicated by a IRQ. When a job finishes, then we update the local and the global GPU stats of that queue. But, while the GPU stats are being updated, a user might be reading the stats from sysfs or fdinfo. For example, on `gpu_stats_show()`, we could think about a scenario where `v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update the value of `v3d->queue[queue].start_ns` to 0, we come back to `gpu_stats_show()` to calculate `active_runtime` and now, `active_runtime = timestamp`. In this simple example, the user would see a spike in the queue usage, that didn't match reality. In order to address this issue properly, use a seqcount to protect read and write sections of the code. Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats") Reported-by: Tvrtko Ursulin Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_drv.c | 14 ++ drivers/gpu/drm/v3d/v3d_drv.h | 7 +++ drivers/gpu/drm/v3d/v3d_gem.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 7 +++ 4 files changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 2ec359ed2def..28b7ddce7747 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -121,6 +121,7 @@ v3d_open(struct drm_device *dev, struct drm_file *file) 1, NULL); memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i])); + seqcount_init(_priv->stats[i].lock); } v3d_perfmon_open_file(v3d_priv); @@ -145,10 +146,15 @@ v3d_postclose(struct drm_device *dev, struct drm_file *file) void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp, u64 *active_runtime, u64 *jobs_completed) { - *active_runtime = stats->enabled_ns; - if (stats->start_ns) - *active_runtime += timestamp - stats->start_ns; - *jobs_completed = stats->jobs_completed; + unsigned int seq; + + do { + seq = read_seqcount_begin(>lock); + *active_runtime = stats->enabled_ns; + if (stats->start_ns) + *active_runtime += timestamp - stats->start_ns; + *jobs_completed = stats->jobs_completed; + } while (read_seqcount_retry(>lock, seq)); } static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index ff06dc1cc078..a2c516fe6d79 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -40,6 +40,13 @@ struct v3d_stats { u64 start_ns; u64 enabled_ns; u64 jobs_completed; + + /* +* This seqcount is used to protect the access to the GPU stats +* variables. It must be used as, while we are reading the stats, +* IRQs can happen and the stats can be updated. +*/ + seqcount_t lock; }; struct v3d_queue_state { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index 0086081a9261..da8faf3b9011 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -251,6 +251,7 @@ v3d_gem_init(struct drm_device *dev) queue->fence_context = dma_fence_context_alloc(1); memset(>stats, 0, sizeof(queue->stats)); + seqcount_init(>stats.lock); } spin_lock_init(>mm_lock); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index b9614944931c..7cd8c335cd9b 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -114,16 +114,23 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_stats *local_stats = >stats[queue]; u64 now = local_clock(); + write_seqcount_begin(_stats->lock); local_stats->start_ns = now; + write_seqcount_end(_stats->lock); + + write_seqcount_begin(_stats->lock); global_stats->start_ns = now; + write_seqcount_end(_stats->lock); } static void v3d_stats_update(struct v3d_stats *stats, u64 now) { + write_seqcount_begin(>lock); stats->enabled_ns += now - stats->start_ns; stats->jobs_completed++; stats->start_ns = 0; + write_seqcount_end(>lock); } void -- 2.44.0
[PATCH v3 3/5] drm/v3d: Create function to update a set of GPU stats
Given a set of GPU stats, that is, a `struct v3d_stats` related to a queue in a given context, create a function that can update this set of GPU stats. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin Reviewed-by: Jose Maria Casanova Crespo --- drivers/gpu/drm/v3d/v3d_sched.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index b6b5542c3fcf..b9614944931c 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -118,6 +118,14 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) global_stats->start_ns = now; } +static void +v3d_stats_update(struct v3d_stats *stats, u64 now) +{ + stats->enabled_ns += now - stats->start_ns; + stats->jobs_completed++; + stats->start_ns = 0; +} + void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) { @@ -127,13 +135,8 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_stats *local_stats = >stats[queue]; u64 now = local_clock(); - local_stats->enabled_ns += now - local_stats->start_ns; - local_stats->jobs_completed++; - local_stats->start_ns = 0; - - global_stats->enabled_ns += now - global_stats->start_ns; - global_stats->jobs_completed++; - global_stats->start_ns = 0; + v3d_stats_update(local_stats, now); + v3d_stats_update(global_stats, now); } static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job) -- 2.44.0
[PATCH v3 1/5] drm/v3d: Create two functions to update all GPU stats variables
Currently, we manually perform all operations to update the GPU stats variables. Apart from the code repetition, this is very prone to errors, as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice"). Therefore, create two functions to manage updating all GPU stats variables. Now, the jobs only need to call for `v3d_job_update_stats()` when the job is done and `v3d_job_start_stats()` when starting the job. Co-developed-by: Tvrtko Ursulin Signed-off-by: Tvrtko Ursulin Signed-off-by: Maíra Canal Reviewed-by: Jose Maria Casanova Crespo --- drivers/gpu/drm/v3d/v3d_drv.h | 1 + drivers/gpu/drm/v3d/v3d_irq.c | 48 ++-- drivers/gpu/drm/v3d/v3d_sched.c | 80 +++-- 3 files changed, 40 insertions(+), 89 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 1950c723dde1..ee3545226d7f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c index ce6b2fb341d1..d469bda52c1a 100644 --- a/drivers/gpu/drm/v3d/v3d_irq.c +++ b/drivers/gpu/drm/v3d/v3d_irq.c @@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_FLDONE) { struct v3d_fence *fence = to_v3d_fence(v3d->bin_job->base.irq_fence); - struct v3d_file_priv *file = v3d->bin_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_BIN]; - - file->jobs_sent[V3D_BIN]++; - v3d->queue[V3D_BIN].jobs_sent++; - - file->start_ns[V3D_BIN] = 0; - v3d->queue[V3D_BIN].start_ns = 0; - - file->enabled_ns[V3D_BIN] += runtime; - v3d->queue[V3D_BIN].enabled_ns += runtime; + v3d_job_update_stats(>bin_job->base, V3D_BIN); trace_v3d_bcl_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_FRDONE) { struct v3d_fence *fence = to_v3d_fence(v3d->render_job->base.irq_fence); - struct v3d_file_priv *file = v3d->render_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_RENDER]; - - file->jobs_sent[V3D_RENDER]++; - v3d->queue[V3D_RENDER].jobs_sent++; - - file->start_ns[V3D_RENDER] = 0; - v3d->queue[V3D_RENDER].start_ns = 0; - - file->enabled_ns[V3D_RENDER] += runtime; - v3d->queue[V3D_RENDER].enabled_ns += runtime; + v3d_job_update_stats(>render_job->base, V3D_RENDER); trace_v3d_rcl_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_CSDDONE(v3d->ver)) { struct v3d_fence *fence = to_v3d_fence(v3d->csd_job->base.irq_fence); - struct v3d_file_priv *file = v3d->csd_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_CSD]; - - file->jobs_sent[V3D_CSD]++; - v3d->queue[V3D_CSD].jobs_sent++; - - file->start_ns[V3D_CSD] = 0; - v3d->queue[V3D_CSD].start_ns = 0; - - file->enabled_ns[V3D_CSD] += runtime; - v3d->queue[V3D_CSD].enabled_ns += runtime; + v3d_job_update_stats(>csd_job->base, V3D_CSD); trace_v3d_csd_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg) if (intsts & V3D_HUB_INT_TFUC) { struct v3d_fence *fence = to_v3d_fence(v3d->tfu_job->base.irq_fence); - struct v3d_file_priv *file = v3d->tfu_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_TFU]; - - file->jobs_sent[V3D_TFU]++; - v3d->queue[V3D_TFU].jobs_sent++; - - file->start_ns[V3D_TFU] = 0; - v3d->queue[V3D_TFU].start_ns = 0; - - file->enabled_ns[V3D_TFU] += runtime; - v3d->queue[V3D_TFU].enabled_ns += runtime; + v3d_job_u
[PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition
The first version of this series had the intention to fix two major issues with the GPU stats: 1. We were incrementing `enabled_ns` twice by the end of each job. 2. There is a race-condition between the IRQ handler and the users The first of the issues was already addressed and the fix was applied to drm-misc-fixes. Now, what is left, addresses the second issue. Apart from addressing this issue, this series improved the GPU stats code as a whole. We reduced code repetition, creating functions to start and update the GPU stats. This will likely reduce the odds of issue #1 happen again. v1 -> v2: https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/ - As the first patch was a bugfix, it was pushed to drm-misc-fixes. - [1/4] Add Chema Casanova's R-b - [2/4] s/jobs_sent/jobs_completed and add the reasoning in the commit message (Chema Casanova) - [2/4] Add Chema Casanova's and Tvrtko Ursulin's R-b - [3/4] Call `local_clock()` only once, by adding a new parameter to the `v3d_stats_update` function (Chema Casanova) - [4/4] Move new line to the correct patch [2/4] (Tvrtko Ursulin) - [4/4] Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko Ursulin) v2 -> v3: https://lore.kernel.org/dri-devel/20240417011021.600889-1-mca...@igalia.com/T/ - [4/5] New patch: separates the code refactor from the race-condition fix (Tvrtko Ursulin) - [5/5] s/interruption/interrupt (Tvrtko Ursulin) - [5/5] s/matches/match (Tvrtko Ursulin) - [5/5] Add Tvrtko Ursulin's R-b Best Regards, - Maíra Maíra Canal (5): drm/v3d: Create two functions to update all GPU stats variables drm/v3d: Create a struct to store the GPU stats drm/v3d: Create function to update a set of GPU stats drm/v3d: Decouple stats calculation from printing drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler drivers/gpu/drm/v3d/v3d_drv.c | 33 drivers/gpu/drm/v3d/v3d_drv.h | 30 --- drivers/gpu/drm/v3d/v3d_gem.c | 9 ++-- drivers/gpu/drm/v3d/v3d_irq.c | 48 ++--- drivers/gpu/drm/v3d/v3d_sched.c | 94 + drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++--- 6 files changed, 109 insertions(+), 118 deletions(-) -- 2.44.0
[PATCH v3 2/5] drm/v3d: Create a struct to store the GPU stats
This will make it easier to instantiate the GPU stats variables and it will create a structure where we can store all the variables that refer to GPU stats. Note that, when we created the struct `v3d_stats`, we renamed `jobs_sent` to `jobs_completed`. This better express the semantics of the variable, as we are only accounting jobs that have been completed. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin Reviewed-by: Jose Maria Casanova Crespo --- drivers/gpu/drm/v3d/v3d_drv.c | 15 +++ drivers/gpu/drm/v3d/v3d_drv.h | 18 ++ drivers/gpu/drm/v3d/v3d_gem.c | 8 drivers/gpu/drm/v3d/v3d_sched.c | 20 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++ 5 files changed, 39 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3debf37e7d9b..52e3ba9df46f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -115,14 +115,12 @@ v3d_open(struct drm_device *dev, struct drm_file *file) v3d_priv->v3d = v3d; for (i = 0; i < V3D_MAX_QUEUES; i++) { - v3d_priv->enabled_ns[i] = 0; - v3d_priv->start_ns[i] = 0; - v3d_priv->jobs_sent[i] = 0; - sched = >queue[i].sched; drm_sched_entity_init(_priv->sched_entity[i], DRM_SCHED_PRIORITY_NORMAL, , 1, NULL); + + memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i])); } v3d_perfmon_open_file(v3d_priv); @@ -151,20 +149,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) enum v3d_queue queue; for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { + struct v3d_stats *stats = _priv->stats[queue]; + /* Note that, in case of a GPU reset, the time spent during an * attempt of executing the job is not computed in the runtime. */ drm_printf(p, "drm-engine-%s: \t%llu ns\n", v3d_queue_to_string(queue), - file_priv->start_ns[queue] ? file_priv->enabled_ns[queue] - + timestamp - file_priv->start_ns[queue] - : file_priv->enabled_ns[queue]); + stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns + : stats->enabled_ns); /* Note that we only count jobs that completed. Therefore, jobs * that were resubmitted due to a GPU reset are not computed. */ drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", - v3d_queue_to_string(queue), file_priv->jobs_sent[queue]); + v3d_queue_to_string(queue), stats->jobs_completed); } } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index ee3545226d7f..5a198924d568 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue queue) return "UNKNOWN"; } +struct v3d_stats { + u64 start_ns; + u64 enabled_ns; + u64 jobs_completed; +}; + struct v3d_queue_state { struct drm_gpu_scheduler sched; u64 fence_context; u64 emit_seqno; - u64 start_ns; - u64 enabled_ns; - u64 jobs_sent; + /* Stores the GPU stats for this queue in the global context. */ + struct v3d_stats stats; }; /* Performance monitor object. The perform lifetime is controlled by userspace @@ -188,11 +193,8 @@ struct v3d_file_priv { struct drm_sched_entity sched_entity[V3D_MAX_QUEUES]; - u64 start_ns[V3D_MAX_QUEUES]; - - u64 enabled_ns[V3D_MAX_QUEUES]; - - u64 jobs_sent[V3D_MAX_QUEUES]; + /* Stores the GPU stats for a specific queue for this fd. */ + struct v3d_stats stats[V3D_MAX_QUEUES]; }; struct v3d_bo { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index afc565078c78..0086081a9261 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -247,10 +247,10 @@ v3d_gem_init(struct drm_device *dev) int ret, i; for (i = 0; i < V3D_MAX_QUEUES; i++) { - v3d->queue[i].fence_context = dma_fence_context_alloc(1); - v3d->queue[i].start_ns = 0; - v3d->queue[i].enabled_ns = 0; - v3d->queue[i].jobs_sent = 0; + struct v3d_queue_state *queue = >queue[i]; + + queue->fence_context = dma_fence_context_alloc(1); + memset(>stats, 0, sizeof(queue->stats)); }
[PATCH v2 4/4] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler
In V3D, the conclusion of a job is indicated by a IRQ. When a job finishes, then we update the local and the global GPU stats of that queue. But, while the GPU stats are being updated, a user might be reading the stats from sysfs or fdinfo. For example, on `gpu_stats_show()`, we could think about a scenario where `v3d->queue[queue].start_ns != 0`, then an interruption happens, we update the value of `v3d->queue[queue].start_ns` to 0, we come back to `gpu_stats_show()` to calculate `active_runtime` and now, `active_runtime = timestamp`. In this simple example, the user would see a spike in the queue usage, that didn't matches reality. In order to address this issue properly, use a seqcount to protect read and write sections of the code. Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats") Reported-by: Tvrtko Ursulin Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 10 ++ drivers/gpu/drm/v3d/v3d_drv.h | 21 + drivers/gpu/drm/v3d/v3d_gem.c | 7 +-- drivers/gpu/drm/v3d/v3d_sched.c | 7 +++ drivers/gpu/drm/v3d/v3d_sysfs.c | 11 +++ 5 files changed, 42 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 52e3ba9df46f..cf15fa142968 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -121,6 +121,7 @@ v3d_open(struct drm_device *dev, struct drm_file *file) 1, NULL); memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i])); + seqcount_init(_priv->stats[i].lock); } v3d_perfmon_open_file(v3d_priv); @@ -150,20 +151,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { struct v3d_stats *stats = _priv->stats[queue]; + u64 active_runtime, jobs_completed; + + v3d_get_stats(stats, timestamp, _runtime, _completed); /* Note that, in case of a GPU reset, the time spent during an * attempt of executing the job is not computed in the runtime. */ drm_printf(p, "drm-engine-%s: \t%llu ns\n", - v3d_queue_to_string(queue), - stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns - : stats->enabled_ns); + v3d_queue_to_string(queue), active_runtime); /* Note that we only count jobs that completed. Therefore, jobs * that were resubmitted due to a GPU reset are not computed. */ drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", - v3d_queue_to_string(queue), stats->jobs_completed); + v3d_queue_to_string(queue), jobs_completed); } } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 5a198924d568..5211df7c7317 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -40,8 +40,29 @@ struct v3d_stats { u64 start_ns; u64 enabled_ns; u64 jobs_completed; + + /* +* This seqcount is used to protect the access to the GPU stats +* variables. It must be used as, while we are reading the stats, +* IRQs can happen and the stats can be updated. +*/ + seqcount_t lock; }; +static inline void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp, +u64 *active_runtime, u64 *jobs_completed) +{ + unsigned int seq; + + do { + seq = read_seqcount_begin(>lock); + *active_runtime = stats->enabled_ns; + if (stats->start_ns) + *active_runtime += timestamp - stats->start_ns; + *jobs_completed = stats->jobs_completed; + } while (read_seqcount_retry(>lock, seq)); +} + struct v3d_queue_state { struct drm_gpu_scheduler sched; diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index d14589d3ae6c..da8faf3b9011 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -247,8 +247,11 @@ v3d_gem_init(struct drm_device *dev) int ret, i; for (i = 0; i < V3D_MAX_QUEUES; i++) { - v3d->queue[i].fence_context = dma_fence_context_alloc(1); - memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats)); + struct v3d_queue_state *queue = >queue[i]; + + queue->fence_context = dma_fence_context_alloc(1); + memset(>stats, 0, sizeof(queue->stats)); + seqcount_init(>stats.lock); } spin_lock_init(>mm_lock); diff --git a/drivers/g
[PATCH v2 3/4] drm/v3d: Create function to update a set of GPU stats
Given a set of GPU stats, that is, a `struct v3d_stats` related to a queue in a given context, create a function that can update this set of GPU stats. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin Reviewed-by: Jose Maria Casanova Crespo --- drivers/gpu/drm/v3d/v3d_sched.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index b6b5542c3fcf..b9614944931c 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -118,6 +118,14 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) global_stats->start_ns = now; } +static void +v3d_stats_update(struct v3d_stats *stats, u64 now) +{ + stats->enabled_ns += now - stats->start_ns; + stats->jobs_completed++; + stats->start_ns = 0; +} + void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) { @@ -127,13 +135,8 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_stats *local_stats = >stats[queue]; u64 now = local_clock(); - local_stats->enabled_ns += now - local_stats->start_ns; - local_stats->jobs_completed++; - local_stats->start_ns = 0; - - global_stats->enabled_ns += now - global_stats->start_ns; - global_stats->jobs_completed++; - global_stats->start_ns = 0; + v3d_stats_update(local_stats, now); + v3d_stats_update(global_stats, now); } static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job) -- 2.44.0
[PATCH v2 2/4] drm/v3d: Create a struct to store the GPU stats
This will make it easier to instantiate the GPU stats variables and it will create a structure where we can store all the variables that refer to GPU stats. Note that, when we created the struct `v3d_stats`, we renamed `jobs_sent` to `jobs_completed`. This better express the semantics of the variable, as we are only accounting jobs that have been completed. Signed-off-by: Maíra Canal Reviewed-by: Tvrtko Ursulin Reviewed-by: Jose Maria Casanova Crespo --- drivers/gpu/drm/v3d/v3d_drv.c | 15 +++ drivers/gpu/drm/v3d/v3d_drv.h | 18 ++ drivers/gpu/drm/v3d/v3d_gem.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 20 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++ 5 files changed, 36 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3debf37e7d9b..52e3ba9df46f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -115,14 +115,12 @@ v3d_open(struct drm_device *dev, struct drm_file *file) v3d_priv->v3d = v3d; for (i = 0; i < V3D_MAX_QUEUES; i++) { - v3d_priv->enabled_ns[i] = 0; - v3d_priv->start_ns[i] = 0; - v3d_priv->jobs_sent[i] = 0; - sched = >queue[i].sched; drm_sched_entity_init(_priv->sched_entity[i], DRM_SCHED_PRIORITY_NORMAL, , 1, NULL); + + memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i])); } v3d_perfmon_open_file(v3d_priv); @@ -151,20 +149,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) enum v3d_queue queue; for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { + struct v3d_stats *stats = _priv->stats[queue]; + /* Note that, in case of a GPU reset, the time spent during an * attempt of executing the job is not computed in the runtime. */ drm_printf(p, "drm-engine-%s: \t%llu ns\n", v3d_queue_to_string(queue), - file_priv->start_ns[queue] ? file_priv->enabled_ns[queue] - + timestamp - file_priv->start_ns[queue] - : file_priv->enabled_ns[queue]); + stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns + : stats->enabled_ns); /* Note that we only count jobs that completed. Therefore, jobs * that were resubmitted due to a GPU reset are not computed. */ drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", - v3d_queue_to_string(queue), file_priv->jobs_sent[queue]); + v3d_queue_to_string(queue), stats->jobs_completed); } } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index ee3545226d7f..5a198924d568 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue queue) return "UNKNOWN"; } +struct v3d_stats { + u64 start_ns; + u64 enabled_ns; + u64 jobs_completed; +}; + struct v3d_queue_state { struct drm_gpu_scheduler sched; u64 fence_context; u64 emit_seqno; - u64 start_ns; - u64 enabled_ns; - u64 jobs_sent; + /* Stores the GPU stats for this queue in the global context. */ + struct v3d_stats stats; }; /* Performance monitor object. The perform lifetime is controlled by userspace @@ -188,11 +193,8 @@ struct v3d_file_priv { struct drm_sched_entity sched_entity[V3D_MAX_QUEUES]; - u64 start_ns[V3D_MAX_QUEUES]; - - u64 enabled_ns[V3D_MAX_QUEUES]; - - u64 jobs_sent[V3D_MAX_QUEUES]; + /* Stores the GPU stats for a specific queue for this fd. */ + struct v3d_stats stats[V3D_MAX_QUEUES]; }; struct v3d_bo { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index afc565078c78..d14589d3ae6c 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -248,9 +248,7 @@ v3d_gem_init(struct drm_device *dev) for (i = 0; i < V3D_MAX_QUEUES; i++) { v3d->queue[i].fence_context = dma_fence_context_alloc(1); - v3d->queue[i].start_ns = 0; - v3d->queue[i].enabled_ns = 0; - v3d->queue[i].jobs_sent = 0; + memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats)); } spin_lock_init(>mm_lock); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 8ca61bcd4b1
[PATCH v2 0/4] drm/v3d: Fix GPU stats inconsistencies and race-condition
The first version of this series had the intention to fix two major issues with the GPU stats: 1. We were incrementing `enabled_ns` twice by the end of each job. 2. There is a race-condition between the IRQ handler and the users The first of the issues was already addressed and the fix was applied to drm-misc-fixes. Now, what is left, addresses the second issue. Apart from addressing this issue, this series improved the GPU stats code as a whole. We reduced code repetition as a whole, creating functions to start and update the GPU stats. This will likely reduce the odds of issue #1 happen again. Best Regards, - Maíra v1 -> v2: https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/ * As the first patch was a bugfix, it was pushed to drm-misc-fixes. * [1/4]: Add Chema Casanova's R-b * [2/4]: s/jobs_sent/jobs_completed and add the reasoning in the commit message (Chema Casanova) * [2/4]: Add Chema Casanova's and Tvrtko Ursulin's R-b * [3/4]: Call `local_clock()` only once, by adding a new parameter to the `v3d_stats_update` function (Chema Casanova) * [4/4]: Move new line to the correct patch (2/4) (Tvrtko Ursulin) * [4/4]: Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko Ursulin) Maíra Canal (4): drm/v3d: Create two functions to update all GPU stats variables drm/v3d: Create a struct to store the GPU stats drm/v3d: Create function to update a set of GPU stats drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler drivers/gpu/drm/v3d/v3d_drv.c | 19 +++ drivers/gpu/drm/v3d/v3d_drv.h | 40 +++--- drivers/gpu/drm/v3d/v3d_gem.c | 9 ++-- drivers/gpu/drm/v3d/v3d_irq.c | 48 ++--- drivers/gpu/drm/v3d/v3d_sched.c | 94 + drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++--- 6 files changed, 105 insertions(+), 118 deletions(-) -- 2.44.0
[PATCH v2 1/4] drm/v3d: Create two functions to update all GPU stats variables
Currently, we manually perform all operations to update the GPU stats variables. Apart from the code repetition, this is very prone to errors, as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice"). Therefore, create two functions to manage updating all GPU stats variables. Now, the jobs only need to call for `v3d_job_update_stats()` when the job is done and `v3d_job_start_stats()` when starting the job. Co-developed-by: Tvrtko Ursulin Signed-off-by: Tvrtko Ursulin Signed-off-by: Maíra Canal Reviewed-by: Jose Maria Casanova Crespo --- drivers/gpu/drm/v3d/v3d_drv.h | 1 + drivers/gpu/drm/v3d/v3d_irq.c | 48 ++-- drivers/gpu/drm/v3d/v3d_sched.c | 80 +++-- 3 files changed, 40 insertions(+), 89 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 1950c723dde1..ee3545226d7f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c index ce6b2fb341d1..d469bda52c1a 100644 --- a/drivers/gpu/drm/v3d/v3d_irq.c +++ b/drivers/gpu/drm/v3d/v3d_irq.c @@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_FLDONE) { struct v3d_fence *fence = to_v3d_fence(v3d->bin_job->base.irq_fence); - struct v3d_file_priv *file = v3d->bin_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_BIN]; - - file->jobs_sent[V3D_BIN]++; - v3d->queue[V3D_BIN].jobs_sent++; - - file->start_ns[V3D_BIN] = 0; - v3d->queue[V3D_BIN].start_ns = 0; - - file->enabled_ns[V3D_BIN] += runtime; - v3d->queue[V3D_BIN].enabled_ns += runtime; + v3d_job_update_stats(>bin_job->base, V3D_BIN); trace_v3d_bcl_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_FRDONE) { struct v3d_fence *fence = to_v3d_fence(v3d->render_job->base.irq_fence); - struct v3d_file_priv *file = v3d->render_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_RENDER]; - - file->jobs_sent[V3D_RENDER]++; - v3d->queue[V3D_RENDER].jobs_sent++; - - file->start_ns[V3D_RENDER] = 0; - v3d->queue[V3D_RENDER].start_ns = 0; - - file->enabled_ns[V3D_RENDER] += runtime; - v3d->queue[V3D_RENDER].enabled_ns += runtime; + v3d_job_update_stats(>render_job->base, V3D_RENDER); trace_v3d_rcl_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_CSDDONE(v3d->ver)) { struct v3d_fence *fence = to_v3d_fence(v3d->csd_job->base.irq_fence); - struct v3d_file_priv *file = v3d->csd_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_CSD]; - - file->jobs_sent[V3D_CSD]++; - v3d->queue[V3D_CSD].jobs_sent++; - - file->start_ns[V3D_CSD] = 0; - v3d->queue[V3D_CSD].start_ns = 0; - - file->enabled_ns[V3D_CSD] += runtime; - v3d->queue[V3D_CSD].enabled_ns += runtime; + v3d_job_update_stats(>csd_job->base, V3D_CSD); trace_v3d_csd_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg) if (intsts & V3D_HUB_INT_TFUC) { struct v3d_fence *fence = to_v3d_fence(v3d->tfu_job->base.irq_fence); - struct v3d_file_priv *file = v3d->tfu_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_TFU]; - - file->jobs_sent[V3D_TFU]++; - v3d->queue[V3D_TFU].jobs_sent++; - - file->start_ns[V3D_TFU] = 0; - v3d->queue[V3D_TFU].start_ns = 0; - - file->enabled_ns[V3D_TFU] += runtime; - v3d->queue[V3D_TFU].enabled_ns += runtime; + v3d_job_u
Re: [PATCH v2 20/43] drm/vkms: Use fbdev-shmem
On 4/10/24 10:02, Thomas Zimmermann wrote: Implement fbdev emulation with fbdev-shmem. Avoids the overhead of fbdev-generic's additional shadow buffering. No functional changes. Signed-off-by: Thomas Zimmermann Acked-by: Maíra Canal Best Regards, - Maíra Cc: Rodrigo Siqueira Cc: Melissa Wen Cc: "Maíra Canal" Cc: Haneen Mohammed Cc: Daniel Vetter --- drivers/gpu/drm/vkms/vkms_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c index dd0af086e7fa9..8dc9dc13896e9 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.c +++ b/drivers/gpu/drm/vkms/vkms_drv.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include #include #include @@ -223,7 +223,7 @@ static int vkms_create(struct vkms_config *config) if (ret) goto out_devres; - drm_fbdev_generic_setup(_device->drm, 0); + drm_fbdev_shmem_setup(_device->drm, 0); return 0;
Re: [PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D
On 4/16/24 02:30, Stefan Wahren wrote: Hi Maíra, Am 16.04.24 um 03:02 schrieb Maíra Canal: On 4/15/24 13:54, Andre Przywara wrote: On Mon, 15 Apr 2024 13:00:39 -0300 Maíra Canal wrote: Hi, RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to the RPi. Currently, the downstream kernel uses an overlay to enable the GPU and use GPU hardware acceleration. When deploying a mainline kernel to the RPi 0-3, we end up without any GPU hardware acceleration (essentially, we can't use the OpenGL driver). Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel. So I think Krzysztof's initial comment still stands: What does that patch actually change? If I build those DTBs as of now, none of them has a status property in the v3d node. Which means it's enabled: https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter2-devicetree-basics.rst#status So adding an explicit 'status = "okay";' doesn't make a difference. What do I miss here? As mentioned by Stefan in the last version, in Raspberry Pi OS, there is a systemd script which is trying to check for the V3D driver (/usr/lib /systemd/scripts/gldriver_test.sh). Within the first check, "raspi- config nonint is_kms" is called, which always seems to fail. What "raspi-config" does is check if /proc/device-tree/soc/v3d@7ec0/status is equal to "okay". As /proc/device-tree/soc/v3d@7ec0/status doesn't exists, it returns false. yes, but i also mention that the V3D driver starts without this patch. The commit message of this patch suggests this is a DT issue, which is not. I hadn't the time to update my SD card to Bookworm yet. Does the issue still exists with this version? I'm using a 32-bit kernel and the recommended OS for 32-bit is Bullseye. But I checked the Bookworm code and indeed, Bookworm doesn't check the device tree [1]. I'm thinking about sending a patch to the Bullseye branch to fix this issue. [1] https://github.com/RPi-Distro/raspi-config/blob/966ed3fecc159ff3e69a774d74bfd716c04dafff/raspi-config#L128 Best Regards, - Maíra I'll send if I can improve the userspace tool by just checking if the folder /proc/device-tree/soc/v3d@7ec0/ exists. >> Thanks for the explanation! Best Regards, - Maíra Cheers, Andre Signed-off-by: Maíra Canal --- v1 -> v2: https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/ * As mentioned by Krzysztof, enabling should be done in last place of override/extend. Therefore, I'm disabling V3D in the common dtsi and enabling in the last place of extend, i.e. the RPi DTS files. arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 + arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 15 files changed, 57 insertions(+) diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi index 9261b67dbee1..69e34831de51 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi +++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi @@ -139,6 +139,7 @@ v3d: v3d@7ec0 { compatible = "brcm,bcm2835-v3d"; reg = <0x7ec0 0x1000>; interrupts = <1 10>; + status = "disabled"; }; vc4: gpu { diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts index 069b48272aa5..495ab1dfd2ce 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts @@ -128,3 +128,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts index 2726c00431e8..4634d88ce3af 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts @@ -121,3 +121,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broad
Re: [PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D
On 4/15/24 13:54, Andre Przywara wrote: On Mon, 15 Apr 2024 13:00:39 -0300 Maíra Canal wrote: Hi, RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to the RPi. Currently, the downstream kernel uses an overlay to enable the GPU and use GPU hardware acceleration. When deploying a mainline kernel to the RPi 0-3, we end up without any GPU hardware acceleration (essentially, we can't use the OpenGL driver). Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel. So I think Krzysztof's initial comment still stands: What does that patch actually change? If I build those DTBs as of now, none of them has a status property in the v3d node. Which means it's enabled: https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter2-devicetree-basics.rst#status So adding an explicit 'status = "okay";' doesn't make a difference. What do I miss here? As mentioned by Stefan in the last version, in Raspberry Pi OS, there is a systemd script which is trying to check for the V3D driver (/usr/lib /systemd/scripts/gldriver_test.sh). Within the first check, "raspi- config nonint is_kms" is called, which always seems to fail. What "raspi-config" does is check if /proc/device-tree/soc/v3d@7ec0/status is equal to "okay". As /proc/device-tree/soc/v3d@7ec0/status doesn't exists, it returns false. I'll send if I can improve the userspace tool by just checking if the folder /proc/device-tree/soc/v3d@7ec0/ exists. Thanks for the explanation! Best Regards, - Maíra Cheers, Andre Signed-off-by: Maíra Canal --- v1 -> v2: https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/ * As mentioned by Krzysztof, enabling should be done in last place of override/extend. Therefore, I'm disabling V3D in the common dtsi and enabling in the last place of extend, i.e. the RPi DTS files. arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 + arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts| 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts| 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 15 files changed, 57 insertions(+) diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi index 9261b67dbee1..69e34831de51 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi +++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi @@ -139,6 +139,7 @@ v3d: v3d@7ec0 { compatible = "brcm,bcm2835-v3d"; reg = <0x7ec0 0x1000>; interrupts = <1 10>; + status = "disabled"; }; vc4: gpu { diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts index 069b48272aa5..495ab1dfd2ce 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts @@ -128,3 +128,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts index 2726c00431e8..4634d88ce3af 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts @@ -121,3 +121,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts index c57b999a4520..45fa0f6851fc 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts @@ -130,3 +130,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts index ae6d3a9586ab..c1dac5d704aa 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts @@ -121,3 +121,7
Re: [PATCH 1/5] drm/v3d: Don't increment `enabled_ns` twice
On 4/3/24 17:24, Maíra Canal wrote: The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs") introduced the calculation of global GPU stats. For the regards, it used the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats"). While adding global GPU stats calculation ability, the author forgot to delete the existing one. Currently, the value of `enabled_ns` is incremented twice by the end of the job, when it should be added just once. Therefore, delete the leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs"). Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs") Reported-by: Tvrtko Ursulin Signed-off-by: Maíra Canal As this patch is a isolated bugfix and it was reviewed by two developers, I'm applying it to drm-misc/drm-misc-fixes. I'll address the feedback for the rest of the series later and send a v2. Best Regards, - Maíra --- drivers/gpu/drm/v3d/v3d_irq.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c index 2e04f6cb661e..ce6b2fb341d1 100644 --- a/drivers/gpu/drm/v3d/v3d_irq.c +++ b/drivers/gpu/drm/v3d/v3d_irq.c @@ -105,7 +105,6 @@ v3d_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->bin_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_BIN]; - file->enabled_ns[V3D_BIN] += local_clock() - file->start_ns[V3D_BIN]; file->jobs_sent[V3D_BIN]++; v3d->queue[V3D_BIN].jobs_sent++; @@ -126,7 +125,6 @@ v3d_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->render_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_RENDER]; - file->enabled_ns[V3D_RENDER] += local_clock() - file->start_ns[V3D_RENDER]; file->jobs_sent[V3D_RENDER]++; v3d->queue[V3D_RENDER].jobs_sent++; @@ -147,7 +145,6 @@ v3d_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->csd_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_CSD]; - file->enabled_ns[V3D_CSD] += local_clock() - file->start_ns[V3D_CSD]; file->jobs_sent[V3D_CSD]++; v3d->queue[V3D_CSD].jobs_sent++; @@ -195,7 +192,6 @@ v3d_hub_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->tfu_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_TFU]; - file->enabled_ns[V3D_TFU] += local_clock() - file->start_ns[V3D_TFU]; file->jobs_sent[V3D_TFU]++; v3d->queue[V3D_TFU].jobs_sent++;
Re: [PATCH] dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FS
Hi Tvrtko, On 4/1/24 10:21, Tvrtko Ursulin wrote: On 01/04/2024 13:45, Christian König wrote: Am 01.04.24 um 14:39 schrieb Tvrtko Ursulin: On 29/03/2024 00:00, T.J. Mercier wrote: On Thu, Mar 28, 2024 at 7:53 AM Tvrtko Ursulin wrote: From: Tvrtko Ursulin There is no point in compiling in the list and mutex operations which are only used from the dma-buf debugfs code, if debugfs is not compiled in. Put the code in questions behind some kconfig guards and so save some text and maybe even a pointer per object at runtime when not enabled. Signed-off-by: Tvrtko Ursulin Reviewed-by: T.J. Mercier Thanks! How would patches to dma-buf be typically landed? Via what tree I mean? drm-misc-next? That should go through drm-misc-next. And feel free to add Reviewed-by: Christian König as well. Thanks! Maarten if I got it right you are handling the next drm-misc-next pull - could you merge this one please? Applied to drm-misc/drm-misc-next! Best Regards, - Maíra Regards, Tvrtko
Re: [PATCH] drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_buffer
On 4/15/24 12:19, Jocelyn Falempe wrote: Hi, You're right, I messed up the rename, and I mostly test on x86, where I don't build the imx driver. Reviewed-by: Jocelyn Falempe Best regards, Applied to drm-misc/drm-misc-next! Best Regards, - Maíra
[PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D
RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to the RPi. Currently, the downstream kernel uses an overlay to enable the GPU and use GPU hardware acceleration. When deploying a mainline kernel to the RPi 0-3, we end up without any GPU hardware acceleration (essentially, we can't use the OpenGL driver). Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel. Signed-off-by: Maíra Canal --- v1 -> v2: https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/ * As mentioned by Krzysztof, enabling should be done in last place of override/extend. Therefore, I'm disabling V3D in the common dtsi and enabling in the last place of extend, i.e. the RPi DTS files. arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 + arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts| 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts| 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts | 4 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts | 4 arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 15 files changed, 57 insertions(+) diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi index 9261b67dbee1..69e34831de51 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi +++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi @@ -139,6 +139,7 @@ v3d: v3d@7ec0 { compatible = "brcm,bcm2835-v3d"; reg = <0x7ec0 0x1000>; interrupts = <1 10>; + status = "disabled"; }; vc4: gpu { diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts index 069b48272aa5..495ab1dfd2ce 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts @@ -128,3 +128,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts index 2726c00431e8..4634d88ce3af 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts @@ -121,3 +121,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts index c57b999a4520..45fa0f6851fc 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts @@ -130,3 +130,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts index ae6d3a9586ab..c1dac5d704aa 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts @@ -121,3 +121,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts index 72764be75a79..72ca31f2a7d6 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts @@ -115,3 +115,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts index 3f9d198ac3ab..881a07d2f28f 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts @@ -95,3 +95,7 @@ { pinctrl-0 = <_gpio14>; status = "okay"; }; + + { + status = "okay"; +}; diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts b/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts index 1f0b163e400c..1c7324067442 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts +++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts @@ -134,6
[PATCH] drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_buffer
On version 11, Thomas suggested to change the name of the function and this request was applied on version 12, which is the version that landed. Although the name of the function changed on the C file, it didn't changed on the header file, leading to a compilation error as such: drivers/gpu/drm/imx/ipuv3/ipuv3-plane.c:780:24: error: use of undeclared identifier 'drm_fb_dma_get_scanout_buffer'; did you mean 'drm_panic_gem_get_scanout_buffer'? 780 | .get_scanout_buffer = drm_fb_dma_get_scanout_buffer, | ^ | drm_panic_gem_get_scanout_buffer ./include/drm/drm_fb_dma_helper.h:23:5: note: 'drm_panic_gem_get_scanout_buffer' declared here 23 | int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane, | ^ 1 error generated. Best Regards, - Maíra Fixes: 879b3b6511fe ("drm/fb_dma: Add generic get_scanout_buffer() for drm_panic" Signed-off-by: Maíra Canal --- include/drm/drm_fb_dma_helper.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/drm/drm_fb_dma_helper.h b/include/drm/drm_fb_dma_helper.h index 61f24c2aba2f..c950732c6d36 100644 --- a/include/drm/drm_fb_dma_helper.h +++ b/include/drm/drm_fb_dma_helper.h @@ -6,6 +6,7 @@ struct drm_device; struct drm_framebuffer; +struct drm_plane; struct drm_plane_state; struct drm_scanout_buffer; @@ -20,8 +21,8 @@ void drm_fb_dma_sync_non_coherent(struct drm_device *drm, struct drm_plane_state *old_state, struct drm_plane_state *state); -int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane, -struct drm_scanout_buffer *sb); +int drm_fb_dma_get_scanout_buffer(struct drm_plane *plane, + struct drm_scanout_buffer *sb); #endif -- 2.44.0
Re: [PATCH] ARM: dts: bcm2835: Enable 3D rendering through V3D
Hi Phil, On 4/14/24 15:43, Phil Elwell wrote: Hello all, On Fri, 12 Apr 2024 at 18:17, Stefan Wahren wrote: Hi Maíra, [add Phil & Dave] Am 12.04.24 um 15:25 schrieb Maíra Canal: RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to the RPi. Currently, the downstream kernel uses an overlay to enable the GPU and use GPU hardware acceleration. When deploying a mainline kernel to the RPi 0-3, we end up without any GPU hardware acceleration (essentially, we can't use the OpenGL driver). Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel. thanks for trying to improve the combination Raspberry Pi OS + Mainline Kernel. I think i'm able to reproduce the issue with Raspberry Pi 3 B + on Buster. Buster? We launched Buster with 4.19 and ended on 5.10. We've moved onto Bookworm now. A lot has changed in that time... From the kernel side everything looks good: [ 11.054833] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4]) [ 11.055118] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4]) [ 11.055340] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4]) [ 11.055521] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 11.055695] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 11.055874] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 11.056020] vc4-drm soc:gpu: bound 3fc0.v3d (ops vc4_v3d_ops [vc4]) [ 11.063277] Bluetooth: hci0: BCM4345C0 'brcm/BCM4345C0.raspberrypi,3-model-b-plus.hcd' Patch [ 11.070466] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0 [ 11.174803] Console: switching to colour frame buffer device 240x75 [ 11.205125] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device But in Raspberry Pi OS there is a systemd script which is trying to check for the V3D driver /usr/lib/systemd/scripts/gldriver_test.sh Within the first check "raspi-config nonint is_kms" is called, which always seems to fail. If i run strace on this command it seems to check for /proc/device-tree/soc/v3d@7ec0/status which doesn't exists in the Mainline device tree. Maybe there is a chance to improve the userspace tool? ...such as the raspi-config tool, which now always succeeds for is_kms. I'm using Raspberry Pi OS Bulleye with the raspi-config tool on version 20231012~bulleye. I can still reproduce this issue when using a upstream kernel. I ran `sudo apt upgrade`, but a new version of the raspi-config tool didn't appeared. Best Regards, - Maíra Phil Signed-off-by: Maíra Canal --- I decided to add the status property to the `bcm2835-common.dtsi`, but there are two other options: 1. To add the status property to the `bcm2835-rpi-common.dtsi` file 2. To add the status property to each individual RPi model, e.g. `bcm2837-rpi-3-b.dts`. Let me know which option is more suitable, and if `bcm2835-common.dtsi` is not the best option, I can send a v2. Best Regards, - Maíra arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi index 9261b67dbee1..851a6bce1939 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi +++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi @@ -139,6 +139,7 @@ v3d: v3d@7ec0 { compatible = "brcm,bcm2835-v3d"; reg = <0x7ec0 0x1000>; interrupts = <1 10>; + status = "okay"; }; vc4: gpu {
[PATCH] ARM: dts: bcm2835: Enable 3D rendering through V3D
RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to the RPi. Currently, the downstream kernel uses an overlay to enable the GPU and use GPU hardware acceleration. When deploying a mainline kernel to the RPi 0-3, we end up without any GPU hardware acceleration (essentially, we can't use the OpenGL driver). Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel. Signed-off-by: Maíra Canal --- I decided to add the status property to the `bcm2835-common.dtsi`, but there are two other options: 1. To add the status property to the `bcm2835-rpi-common.dtsi` file 2. To add the status property to each individual RPi model, e.g. `bcm2837-rpi-3-b.dts`. Let me know which option is more suitable, and if `bcm2835-common.dtsi` is not the best option, I can send a v2. Best Regards, - Maíra arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi index 9261b67dbee1..851a6bce1939 100644 --- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi +++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi @@ -139,6 +139,7 @@ v3d: v3d@7ec0 { compatible = "brcm,bcm2835-v3d"; reg = <0x7ec0 0x1000>; interrupts = <1 10>; + status = "okay"; }; vc4: gpu { -- 2.44.0
Re: [PATCH v6 3/8] drm/ci: uprev IGT and update testlist
On 4/9/24 12:15, Dmitry Baryshkov wrote: On Tue, Apr 09, 2024 at 07:22:38PM +0530, Vignesh Raman wrote: Hi Maíra, On 09/04/24 15:10, Maíra Canal wrote: On 4/9/24 05:13, Vignesh Raman wrote: Uprev IGT and add amd, v3d, vc4 and vgem specific tests to testlist and skip driver-specific tests in *-skips.txt. Also add testlist to the MAINTAINERS file and update xfails. A better approach would be to stop vendoring the testlist into the kernel and instead use testlist from the IGT build to ensure we do not miss renamed or newly added tests. This implementation is planned for the future. How problamatic would be to just do this in this test series, instead of adding a huge testlist that we need to maintain synced with IGT? Is it okay if these changes are submitted in another patch series to avoid delaying the current one. There are patches like vkms which are blocked due to the mesa uprev patch. We would also need to rerun all jobs and update xfails with the new testlist. In next series we could uprev IGT to the latest version and use the testlist from the build and remove the one in drm-ci. We can also test with the latest kernel. I will work on this. Please let me know your thoughts. As we have to rebase/retest anyway, I think it makes more sense to land from-IGT-test-list first, fixing it for the devices that are currently present, and to land the rest afterwards. As for the IGT uprev we have been waititng for that for quite a while (I think I've event sent a patch a while ago) in order to fix test failures on drm/msm. Agreed on that. Best Regards - Maíra Regards, Vignesh Best Regards, - Maíra Acked-by: Helen Koike Signed-off-by: Vignesh Raman --- v3: - New patch in series to uprev IGT and update testlist. v4: - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes. v5: - Keep single testlist and update xfails. Skip driver specific tests. v6: - Update xfails. --- MAINTAINERS | 8 + drivers/gpu/drm/ci/gitlab-ci.yml | 2 +- drivers/gpu/drm/ci/testlist.txt | 321 ++ .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt | 25 +- .../drm/ci/xfails/amdgpu-stoney-flakes.txt | 10 +- .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt | 23 +- drivers/gpu/drm/ci/xfails/i915-amly-fails.txt | 1 + drivers/gpu/drm/ci/xfails/i915-amly-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-apl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-cml-fails.txt | 1 + drivers/gpu/drm/ci/xfails/i915-cml-skips.txt | 7 + drivers/gpu/drm/ci/xfails/i915-glk-fails.txt | 2 +- drivers/gpu/drm/ci/xfails/i915-glk-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt | 2 + drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-whl-fails.txt | 1 + drivers/gpu/drm/ci/xfails/i915-whl-skips.txt | 9 +- .../drm/ci/xfails/mediatek-mt8173-fails.txt | 3 - .../drm/ci/xfails/mediatek-mt8173-skips.txt | 6 + .../drm/ci/xfails/mediatek-mt8183-fails.txt | 1 + .../drm/ci/xfails/mediatek-mt8183-skips.txt | 5 + .../gpu/drm/ci/xfails/meson-g12b-fails.txt | 1 + .../gpu/drm/ci/xfails/meson-g12b-skips.txt | 5 + .../gpu/drm/ci/xfails/msm-apq8016-skips.txt | 5 + .../gpu/drm/ci/xfails/msm-apq8096-skips.txt | 8 +- .../msm-sc7180-trogdor-kingoftown-skips.txt | 6 + ...sm-sc7180-trogdor-lazor-limozeen-skips.txt | 6 + .../gpu/drm/ci/xfails/msm-sdm845-skips.txt | 6 + .../drm/ci/xfails/rockchip-rk3288-fails.txt | 1 + .../drm/ci/xfails/rockchip-rk3288-skips.txt | 8 +- .../drm/ci/xfails/rockchip-rk3399-fails.txt | 1 + .../drm/ci/xfails/rockchip-rk3399-skips.txt | 6 + .../drm/ci/xfails/virtio_gpu-none-fails.txt | 15 + .../drm/ci/xfails/virtio_gpu-none-skips.txt | 9 +- 35 files changed, 532 insertions(+), 17 deletions(-) create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt diff --git a/MAINTAINERS b/MAINTAINERS index 3bc7e122a094..f7d0040a6c21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1665,6 +1665,7 @@ L: dri-devel@lists.freedesktop.org S: Supported T: git git://anongit.freedesktop.org/drm/drm-misc F: Documentation/gpu/panfrost.rst +F: drivers/gpu/drm/ci/testlist.txt F: drivers/gpu/drm/panfrost/ F: include/uapi/drm/panfrost_drm.h @@ -6753,6 +6754,7 @@ S: Maintained B: https://gitlab.freedesktop.org/drm/msm/-/issues T: git https://gitlab.freedesktop.org/drm/msm.git F: Documentation/devicetree/bindings/display/msm/ +F: drivers/gpu/drm/ci/testlist.txt F: drivers
Re: [PATCH v6 3/8] drm/ci: uprev IGT and update testlist
On 4/9/24 05:13, Vignesh Raman wrote: Uprev IGT and add amd, v3d, vc4 and vgem specific tests to testlist and skip driver-specific tests in *-skips.txt. Also add testlist to the MAINTAINERS file and update xfails. A better approach would be to stop vendoring the testlist into the kernel and instead use testlist from the IGT build to ensure we do not miss renamed or newly added tests. This implementation is planned for the future. How problamatic would be to just do this in this test series, instead of adding a huge testlist that we need to maintain synced with IGT? Best Regards, - Maíra Acked-by: Helen Koike Signed-off-by: Vignesh Raman --- v3: - New patch in series to uprev IGT and update testlist. v4: - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes. v5: - Keep single testlist and update xfails. Skip driver specific tests. v6: - Update xfails. --- MAINTAINERS | 8 + drivers/gpu/drm/ci/gitlab-ci.yml | 2 +- drivers/gpu/drm/ci/testlist.txt | 321 ++ .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt | 25 +- .../drm/ci/xfails/amdgpu-stoney-flakes.txt| 10 +- .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt | 23 +- drivers/gpu/drm/ci/xfails/i915-amly-fails.txt | 1 + drivers/gpu/drm/ci/xfails/i915-amly-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-apl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-cml-fails.txt | 1 + drivers/gpu/drm/ci/xfails/i915-cml-skips.txt | 7 + drivers/gpu/drm/ci/xfails/i915-glk-fails.txt | 2 +- drivers/gpu/drm/ci/xfails/i915-glk-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt | 2 + drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-whl-fails.txt | 1 + drivers/gpu/drm/ci/xfails/i915-whl-skips.txt | 9 +- .../drm/ci/xfails/mediatek-mt8173-fails.txt | 3 - .../drm/ci/xfails/mediatek-mt8173-skips.txt | 6 + .../drm/ci/xfails/mediatek-mt8183-fails.txt | 1 + .../drm/ci/xfails/mediatek-mt8183-skips.txt | 5 + .../gpu/drm/ci/xfails/meson-g12b-fails.txt| 1 + .../gpu/drm/ci/xfails/meson-g12b-skips.txt| 5 + .../gpu/drm/ci/xfails/msm-apq8016-skips.txt | 5 + .../gpu/drm/ci/xfails/msm-apq8096-skips.txt | 8 +- .../msm-sc7180-trogdor-kingoftown-skips.txt | 6 + ...sm-sc7180-trogdor-lazor-limozeen-skips.txt | 6 + .../gpu/drm/ci/xfails/msm-sdm845-skips.txt| 6 + .../drm/ci/xfails/rockchip-rk3288-fails.txt | 1 + .../drm/ci/xfails/rockchip-rk3288-skips.txt | 8 +- .../drm/ci/xfails/rockchip-rk3399-fails.txt | 1 + .../drm/ci/xfails/rockchip-rk3399-skips.txt | 6 + .../drm/ci/xfails/virtio_gpu-none-fails.txt | 15 + .../drm/ci/xfails/virtio_gpu-none-skips.txt | 9 +- 35 files changed, 532 insertions(+), 17 deletions(-) create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt diff --git a/MAINTAINERS b/MAINTAINERS index 3bc7e122a094..f7d0040a6c21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1665,6 +1665,7 @@ L:dri-devel@lists.freedesktop.org S:Supported T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/gpu/panfrost.rst +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/panfrost/ F:include/uapi/drm/panfrost_drm.h @@ -6753,6 +6754,7 @@ S: Maintained B:https://gitlab.freedesktop.org/drm/msm/-/issues T:git https://gitlab.freedesktop.org/drm/msm.git F:Documentation/devicetree/bindings/display/msm/ +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/msm* F:drivers/gpu/drm/msm/ F:include/uapi/drm/msm_drm.h @@ -7047,6 +7049,7 @@ T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml F:Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml F:Documentation/gpu/meson.rst +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/meson* F:drivers/gpu/drm/meson/ @@ -7160,6 +7163,7 @@ L: dri-devel@lists.freedesktop.org L:linux-media...@lists.infradead.org (moderated for non-subscribers) S:Supported F:Documentation/devicetree/bindings/display/mediatek/ +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/mediatek* F:drivers/gpu/drm/mediatek/ F:drivers/phy/mediatek/phy-mtk-dp.c @@ -7211,6 +7215,7 @@ L:dri-devel@lists.freedesktop.org S:Maintained T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/devicetree/bindings/display/rockchip/ +F: drivers/gpu/drm/ci/testlist.txt F:
Re: [PATCH 5/5] drm/vkms: Use drm_crtc_vblank_crtc()
On 4/8/24 16:06, Ville Syrjala wrote: From: Ville Syrjälä Replace the open coded drm_crtc_vblank_crtc() with the real thing. Cc: Rodrigo Siqueira Cc: Melissa Wen Cc: "Maíra Canal" Cc: Haneen Mohammed Cc: Daniel Vetter Signed-off-by: Ville Syrjälä Reviewed-by: Maíra Canal Best Regards, - Maíra --- drivers/gpu/drm/vkms/vkms_crtc.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index 61e500b8c9da..40b4d084e3ce 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -61,9 +61,7 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) static int vkms_enable_vblank(struct drm_crtc *crtc) { - struct drm_device *dev = crtc->dev; - unsigned int pipe = drm_crtc_index(crtc); - struct drm_vblank_crtc *vblank = >vblank[pipe]; + struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc); struct vkms_output *out = drm_crtc_to_vkms_output(crtc); drm_calc_timestamping_constants(crtc, >mode); @@ -88,10 +86,9 @@ static bool vkms_get_vblank_timestamp(struct drm_crtc *crtc, bool in_vblank_irq) { struct drm_device *dev = crtc->dev; - unsigned int pipe = crtc->index; struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev); struct vkms_output *output = >output; - struct drm_vblank_crtc *vblank = >vblank[pipe]; + struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc); if (!READ_ONCE(vblank->enabled)) { *vblank_time = ktime_get();
Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob
On 4/4/24 18:30, Adrián Larumbe wrote: On 04.04.2024 11:31, Maíra Canal wrote: On 4/4/24 11:00, Adrián Larumbe wrote: This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs"). The point is making broader GPU occupancy numbers available through the sysfs interface, so that for every job slot, its number of processed jobs and total processing time are displayed. Shouldn't we make this sysfs interface a generic DRM interface? Something that would be standard for all drivers and that we could integrate into gputop in the future. I think the best way to generalise this sysfs knob would be to create a DRM class attribute somewhere in drivers/gpu/drm/drm_sysfs.c and then adding a new function to 'struct drm_driver' that would return a structure with the relevant information (execution units and their names, number of processed jobs, etc). These is exactly what I was thinking about. What that information would exactly be is up to debate, I guess, since different drivers might be interested in showing different bits of information. I believe we can start with the requirements of V3D and Panfrost and them, expand from it. Laying that down is important because the sysfs file would become part of the device class API. My PoV: it is important, but not completly tragic if we don't get it perfect. Just like fdinfo, which is evolving. I might come up with a new RFC patch series that does precisely that, at least for v3d and Panfrost, and maybe other people could pitch in with the sort of things they'd like to see for other drivers? Yeah, this would be a great idea. Please, CC me on this series. Best Regards, - Maíra Cheers, Adrian Best Regards, - Maíra Cc: Boris Brezillon Cc: Christopher Healy Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_device.h | 5 +++ drivers/gpu/drm/panfrost/panfrost_drv.c| 49 -- drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++- drivers/gpu/drm/panfrost/panfrost_job.h| 3 ++ 4 files changed, 68 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index cffcb0ac7c11..1d343351c634 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -169,6 +169,11 @@ struct panfrost_engine_usage { unsigned long long cycles[NUM_JOB_SLOTS]; }; +struct panfrost_slot_usage { + u64 enabled_ns; + u64 jobs_sent; +}; + struct panfrost_file_priv { struct panfrost_device *pfdev; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ef9f6c0716d5..6afcde66270f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -524,6 +525,10 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = { PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW), }; +static const char * const engine_names[] = { + "fragment", "vertex-tiler", "compute-only" +}; + static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, struct panfrost_file_priv *panfrost_priv, struct drm_printer *p) @@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, * job spent on the GPU. */ - static const char * const engine_names[] = { - "fragment", "vertex-tiler", "compute-only" - }; - BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { @@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev, static DEVICE_ATTR_RW(profiling); +static ssize_t +gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct panfrost_device *pfdev = dev_get_drvdata(dev); + struct panfrost_slot_usage stats; + u64 timestamp = local_clock(); + ssize_t len = 0; + unsigned int i; + + BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); + + len += sysfs_emit(buf, "queuetimestampjobs runtime\n"); + len += sysfs_emit_at(buf, len, "-\n"); + + for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { + + stats = get_slot_stats(pfdev, i); + + /* +* Each line will display the slot name, timestamp, the number +* of jobs handled by that engine and runtime, as shown below: +* +* queuetimestampjobsruntime +* -
[PATCH v2 6/6] drm/v3d: Enable big and super pages
The V3D MMU also supports 64KB and 1MB pages, called big and super pages, respectively. In order to set a 64KB page or 1MB page in the MMU, we need to make sure that page table entries for all 4KB pages within a big/super page must be correctly configured. In order to create a big/super page, we need a contiguous memory region. That's why we use a separate mountpoint with THP enabled. In order to place the page table entries in the MMU, we iterate over the 16 4KB pages (for big pages) or 256 4KB pages (for super pages) and insert the PTE. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_bo.c| 21 +-- drivers/gpu/drm/v3d/v3d_drv.c | 8 ++ drivers/gpu/drm/v3d/v3d_drv.h | 2 ++ drivers/gpu/drm/v3d/v3d_gemfs.c | 6 + drivers/gpu/drm/v3d/v3d_mmu.c | 46 ++--- 5 files changed, 71 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index 79e31c5299b1..cfe82232886a 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) struct v3d_dev *v3d = to_v3d_dev(obj->dev); struct v3d_bo *bo = to_v3d_bo(obj); struct sg_table *sgt; + u64 align; int ret; /* So far we pin the BO in the MMU for its lifetime, so use @@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj) if (IS_ERR(sgt)) return PTR_ERR(sgt); + if (!v3d->super_pages) + align = SZ_4K; + else if (obj->size >= SZ_1M) + align = SZ_1M; + else if (obj->size >= SZ_64K) + align = SZ_64K; + else + align = SZ_4K; + spin_lock(>mm_lock); /* Allocate the object's space in the GPU's page tables. * Inserting PTEs will happen later, but the offset is for the @@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0); +align >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv, size_t unaligned_size) { struct drm_gem_shmem_object *shmem_obj; + struct v3d_dev *v3d = to_v3d_dev(dev); struct v3d_bo *bo; int ret; - shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + /* Let the user opt out of allocating the BOs with THP */ + if (v3d->super_pages) + shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size, + v3d->gemfs); + else + shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + if (IS_ERR(shmem_obj)) return ERR_CAST(shmem_obj); bo = to_v3d_bo(_obj->base); diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3debf37e7d9b..3dbd29560be4 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -36,6 +36,12 @@ #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 +static bool super_pages = true; +module_param_named(super_pages, super_pages, bool, 0400); +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \ + To enable Super Pages, you need support to \ + enable THP."); + static int v3d_get_param_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { @@ -308,6 +314,8 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) return -ENOMEM; } + v3d->super_pages = super_pages; + ret = v3d_gem_init(drm); if (ret) goto dma_free; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 17236ee23490..0a7aacf51164 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -18,6 +18,7 @@ struct platform_device; struct reset_control; #define V3D_MMU_PAGE_SHIFT 12 +#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) #define V3D_MAX_QUEUES (V3D_CPU + 1) @@ -121,6 +122,7 @@ struct v3d_dev { * tmpfs instance used for shmem backed objects */ struct vfsmount *gemfs; + bool super_pages; struct work_struct overflow_mem_work; diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c index 31cf5bd11e39..7ee55b32c36e 100644 --- a/drivers/gpu/drm/v3d/v3d_gemfs.c +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c @@ -12,6 +12,10 @@ void v3d_gemfs_
[PATCH v2 5/6] drm/v3d: Reduce the alignment of the node allocation
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 100 BOs of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] https://github.com/zmike/vkoverhead/issues/14 Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_bo.c | 2 +- drivers/gpu/drm/v3d/v3d_drv.h | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index a07ede668cc1..79e31c5299b1 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0); +SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index d2ce8222771a..17236ee23490 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -17,8 +17,6 @@ struct clk; struct platform_device; struct reset_control; -#define GMP_GRANULARITY (128 * 1024) - #define V3D_MMU_PAGE_SHIFT 12 #define V3D_MAX_QUEUES (V3D_CPU + 1) -- 2.44.0
[PATCH v2 4/6] drm/gem: Create shmem GEM object in a given mountpoint
Create a function `drm_gem_shmem_create_with_mnt()`, similar to `drm_gem_shmem_create()`, that has a mountpoint as a argument. This function will create a shmem GEM object in a given tmpfs mountpoint. This function will be useful for drivers that have a special mountpoint with flags enabled. Signed-off-by: Maíra Canal --- drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++ include/drm/drm_gem_shmem_helper.h | 3 +++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 13bcdbfd..10b7c4c769a3 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs = { }; static struct drm_gem_shmem_object * -__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) +__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private, + struct vfsmount *gemfs) { struct drm_gem_shmem_object *shmem; struct drm_gem_object *obj; @@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) drm_gem_private_object_init(dev, obj, size); shmem->map_wc = false; /* dma-buf mappings use always writecombine */ } else { - ret = drm_gem_object_init(dev, obj, size); + ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs); } if (ret) { drm_gem_private_object_fini(obj); @@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private) */ struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size) { - return __drm_gem_shmem_create(dev, size, false); + return __drm_gem_shmem_create(dev, size, false, NULL); } EXPORT_SYMBOL_GPL(drm_gem_shmem_create); +/** + * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a + * given mountpoint + * @dev: DRM device + * @size: Size of the object to allocate + * @gemfs: tmpfs mount where the GEM object will be created + * + * This function creates a shmem GEM object in a given tmpfs mountpoint. + * + * Returns: + * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative + * error code on failure. + */ +struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev, + size_t size, + struct vfsmount *gemfs) +{ + return __drm_gem_shmem_create(dev, size, false, gemfs); +} +EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt); + /** * drm_gem_shmem_free - Free resources associated with a shmem GEM object * @shmem: shmem GEM object to free @@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev, size_t size = PAGE_ALIGN(attach->dmabuf->size); struct drm_gem_shmem_object *shmem; - shmem = __drm_gem_shmem_create(dev, size, true); + shmem = __drm_gem_shmem_create(dev, size, true, NULL); if (IS_ERR(shmem)) return ERR_CAST(shmem); diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h index efbc9f27312b..d22e3fb53631 100644 --- a/include/drm/drm_gem_shmem_helper.h +++ b/include/drm/drm_gem_shmem_helper.h @@ -97,6 +97,9 @@ struct drm_gem_shmem_object { container_of(obj, struct drm_gem_shmem_object, base) struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size); +struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev, + size_t size, + struct vfsmount *gemfs); void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem); void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem); -- 2.44.0
[PATCH v2 3/6] drm/v3d: Introduce gemfs
Create a separate "tmpfs" kernel mount for V3D. This will allow us to move away from the shmemfs `shm_mnt` and gives the flexibility to do things like set our own mount options. Here, the interest is to use "huge=", which should allow us to enable the use of THP for our shmem-backed objects. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/Makefile| 3 ++- drivers/gpu/drm/v3d/v3d_drv.h | 9 +++ drivers/gpu/drm/v3d/v3d_gem.c | 3 +++ drivers/gpu/drm/v3d/v3d_gemfs.c | 46 + 4 files changed, 60 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile index b7d673f1153b..fcf710926057 100644 --- a/drivers/gpu/drm/v3d/Makefile +++ b/drivers/gpu/drm/v3d/Makefile @@ -13,7 +13,8 @@ v3d-y := \ v3d_trace_points.o \ v3d_sched.o \ v3d_sysfs.o \ - v3d_submit.o + v3d_submit.o \ + v3d_gemfs.o v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 1950c723dde1..d2ce8222771a 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -119,6 +119,11 @@ struct v3d_dev { struct drm_mm mm; spinlock_t mm_lock; + /* +* tmpfs instance used for shmem backed objects +*/ + struct vfsmount *gemfs; + struct work_struct overflow_mem_work; struct v3d_bin_job *bin_job; @@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d); void v3d_invalidate_caches(struct v3d_dev *v3d); void v3d_clean_caches(struct v3d_dev *v3d); +/* v3d_gemfs.c */ +void v3d_gemfs_init(struct v3d_dev *v3d); +void v3d_gemfs_fini(struct v3d_dev *v3d); + /* v3d_submit.c */ void v3d_job_cleanup(struct v3d_job *job); void v3d_job_put(struct v3d_job *job); diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index 66f4b78a6b2e..faefbe497e8d 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev) v3d_init_hw_state(v3d); v3d_mmu_set_page_table(v3d); + v3d_gemfs_init(v3d); + ret = v3d_sched_init(v3d); if (ret) { drm_mm_takedown(>mm); @@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev) struct v3d_dev *v3d = to_v3d_dev(dev); v3d_sched_fini(v3d); + v3d_gemfs_fini(v3d); /* Waiting for jobs to finish would need to be done before * unregistering V3D. diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c new file mode 100644 index ..31cf5bd11e39 --- /dev/null +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* Copyright (C) 2024 Raspberry Pi */ + +#include +#include + +#include "v3d_drv.h" + +void v3d_gemfs_init(struct v3d_dev *v3d) +{ + char huge_opt[] = "huge=within_size"; + struct file_system_type *type; + struct vfsmount *gemfs; + + /* +* By creating our own shmemfs mountpoint, we can pass in +* mount flags that better match our usecase. However, we +* only do so on platforms which benefit from it. +*/ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + goto err; + + type = get_fs_type("tmpfs"); + if (!type) + goto err; + + gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt); + if (IS_ERR(gemfs)) + goto err; + + v3d->gemfs = gemfs; + drm_info(>drm, "Using Transparent Hugepages\n"); + + return; + +err: + v3d->gemfs = NULL; + drm_notice(>drm, + "Transparent Hugepage support is recommended for optimal performance on this platform!\n"); +} + +void v3d_gemfs_fini(struct v3d_dev *v3d) +{ + if (v3d->gemfs) + kern_unmount(v3d->gemfs); +} -- 2.44.0
[PATCH v2 2/6] drm/gem: Create a drm_gem_object_init_with_mnt() function
For some applications, such as applications that uses huge pages, we might want to have a different mountpoint, for which we pass mount flags that better match our usecase. Therefore, create a new function `drm_gem_object_init_with_mnt()` that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to `shmem_file_setup()`. Signed-off-by: Maíra Canal --- drivers/gpu/drm/drm_gem.c | 34 ++ include/drm/drm_gem.h | 3 +++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index d4bbc5d109c8..74ebe68e3d61 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev) } /** - * drm_gem_object_init - initialize an allocated shmem-backed GEM object + * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM + * object in a given shmfs mountpoint + * * @dev: drm_device the object should be initialized for * @obj: drm_gem_object to initialize * @size: object size + * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use + * the usual tmpfs mountpoint (`shm_mnt`). * * Initialize an already allocated GEM object of the specified size with * shmfs backing store. */ -int drm_gem_object_init(struct drm_device *dev, - struct drm_gem_object *obj, size_t size) +int drm_gem_object_init_with_mnt(struct drm_device *dev, +struct drm_gem_object *obj, size_t size, +struct vfsmount *gemfs) { struct file *filp; drm_gem_private_object_init(dev, obj, size); - filp = shmem_file_setup("drm mm object", size, VM_NORESERVE); + if (gemfs) + filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size, +VM_NORESERVE); + else + filp = shmem_file_setup("drm mm object", size, VM_NORESERVE); + if (IS_ERR(filp)) return PTR_ERR(filp); @@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev, return 0; } +EXPORT_SYMBOL(drm_gem_object_init_with_mnt); + +/** + * drm_gem_object_init - initialize an allocated shmem-backed GEM object + * @dev: drm_device the object should be initialized for + * @obj: drm_gem_object to initialize + * @size: object size + * + * Initialize an already allocated GEM object of the specified size with + * shmfs backing store. + */ +int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj, + size_t size) +{ + return drm_gem_object_init_with_mnt(dev, obj, size, NULL); +} EXPORT_SYMBOL(drm_gem_object_init); /** diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bae4865b2101..2ebf6e10cc44 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj); void drm_gem_object_free(struct kref *kref); int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj, size_t size); +int drm_gem_object_init_with_mnt(struct drm_device *dev, +struct drm_gem_object *obj, size_t size, +struct vfsmount *gemfs); void drm_gem_private_object_init(struct drm_device *dev, struct drm_gem_object *obj, size_t size); void drm_gem_private_object_fini(struct drm_gem_object *obj); -- 2.44.0
[PATCH v2 0/6] drm/v3d: Enable Big and Super Pages
performance. This indicates an enhancement in the baseline scenario, rather than any detriment caused by v2. Additionally, I've included stats from v1 in the comparisons. Upon scrutinizing the average FPS of v2 in contrast to v1, it becomes evident that v2 not only maintains the improvements but may even surpass them. This version provides a much safer way to iterate through memory and doesn't hold to the same limitations as v1. For example, v1 had a hard-coded hack that only allowed a huge page to be created if the BO was bigger than 2MB. These limitations are gone now. This series also introduces changes in the GEM helpers, in order to enable V3D to have a separate mount point for shmfs GEM objects. Any feedback from the community about the changes in the GEM helpers is welcomed! v1 -> v2: https://lore.kernel.org/dri-devel/20240311100959.205545-1-mca...@igalia.com/ * [1/6] Add Iago's R-b to PATCH 1/5 (Iago Toral) * [2/6] Create a new function `drm_gem_object_init_with_mnt()` to define the shmfs mountpoint. Now, we don't touch a bunch of drivers, as `drm_gem_object_init()` preserves its signature (Tvrtko Ursulin) * [3/6] Use `huge=within_size` instead of `huge=always`, in order to avoid OOM. This also allow us to move away from the 2MB hack. (Tvrtko Ursulin) * [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral) * [5/6] Create a separate patch to reduce the alignment of the node allocation (Iago Toral) * [6/6] Complete refactoring to the way that we iterate through the memory (Tvrtko Ursulin) * [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us misleading data (Tvrtko Ursulin) * [6/6] Use both Big Pages (64K) and Super Pages (1MB) Best Regards, - Maíra Maíra Canal (6): drm/v3d: Fix return if scheduler initialization fails drm/gem: Create a drm_gem_object_init_with_mnt() function drm/v3d: Introduce gemfs drm/gem: Create shmem GEM object in a given mountpoint drm/v3d: Reduce the alignment of the node allocation drm/v3d: Enable big and super pages drivers/gpu/drm/drm_gem.c | 34 +++-- drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +-- drivers/gpu/drm/v3d/Makefile | 3 +- drivers/gpu/drm/v3d/v3d_bo.c | 21 ++- drivers/gpu/drm/v3d/v3d_drv.c | 8 drivers/gpu/drm/v3d/v3d_drv.h | 13 ++- drivers/gpu/drm/v3d/v3d_gem.c | 6 ++- drivers/gpu/drm/v3d/v3d_gemfs.c| 52 ++ drivers/gpu/drm/v3d/v3d_mmu.c | 46 ++- include/drm/drm_gem.h | 3 ++ include/drm/drm_gem_shmem_helper.h | 3 ++ 11 files changed, 195 insertions(+), 24 deletions(-) create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c -- 2.44.0
[PATCH v2 1/6] drm/v3d: Fix return if scheduler initialization fails
If the scheduler initialization fails, GEM initialization must fail as well. Therefore, if `v3d_sched_init()` fails, free the DMA memory allocated and return the error value in `v3d_gem_init()`. Signed-off-by: Maíra Canal Reviewed-by: Iago Toral Quiroga --- drivers/gpu/drm/v3d/v3d_gem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index afc565078c78..66f4b78a6b2e 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev) ret = v3d_sched_init(v3d); if (ret) { drm_mm_takedown(>mm); - dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt, + dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt, v3d->pt_paddr); + return ret; } return 0; -- 2.44.0
Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob
On 4/4/24 11:00, Adrián Larumbe wrote: This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs"). The point is making broader GPU occupancy numbers available through the sysfs interface, so that for every job slot, its number of processed jobs and total processing time are displayed. Shouldn't we make this sysfs interface a generic DRM interface? Something that would be standard for all drivers and that we could integrate into gputop in the future. Best Regards, - Maíra Cc: Boris Brezillon Cc: Christopher Healy Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_device.h | 5 +++ drivers/gpu/drm/panfrost/panfrost_drv.c| 49 -- drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++- drivers/gpu/drm/panfrost/panfrost_job.h| 3 ++ 4 files changed, 68 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index cffcb0ac7c11..1d343351c634 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -169,6 +169,11 @@ struct panfrost_engine_usage { unsigned long long cycles[NUM_JOB_SLOTS]; }; +struct panfrost_slot_usage { + u64 enabled_ns; + u64 jobs_sent; +}; + struct panfrost_file_priv { struct panfrost_device *pfdev; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ef9f6c0716d5..6afcde66270f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -524,6 +525,10 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = { PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW), }; +static const char * const engine_names[] = { + "fragment", "vertex-tiler", "compute-only" +}; + static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, struct panfrost_file_priv *panfrost_priv, struct drm_printer *p) @@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, * job spent on the GPU. */ - static const char * const engine_names[] = { - "fragment", "vertex-tiler", "compute-only" - }; - BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { @@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev, static DEVICE_ATTR_RW(profiling); +static ssize_t +gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct panfrost_device *pfdev = dev_get_drvdata(dev); + struct panfrost_slot_usage stats; + u64 timestamp = local_clock(); + ssize_t len = 0; + unsigned int i; + + BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); + + len += sysfs_emit(buf, "queuetimestampjobs runtime\n"); + len += sysfs_emit_at(buf, len, "-\n"); + + for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { + + stats = get_slot_stats(pfdev, i); + + /* +* Each line will display the slot name, timestamp, the number +* of jobs handled by that engine and runtime, as shown below: +* +* queuetimestampjobsruntime +* - +* fragment 12252943467507 638 1184747640 +* vertex-tiler 12252943467507 636 121663838 +* +*/ + len += sysfs_emit_at(buf, len, "%-13s%-17llu%-12llu%llu\n", +engine_names[i], +timestamp, +stats.jobs_sent, +stats.enabled_ns); + } + + return len; +} +static DEVICE_ATTR_RO(gpu_stats); + static struct attribute *panfrost_attrs[] = { _attr_profiling.attr, + _attr_gpu_stats.attr, NULL, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index a61ef0af9a4e..4c779e6f4cb0 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -31,6 +31,8 @@ struct panfrost_queue_state { struct drm_gpu_scheduler sched; u64 fence_context; u64 emit_seqno; + + struct panfrost_slot_usage stats; }; struct panfrost_job_slot { @@ -160,15 +162,20 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int slot) WARN_ON(!job); if (job->is_profiled) { + u64 job_time =
[PATCH 5/5] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler
In V3D, the conclusion of a job is indicated by a IRQ. When a job finishes, then we update the local and the global GPU stats of that queue. But, while the GPU stats are being updated, a user might be reading the stats from sysfs or fdinfo. For example, on `gpu_stats_show()`, we could think about a scenario where `v3d->queue[queue].start_ns != 0`, then an interruption happens, we update the value of `v3d->queue[queue].start_ns` to 0, we come back to `gpu_stats_show()` to calculate `active_runtime` and now, `active_runtime = timestamp`. In this simple example, the user would see a spike in the queue usage, that didn't matches reality. In order to address this issue properly, use rw-locks to protect read and write sections of the code. Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats") Reported-by: Tvrtko Ursulin Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 16 drivers/gpu/drm/v3d/v3d_drv.h | 7 +++ drivers/gpu/drm/v3d/v3d_gem.c | 7 +-- drivers/gpu/drm/v3d/v3d_sched.c | 9 + drivers/gpu/drm/v3d/v3d_sysfs.c | 16 5 files changed, 41 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index cbb62be18aa5..60437718786c 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -119,7 +119,9 @@ v3d_open(struct drm_device *dev, struct drm_file *file) drm_sched_entity_init(_priv->sched_entity[i], DRM_SCHED_PRIORITY_NORMAL, , 1, NULL); + memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i])); + rwlock_init(_priv->stats[i].rw_lock); } v3d_perfmon_open_file(v3d_priv); @@ -149,20 +151,26 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { struct v3d_stats *stats = _priv->stats[queue]; + u64 active_time, jobs_sent; + unsigned long flags; + + read_lock_irqsave(>rw_lock, flags); + active_time = stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns + : stats->enabled_ns; + jobs_sent = stats->jobs_sent; + read_unlock_irqrestore(>rw_lock, flags); /* Note that, in case of a GPU reset, the time spent during an * attempt of executing the job is not computed in the runtime. */ drm_printf(p, "drm-engine-%s: \t%llu ns\n", - v3d_queue_to_string(queue), - stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns - : stats->enabled_ns); + v3d_queue_to_string(queue), active_time); /* Note that we only count jobs that completed. Therefore, jobs * that were resubmitted due to a GPU reset are not computed. */ drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", - v3d_queue_to_string(queue), stats->jobs_sent); + v3d_queue_to_string(queue), jobs_sent); } } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 0117593976ed..8fde2623f763 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -40,6 +40,13 @@ struct v3d_stats { u64 start_ns; u64 enabled_ns; u64 jobs_sent; + + /* +* This lock is used to protect the access to the GPU stats variables. +* It must be used as, while we are reading the stats, IRQs can happen +* and the stats would be updated. +*/ + rwlock_t rw_lock; }; struct v3d_queue_state { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index d14589d3ae6c..439088724a51 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -247,8 +247,11 @@ v3d_gem_init(struct drm_device *dev) int ret, i; for (i = 0; i < V3D_MAX_QUEUES; i++) { - v3d->queue[i].fence_context = dma_fence_context_alloc(1); - memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats)); + struct v3d_queue_state *queue = >queue[i]; + + queue->fence_context = dma_fence_context_alloc(1); + memset(>stats, 0, sizeof(queue->stats)); + rwlock_init(>stats.rw_lock); } spin_lock_init(>mm_lock); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 754107b80f67..640de6768b15 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -
[PATCH 4/5] drm/v3d: Create function to update a set of GPU stats
Given a set of GPU stats, that is, a `struct v3d_stats` related to a queue in a given context, create a function that can update all this set of GPU stats. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_sched.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index ea5f5a84b55b..754107b80f67 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -118,6 +118,16 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) global_stats->start_ns = now; } +static void +v3d_stats_update(struct v3d_stats *stats) +{ + u64 now = local_clock(); + + stats->enabled_ns += now - stats->start_ns; + stats->jobs_sent++; + stats->start_ns = 0; +} + void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) { @@ -125,15 +135,9 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_file_priv *file = job->file->driver_priv; struct v3d_stats *global_stats = >queue[queue].stats; struct v3d_stats *local_stats = >stats[queue]; - u64 now = local_clock(); - - local_stats->enabled_ns += now - local_stats->start_ns; - local_stats->jobs_sent++; - local_stats->start_ns = 0; - global_stats->enabled_ns += now - global_stats->start_ns; - global_stats->jobs_sent++; - global_stats->start_ns = 0; + v3d_stats_update(local_stats); + v3d_stats_update(global_stats); } static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job) -- 2.44.0
[PATCH 3/5] drm/v3d: Create a struct to store the GPU stats
This will make it easier to instantiate the GPU stats variables and it will create a structure where we can store all the variables that refer to GPU stats. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 14 ++ drivers/gpu/drm/v3d/v3d_drv.h | 18 ++ drivers/gpu/drm/v3d/v3d_gem.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 20 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++ 5 files changed, 35 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3debf37e7d9b..cbb62be18aa5 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -115,14 +115,11 @@ v3d_open(struct drm_device *dev, struct drm_file *file) v3d_priv->v3d = v3d; for (i = 0; i < V3D_MAX_QUEUES; i++) { - v3d_priv->enabled_ns[i] = 0; - v3d_priv->start_ns[i] = 0; - v3d_priv->jobs_sent[i] = 0; - sched = >queue[i].sched; drm_sched_entity_init(_priv->sched_entity[i], DRM_SCHED_PRIORITY_NORMAL, , 1, NULL); + memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i])); } v3d_perfmon_open_file(v3d_priv); @@ -151,20 +148,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) enum v3d_queue queue; for (queue = 0; queue < V3D_MAX_QUEUES; queue++) { + struct v3d_stats *stats = _priv->stats[queue]; + /* Note that, in case of a GPU reset, the time spent during an * attempt of executing the job is not computed in the runtime. */ drm_printf(p, "drm-engine-%s: \t%llu ns\n", v3d_queue_to_string(queue), - file_priv->start_ns[queue] ? file_priv->enabled_ns[queue] - + timestamp - file_priv->start_ns[queue] - : file_priv->enabled_ns[queue]); + stats->start_ns ? stats->enabled_ns + timestamp - stats->start_ns + : stats->enabled_ns); /* Note that we only count jobs that completed. Therefore, jobs * that were resubmitted due to a GPU reset are not computed. */ drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", - v3d_queue_to_string(queue), file_priv->jobs_sent[queue]); + v3d_queue_to_string(queue), stats->jobs_sent); } } diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index ee3545226d7f..0117593976ed 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue queue) return "UNKNOWN"; } +struct v3d_stats { + u64 start_ns; + u64 enabled_ns; + u64 jobs_sent; +}; + struct v3d_queue_state { struct drm_gpu_scheduler sched; u64 fence_context; u64 emit_seqno; - u64 start_ns; - u64 enabled_ns; - u64 jobs_sent; + /* Stores the GPU stats for this queue in the global context. */ + struct v3d_stats stats; }; /* Performance monitor object. The perform lifetime is controlled by userspace @@ -188,11 +193,8 @@ struct v3d_file_priv { struct drm_sched_entity sched_entity[V3D_MAX_QUEUES]; - u64 start_ns[V3D_MAX_QUEUES]; - - u64 enabled_ns[V3D_MAX_QUEUES]; - - u64 jobs_sent[V3D_MAX_QUEUES]; + /* Stores the GPU stats for a specific queue for this fd. */ + struct v3d_stats stats[V3D_MAX_QUEUES]; }; struct v3d_bo { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index afc565078c78..d14589d3ae6c 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -248,9 +248,7 @@ v3d_gem_init(struct drm_device *dev) for (i = 0; i < V3D_MAX_QUEUES; i++) { v3d->queue[i].fence_context = dma_fence_context_alloc(1); - v3d->queue[i].start_ns = 0; - v3d->queue[i].enabled_ns = 0; - v3d->queue[i].jobs_sent = 0; + memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats)); } spin_lock_init(>mm_lock); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 8ca61bcd4b1c..ea5f5a84b55b 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -110,10 +110,12 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) { struct v3d_dev *v3d = job->v3d; struct v3d_file_priv *file = job->file->driver_priv; +
[PATCH 0/5] drm/v3d: Fix GPU stats inconsistancies and race-condition
This series addresses two major issues on the GPU stats: 1. Currently, we are incrementing `enabled_ns` twice by the end of each job. 2. There is a race-condition between the IRQ handler and the users Apart from addressing this issues, this series improved the GPU stats code as a hole. We reduced code repetition as a hole, creating functions to start and update the GPU stats. This will likely reduce the odds of issue #1 happen again. Note that I incrementally improved the code, creating small atomics commits to ease the reviewing process. Also, I separated the first patch, that has the fix to issue #1, in order to keep the fix separated from code improvements. The issue #1 is addressed on the first patch, while issue #2 is addressed in the last patch. Patches #2 to #4 are code improvements. Best Regards, - Maíra Maíra Canal (5): drm/v3d: Don't increment `enabled_ns` twice drm/v3d: Create two functions to update all GPU stats variables drm/v3d: Create a struct to store the GPU stats drm/v3d: Create function to update a set of GPU stats drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler drivers/gpu/drm/v3d/v3d_drv.c | 24 +--- drivers/gpu/drm/v3d/v3d_drv.h | 26 ++--- drivers/gpu/drm/v3d/v3d_gem.c | 9 +-- drivers/gpu/drm/v3d/v3d_irq.c | 52 ++ drivers/gpu/drm/v3d/v3d_sched.c | 97 ++--- drivers/gpu/drm/v3d/v3d_sysfs.c | 18 +++--- 6 files changed, 104 insertions(+), 122 deletions(-) -- 2.44.0
[PATCH 1/5] drm/v3d: Don't increment `enabled_ns` twice
The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs") introduced the calculation of global GPU stats. For the regards, it used the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats"). While adding global GPU stats calculation ability, the author forgot to delete the existing one. Currently, the value of `enabled_ns` is incremented twice by the end of the job, when it should be added just once. Therefore, delete the leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs"). Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs") Reported-by: Tvrtko Ursulin Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_irq.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c index 2e04f6cb661e..ce6b2fb341d1 100644 --- a/drivers/gpu/drm/v3d/v3d_irq.c +++ b/drivers/gpu/drm/v3d/v3d_irq.c @@ -105,7 +105,6 @@ v3d_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->bin_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_BIN]; - file->enabled_ns[V3D_BIN] += local_clock() - file->start_ns[V3D_BIN]; file->jobs_sent[V3D_BIN]++; v3d->queue[V3D_BIN].jobs_sent++; @@ -126,7 +125,6 @@ v3d_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->render_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_RENDER]; - file->enabled_ns[V3D_RENDER] += local_clock() - file->start_ns[V3D_RENDER]; file->jobs_sent[V3D_RENDER]++; v3d->queue[V3D_RENDER].jobs_sent++; @@ -147,7 +145,6 @@ v3d_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->csd_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_CSD]; - file->enabled_ns[V3D_CSD] += local_clock() - file->start_ns[V3D_CSD]; file->jobs_sent[V3D_CSD]++; v3d->queue[V3D_CSD].jobs_sent++; @@ -195,7 +192,6 @@ v3d_hub_irq(int irq, void *arg) struct v3d_file_priv *file = v3d->tfu_job->base.file->driver_priv; u64 runtime = local_clock() - file->start_ns[V3D_TFU]; - file->enabled_ns[V3D_TFU] += local_clock() - file->start_ns[V3D_TFU]; file->jobs_sent[V3D_TFU]++; v3d->queue[V3D_TFU].jobs_sent++; -- 2.44.0
[PATCH 2/5] drm/v3d: Create two functions to update all GPU stats variables
Currently, we manually perform all operations to update the GPU stats variables. Apart from the code repetition, this is very prone to errors, as we can see on the previous commit. Therefore, create two functions to manage updating all GPU stats variables. Now, the jobs only need to call for `v3d_job_update_stats()` when the job is done and `v3d_job_start_stats()` when starting the job. Co-developed-by: Tvrtko Ursulin Signed-off-by: Tvrtko Ursulin Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.h | 1 + drivers/gpu/drm/v3d/v3d_irq.c | 48 ++-- drivers/gpu/drm/v3d/v3d_sched.c | 80 +++-- 3 files changed, 40 insertions(+), 89 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 1950c723dde1..ee3545226d7f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c index ce6b2fb341d1..d469bda52c1a 100644 --- a/drivers/gpu/drm/v3d/v3d_irq.c +++ b/drivers/gpu/drm/v3d/v3d_irq.c @@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_FLDONE) { struct v3d_fence *fence = to_v3d_fence(v3d->bin_job->base.irq_fence); - struct v3d_file_priv *file = v3d->bin_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_BIN]; - - file->jobs_sent[V3D_BIN]++; - v3d->queue[V3D_BIN].jobs_sent++; - - file->start_ns[V3D_BIN] = 0; - v3d->queue[V3D_BIN].start_ns = 0; - - file->enabled_ns[V3D_BIN] += runtime; - v3d->queue[V3D_BIN].enabled_ns += runtime; + v3d_job_update_stats(>bin_job->base, V3D_BIN); trace_v3d_bcl_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_FRDONE) { struct v3d_fence *fence = to_v3d_fence(v3d->render_job->base.irq_fence); - struct v3d_file_priv *file = v3d->render_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_RENDER]; - - file->jobs_sent[V3D_RENDER]++; - v3d->queue[V3D_RENDER].jobs_sent++; - - file->start_ns[V3D_RENDER] = 0; - v3d->queue[V3D_RENDER].start_ns = 0; - - file->enabled_ns[V3D_RENDER] += runtime; - v3d->queue[V3D_RENDER].enabled_ns += runtime; + v3d_job_update_stats(>render_job->base, V3D_RENDER); trace_v3d_rcl_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg) if (intsts & V3D_INT_CSDDONE(v3d->ver)) { struct v3d_fence *fence = to_v3d_fence(v3d->csd_job->base.irq_fence); - struct v3d_file_priv *file = v3d->csd_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_CSD]; - - file->jobs_sent[V3D_CSD]++; - v3d->queue[V3D_CSD].jobs_sent++; - - file->start_ns[V3D_CSD] = 0; - v3d->queue[V3D_CSD].start_ns = 0; - - file->enabled_ns[V3D_CSD] += runtime; - v3d->queue[V3D_CSD].enabled_ns += runtime; + v3d_job_update_stats(>csd_job->base, V3D_CSD); trace_v3d_csd_irq(>drm, fence->seqno); dma_fence_signal(>base); status = IRQ_HANDLED; @@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg) if (intsts & V3D_HUB_INT_TFUC) { struct v3d_fence *fence = to_v3d_fence(v3d->tfu_job->base.irq_fence); - struct v3d_file_priv *file = v3d->tfu_job->base.file->driver_priv; - u64 runtime = local_clock() - file->start_ns[V3D_TFU]; - - file->jobs_sent[V3D_TFU]++; - v3d->queue[V3D_TFU].jobs_sent++; - - file->start_ns[V3D_TFU] = 0; - v3d->queue[V3D_TFU].start_ns = 0; - - file->enabled_ns[V3D_TFU] += runtime; - v3d->queue[V3D_TFU].enabled_ns += runtime; + v3d_job_update_stats(>tfu_job->base, V3D_TFU); trace_v3d_tfu_irq(>drm, fence->seq
Re: [PATCH v5 03/10] drm/ci: uprev IGT and update testlist
On 4/2/24 06:41, Dmitry Baryshkov wrote: On Tue, Apr 02, 2024 at 12:35:17PM +0530, Vignesh Raman wrote: Hi Maíra, On 01/04/24 22:33, Maíra Canal wrote: On 4/1/24 03:12, Vignesh Raman wrote: Uprev IGT and add amd, v3d, vc4 and vgem specific tests to testlist and skip driver-specific tests. Also add testlist to the MAINTAINERS file and update xfails. Signed-off-by: Vignesh Raman --- v3: - New patch in series to uprev IGT and update testlist. v4: - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes. v5: - Keep single testlist and update xfails. Skip driver specific tests. Looks a bit odd to me to have a single testlist with the specific tests in it. We will need to skip the specific tests on all *-skips.txt. Could you justify this choice in the commit message? The reason for choosing this option was a suggestion from Dmitry, https://www.spinics.net/lists/dri-devel/msg437901.html My suggestion was to stop vendoring the test list into the kernel and to always use a test list from IGT. Otherwise it is very easy to miss renamed or freshly added tests. This makes much more sense to me. Best Regards, - Maíra Also to keep it similar to IGT which has a single testlist. I will add this justification in the commit message. Regards, Vignesh Best Regards, - Maíra --- MAINTAINERS | 8 + drivers/gpu/drm/ci/gitlab-ci.yml | 2 +- drivers/gpu/drm/ci/testlist.txt | 321 ++ .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt | 25 +- .../drm/ci/xfails/amdgpu-stoney-flakes.txt | 10 +- .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt | 23 +- drivers/gpu/drm/ci/xfails/i915-amly-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-apl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-cml-skips.txt | 7 + drivers/gpu/drm/ci/xfails/i915-glk-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-whl-skips.txt | 9 +- .../drm/ci/xfails/mediatek-mt8173-skips.txt | 6 + .../drm/ci/xfails/mediatek-mt8183-skips.txt | 6 + .../gpu/drm/ci/xfails/meson-g12b-skips.txt | 6 + .../gpu/drm/ci/xfails/msm-apq8016-skips.txt | 5 + .../gpu/drm/ci/xfails/msm-apq8096-skips.txt | 8 +- .../msm-sc7180-trogdor-kingoftown-skips.txt | 6 + ...sm-sc7180-trogdor-lazor-limozeen-skips.txt | 6 + .../gpu/drm/ci/xfails/msm-sdm845-skips.txt | 6 + .../drm/ci/xfails/rockchip-rk3288-skips.txt | 9 +- .../drm/ci/xfails/rockchip-rk3399-skips.txt | 7 + .../drm/ci/xfails/virtio_gpu-none-skips.txt | 9 +- 24 files changed, 511 insertions(+), 13 deletions(-) create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt diff --git a/MAINTAINERS b/MAINTAINERS index 3bc7e122a094..f7d0040a6c21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1665,6 +1665,7 @@ L: dri-devel@lists.freedesktop.org S: Supported T: git git://anongit.freedesktop.org/drm/drm-misc F: Documentation/gpu/panfrost.rst +F: drivers/gpu/drm/ci/testlist.txt F: drivers/gpu/drm/panfrost/ F: include/uapi/drm/panfrost_drm.h @@ -6753,6 +6754,7 @@ S: Maintained B: https://gitlab.freedesktop.org/drm/msm/-/issues T: git https://gitlab.freedesktop.org/drm/msm.git F: Documentation/devicetree/bindings/display/msm/ +F: drivers/gpu/drm/ci/testlist.txt F: drivers/gpu/drm/ci/xfails/msm* F: drivers/gpu/drm/msm/ F: include/uapi/drm/msm_drm.h @@ -7047,6 +7049,7 @@ T: git git://anongit.freedesktop.org/drm/drm-misc F: Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml F: Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml F: Documentation/gpu/meson.rst +F: drivers/gpu/drm/ci/testlist.txt F: drivers/gpu/drm/ci/xfails/meson* F: drivers/gpu/drm/meson/ @@ -7160,6 +7163,7 @@ L: dri-devel@lists.freedesktop.org L: linux-media...@lists.infradead.org (moderated for non-subscribers) S: Supported F: Documentation/devicetree/bindings/display/mediatek/ +F: drivers/gpu/drm/ci/testlist.txt F: drivers/gpu/drm/ci/xfails/mediatek* F: drivers/gpu/drm/mediatek/ F: drivers/phy/mediatek/phy-mtk-dp.c @@ -7211,6 +7215,7 @@ L: dri-devel@lists.freedesktop.org S: Maintained T: git git://anongit.freedesktop.org/drm/drm-misc F: Documentation/devicetree/bindings/display/rockchip/ +F: drivers/gpu/drm/ci/testlist.txt F: drivers/gpu/drm/ci/xfails/rockchip* F: drivers/gpu/drm/rockchip/ @@ -10739,6 +10744,7 @@ C: irc://irc.oftc.net/intel-gfx T: git git
Re: [PATCH v5 03/10] drm/ci: uprev IGT and update testlist
On 4/1/24 03:12, Vignesh Raman wrote: Uprev IGT and add amd, v3d, vc4 and vgem specific tests to testlist and skip driver-specific tests. Also add testlist to the MAINTAINERS file and update xfails. Signed-off-by: Vignesh Raman --- v3: - New patch in series to uprev IGT and update testlist. v4: - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes. v5: - Keep single testlist and update xfails. Skip driver specific tests. Looks a bit odd to me to have a single testlist with the specific tests in it. We will need to skip the specific tests on all *-skips.txt. Could you justify this choice in the commit message? Best Regards, - Maíra --- MAINTAINERS | 8 + drivers/gpu/drm/ci/gitlab-ci.yml | 2 +- drivers/gpu/drm/ci/testlist.txt | 321 ++ .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt | 25 +- .../drm/ci/xfails/amdgpu-stoney-flakes.txt| 10 +- .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt | 23 +- drivers/gpu/drm/ci/xfails/i915-amly-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-apl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-cml-skips.txt | 7 + drivers/gpu/drm/ci/xfails/i915-glk-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt | 9 +- drivers/gpu/drm/ci/xfails/i915-whl-skips.txt | 9 +- .../drm/ci/xfails/mediatek-mt8173-skips.txt | 6 + .../drm/ci/xfails/mediatek-mt8183-skips.txt | 6 + .../gpu/drm/ci/xfails/meson-g12b-skips.txt| 6 + .../gpu/drm/ci/xfails/msm-apq8016-skips.txt | 5 + .../gpu/drm/ci/xfails/msm-apq8096-skips.txt | 8 +- .../msm-sc7180-trogdor-kingoftown-skips.txt | 6 + ...sm-sc7180-trogdor-lazor-limozeen-skips.txt | 6 + .../gpu/drm/ci/xfails/msm-sdm845-skips.txt| 6 + .../drm/ci/xfails/rockchip-rk3288-skips.txt | 9 +- .../drm/ci/xfails/rockchip-rk3399-skips.txt | 7 + .../drm/ci/xfails/virtio_gpu-none-skips.txt | 9 +- 24 files changed, 511 insertions(+), 13 deletions(-) create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt diff --git a/MAINTAINERS b/MAINTAINERS index 3bc7e122a094..f7d0040a6c21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1665,6 +1665,7 @@ L:dri-devel@lists.freedesktop.org S:Supported T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/gpu/panfrost.rst +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/panfrost/ F:include/uapi/drm/panfrost_drm.h @@ -6753,6 +6754,7 @@ S: Maintained B:https://gitlab.freedesktop.org/drm/msm/-/issues T:git https://gitlab.freedesktop.org/drm/msm.git F:Documentation/devicetree/bindings/display/msm/ +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/msm* F:drivers/gpu/drm/msm/ F:include/uapi/drm/msm_drm.h @@ -7047,6 +7049,7 @@ T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml F:Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml F:Documentation/gpu/meson.rst +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/meson* F:drivers/gpu/drm/meson/ @@ -7160,6 +7163,7 @@ L: dri-devel@lists.freedesktop.org L:linux-media...@lists.infradead.org (moderated for non-subscribers) S:Supported F:Documentation/devicetree/bindings/display/mediatek/ +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/mediatek* F:drivers/gpu/drm/mediatek/ F:drivers/phy/mediatek/phy-mtk-dp.c @@ -7211,6 +7215,7 @@ L:dri-devel@lists.freedesktop.org S:Maintained T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/devicetree/bindings/display/rockchip/ +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/rockchip* F:drivers/gpu/drm/rockchip/ @@ -10739,6 +10744,7 @@ C: irc://irc.oftc.net/intel-gfx T:git git://anongit.freedesktop.org/drm-intel F:Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon F:Documentation/gpu/i915.rst +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/i915* F:drivers/gpu/drm/i915/ F:include/drm/i915* @@ -18255,6 +18261,7 @@ C: irc://irc.oftc.net/radeon T:git https://gitlab.freedesktop.org/agd5f/linux.git F:Documentation/gpu/amdgpu/ F:drivers/gpu/drm/amd/ +F: drivers/gpu/drm/ci/testlist.txt F:drivers/gpu/drm/ci/xfails/amd* F:drivers/gpu/drm/radeon/ F:include/uapi/drm/amdgpu_drm.h @@ -23303,6 +23310,7 @@ L: dri-devel@lists.freedesktop.org L:virtualizat...@lists.linux.dev
Re: [PATCH v5 10/10] drm/ci: add tests on vkms
On 4/1/24 03:12, Vignesh Raman wrote: Add job that runs igt on top of vkms. Signed-off-by: Vignesh Raman Acked-by: Jessica Zhang Tested-by: Jessica Zhang Acked-by: Maxime Ripard Signed-off-by: Helen Koike Acked-by: Maíra Canal Best Regards, - Maíra --- v4: - New patch in the series. https://lore.kernel.org/lkml/20240201065346.801038-1-vignesh.ra...@collabora.com/ v5: - No changes. --- MAINTAINERS | 2 ++ drivers/gpu/drm/ci/build.sh | 1 - drivers/gpu/drm/ci/gitlab-ci.yml | 3 +- drivers/gpu/drm/ci/igt_runner.sh | 6 ++-- drivers/gpu/drm/ci/image-tags.yml | 2 +- drivers/gpu/drm/ci/test.yml | 24 +- drivers/gpu/drm/ci/x86_64.config | 1 + .../drm/ci/xfails/virtio_gpu-none-fails.txt | 1 - drivers/gpu/drm/ci/xfails/vkms-none-fails.txt | 33 +++ .../gpu/drm/ci/xfails/vkms-none-flakes.txt| 20 +++ drivers/gpu/drm/ci/xfails/vkms-none-skips.txt | 23 + 11 files changed, 108 insertions(+), 8 deletions(-) create mode 100644 drivers/gpu/drm/ci/xfails/vkms-none-fails.txt create mode 100644 drivers/gpu/drm/ci/xfails/vkms-none-flakes.txt create mode 100644 drivers/gpu/drm/ci/xfails/vkms-none-skips.txt diff --git a/MAINTAINERS b/MAINTAINERS index 333704ceefb6..c78c825508ce 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6962,6 +6962,8 @@ L:dri-devel@lists.freedesktop.org S:Maintained T:git git://anongit.freedesktop.org/drm/drm-misc F:Documentation/gpu/vkms.rst +F: drivers/gpu/drm/ci/testlist.txt +F: drivers/gpu/drm/ci/xfails/vkms* F:drivers/gpu/drm/vkms/ DRM DRIVER FOR VIRTUALBOX VIRTUAL GPU diff --git a/drivers/gpu/drm/ci/build.sh b/drivers/gpu/drm/ci/build.sh index 8a3baa003904..95493df9cdc2 100644 --- a/drivers/gpu/drm/ci/build.sh +++ b/drivers/gpu/drm/ci/build.sh @@ -156,7 +156,6 @@ fi mkdir -p artifacts/install/lib mv install/* artifacts/install/. -rm -rf artifacts/install/modules ln -s common artifacts/install/ci-common cp .config artifacts/${CI_JOB_NAME}_config diff --git a/drivers/gpu/drm/ci/gitlab-ci.yml b/drivers/gpu/drm/ci/gitlab-ci.yml index 5b5d4a324659..df762d03533f 100644 --- a/drivers/gpu/drm/ci/gitlab-ci.yml +++ b/drivers/gpu/drm/ci/gitlab-ci.yml @@ -114,6 +114,7 @@ stages: - panfrost - powervr - virtio-gpu + - software-driver # YAML anchors for rule conditions # @@ -269,4 +270,4 @@ sanity: # Jobs that need to pass before spending hardware resources on further testing .required-for-hardware-jobs: - needs: [] \ No newline at end of file + needs: [] diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh index ce6e22369d4d..c89acb974645 100755 --- a/drivers/gpu/drm/ci/igt_runner.sh +++ b/drivers/gpu/drm/ci/igt_runner.sh @@ -20,10 +20,10 @@ cat /sys/kernel/debug/dri/*/state set -e case "$DRIVER_NAME" in -amdgpu) +amdgpu|vkms) # Cannot use HWCI_KERNEL_MODULES as at that point we don't have the module in /lib -mv /install/modules/lib/modules/* /lib/modules/. -modprobe amdgpu +mv /install/modules/lib/modules/* /lib/modules/. || true +modprobe --first-time $DRIVER_NAME ;; esac diff --git a/drivers/gpu/drm/ci/image-tags.yml b/drivers/gpu/drm/ci/image-tags.yml index cf07c3e09b8c..bf861ab8b9c2 100644 --- a/drivers/gpu/drm/ci/image-tags.yml +++ b/drivers/gpu/drm/ci/image-tags.yml @@ -4,7 +4,7 @@ variables: DEBIAN_BASE_TAG: "${CONTAINER_TAG}" DEBIAN_X86_64_BUILD_IMAGE_PATH: "debian/x86_64_build" - DEBIAN_BUILD_TAG: "2023-10-08-config" + DEBIAN_BUILD_TAG: "2024-01-29-vkms" KERNEL_ROOTFS_TAG: "2023-10-06-amd" PKG_REPO_REV: "67f2c46b" diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml index 8c90ae5a51e6..8fed797a26b9 100644 --- a/drivers/gpu/drm/ci/test.yml +++ b/drivers/gpu/drm/ci/test.yml @@ -411,7 +411,7 @@ panfrost:g12b: - .panfrost-gpu virtio_gpu:none: - stage: virtio-gpu + stage: software-driver variables: CROSVM_GALLIUM_DRIVER: llvmpipe DRIVER_NAME: virtio_gpu @@ -431,3 +431,25 @@ virtio_gpu:none: - debian/x86_64_test-gl - testing:x86_64 - igt:x86_64 + +vkms:none: + stage: software-driver + variables: +DRIVER_NAME: vkms +GPU_VERSION: none + extends: +- .test-gl +- .test-rules + tags: +- kvm + script: +- ln -sf $CI_PROJECT_DIR/install /install +- mv install/bzImage /lava-files/bzImage +- mkdir -p /lib/modules +- mkdir -p $CI_PROJECT_DIR/results +- ln -sf $CI_PROJECT_DIR/results /results +- ./install/crosvm-runner.sh ./install/igt_runner.sh + needs: +- debian/x86_64_test-gl +- testing:x86_64 +- igt:x86_64 diff --git a/drivers/gpu/drm/
Re: [PATCH v5 04/16] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
On 3/26/24 12:56, Louis Chauvet wrote: Le 25/03/24 - 10:56, Maíra Canal a écrit : On 3/13/24 14:44, Louis Chauvet wrote: Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the compiler to check if the passed functions take the correct arguments. Such typedefs will help ensuring consistency across the code base in case of update of these prototypes. Rename input/output variable in a consistent way between read_line and write_line. A warn has been added in get_pixel_*_function to alert when an unsupported pixel format is requested. As those formats are checked before atomic_update callbacks, it should never append. Document for those typedefs. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_drv.h | 23 ++- drivers/gpu/drm/vkms/vkms_formats.c | 124 +--- drivers/gpu/drm/vkms/vkms_formats.h | 4 +- drivers/gpu/drm/vkms/vkms_plane.c | 2 +- 4 files changed, 95 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 18086423a3a7..4bfc62d26f08 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -53,12 +53,31 @@ struct line_buffer { struct pixel_argb_u16 *pixels; }; +/** + * typedef pixel_write_t - These functions are used to read a pixel from a + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels + * buffer. Your brief description looks a bit big to me. Also, take a look at the cross-references docs [1]. Is this description sufficient? typedef pixel_write_t - Convert a pixel from a pixel_argb_u16 into a specific format Yeah. Best Regards, - Maíra [1] https://docs.kernel.org/doc-guide/kernel-doc.html#highlights-and-cross-references + * + * @out_pixel: destination address to write the pixel + * @in_pixel: pixel to write + */ +typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel); + struct vkms_writeback_job { struct iosys_map data[DRM_FORMAT_MAX_PLANES]; struct vkms_frame_info wb_frame_info; - void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel); + pixel_write_t pixel_write; }; +/** + * typedef pixel_read_t - These functions are used to read a pixel in the source frame, + * convert it to `struct pixel_argb_u16` and write it to @out_pixel. Same. typedef pixel_read_t - Read a pixel and convert it to a pixel_argb_u16 + * + * @in_pixel: Pointer to the pixel to read + * @out_pixel: Pointer to write the converted pixel s/Pointer/pointer Fixed in v6. + */ +typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel); + /** * vkms_plane_state - Driver specific plane state * @base: base plane state @@ -69,7 +88,7 @@ struct vkms_writeback_job { struct vkms_plane_state { struct drm_shadow_plane_state base; struct vkms_frame_info *frame_info; - void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel); + pixel_read_t pixel_read; }; struct vkms_plane { diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 6e3dc8682ff9..55a4365d21a4 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -76,7 +76,7 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i * They are used in the `vkms_compose_row` function to handle multiple formats. */ -static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { /* * The 257 is the "conversion ratio". This number is obtained by the @@ -84,48 +84,48 @@ static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe * the best color value in a pixel format with more possibilities. * A similar idea applies to others RGB color conversions. */ - out_pixel->a = (u16)src_pixels[3] * 257; - out_pixel->r = (u16)src_pixels[2] * 257; - out_pixel->g = (u16)src_pixels[1] * 257; - out_pixel->b = (u16)src_pixels[0] * 257; + out_pixel->a = (u16)in_pixel[3] * 257; + out_pixel->r = (u16)in_pixel[2] * 257; + out_pixel->g = (u16)in_pixel[1] * 257; + out_pixel->b = (u16)in_pixel[0] * 257; } -static void XRGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { out_pixel->a = (u16)0x; - out_pixel->r = (u16)src_pixels[2] * 257; - out_pixel->g = (u16)src_pixels[1] * 257; - out_pixel->b = (u16)src_pixels[0] * 257; + out_pixel->r = (u16)in_pixel[2] * 257; + out_pixel->g = (u16)in_pixel[1] * 257; + out_pixel->b = (u16)in_pixel[0] *
Re: [PATCH v2 05/14] drm: Suppress intentional warning backtraces in scaling unit tests
On 3/25/24 16:24, Guenter Roeck wrote: Hi, On Mon, Mar 25, 2024 at 04:05:06PM -0300, Maíra Canal wrote: Hi Guenter, On 3/25/24 14:52, Guenter Roeck wrote: The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests intentionally trigger warning backtraces by providing bad parameters to the tested functions. What is tested is the return value, not the existence of a warning backtrace. Suppress the backtraces to avoid clogging the kernel log. Tested-by: Linux Kernel Functional Testing Acked-by: Dan Carpenter Signed-off-by: Guenter Roeck --- - Rebased to v6.9-rc1 - Added Tested-by:, Acked-by:, and Reviewed-by: tags drivers/gpu/drm/tests/drm_rect_test.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_rect_test.c b/drivers/gpu/drm/tests/drm_rect_test.c index 76332cd2ead8..75614cb4deb5 100644 --- a/drivers/gpu/drm/tests/drm_rect_test.c +++ b/drivers/gpu/drm/tests/drm_rect_test.c @@ -406,22 +406,28 @@ KUNIT_ARRAY_PARAM(drm_rect_scale, drm_rect_scale_cases, drm_rect_scale_case_desc static void drm_test_rect_calc_hscale(struct kunit *test) { + DEFINE_SUPPRESSED_WARNING(drm_calc_scale); const struct drm_rect_scale_case *params = test->param_value; int scaling_factor; + START_SUPPRESSED_WARNING(drm_calc_scale); I'm not sure if it is not that obvious only to me, but it would be nice to have a comment here, remembering that we provide bad parameters in some test cases. Sure. Something like this ? /* * drm_rect_calc_hscale() generates a warning backtrace whenever bad * parameters are passed to it. This affects all unit tests with an * error code in expected_scaling_factor. */ Yeah, perfect. With that, feel free to add my Acked-by: Maíra Canal Best Regards, - Maíra Thanks, Guenter
Re: [PATCH v2 05/14] drm: Suppress intentional warning backtraces in scaling unit tests
Hi Guenter, On 3/25/24 14:52, Guenter Roeck wrote: The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests intentionally trigger warning backtraces by providing bad parameters to the tested functions. What is tested is the return value, not the existence of a warning backtrace. Suppress the backtraces to avoid clogging the kernel log. Tested-by: Linux Kernel Functional Testing Acked-by: Dan Carpenter Signed-off-by: Guenter Roeck --- - Rebased to v6.9-rc1 - Added Tested-by:, Acked-by:, and Reviewed-by: tags drivers/gpu/drm/tests/drm_rect_test.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_rect_test.c b/drivers/gpu/drm/tests/drm_rect_test.c index 76332cd2ead8..75614cb4deb5 100644 --- a/drivers/gpu/drm/tests/drm_rect_test.c +++ b/drivers/gpu/drm/tests/drm_rect_test.c @@ -406,22 +406,28 @@ KUNIT_ARRAY_PARAM(drm_rect_scale, drm_rect_scale_cases, drm_rect_scale_case_desc static void drm_test_rect_calc_hscale(struct kunit *test) { + DEFINE_SUPPRESSED_WARNING(drm_calc_scale); const struct drm_rect_scale_case *params = test->param_value; int scaling_factor; + START_SUPPRESSED_WARNING(drm_calc_scale); I'm not sure if it is not that obvious only to me, but it would be nice to have a comment here, remembering that we provide bad parameters in some test cases. Best Regards, - Maíra scaling_factor = drm_rect_calc_hscale(>src, >dst, params->min_range, params->max_range); + END_SUPPRESSED_WARNING(drm_calc_scale); KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor); } static void drm_test_rect_calc_vscale(struct kunit *test) { + DEFINE_SUPPRESSED_WARNING(drm_calc_scale); const struct drm_rect_scale_case *params = test->param_value; int scaling_factor; + START_SUPPRESSED_WARNING(drm_calc_scale); scaling_factor = drm_rect_calc_vscale(>src, >dst, params->min_range, params->max_range); + END_SUPPRESSED_WARNING(drm_calc_scale); KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor); }
Re: [PATCH v5 14/16] drm/vkms: Create KUnit tests for YUV conversions
On 3/13/24 14:45, Louis Chauvet wrote: From: Arthur Grillo Create KUnit tests to test the conversion between YUV and RGB. Test each conversion and range combination with some common colors. The code used to compute the expected result can be found in comment. Signed-off-by: Arthur Grillo [Louis Chauvet: - fix minor formating issues (whitespace, double line) - change expected alpha from 0x to 0x - adapt to the new get_conversion_matrix usage - apply the changes from Arthur - move struct pixel_yuv_u8 to the test itself] Again, a Co-developed-by tag might be more proper. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/Kconfig | 15 ++ drivers/gpu/drm/vkms/Makefile | 1 + drivers/gpu/drm/vkms/tests/.kunitconfig | 4 + drivers/gpu/drm/vkms/tests/Makefile | 3 + drivers/gpu/drm/vkms/tests/vkms_format_test.c | 230 ++ drivers/gpu/drm/vkms/vkms_formats.c | 7 +- drivers/gpu/drm/vkms/vkms_formats.h | 4 + 7 files changed, 262 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/vkms/Kconfig b/drivers/gpu/drm/vkms/Kconfig index b9ecdebecb0b..9b0e1940c14f 100644 --- a/drivers/gpu/drm/vkms/Kconfig +++ b/drivers/gpu/drm/vkms/Kconfig @@ -13,3 +13,18 @@ config DRM_VKMS a VKMS. If M is selected the module will be called vkms. + +config DRM_VKMS_KUNIT_TESTS + tristate "Tests for VKMS" if !KUNIT_ALL_TESTS "KUnit tests for VKMS" + depends on DRM_VKMS && KUNIT + default KUNIT_ALL_TESTS + help + This builds unit tests for VKMS. This option is not useful for + distributions or general kernels, but only for kernel + developers working on VKMS. + + For more information on KUnit and unit tests in general, + please refer to the KUnit documentation in + Documentation/dev-tools/kunit/. + + If in doubt, say "N". \ No newline at end of file diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile index 1b28a6a32948..8d3e46dde635 100644 --- a/drivers/gpu/drm/vkms/Makefile +++ b/drivers/gpu/drm/vkms/Makefile @@ -9,3 +9,4 @@ vkms-y := \ vkms_writeback.o obj-$(CONFIG_DRM_VKMS) += vkms.o +obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += tests/ diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig b/drivers/gpu/drm/vkms/tests/.kunitconfig new file mode 100644 index ..70e378228cbd --- /dev/null +++ b/drivers/gpu/drm/vkms/tests/.kunitconfig @@ -0,0 +1,4 @@ +CONFIG_KUNIT=y +CONFIG_DRM=y +CONFIG_DRM_VKMS=y +CONFIG_DRM_VKMS_KUNIT_TESTS=y diff --git a/drivers/gpu/drm/vkms/tests/Makefile b/drivers/gpu/drm/vkms/tests/Makefile new file mode 100644 index ..2d1df668569e --- /dev/null +++ b/drivers/gpu/drm/vkms/tests/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += vkms_format_test.o diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c new file mode 100644 index ..0954d606e44a --- /dev/null +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +#include +#include +#include + +#include "../../drm_crtc_internal.h" + +#include "../vkms_drv.h" +#include "../vkms_formats.h" + +#define TEST_BUFF_SIZE 50 + +struct pixel_yuv_u8 { + u8 y, u, v; +}; + +struct yuv_u8_to_argb_u16_case { + enum drm_color_encoding encoding; + enum drm_color_range range; + size_t n_colors; + struct format_pair { + char *name; + struct pixel_yuv_u8 yuv; + struct pixel_argb_u16 argb; + } colors[TEST_BUFF_SIZE]; +}; + +/* + * The YUV color representation were acquired via the colour python framework. + * Below are the function calls used for generating each case. + * + * for more information got to the docs: s/for/For + * https://colour.readthedocs.io/en/master/generated/colour.RGB_to_YCbCr.html + */ +static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = { + /* +* colour.RGB_to_YCbCr(, +* K=colour.WEIGHTS_YCBCR["ITU-R BT.601"], +* in_bits = 16, +* in_legal = False, +* in_int = True, +* out_bits = 8, +* out_legal = False, +* out_int = True) +*/ I feel that this Python code is kind of poluting the test cases. + { + .encoding = DRM_COLOR_YCBCR_BT601, + .range = DRM_COLOR_YCBCR_FULL_RANGE, + .n_colors = 6, + .colors = { + { "white", { 0xff, 0x80, 0x80 }, { 0x, 0x, 0x, 0x }}, + { "gray", { 0x80, 0x80, 0x80 }, { 0x, 0x8080, 0x8080, 0x8080 }}, + {
Re: [PATCH v5 11/16] drm/vkms: Add YUV support
On 3/13/24 14:45, Louis Chauvet wrote: From: Arthur Grillo Add support to the YUV formats bellow: - NV12/NV16/NV24 - NV21/NV61/NV42 - YUV420/YUV422/YUV444 - YVU420/YVU422/YVU444 The conversion from yuv to rgb is done with fixed-point arithmetic, using 32.32 floats and the drm_fixed helpers. To do the conversion, a specific matrix must be used for each color range (DRM_COLOR_*_RANGE) and encoding (DRM_COLOR_*). This matrix is stored in the `conversion_matrix` struct, along with the specific y_offset needed. This matrix is queried only once, in `vkms_plane_atomic_update` and stored in a `vkms_plane_state`. Those conversion matrices of each encoding and range were obtained by rounding the values of the original conversion matrices multiplied by 2^32. This is done to avoid the use of floating point operations. The same reading function is used for YUV and YVU formats. As the only difference between those two category of formats is the order of field, a simple swap in conversion matrix columns allows using the same function. Signed-off-by: Arthur Grillo [Louis Chauvet: - Adapted Arthur's work - Implemented the read_line_t callbacks for yuv - add struct conversion_matrix - remove struct pixel_yuv_u8 - update the commit message - Merge the modifications from Arthur] A Co-developed-by tag would be more appropriate. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_drv.h | 22 ++ drivers/gpu/drm/vkms/vkms_formats.c | 431 drivers/gpu/drm/vkms/vkms_formats.h | 4 + drivers/gpu/drm/vkms/vkms_plane.c | 17 +- 4 files changed, 473 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 23e1d247468d..f3116084de5a 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -99,6 +99,27 @@ typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, int x_st int y_start, enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]); +/** + * CONVERSION_MATRIX_FLOAT_DEPTH - Number of digits after the point for conversion matrix values + */ +#define CONVERSION_MATRIX_FLOAT_DEPTH 32 + +/** + * struct conversion_matrix - Matrix to use for a specific encoding and range + * + * @matrix: Conversion matrix from yuv to rgb. The matrix is stored in a row-major manner and is + * used to compute rgb values from yuv values: + * [[r],[g],[b]] = @matrix * [[y],[u],[v]] + * OR for yvu formats: + * [[r],[g],[b]] = @matrix * [[y],[v],[u]] + * The values of the matrix are fixed floats, 32.CONVERSION_MATRIX_FLOAT_DEPTH > + * @y_offest: Offset to apply on the y value. s/y_offest/y_offset + */ +struct conversion_matrix { + s64 matrix[3][3]; + s64 y_offset; +}; + /** * vkms_plane_state - Driver specific plane state * @base: base plane state @@ -110,6 +131,7 @@ struct vkms_plane_state { struct drm_shadow_plane_state base; struct vkms_frame_info *frame_info; pixel_read_line_t pixel_read_line; + struct conversion_matrix *conversion_matrix; Add @conversion_matrix on the kernel-doc from the struct vkms_plane_state. }; struct vkms_plane { diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 1449a0e6c706..edbf4b321b91 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -105,6 +105,44 @@ static int get_step_next_block(struct drm_framebuffer *fb, enum pixel_read_direc return 0; } +/** + * get_subsampling() - Get the subsampling divisor value on a specific direction Where are the arguments? + */ +static int get_subsampling(const struct drm_format_info *format, + enum pixel_read_direction direction) +{ + switch (direction) { + case READ_BOTTOM_TO_TOP: + case READ_TOP_TO_BOTTOM: + return format->vsub; + case READ_RIGHT_TO_LEFT: + case READ_LEFT_TO_RIGHT: + return format->hsub; + } + WARN_ONCE(true, "Invalid direction for pixel reading: %d\n", direction); + return 1; +} + +/** + * get_subsampling_offset() - An offset for keeping the chroma siting consistent regardless of + * x_start and y_start values Same. + */ +static int get_subsampling_offset(enum pixel_read_direction direction, int x_start, int y_start) +{ + switch (direction) { + case READ_BOTTOM_TO_TOP: + return -y_start - 1; + case READ_TOP_TO_BOTTOM: + return y_start; + case READ_RIGHT_TO_LEFT: + return -x_start - 1; + case READ_LEFT_TO_RIGHT: + return x_start; + } + WARN_ONCE(true, "Invalid direction for pixel reading: %d\n", direction); + return 0; +} + /* * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to
Re: [PATCH v5 10/16] drm/vkms: Re-introduce line-per-line composition algorithm
On 3/13/24 14:45, Louis Chauvet wrote: Re-introduce a line-by-line composition algorithm for each pixel format. This allows more performance by not requiring an indirection per pixel read. This patch is focused on readability of the code. Line-by-line composition was introduced by [1] but rewritten back to pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact on performance, and it was merged. This patch is almost a revert of [2], but in addition efforts have been made to increase readability and maintainability of the rotation handling. The blend function is now divided in two parts: - Transformation of coordinates from the output referential to the source referential - Line conversion and blending Most of the complexity of the rotation management is avoided by using drm_rect_* helpers. The remaining complexity is around the clipping, to avoid reading/writing outside source/destination buffers. The pixel conversion is now done line-by-line, so the read_pixel_t was replaced with read_pixel_line_t callback. This way the indirection is only required once per line and per plane, instead of once per pixel and per plane. The read_line_t callbacks are very similar for most pixel format, but it is required to avoid performance impact. Some helpers for color conversion were introduced to avoid code repetition: - *_to_argb_u16: perform colors conversion. They should be inlined by the compiler, and they are used to avoid repetition between multiple variants of the same format (argb/xrgb and maybe in the future for formats like bgr formats). This new algorithm was tested with: - kms_plane (for color conversions) - kms_rotation_crc (for rotations of planes) - kms_cursor_crc (for translations of planes) - kms_rotation (for all rotations and formats combinations) [3] The performance gain was mesured with: - kms_fb_stress Could you tell us what was the performance gain? [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept new formats") https://lore.kernel.org/all/20220905190811.25024-7-igormtorre...@gmail.com/ [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion functionality") https://lore.kernel.org/all/20230418130525.128733-2-mca...@igalia.com/ [3]: Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_composer.c | 167 +++-- drivers/gpu/drm/vkms/vkms_drv.h | 27 ++-- drivers/gpu/drm/vkms/vkms_formats.c | 236 ++- drivers/gpu/drm/vkms/vkms_formats.h | 2 +- drivers/gpu/drm/vkms/vkms_plane.c| 5 +- 5 files changed, 292 insertions(+), 145 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index 989bcf59f375..5d78c33dbf41 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer, struct line_buffer *output_buffer, int x_start, int pixel_count) { struct pixel_argb_u16 *out = _buffer->pixels[x_start]; - const struct pixel_argb_u16 *in = stage_buffer->pixels; + const struct pixel_argb_u16 *in = _buffer->pixels[x_start]; for (int i = 0; i < pixel_count; i++) { out[i].a = (u16)0x; @@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer, } } -static int get_y_pos(struct vkms_frame_info *frame_info, int y) -{ - if (frame_info->rotation & DRM_MODE_REFLECT_Y) - return drm_rect_height(_info->rotated) - y - 1; - - switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) { - case DRM_MODE_ROTATE_90: - return frame_info->rotated.x2 - y - 1; - case DRM_MODE_ROTATE_270: - return y + frame_info->rotated.x1; - default: - return y; - } -} - -static bool check_limit(struct vkms_frame_info *frame_info, int pos) -{ - if (drm_rotation_90_or_270(frame_info->rotation)) { - if (pos >= 0 && pos < drm_rect_width(_info->rotated)) - return true; - } else { - if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2) - return true; - } - - return false; -} static void fill_background(const struct pixel_argb_u16 *background_color, struct line_buffer *output_buffer) @@ -215,34 +188,146 @@ static void blend(struct vkms_writeback_job *wb, { struct vkms_plane_state **plane = crtc_state->active_planes; u32 n_active_planes = crtc_state->num_active_planes; - int y_pos, x_dst, x_limit; const struct pixel_argb_u16 background_color = { .a = 0x }; - size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; + int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; + int crtc_x_limit =
Re: [PATCH v5 09/16] drm/vkms: Introduce pixel_read_direction enum
On 3/13/24 14:45, Louis Chauvet wrote: The pixel_read_direction enum is useful to describe the reading direction in a plane. It avoids using the rotation property of DRM, which not practical to know the direction of reading. This patch also introduce two helpers, one to compute the pixel_read_direction from the DRM rotation property, and one to compute the step, in byte, between two successive pixel in a specific direction. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_composer.c | 36 drivers/gpu/drm/vkms/vkms_drv.h | 11 +++ drivers/gpu/drm/vkms/vkms_formats.c | 30 ++ 3 files changed, 77 insertions(+) diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index 9254086f23ff..989bcf59f375 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -159,6 +159,42 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff } } +/** + * direction_for_rotation() - Get the correct reading direction for a given rotation + * + * This function will use the @rotation setting of a source plane to compute the reading + * direction in this plane which correspond to a "left to right writing" in the CRTC. + * For example, if the buffer is reflected on X axis, the pixel must be read from right to left + * to be written from left to right on the CRTC. + * + * @rotation: Rotation to analyze. It correspond the field @frame_info.rotation. A bit unusual to see arguments after the description. + */ +static enum pixel_read_direction direction_for_rotation(unsigned int rotation) +{ + if (rotation & DRM_MODE_ROTATE_0) { + if (rotation & DRM_MODE_REFLECT_X) + return READ_RIGHT_TO_LEFT; + else + return READ_LEFT_TO_RIGHT; + } else if (rotation & DRM_MODE_ROTATE_90) { + if (rotation & DRM_MODE_REFLECT_Y) + return READ_BOTTOM_TO_TOP; + else + return READ_TOP_TO_BOTTOM; + } else if (rotation & DRM_MODE_ROTATE_180) { + if (rotation & DRM_MODE_REFLECT_X) + return READ_LEFT_TO_RIGHT; + else + return READ_RIGHT_TO_LEFT; + } else if (rotation & DRM_MODE_ROTATE_270) { + if (rotation & DRM_MODE_REFLECT_Y) + return READ_TOP_TO_BOTTOM; + else + return READ_BOTTOM_TO_TOP; + } + return READ_LEFT_TO_RIGHT; +} + /** * blend - blend the pixels from all planes and compute crc * @wb: The writeback frame buffer metadata diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 3ead8b39af4a..985e7a92b7bc 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -69,6 +69,17 @@ struct vkms_writeback_job { pixel_write_t pixel_write; }; +/** + * enum pixel_read_direction - Enum used internaly by VKMS to represent a reading direction in a + * plane. + */ +enum pixel_read_direction { + READ_BOTTOM_TO_TOP, + READ_TOP_TO_BOTTOM, + READ_RIGHT_TO_LEFT, + READ_LEFT_TO_RIGHT +}; + /** * typedef pixel_read_t - These functions are used to read a pixel in the source frame, * convert it to `struct pixel_argb_u16` and write it to @out_pixel. diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 649d75d05b1f..743b6fd06db5 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -75,6 +75,36 @@ static void packed_pixels_addr(const struct vkms_frame_info *frame_info, *addr = (u8 *)frame_info->map[0].vaddr + offset; } +/** + * get_step_next_block() - Common helper to compute the correct step value between each pixel block + * to read in a certain direction. + * + * As the returned offset is the number of bytes between two consecutive blocks in a direction, + * the caller may have to read multiple pixel before using the next one (for example, to read from + * left to right in a DRM_FORMAT_R1 plane, each block contains 8 pixels, so the step must be used + * only every 8 pixels. + * + * @fb: Framebuffer to iter on + * @direction: Direction of the reading + * @plane_index: Plane to get the step from Same. Best Regards, - Maíra + */ +static int get_step_next_block(struct drm_framebuffer *fb, enum pixel_read_direction direction, + int plane_index) +{ + switch (direction) { + case READ_LEFT_TO_RIGHT: + return fb->format->char_per_block[plane_index]; + case READ_RIGHT_TO_LEFT: + return -fb->format->char_per_block[plane_index]; + case READ_TOP_TO_BOTTOM: + return (int)fb->pitches[plane_index]; + case READ_BOTTOM_TO_TOP: +
Re: [PATCH v5 06/16] drm/vkms: Use const for input pointers in pixel_read an pixel_write functions
On 3/13/24 14:45, Louis Chauvet wrote: As the pixel_read and pixel_write function should never modify the input buffer, mark those pointers const. Signed-off-by: Louis Chauvet Reviewed-by: Maíra Canal Best Regards, - Maíra --- drivers/gpu/drm/vkms/vkms_drv.h | 4 ++-- drivers/gpu/drm/vkms/vkms_formats.c | 24 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 4bfc62d26f08..3ead8b39af4a 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -61,7 +61,7 @@ struct line_buffer { * @out_pixel: destination address to write the pixel * @in_pixel: pixel to write */ -typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel); +typedef void (*pixel_write_t)(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel); struct vkms_writeback_job { struct iosys_map data[DRM_FORMAT_MAX_PLANES]; @@ -76,7 +76,7 @@ struct vkms_writeback_job { * @in_pixel: Pointer to the pixel to read * @out_pixel: Pointer to write the converted pixel */ -typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel); +typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel); /** * vkms_plane_state - Driver specific plane state diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index b57d85b8b935..b2f8dfc26c35 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -76,7 +76,7 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i * They are used in the `vkms_compose_row` function to handle multiple formats. */ -static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void ARGB_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { /* * The 257 is the "conversion ratio". This number is obtained by the @@ -90,7 +90,7 @@ static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) out_pixel->b = (u16)in_pixel[0] * 257; } -static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void XRGB_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { out_pixel->a = (u16)0x; out_pixel->r = (u16)in_pixel[2] * 257; @@ -98,7 +98,7 @@ static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) out_pixel->b = (u16)in_pixel[0] * 257; } -static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { u16 *pixel = (u16 *)in_pixel; @@ -108,7 +108,7 @@ static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pi out_pixel->b = le16_to_cpu(pixel[0]); } -static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { u16 *pixel = (u16 *)in_pixel; @@ -118,7 +118,7 @@ static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pi out_pixel->b = le16_to_cpu(pixel[0]); } -static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { u16 *pixel = (u16 *)in_pixel; @@ -143,7 +143,7 @@ static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) * It is used to avoid null pointer to be used as a function. In theory, this function should * never be called, except if you found a bug in the driver/DRM core. */ -static void black_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +static void black_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { out_pixel->a = (u16)0x; out_pixel->r = 0; @@ -189,7 +189,7 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state * They are used in the `vkms_writeback_row` to convert and store a pixel from the src_buffer to * the writeback buffer. */ -static void argb_u16_to_ARGB(u8 *out_pixel, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_ARGB(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { /* * This sequence below is important because the format's byte order is @@ -207,7 +207,7 @@ static void argb_u16_to_ARGB(u8 *out_pixel, struct pixel_argb_u16 *in_pixel) out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257); } -static void argb_u16_to_XRGB(u8 *out_pixel, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_XRGB(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { out_pixel[3] = 0xff;
Re: [PATCH v5 05/16] drm/vkms: Add dummy pixel_read/pixel_write callbacks to avoid NULL pointers
On 3/13/24 14:44, Louis Chauvet wrote: Introduce two callbacks which does nothing. They are used in replacement of NULL and it avoid kernel OOPS if this NULL is called. If those callback are used, it means that there is a mismatch between what formats are announced by atomic_check and what is realy supported by atomic_update. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_formats.c | 43 +++-- 1 file changed, 37 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 55a4365d21a4..b57d85b8b935 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -136,6 +136,21 @@ static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); } +/** + * black_to_argb_u16() - pixel_read callback which always read black + * + * This callback is used when an invalid format is requested for plane reading. + * It is used to avoid null pointer to be used as a function. In theory, this function should + * never be called, except if you found a bug in the driver/DRM core. + */ +static void black_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) +{ + out_pixel->a = (u16)0x; + out_pixel->r = 0; + out_pixel->g = 0; + out_pixel->b = 0; +} + /** * vkms_compose_row - compose a single row of a plane * @stage_buffer: output line with the composed pixels @@ -238,6 +253,16 @@ static void argb_u16_to_RGB565(u8 *out_pixel, struct pixel_argb_u16 *in_pixel) *pixel = cpu_to_le16(r << 11 | g << 5 | b); } +/** + * argb_u16_to_nothing() - pixel_write callback with no effect + * + * This callback is used when an invalid format is requested for writeback. + * It is used to avoid null pointer to be used as a function. In theory, this should never + * happen, except if there is a bug in the driver + */ +static void argb_u16_to_nothing(u8 *out_pixel, struct pixel_argb_u16 *in_pixel) +{} + /** * Generic loop for all supported writeback format. It is executed just after the blending to * write a line in the writeback buffer. @@ -261,8 +286,8 @@ void vkms_writeback_row(struct vkms_writeback_job *wb, /** * Retrieve the correct read_pixel function for a specific format. - * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the - * pointer is valid before using it in a vkms_plane_state. + * If the format is not supported by VKMS a warn is emitted and a dummy "always read black" "If the format is not supported by VKMS, a warning is emitted and a dummy "always read black"..." + * function is returned. * * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h]) */ @@ -285,18 +310,21 @@ pixel_read_t get_pixel_read_function(u32 format) * format must: * - Be listed in vkms_formats in vkms_plane.c * - Have a pixel_read callback defined here +* +* To avoid kernel crash, a dummy "always read black" function is used. It means +* that during the composition, this plane will always be black. */ WARN(true, "Pixel format %p4cc is not supported by VKMS planes. This is a kernel bug, atomic check must forbid this configuration.\n", ); - return (pixel_read_t)NULL; + return _to_argb_u16; } } /** * Retrieve the correct write_pixel function for a specific format. - * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the - * pointer is valid before using it in a vkms_writeback_job. + * If the format is not supported by VKMS a warn is emitted and a dummy "don't do anything" "If the format is not supported by VKMS, a warning is emitted and a dummy "don't do anything"..." Best Regards, - Maíra + * function is returned. * * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h]) */ @@ -319,10 +347,13 @@ pixel_write_t get_pixel_write_function(u32 format) * format must: * - Be listed in vkms_wb_formats in vkms_writeback.c * - Have a pixel_write callback defined here +* +* To avoid kernel crash, a dummy "don't do anything" function is used. It means +* that the resulting writeback buffer is not composed and can contains any values. */ WARN(true, "Pixel format %p4cc is not supported by VKMS writeback. This is a kernel bug, atomic check must forbid this configuration.\n", ); - return (pixel_write_t)NULL; + return _u16_to_nothing; } }
Re: [PATCH v5 04/16] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
On 3/13/24 14:44, Louis Chauvet wrote: Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the compiler to check if the passed functions take the correct arguments. Such typedefs will help ensuring consistency across the code base in case of update of these prototypes. Rename input/output variable in a consistent way between read_line and write_line. A warn has been added in get_pixel_*_function to alert when an unsupported pixel format is requested. As those formats are checked before atomic_update callbacks, it should never append. Document for those typedefs. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_drv.h | 23 ++- drivers/gpu/drm/vkms/vkms_formats.c | 124 +--- drivers/gpu/drm/vkms/vkms_formats.h | 4 +- drivers/gpu/drm/vkms/vkms_plane.c | 2 +- 4 files changed, 95 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 18086423a3a7..4bfc62d26f08 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -53,12 +53,31 @@ struct line_buffer { struct pixel_argb_u16 *pixels; }; +/** + * typedef pixel_write_t - These functions are used to read a pixel from a + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels + * buffer. Your brief description looks a bit big to me. Also, take a look at the cross-references docs [1]. [1] https://docs.kernel.org/doc-guide/kernel-doc.html#highlights-and-cross-references + * + * @out_pixel: destination address to write the pixel + * @in_pixel: pixel to write + */ +typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel); + struct vkms_writeback_job { struct iosys_map data[DRM_FORMAT_MAX_PLANES]; struct vkms_frame_info wb_frame_info; - void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel); + pixel_write_t pixel_write; }; +/** + * typedef pixel_read_t - These functions are used to read a pixel in the source frame, + * convert it to `struct pixel_argb_u16` and write it to @out_pixel. Same. + * + * @in_pixel: Pointer to the pixel to read + * @out_pixel: Pointer to write the converted pixel s/Pointer/pointer + */ +typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel); + /** * vkms_plane_state - Driver specific plane state * @base: base plane state @@ -69,7 +88,7 @@ struct vkms_writeback_job { struct vkms_plane_state { struct drm_shadow_plane_state base; struct vkms_frame_info *frame_info; - void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel); + pixel_read_t pixel_read; }; struct vkms_plane { diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 6e3dc8682ff9..55a4365d21a4 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -76,7 +76,7 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i * They are used in the `vkms_compose_row` function to handle multiple formats. */ -static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { /* * The 257 is the "conversion ratio". This number is obtained by the @@ -84,48 +84,48 @@ static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe * the best color value in a pixel format with more possibilities. * A similar idea applies to others RGB color conversions. */ - out_pixel->a = (u16)src_pixels[3] * 257; - out_pixel->r = (u16)src_pixels[2] * 257; - out_pixel->g = (u16)src_pixels[1] * 257; - out_pixel->b = (u16)src_pixels[0] * 257; + out_pixel->a = (u16)in_pixel[3] * 257; + out_pixel->r = (u16)in_pixel[2] * 257; + out_pixel->g = (u16)in_pixel[1] * 257; + out_pixel->b = (u16)in_pixel[0] * 257; } -static void XRGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { out_pixel->a = (u16)0x; - out_pixel->r = (u16)src_pixels[2] * 257; - out_pixel->g = (u16)src_pixels[1] * 257; - out_pixel->b = (u16)src_pixels[0] * 257; + out_pixel->r = (u16)in_pixel[2] * 257; + out_pixel->g = (u16)in_pixel[1] * 257; + out_pixel->b = (u16)in_pixel[0] * 257; } -static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel) { - u16 *pixels = (u16 *)src_pixels; + u16 *pixel = (u16 *)in_pixel; - out_pixel->a = le16_to_cpu(pixels[3]); - out_pixel->r = le16_to_cpu(pixels[2]); - out_pixel->g = le16_to_cpu(pixels[1]); -
Re: [PATCH v5 03/16] drm/vkms: write/update the documentation for pixel conversion and pixel write functions
On 3/13/24 14:44, Louis Chauvet wrote: Add some documentation on pixel conversion functions. Update of outdated comments for pixel_write functions. Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_composer.c | 7 drivers/gpu/drm/vkms/vkms_drv.h | 13 drivers/gpu/drm/vkms/vkms_formats.c | 62 ++-- 3 files changed, 73 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index c6d9b4a65809..da0651a94c9b 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -189,6 +189,13 @@ static void blend(struct vkms_writeback_job *wb, size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; + /* +* The planes are composed line-by-line to avoid heavy memory usage. It is a necessary +* complexity to avoid poor blending performance. +* +* The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging +* buffer. +*/ for (size_t y = 0; y < crtc_y_limit; y++) { fill_background(_color, output_buffer); diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index b4b357447292..18086423a3a7 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -25,6 +25,17 @@ #define VKMS_LUT_SIZE 256 +/** + * struct vkms_frame_info - structure to store the state of a frame + * + * @fb: backing drm framebuffer + * @src: source rectangle of this frame in the source framebuffer + * @dst: destination rectangle in the crtc buffer + * @map: see drm_shadow_plane_state@data + * @rotation: rotation applied to the source. + * + * @src and @dst should have the same size modulo the rotation. + */ struct vkms_frame_info { struct drm_framebuffer *fb; struct drm_rect src, dst; @@ -52,6 +63,8 @@ struct vkms_writeback_job { * vkms_plane_state - Driver specific plane state It should be "* struct vkms_plane_state - Driver specific plane state". * @base: base plane state * @frame_info: data required for composing computation + * @pixel_read: function to read a pixel in this plane. The creator of a vkms_plane_state must + * ensure that this pointer is valid */ struct vkms_plane_state { struct drm_shadow_plane_state base; diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 172830a3936a..6e3dc8682ff9 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -9,6 +9,18 @@ #include "vkms_formats.h" +/** + * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane + * + * @frame_info: Buffer metadata + * @x: The x coordinate of the wanted pixel in the buffer + * @y: The y coordinate of the wanted pixel in the buffer + * + * The caller must ensure that the framebuffer associated with this request uses a pixel format + * where block_h == block_w == 1. + * If this requirement is not fulfilled, the resulting offset can point to an other pixel or + * outside of the buffer. + */ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y) { struct drm_framebuffer *fb = frame_info->fb; @@ -17,18 +29,22 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int + (x * fb->format->cpp[0]); } -/* - * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates +/** + * packed_pixels_addr() - Get the pointer to the block containing the pixel at the given + * coordinates * * @frame_info: Buffer metadata - * @x: The x(width) coordinate of the 2D buffer - * @y: The y(Heigth) coordinate of the 2D buffer + * @x: The x(width) coordinate inside the plane + * @y: The y(height) coordinate inside the plane I would add a space after x and y. * * Takes the information stored in the frame_info, a pair of coordinates, and * returns the address of the first color channel. * This function assumes the channels are packed together, i.e. a color channel * comes immediately after another in the memory. And therefore, this function * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21). + * + * The caller must ensure that the framebuffer associated with this request uses a pixel format + * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer. */ static void *packed_pixels_addr(const struct vkms_frame_info *frame_info, int x, int y) @@ -53,6 +69,13 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i return x; } +/* + * The following functions take pixel data from the buffer and convert them to the format Double-spacing. + * ARGB16161616 in out_pixel. + * + * They are used in the `vkms_compose_row` function to
Re: [PATCH v5 02/16] drm/vkms: Use drm_frame directly
On 3/13/24 14:44, Louis Chauvet wrote: From: Arthur Grillo Remove intermidiary variables and access the variables directly from drm_frame. These changes should be noop. Signed-off-by: Arthur Grillo Signed-off-by: Louis Chauvet --- drivers/gpu/drm/vkms/vkms_drv.h | 3 --- drivers/gpu/drm/vkms/vkms_formats.c | 12 +++- drivers/gpu/drm/vkms/vkms_plane.c | 3 --- drivers/gpu/drm/vkms/vkms_writeback.c | 5 - 4 files changed, 7 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 8f5710debb1e..b4b357447292 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -31,9 +31,6 @@ struct vkms_frame_info { struct drm_rect rotated; struct iosys_map map[DRM_FORMAT_MAX_PLANES]; unsigned int rotation; - unsigned int offset; - unsigned int pitch; - unsigned int cpp; }; struct pixel_argb_u16 { diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index 36046b12f296..172830a3936a 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -11,8 +11,10 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y) { - return frame_info->offset + (y * frame_info->pitch) - + (x * frame_info->cpp); + struct drm_framebuffer *fb = frame_info->fb; + + return fb->offsets[0] + (y * fb->pitches[0]) + + (x * fb->format->cpp[0]); Nitpicking: Could this be packed into a single line? Anyway, Reviewed-by: Maíra Canal Best Regards, - Maíra } /* @@ -131,12 +133,12 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state u8 *src_pixels = get_packed_src_addr(frame_info, y); int limit = min_t(size_t, drm_rect_width(_info->dst), stage_buffer->n_pixels); - for (size_t x = 0; x < limit; x++, src_pixels += frame_info->cpp) { + for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) { int x_pos = get_x_position(frame_info, limit, x); if (drm_rotation_90_or_270(frame_info->rotation)) src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1) - + frame_info->cpp * y; + + frame_info->fb->format->cpp[0] * y; plane->pixel_read(src_pixels, _pixels[x_pos]); } @@ -223,7 +225,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb, struct pixel_argb_u16 *in_pixels = src_buffer->pixels; int x_limit = min_t(size_t, drm_rect_width(_info->dst), src_buffer->n_pixels); - for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->cpp) + for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->fb->format->cpp[0]) wb->pixel_write(dst_pixels, _pixels[x]); } diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c index 5a8d295e65f2..21b5adfb44aa 100644 --- a/drivers/gpu/drm/vkms/vkms_plane.c +++ b/drivers/gpu/drm/vkms/vkms_plane.c @@ -125,9 +125,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane, drm_rect_rotate(_info->rotated, drm_rect_width(_info->rotated), drm_rect_height(_info->rotated), frame_info->rotation); - frame_info->offset = fb->offsets[0]; - frame_info->pitch = fb->pitches[0]; - frame_info->cpp = fb->format->cpp[0]; vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt); } diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c index bc724cbd5e3a..c8582df1f739 100644 --- a/drivers/gpu/drm/vkms/vkms_writeback.c +++ b/drivers/gpu/drm/vkms/vkms_writeback.c @@ -149,11 +149,6 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn, crtc_state->active_writeback = active_wb; crtc_state->wb_pending = true; spin_unlock_irq(>composer_lock); - - wb_frame_info->offset = fb->offsets[0]; - wb_frame_info->pitch = fb->pitches[0]; - wb_frame_info->cpp = fb->format->cpp[0]; - drm_writeback_queue_job(wb_conn, connector_state); active_wb->pixel_write = get_pixel_write_function(wb_format); drm_rect_init(_frame_info->src, 0, 0, crtc_width, crtc_height);
Re: [PATCH v5 01/16] drm/vkms: Code formatting
On 3/13/24 14:44, Louis Chauvet wrote: Few no-op changes to remove double spaces and fix wrong alignments. Signed-off-by: Louis Chauvet Reviewed-by: Maíra Canal Best Regards, - Maíra --- drivers/gpu/drm/vkms/vkms_composer.c | 10 +- drivers/gpu/drm/vkms/vkms_crtc.c | 6 ++ drivers/gpu/drm/vkms/vkms_drv.c | 3 +-- drivers/gpu/drm/vkms/vkms_plane.c| 8 4 files changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index e7441b227b3c..c6d9b4a65809 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -96,7 +96,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t) s64 a_fp = drm_int2fixp(a); s64 b_fp = drm_int2fixp(b); - s64 delta = drm_fixp_mul(b_fp - a_fp, t); + s64 delta = drm_fixp_mul(b_fp - a_fp, t); return drm_fixp2int(a_fp + delta); } @@ -302,8 +302,8 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb, void vkms_composer_worker(struct work_struct *work) { struct vkms_crtc_state *crtc_state = container_of(work, - struct vkms_crtc_state, - composer_work); + struct vkms_crtc_state, + composer_work); struct drm_crtc *crtc = crtc_state->base.crtc; struct vkms_writeback_job *active_wb = crtc_state->active_writeback; struct vkms_output *out = drm_crtc_to_vkms_output(crtc); @@ -328,7 +328,7 @@ void vkms_composer_worker(struct work_struct *work) crtc_state->gamma_lut.base = (struct drm_color_lut *)crtc->state->gamma_lut->data; crtc_state->gamma_lut.lut_length = crtc->state->gamma_lut->length / sizeof(struct drm_color_lut); - max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length - 1); + max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length - 1); crtc_state->gamma_lut.channel_value2index_ratio = drm_fixp_div(max_lut_index_fp, u16_max_fp); @@ -367,7 +367,7 @@ void vkms_composer_worker(struct work_struct *work) drm_crtc_add_crc_entry(crtc, true, frame_start++, ); } -static const char * const pipe_crc_sources[] = {"auto"}; +static const char *const pipe_crc_sources[] = { "auto" }; const char *const *vkms_get_crc_sources(struct drm_crtc *crtc, size_t *count) diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index 61e500b8c9da..7586ae2e1dd3 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -191,8 +191,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc, return ret; drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) { - plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, - plane); + plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane); WARN_ON(!plane_state); if (!plane_state->visible) @@ -208,8 +207,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc, i = 0; drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) { - plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, - plane); + plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane); if (!plane_state->visible) continue; diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c index dd0af086e7fa..83e6c9b9ff46 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.c +++ b/drivers/gpu/drm/vkms/vkms_drv.c @@ -81,8 +81,7 @@ static void vkms_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_wait_for_flip_done(dev, old_state); for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) { - struct vkms_crtc_state *vkms_state = - to_vkms_crtc_state(old_crtc_state); + struct vkms_crtc_state *vkms_state = to_vkms_crtc_state(old_crtc_state); flush_work(_state->composer_work); } diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c index e5c625ab8e3e..5a8d295e65f2 100644 --- a/drivers/gpu/drm/vkms/vkms_plane.c +++ b/drivers/gpu/drm/vkms/vkms_plane.c @@ -117,10 +117,10 @@ static void vkms_plane_atomic_update(struct drm_plane *plane, m
Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
Not that the CC list wasn't big enough, but I'm adding MM folks in the CC list. On 3/18/24 11:04, Christian König wrote: Am 18.03.24 um 14:28 schrieb Maíra Canal: Hi Christian, On 3/18/24 10:10, Christian König wrote: Am 18.03.24 um 13:42 schrieb Maíra Canal: Hi Christian, On 3/12/24 10:48, Christian König wrote: Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin: On 12/03/2024 10:37, Christian König wrote: Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin: On 12/03/2024 10:23, Christian König wrote: Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin: On 12/03/2024 08:59, Christian König wrote: Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin: Hi Maira, On 11/03/2024 10:05, Maíra Canal wrote: For some applications, such as using huge pages, we might want to have a different mountpoint, for which we pass in mount flags that better match our usecase. Therefore, add a new parameter to drm_gem_object_init() that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to shmem_file_setup(). One strategy for reducing churn, and so the number of drivers this patch touches, could be to add a lower level drm_gem_object_init() (which takes vfsmount, call it __drm_gem_object_init(), or drm__gem_object_init_mnt(), and make drm_gem_object_init() call that one with a NULL argument. I would even go a step further into the other direction. The shmem backed GEM object is just some special handling as far as I can see. So I would rather suggest to rename all drm_gem_* function which only deal with the shmem backed GEM object into drm_gem_shmem_*. That makes sense although it would be very churny. I at least would be on the fence regarding the cost vs benefit. Yeah, it should clearly not be part of this patch here. Also the explanation why a different mount point helps with something isn't very satisfying. Not satisfying as you think it is not detailed enough to say driver wants to use huge pages for performance? Or not satisying as you question why huge pages would help? That huge pages are beneficial is clear to me, but I'm missing the connection why a different mount point helps with using huge pages. Ah right, same as in i915, one needs to mount a tmpfs instance passing huge=within_size or huge=always option. Default is 'never', see man 5 tmpfs. Thanks for the explanation, I wasn't aware of that. Mhm, shouldn't we always use huge pages? Is there a reason for a DRM device to not use huge pages with the shmem backend? AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back then the understanding was within_size may overallocate, meaning there would be some space wastage, until the memory pressure makes the thp code split the trailing huge page. I haven't checked if that still applies. Other than that I don't know if some drivers/platforms could have problems if they have some limitations or hardcoded assumptions when they iterate the sg list. Yeah, that was the whole point behind my question. As far as I can see this isn't driver specific, but platform specific. I might be wrong here, but I think we should then probably not have that handling in each individual driver, but rather centralized in the DRM code. I don't see a point in enabling THP for all shmem drivers. A huge page is only useful if the driver is going to use it. On V3D, for example, I only need huge pages because I need the memory contiguously allocated to implement Super Pages. Otherwise, if we don't have the Super Pages support implemented in the driver, I would be creating memory pressure without any performance gain. Well that's the point I'm disagreeing with. THP doesn't seem to create much extra memory pressure for this use case. As far as I can see background for the option is that files in tmpfs usually have a varying size, so it usually isn't beneficial to allocate a huge page just to find that the shmem file is much smaller than what's needed. But GEM objects have a fixed size. So we of hand knew if we need 4KiB or 1GiB and can therefore directly allocate huge pages if they are available and object large enough to back them with. If the memory pressure is so high that we don't have huge pages available the shmem code falls back to standard pages anyway. The matter is: how do we define the point where the memory pressure is high? Well as driver developers/maintainers we simply don't do that. This is the job of the shmem code. For example, notice that in this implementation of Super Pages for the V3D driver, I only use a Super Page if the BO is bigger than 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM available for the GPU. If I created huge pages for every BO allocation (and initially, I tried that), I would end up with hangs in some applications. Yeah, that is what I meant with the trivial optimisation to the shmem code. Essentially when you have
Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
On 3/18/24 10:28, Maíra Canal wrote: Hi Christian, On 3/18/24 10:10, Christian König wrote: Am 18.03.24 um 13:42 schrieb Maíra Canal: Hi Christian, On 3/12/24 10:48, Christian König wrote: Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin: On 12/03/2024 10:37, Christian König wrote: Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin: On 12/03/2024 10:23, Christian König wrote: Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin: On 12/03/2024 08:59, Christian König wrote: Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin: Hi Maira, On 11/03/2024 10:05, Maíra Canal wrote: For some applications, such as using huge pages, we might want to have a different mountpoint, for which we pass in mount flags that better match our usecase. Therefore, add a new parameter to drm_gem_object_init() that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to shmem_file_setup(). One strategy for reducing churn, and so the number of drivers this patch touches, could be to add a lower level drm_gem_object_init() (which takes vfsmount, call it __drm_gem_object_init(), or drm__gem_object_init_mnt(), and make drm_gem_object_init() call that one with a NULL argument. I would even go a step further into the other direction. The shmem backed GEM object is just some special handling as far as I can see. So I would rather suggest to rename all drm_gem_* function which only deal with the shmem backed GEM object into drm_gem_shmem_*. That makes sense although it would be very churny. I at least would be on the fence regarding the cost vs benefit. Yeah, it should clearly not be part of this patch here. Also the explanation why a different mount point helps with something isn't very satisfying. Not satisfying as you think it is not detailed enough to say driver wants to use huge pages for performance? Or not satisying as you question why huge pages would help? That huge pages are beneficial is clear to me, but I'm missing the connection why a different mount point helps with using huge pages. Ah right, same as in i915, one needs to mount a tmpfs instance passing huge=within_size or huge=always option. Default is 'never', see man 5 tmpfs. Thanks for the explanation, I wasn't aware of that. Mhm, shouldn't we always use huge pages? Is there a reason for a DRM device to not use huge pages with the shmem backend? AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back then the understanding was within_size may overallocate, meaning there would be some space wastage, until the memory pressure makes the thp code split the trailing huge page. I haven't checked if that still applies. Other than that I don't know if some drivers/platforms could have problems if they have some limitations or hardcoded assumptions when they iterate the sg list. Yeah, that was the whole point behind my question. As far as I can see this isn't driver specific, but platform specific. I might be wrong here, but I think we should then probably not have that handling in each individual driver, but rather centralized in the DRM code. I don't see a point in enabling THP for all shmem drivers. A huge page is only useful if the driver is going to use it. On V3D, for example, I only need huge pages because I need the memory contiguously allocated to implement Super Pages. Otherwise, if we don't have the Super Pages support implemented in the driver, I would be creating memory pressure without any performance gain. Well that's the point I'm disagreeing with. THP doesn't seem to create much extra memory pressure for this use case. As far as I can see background for the option is that files in tmpfs usually have a varying size, so it usually isn't beneficial to allocate a huge page just to find that the shmem file is much smaller than what's needed. But GEM objects have a fixed size. So we of hand knew if we need 4KiB or 1GiB and can therefore directly allocate huge pages if they are available and object large enough to back them with. If the memory pressure is so high that we don't have huge pages available the shmem code falls back to standard pages anyway. The matter is: how do we define the point where the memory pressure is high? For example, notice that in this implementation of Super Pages for the V3D driver, I only use a Super Page if the BO is bigger than 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM available for the GPU. If I created huge pages for every BO allocation (and initially, I tried that), I would end up with hangs in some applications. At least, for V3D, I wouldn't like to see THP being used for all the allocations. But, we have maintainers of other drivers in the CC. Okay, I'm thinking about a compromise. What if we create a gemfs mountpoint in the DRM core and everytime we init a object, we can choose if we will use huge pages or not. Therefore, drm_gem_sh
Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
Hi Christian, On 3/18/24 10:10, Christian König wrote: Am 18.03.24 um 13:42 schrieb Maíra Canal: Hi Christian, On 3/12/24 10:48, Christian König wrote: Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin: On 12/03/2024 10:37, Christian König wrote: Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin: On 12/03/2024 10:23, Christian König wrote: Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin: On 12/03/2024 08:59, Christian König wrote: Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin: Hi Maira, On 11/03/2024 10:05, Maíra Canal wrote: For some applications, such as using huge pages, we might want to have a different mountpoint, for which we pass in mount flags that better match our usecase. Therefore, add a new parameter to drm_gem_object_init() that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to shmem_file_setup(). One strategy for reducing churn, and so the number of drivers this patch touches, could be to add a lower level drm_gem_object_init() (which takes vfsmount, call it __drm_gem_object_init(), or drm__gem_object_init_mnt(), and make drm_gem_object_init() call that one with a NULL argument. I would even go a step further into the other direction. The shmem backed GEM object is just some special handling as far as I can see. So I would rather suggest to rename all drm_gem_* function which only deal with the shmem backed GEM object into drm_gem_shmem_*. That makes sense although it would be very churny. I at least would be on the fence regarding the cost vs benefit. Yeah, it should clearly not be part of this patch here. Also the explanation why a different mount point helps with something isn't very satisfying. Not satisfying as you think it is not detailed enough to say driver wants to use huge pages for performance? Or not satisying as you question why huge pages would help? That huge pages are beneficial is clear to me, but I'm missing the connection why a different mount point helps with using huge pages. Ah right, same as in i915, one needs to mount a tmpfs instance passing huge=within_size or huge=always option. Default is 'never', see man 5 tmpfs. Thanks for the explanation, I wasn't aware of that. Mhm, shouldn't we always use huge pages? Is there a reason for a DRM device to not use huge pages with the shmem backend? AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back then the understanding was within_size may overallocate, meaning there would be some space wastage, until the memory pressure makes the thp code split the trailing huge page. I haven't checked if that still applies. Other than that I don't know if some drivers/platforms could have problems if they have some limitations or hardcoded assumptions when they iterate the sg list. Yeah, that was the whole point behind my question. As far as I can see this isn't driver specific, but platform specific. I might be wrong here, but I think we should then probably not have that handling in each individual driver, but rather centralized in the DRM code. I don't see a point in enabling THP for all shmem drivers. A huge page is only useful if the driver is going to use it. On V3D, for example, I only need huge pages because I need the memory contiguously allocated to implement Super Pages. Otherwise, if we don't have the Super Pages support implemented in the driver, I would be creating memory pressure without any performance gain. Well that's the point I'm disagreeing with. THP doesn't seem to create much extra memory pressure for this use case. As far as I can see background for the option is that files in tmpfs usually have a varying size, so it usually isn't beneficial to allocate a huge page just to find that the shmem file is much smaller than what's needed. But GEM objects have a fixed size. So we of hand knew if we need 4KiB or 1GiB and can therefore directly allocate huge pages if they are available and object large enough to back them with. If the memory pressure is so high that we don't have huge pages available the shmem code falls back to standard pages anyway. The matter is: how do we define the point where the memory pressure is high? For example, notice that in this implementation of Super Pages for the V3D driver, I only use a Super Page if the BO is bigger than 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM available for the GPU. If I created huge pages for every BO allocation (and initially, I tried that), I would end up with hangs in some applications. At least, for V3D, I wouldn't like to see THP being used for all the allocations. But, we have maintainers of other drivers in the CC. Best Regards, - Maíra So THP is almost always beneficial for GEM even if the driver doesn't actually need it. The only potential case I can think of which might not be handled gracefully is the tail pages, e.g. h
Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
Hi Christian, On 3/12/24 10:48, Christian König wrote: Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin: On 12/03/2024 10:37, Christian König wrote: Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin: On 12/03/2024 10:23, Christian König wrote: Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin: On 12/03/2024 08:59, Christian König wrote: Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin: Hi Maira, On 11/03/2024 10:05, Maíra Canal wrote: For some applications, such as using huge pages, we might want to have a different mountpoint, for which we pass in mount flags that better match our usecase. Therefore, add a new parameter to drm_gem_object_init() that allow us to define the tmpfs mountpoint where the GEM object will be created. If this parameter is NULL, then we fallback to shmem_file_setup(). One strategy for reducing churn, and so the number of drivers this patch touches, could be to add a lower level drm_gem_object_init() (which takes vfsmount, call it __drm_gem_object_init(), or drm__gem_object_init_mnt(), and make drm_gem_object_init() call that one with a NULL argument. I would even go a step further into the other direction. The shmem backed GEM object is just some special handling as far as I can see. So I would rather suggest to rename all drm_gem_* function which only deal with the shmem backed GEM object into drm_gem_shmem_*. That makes sense although it would be very churny. I at least would be on the fence regarding the cost vs benefit. Yeah, it should clearly not be part of this patch here. Also the explanation why a different mount point helps with something isn't very satisfying. Not satisfying as you think it is not detailed enough to say driver wants to use huge pages for performance? Or not satisying as you question why huge pages would help? That huge pages are beneficial is clear to me, but I'm missing the connection why a different mount point helps with using huge pages. Ah right, same as in i915, one needs to mount a tmpfs instance passing huge=within_size or huge=always option. Default is 'never', see man 5 tmpfs. Thanks for the explanation, I wasn't aware of that. Mhm, shouldn't we always use huge pages? Is there a reason for a DRM device to not use huge pages with the shmem backend? AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back then the understanding was within_size may overallocate, meaning there would be some space wastage, until the memory pressure makes the thp code split the trailing huge page. I haven't checked if that still applies. Other than that I don't know if some drivers/platforms could have problems if they have some limitations or hardcoded assumptions when they iterate the sg list. Yeah, that was the whole point behind my question. As far as I can see this isn't driver specific, but platform specific. I might be wrong here, but I think we should then probably not have that handling in each individual driver, but rather centralized in the DRM code. I don't see a point in enabling THP for all shmem drivers. A huge page is only useful if the driver is going to use it. On V3D, for example, I only need huge pages because I need the memory contiguously allocated to implement Super Pages. Otherwise, if we don't have the Super Pages support implemented in the driver, I would be creating memory pressure without any performance gain. Best Regards, - Maíra Regards, Christian. Te Cc is plenty large so perhaps someone else will have additional information. :) Regards, Tvrtko I mean it would make this patch here even smaller. Regards, Christian. Regards, Tvrtko
Re: [PATCH v2] drm: Fix drm_fixp2int_round() making it add 0.5
Hi Melissa, On 3/17/24 14:50, Melissa Wen wrote: On 03/16, Arthur Grillo wrote: As well noted by Pekka[1], the rounding of drm_fixp2int_round is wrong. To round a number, you need to add 0.5 to the number and floor that, drm_fixp2int_round() is adding 0.076. Make it add 0.5. [1]: https://lore.kernel.org/all/20240301135327.22efe0dd.pekka.paala...@collabora.com/ Fixes: 8b25320887d7 ("drm: Add fixed-point helper to get rounded integer values") Suggested-by: Pekka Paalanen Reviewed-by: Harry Wentland Signed-off-by: Arthur Grillo Great, thanks! Reviewed-by: Melissa Wen I'll apply to drm-misc-next. Shouldn't this patch be applied in drm-misc-fixes? Best Regards, - Maíra Melissa --- Changes in v2: - Add Fixes tag (Melissa Wen) - Remove DRM_FIXED_POINT_HALF (Melissa Wen) - Link to v1: https://lore.kernel.org/all/20240306-louis-vkms-conv-v1-1-5bfe7d129...@riseup.net/ --- include/drm/drm_fixed.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/drm/drm_fixed.h b/include/drm/drm_fixed.h index 0c9f917a4d4b..81572d32db0c 100644 --- a/include/drm/drm_fixed.h +++ b/include/drm/drm_fixed.h @@ -71,7 +71,6 @@ static inline u32 dfixed_div(fixed20_12 A, fixed20_12 B) } #define DRM_FIXED_POINT 32 -#define DRM_FIXED_POINT_HALF 16 #define DRM_FIXED_ONE (1ULL << DRM_FIXED_POINT) #define DRM_FIXED_DECIMAL_MASK(DRM_FIXED_ONE - 1) #define DRM_FIXED_DIGITS_MASK (~DRM_FIXED_DECIMAL_MASK) @@ -90,7 +89,7 @@ static inline int drm_fixp2int(s64 a) static inline int drm_fixp2int_round(s64 a) { - return drm_fixp2int(a + (1 << (DRM_FIXED_POINT_HALF - 1))); + return drm_fixp2int(a + DRM_FIXED_ONE / 2); } static inline int drm_fixp2int_ceil(s64 a) --- base-commit: f89632a9e5fa6c4787c14458cd42a9ef42025434 change-id: 20240315-drm_fixed-c680ba078ecb Best regards, -- Arthur Grillo
[PATCH 5/5] drm/v3d: Enable super pages
The V3D MMU also supports 1MB pages, called super pages. In order to set a 1MB page in the MMU, we need to make sure that page table entries for all 4KB pages within a super page must be correctly configured. Therefore, if the BO is larger than 2MB, we allocate it in a separate mountpoint that uses THP. This will allow us to create a contiguous memory region to create our super pages. In order to place the page table entries in the MMU, we iterate over the 256 4KB pages and insert the PTE. Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_bo.c| 19 +-- drivers/gpu/drm/v3d/v3d_drv.c | 7 +++ drivers/gpu/drm/v3d/v3d_drv.h | 6 -- drivers/gpu/drm/v3d/v3d_gemfs.c | 6 ++ drivers/gpu/drm/v3d/v3d_mmu.c | 24 ++-- 5 files changed, 56 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index a07ede668cc1..cb8e49a33be7 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) struct v3d_dev *v3d = to_v3d_dev(obj->dev); struct v3d_bo *bo = to_v3d_bo(obj); struct sg_table *sgt; + u64 align; int ret; /* So far we pin the BO in the MMU for its lifetime, so use @@ -103,6 +104,9 @@ v3d_bo_create_finish(struct drm_gem_object *obj) if (IS_ERR(sgt)) return PTR_ERR(sgt); + bo->huge_pages = (obj->size >= SZ_2M && v3d->super_pages); + align = bo->huge_pages ? SZ_1M : SZ_4K; + spin_lock(>mm_lock); /* Allocate the object's space in the GPU's page tables. * Inserting PTEs will happen later, but the offset is for the @@ -110,7 +114,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj) */ ret = drm_mm_insert_node_generic(>mm, >node, obj->size >> V3D_MMU_PAGE_SHIFT, -GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0); +align >> V3D_MMU_PAGE_SHIFT, 0, 0); spin_unlock(>mm_lock); if (ret) return ret; @@ -130,10 +134,21 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv, size_t unaligned_size) { struct drm_gem_shmem_object *shmem_obj; + struct v3d_dev *v3d = to_v3d_dev(dev); struct v3d_bo *bo; + size_t size; int ret; - shmem_obj = drm_gem_shmem_create(dev, unaligned_size); + size = PAGE_ALIGN(unaligned_size); + + /* To avoid memory fragmentation, we only use THP if the BO is bigger +* than two Super Pages (1MB). +*/ + if (size >= SZ_2M && v3d->super_pages) + shmem_obj = drm_gem_shmem_create_with_mnt(dev, size, v3d->gemfs); + else + shmem_obj = drm_gem_shmem_create(dev, size); + if (IS_ERR(shmem_obj)) return ERR_CAST(shmem_obj); bo = to_v3d_bo(_obj->base); diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3debf37e7d9b..96f4d8227407 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -36,6 +36,11 @@ #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 +static bool super_pages = true; +module_param_named(super_pages, super_pages, bool, 0400); +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \ + To enable Super Pages, you need support to THP."); + static int v3d_get_param_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { @@ -308,6 +313,8 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) return -ENOMEM; } + v3d->super_pages = super_pages; + ret = v3d_gem_init(drm); if (ret) goto dma_free; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index d2ce8222771a..795087663739 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -17,9 +17,8 @@ struct clk; struct platform_device; struct reset_control; -#define GMP_GRANULARITY (128 * 1024) - #define V3D_MMU_PAGE_SHIFT 12 +#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) #define V3D_MAX_QUEUES (V3D_CPU + 1) @@ -123,6 +122,7 @@ struct v3d_dev { * tmpfs instance used for shmem backed objects */ struct vfsmount *gemfs; + bool super_pages; struct work_struct overflow_mem_work; @@ -211,6 +211,8 @@ struct v3d_bo { struct list_head unref_head; void *vaddr; + + bool huge_pages; }; static inline struct v3d_bo * diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c index 8518b7da6f73..bcde3138f555 100644 --- a/drivers/gpu/drm/v