Re: [PATCH 0/6] drm/v3d: Improve Performance Counters handling

2024-05-09 Thread Iago Toral
Hi Maíra,

I made a couple of minor comments, with those addressed the series is:

Reviewed-by: Iago Toral Quiroga 

Thanks a lot for this!,
Iago

El mié, 08-05-2024 a las 11:30 -0300, Maíra Canal escribió:
> This series has the intention to address two issues with Performance
> Counters
> on V3D:
> 
> 1. Update the number of Performance Counters for V3D 7.1 
>   
> V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87.
> Although the
> series [1] enabled support for V3D 7.1, it didn’t replace the maximum
> number of
> performance counters. This led to errors in user space as the Vulkan
> driver
> updated the maximum number of performance counters, but the kernel
> didn’t. 
>     
> Currently, the user space can request values for performance counters
> that
> are greater than 87 and the kernel will return an error instead of
> the values.
> That’s why `dEQP-VK.query_pool.performance_query.*` currently fails
> on Mesa
> CI [2]. This series intends to fix the `dEQP-
> VK.query_pool.performance_query.*`
> fail.
>     
> 2. Make the kernel able to provide the Performance Counter
> descriptions
>     
> Although all the management of the Performance Monitors is done
> through IOCTLs,
> which means that the code is in the kernel, the performance counter
> descriptions
> are in Mesa. This means two things: (#1) only Mesa has access to the
> descriptions
> and (#2) we can have inconsistencies between the information provided
> by Mesa
> and the kernel, as seen in the first issue addressed by this series.
>   
> To minimize the risk of inconsistencies, this series proposes to use
> the kernel
> as a “single source of truth”. Therefore, if there are any changes to
> the
> performance monitors, all the changes must be done only in the
> kernel. This
> means that all information about the maximum number of performance
> counters and
> all the descriptions will now be retrieved from the kernel. 
> 
> This series is coupled with a Mesa series [3] that enabled the use of
> the new
> IOCTL. I appreciate any feedback from both the kernel and Mesa
> implementations.
> 
> [1]
> https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/
> [2]
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
> [3]
> https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt
> 
> Best Regards,
> - Maíra Canal
> 
> Maíra Canal (6):
>   drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
>   drm/v3d: Different V3D versions can have different number of
> perfcnt
>   drm/v3d: Create a new V3D parameter for the maximum number of
> perfcnt
>   drm/v3d: Create new IOCTL to expose performance counters
> information
>   drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
>   drm/v3d: Deprecate the use of the Performance Counters enum
> 
>  drivers/gpu/drm/v3d/v3d_drv.c |  11 +
>  drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
>  drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
>  .../gpu/drm/v3d/v3d_performance_counters.h    | 208
> ++
>  drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
>  include/uapi/drm/v3d_drm.h    |  44 
>  6 files changed, 312 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h
> 



Re: [PATCH 6/6] drm/v3d: Deprecate the use of the Performance Counters enum

2024-05-09 Thread Iago Toral
El mié, 08-05-2024 a las 11:30 -0300, Maíra Canal escribió:
> The Performance Counters enum used to identify the index of each
> performance counter and provide the total number of performance
> counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2,
> not for V3D 7.1.
> 
> As we implemented a new flexible structure to retrieve performance
> counters information, we can deprecate this enum.
> 
> Signed-off-by: Maíra Canal 
> ---
>  include/uapi/drm/v3d_drm.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index 0860ddb3d0b6..706b4dea1c45 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -603,6 +603,12 @@ struct drm_v3d_submit_cpu {
>   __u64 extensions;
>  };
>  
> +/* The performance counters index represented by this enum are
> deprecated and
> + * must no longer be used. These counters are only valid for V3D
> 4.2.
> + *
> + * In order to check for performance counter information,
> + * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER.

We should probably also include a reference to the new
DRM_V3D_PARAM_MAX_PERF_COUNTERS param here as well to retrieve the
total number of counters available.

Iago

> + */
>  enum {
>   V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
>   V3D_PERFCNT_FEP_VALID_PRIMS,



Re: [PATCH 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM

2024-05-09 Thread Iago Toral
El mié, 08-05-2024 a las 11:30 -0300, Maíra Canal escribió:
> V3D_PERFCNT_NUM represents the maximum number of performance counters
> for V3D 4.2, but not for V3D 7.1. This means that, if we use
> V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.
> 
> Therefore, use the number of performance counters on V3D 7.1 as the
> maximum number of counters. This will allow us to create arrays on
> the
> stack with reasonable size. Note that userspace must use the value
> provided by DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS.

This should be DRM_V3D_PARAM_MAX_PERF_COUNTERS.


Iago


Re: [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-28 Thread Iago Toral
Hi Maíra, 

a question below:

El dom, 28-04-2024 a las 09:40 -0300, Maíra Canal escribió:
> Although Big/Super Pages could appear naturally, it would be quite
> hard
> to have 1MB or 64KB allocated contiguously naturally. Therefore, we
> can
> force the creation of large pages allocated contiguously by using a
> mountpoint with "huge=within_size" enabled.
> 
> Therefore, as V3D has a mountpoint with "huge=within_size" (if user
> has
> THP enabled), use this mountpoint for BO creation if available. This
> will allow us to create large pages allocated contiguously and make
> use
> of Big/Super Pages.
> 
> Signed-off-by: Maíra Canal 
> Reviewed-by: Tvrtko Ursulin 
> ---
> 

(...)

> @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device
> *dev, struct drm_file *file_priv,
>    size_t unaligned_size)
>  {
>   struct drm_gem_shmem_object *shmem_obj;
> + struct v3d_dev *v3d = to_v3d_dev(dev);
>   struct v3d_bo *bo;
>   int ret;
>  
> - shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
> + /* Let the user opt out of allocating the BOs with THP */
> + if (v3d->gemfs)
> + shmem_obj = drm_gem_shmem_create_with_mnt(dev,
> unaligned_size,
> +   v3d-
> >gemfs);
> + else
> + shmem_obj = drm_gem_shmem_create(dev,
> unaligned_size);
> +
>   if (IS_ERR(shmem_obj))
>   return ERR_CAST(shmem_obj);
>   bo = to_v3d_bo(_obj->base);


if I read this correctly when we have THP we always allocate with that,
Even objects that are smaller than 64KB. I was wondering if there is
any benefit/downside to this or if the behavior for small allocations
is the same we had without the new mount point.

Iago


Re: [PATCH v3 5/8] drm/v3d: Reduce the alignment of the node allocation

2024-04-22 Thread Iago Toral
Thanks Maíra, this patch is:

Reviewed-by: Iago Toral Quiroga 

Iago

El dom, 21-04-2024 a las 18:44 -0300, Maíra Canal escribió:
> Currently, we are using an alignment of 128 kB to insert a node,
> which
> ends up wasting memory as we perform plenty of small BOs allocations
> (<= 4 kB). We require that allocations are aligned to 128Kb so for
> any
> allocation smaller than that, we are wasting the difference.
> 
> This implies that we cannot effectively use the whole 4 GB address
> space
> available for the GPU in the RPi 4. Currently, we can allocate up to
> 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This
> can be
> quite limiting for applications that have a high memory requirement,
> such
> as vkoverhead [1].
> 
> By reducing the page alignment to 4 kB, we can allocate up to 100
> BOs
> of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by
> performing
> benchmarks, we were able to attest that reducing the page alignment
> to
> 4 kB can provide a general performance improvement in OpenGL
> applications (e.g. glmark2).
> 
> Therefore, this patch reduces the alignment of the node allocation to
> 4
> kB, which will allow RPi users to explore the whole 4GB virtual
> address space provided by the hardware. Also, this patch allow users
> to
> fully run vkoverhead in the RPi 4/5, solving the issue reported in
> [1].
> 
> [1] https://github.com/zmike/vkoverhead/issues/14
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
>  drivers/gpu/drm/v3d/v3d_drv.h | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_bo.c
> b/drivers/gpu/drm/v3d/v3d_bo.c
> index a07ede668cc1..79e31c5299b1 100644
> --- a/drivers/gpu/drm/v3d/v3d_bo.c
> +++ b/drivers/gpu/drm/v3d/v3d_bo.c
> @@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>    */
>   ret = drm_mm_insert_node_generic(>mm, >node,
>    obj->size >>
> V3D_MMU_PAGE_SHIFT,
> -  GMP_GRANULARITY >>
> V3D_MMU_PAGE_SHIFT, 0, 0);
> +  SZ_4K >>
> V3D_MMU_PAGE_SHIFT, 0, 0);
>   spin_unlock(>mm_lock);
>   if (ret)
>   return ret;
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index d2ce8222771a..17236ee23490 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -17,8 +17,6 @@ struct clk;
>  struct platform_device;
>  struct reset_control;
>  
> -#define GMP_GRANULARITY (128 * 1024)
> -
>  #define V3D_MMU_PAGE_SHIFT 12
>  
>  #define V3D_MAX_QUEUES (V3D_CPU + 1)



Re: [PATCH 0/5] drm/v3d: Enable Super Pages

2024-03-12 Thread Iago Toral
Hi Maíra,

El lun, 11-03-2024 a las 07:05 -0300, Maíra Canal escribió:
> This series introduces support for super pages in V3D. The V3D MMU
> has support
> for 1MB pages, called super pages, which is currently not used.
> Therefore,
> this patchset has the intention to enable super pages in V3D. The
> advantage of
> enabling super pages size is that if any entry for a page within a
> super page
> is cached in the MMU, it will be used for translation of all virtual
> addresses
> in the range of that super pages without requiring fetching any other
> entries.
> 
> Super pages essentially means a slightly better performance for
> users,
> especially in applications with high memory requirements (e.g.
> applications
> that uses multiple large BOs).
> 
> Using a Raspberry Pi 4 (with a PAGE_SIZE=4KB downstream kernel), when
> running
> traces from multiple applications, we were able to see the following
> improvements:
> 
> fps_avg  helped: 
> warzone2100.70secs.1024x768.trace:   1.81 -> 2.56
> (41.82%)
> fps_avg  helped: 
> warzone2100.30secs.1024x768.trace:   2.00 -> 2.39
> (19.62%)
> fps_avg  helped:  quake2-gl1.4-
> 1280x720.trace: 35.01 -> 36.57 (4.47%)
> fps_avg  helped:  supertuxkart-
> menus_1024x768.trace:   120.75 -> 125.50 (3.93%)
> fps_avg  helped:  quake2-gles3-
> 1280x720.trace: 62.69 -> 64.29 (2.55%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_low_quality_640x480.gfxr:  26.13 ->
> 26.75 (2.39%)
> fps_avg  helped: 
> vkQuake_capture_frames_1_through_1200_1280x720.gfxr: 60.35 ->
> 61.36 (1.67%)
> fps_avg  helped: 
> ue4_sun_temple_640x480.gfxr: 24.60 ->
> 24.94 (1.40%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_high_quality_640x480.gfxr: 23.07 ->
> 23.34 (1.15%)
> fps_avg  helped: 
> serious_sam_trace02_1280x720.gfxr:   47.44 ->
> 47.74 (0.63%)
> fps_avg  helped: 
> ue4_shooter_game_high_quality_640x480.gfxr:  18.91 ->
> 19.02 (0.59%)
> 
> Using a Raspberry Pi 5 (with a PAGE_SIZE=16KB downstream kernel),
> when running
> traces from multiple applications, we were able to see the following
> improvements:
> 
> fps_avg  helped: 
> warzone2100.30secs.1024x768.trace:   3.60 -> 4.49
> (24.72%)
> fps_avg  helped: 
> sponza_demo02_800x600.gfxr:  46.33 ->
> 49.34 (6.49%)
> fps_avg  helped: 
> quake3e_capture_frames_1_through_1800_1920x1080.gfxr:    155.70 ->
> 165.71 (6.43%)
> fps_avg  helped:  gl-117-
> 1024x768.trace:   31.82 -> 33.85
> (6.41%)
> fps_avg  helped:  supertuxkart-
> menus_1024x768.trace:   287.80 -> 303.80 (5.56%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_low_quality_640x480.gfxr:  45.27 ->
> 47.30 (4.49%)
> fps_avg  helped: 
> sponza_demo01_800x600.gfxr:  42.05 ->
> 43.68 (3.89%)
> fps_avg  helped:  supertuxkart-
> racing_1024x768.trace:  19.94 -> 20.59 (3.26%)
> fps_avg  helped: 
> vkQuake_capture_frames_1_through_1200_1280x720.gfxr: 135.19 ->
> 139.45 (3.15%)
> fps_avg  helped:  quake2-gles3-
> 1280x720.trace: 151.71 -> 156.13 (2.92%)
> fps_avg  helped: 
> ue4_shooter_game_high_quality_640x480.gfxr:  30.28 ->
> 31.05 (2.54%)
> fps_avg  helped:  rbdoom-3-
> bfg_640x480.gfxr:   31.52 -> 32.30
> (2.49%)
> fps_avg  helped: 
> quake3e_capture_frames_1800_through_2400_1920x1080.gfxr: 157.29 ->
> 160.35 (1.94%)
> fps_avg  helped:  quake3e-
> 1280x720.trace:  230.48 -> 234.51
> (1.75%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_high_quality_640x480.gfxr: 49.67 ->
> 50.46 (1.60%)
> fps_avg  helped: 
> ue4_sun_temple_640x480.gfxr: 39.70 ->
> 40.23 (1.34%)
> 
> This series also introduces changes in the GEM helpers, in order to
> enable V3D
> to have a separated mountpoint for shmem GEM objects. Any feedback
> from the
> community about the changes in the GEM helpers is welcomed!
> 
> Best Regards,
> - Maíra
> 
> Maíra Canal (5):
>   drm/v3d: Fix return if scheduler initialization fails
>   drm/gem: Add a mountpoint parameter to drm_gem_object_init()
>   drm/v3d: Introduce gemfs
>   drm/gem: Create shmem GEM object in a given mountpoint
>   drm/v3d: Enable super pages
> 

I reviewed the 3 v3d patches in the series, gave R-B for the first two
and made a couple of comments to the last one. For the drm/gem patches
I think you want someone more qualified to review them.

Iago

>  drivers/gpu/drm/armada/armada_gem.c   |  2 +-
>  drivers/gpu/drm/drm_gem.c | 12 -
>  drivers/gpu/drm/drm_gem_dma_helper.c  |  2 +-
>  drivers/gpu/drm/drm_gem_shmem_helper.c    | 30 +--
>  drivers/gpu/drm/drm_gem_vram_helper.c |  2 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c 

Re: [PATCH 3/5] drm/v3d: Introduce gemfs

2024-03-12 Thread Iago Toral
This patch is: Reviewed-by: Iago Toral Quiroga 

Iago

El lun, 11-03-2024 a las 07:06 -0300, Maíra Canal escribió:
> Create a separate "tmpfs" kernel mount for V3D. This will allow us to
> move away from the shmemfs `shm_mnt` and gives the flexibility to do
> things like set our own mount options. Here, the interest is to use
> "huge=", which should allow us to enable the use of THP for our
> shmem-backed objects.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/Makefile    |  3 ++-
>  drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
>  drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
>  drivers/gpu/drm/v3d/v3d_gemfs.c | 46
> +
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c
> 
> diff --git a/drivers/gpu/drm/v3d/Makefile
> b/drivers/gpu/drm/v3d/Makefile
> index b7d673f1153b..fcf710926057 100644
> --- a/drivers/gpu/drm/v3d/Makefile
> +++ b/drivers/gpu/drm/v3d/Makefile
> @@ -13,7 +13,8 @@ v3d-y := \
>   v3d_trace_points.o \
>   v3d_sched.o \
>   v3d_sysfs.o \
> - v3d_submit.o
> + v3d_submit.o \
> + v3d_gemfs.o
> 
>  v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 1950c723dde1..d2ce8222771a 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -119,6 +119,11 @@ struct v3d_dev {
>   struct drm_mm mm;
>   spinlock_t mm_lock;
> 
> + /*
> +  * tmpfs instance used for shmem backed objects
> +  */
> + struct vfsmount *gemfs;
> +
>   struct work_struct overflow_mem_work;
> 
>   struct v3d_bin_job *bin_job;
> @@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
>  void v3d_invalidate_caches(struct v3d_dev *v3d);
>  void v3d_clean_caches(struct v3d_dev *v3d);
> 
> +/* v3d_gemfs.c */
> +void v3d_gemfs_init(struct v3d_dev *v3d);
> +void v3d_gemfs_fini(struct v3d_dev *v3d);
> +
>  /* v3d_submit.c */
>  void v3d_job_cleanup(struct v3d_job *job);
>  void v3d_job_put(struct v3d_job *job);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 66f4b78a6b2e..faefbe497e8d 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
>   v3d_init_hw_state(v3d);
>   v3d_mmu_set_page_table(v3d);
> 
> + v3d_gemfs_init(v3d);
> +
>   ret = v3d_sched_init(v3d);
>   if (ret) {
>   drm_mm_takedown(>mm);
> @@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
>   struct v3d_dev *v3d = to_v3d_dev(dev);
> 
>   v3d_sched_fini(v3d);
> + v3d_gemfs_fini(v3d);
> 
>   /* Waiting for jobs to finish would need to be done before
>    * unregistering V3D.
> diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c
> b/drivers/gpu/drm/v3d/v3d_gemfs.c
> new file mode 100644
> index ..8518b7da6f73
> --- /dev/null
> +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/* Copyright (C) 2024 Raspberry Pi */
> +
> +#include 
> +#include 
> +
> +#include "v3d_drv.h"
> +
> +void v3d_gemfs_init(struct v3d_dev *v3d)
> +{
> + char huge_opt[] = "huge=always";
> + struct file_system_type *type;
> + struct vfsmount *gemfs;
> +
> + /*
> +  * By creating our own shmemfs mountpoint, we can pass in
> +  * mount flags that better match our usecase. However, we
> +  * only do so on platforms which benefit from it.
> +  */
> + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> + goto err;
> +
> + type = get_fs_type("tmpfs");
> + if (!type)
> + goto err;
> +
> + gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
> huge_opt);
> + if (IS_ERR(gemfs))
> + goto err;
> +
> + v3d->gemfs = gemfs;
> + drm_info(>drm, "Using Transparent Hugepages\n");
> +
> + return;
> +
> +err:
> + v3d->gemfs = NULL;
> + drm_notice(>drm,
> +    "Transparent Hugepage support is recommended for
> optimal performance on this platform!\n");
> +}
> +
> +void v3d_gemfs_fini(struct v3d_dev *v3d)
> +{
> + if (v3d->gemfs)
> + kern_unmount(v3d->gemfs);
> +}
> --
> 2.43.0
> 
> 



Re: [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails

2024-03-12 Thread Iago Toral
This patch is: Reviewed-by: Iago Toral Quiroga 

Iago

El lun, 11-03-2024 a las 07:05 -0300, Maíra Canal escribió:
> If the scheduler initialization fails, GEM initialization must fail
> as
> well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
> allocated and return the error value in `v3d_gem_init()`.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index afc565078c78..66f4b78a6b2e 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev)
>   ret = v3d_sched_init(v3d);
>   if (ret) {
>   drm_mm_takedown(>mm);
> - dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void
> *)v3d->pt,
> + dma_free_coherent(v3d->drm.dev, pt_size, (void
> *)v3d->pt,
>     v3d->pt_paddr);
> + return ret;
>   }
> 
>   return 0;
> --
> 2.43.0
> 
> 



Re: [PATCH 5/5] drm/v3d: Enable super pages

2024-03-12 Thread Iago Toral
El lun, 11-03-2024 a las 07:06 -0300, Maíra Canal escribió:
> The V3D MMU also supports 1MB pages, called super pages. In order to
> set a 1MB page in the MMU, we need to make sure that page table
> entries
> for all 4KB pages within a super page must be correctly configured.
> 
> Therefore, if the BO is larger than 2MB, we allocate it in a separate
> mountpoint that uses THP. This will allow us to create a contiguous
> memory region to create our super pages. In order to place the page
> table entries in the MMU, we iterate over the 256 4KB pages and
> insert
> the PTE.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_bo.c    | 19 +--
>  drivers/gpu/drm/v3d/v3d_drv.c   |  7 +++
>  drivers/gpu/drm/v3d/v3d_drv.h   |  6 --
>  drivers/gpu/drm/v3d/v3d_gemfs.c |  6 ++
>  drivers/gpu/drm/v3d/v3d_mmu.c   | 24 ++--
>  5 files changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_bo.c
> b/drivers/gpu/drm/v3d/v3d_bo.c
> index a07ede668cc1..cb8e49a33be7 100644
> --- a/drivers/gpu/drm/v3d/v3d_bo.c
> +++ b/drivers/gpu/drm/v3d/v3d_bo.c
> @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>   struct v3d_dev *v3d = to_v3d_dev(obj->dev);
>   struct v3d_bo *bo = to_v3d_bo(obj);
>   struct sg_table *sgt;
> + u64 align;
>   int ret;
> 
>   /* So far we pin the BO in the MMU for its lifetime, so use
> @@ -103,6 +104,9 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>   if (IS_ERR(sgt))
>   return PTR_ERR(sgt);
> 
> + bo->huge_pages = (obj->size >= SZ_2M && v3d->super_pages);

We have this check for detecting huge pages replicated here and in
v3d_bo_create, but I think we can just do this check once in
v3d_bo_create and assign this field there as well so we don't have to
repeat the check in both functions?

> + align = bo->huge_pages ? SZ_1M : SZ_4K;
> +
>   spin_lock(>mm_lock);
>   /* Allocate the object's space in the GPU's page tables.
>    * Inserting PTEs will happen later, but the offset is for
> the
> @@ -110,7 +114,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>    */
>   ret = drm_mm_insert_node_generic(>mm, >node,
>    obj->size >>
> V3D_MMU_PAGE_SHIFT,
> -  GMP_GRANULARITY >>
> V3D_MMU_PAGE_SHIFT, 0, 0);
> +  align >>
> V3D_MMU_PAGE_SHIFT, 0, 0);

This is making another change to drop the page align size for the
regular case from GMP_GRANULARITY to 4KB. I think this change is
relevant enough that it would probably deserve a separate commit
explaining the rationale for it. What do you think?

>   spin_unlock(>mm_lock);
>   if (ret)
>   return ret;
> @@ -130,10 +134,21 @@ struct v3d_bo *v3d_bo_create(struct drm_device
> *dev, struct drm_file *file_priv,
>    size_t unaligned_size)
>  {
>   struct drm_gem_shmem_object *shmem_obj;
> + struct v3d_dev *v3d = to_v3d_dev(dev);
>   struct v3d_bo *bo;
> + size_t size;
>   int ret;
> 
> - shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
> + size = PAGE_ALIGN(unaligned_size);
> +
> + /* To avoid memory fragmentation, we only use THP if the BO
> is bigger
> +  * than two Super Pages (1MB).
> +  */
> + if (size >= SZ_2M && v3d->super_pages)
> + shmem_obj = drm_gem_shmem_create_with_mnt(dev, size,
> v3d->gemfs);
> + else
> + shmem_obj = drm_gem_shmem_create(dev, size);
> +
>   if (IS_ERR(shmem_obj))
>   return ERR_CAST(shmem_obj);
>   bo = to_v3d_bo(_obj->base);
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 3debf37e7d9b..96f4d8227407 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -36,6 +36,11 @@
>  #define DRIVER_MINOR 0
>  #define DRIVER_PATCHLEVEL 0
> 
> +static bool super_pages = true;
> +module_param_named(super_pages, super_pages, bool, 0400);
> +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.
> Note: \
> +    To enable Super Pages, you need
> support to THP.");

I guess you meant to say '(...) support for THP'?

> +
>  static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
>      struct drm_file *file_priv)
>  {
> @@ -308,6 +313,8 @@ static int v3d_platform_drm_probe(struct
> platform_device *pdev)
>   return -ENOMEM;
>   }
> 
> + v3d->super_pages = super_pages;
> +
>   ret = v3d_gem_init(drm);
>   if (ret)
>   goto dma_free;
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index d2ce8222771a..795087663739 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -17,9 +17,8 @@ struct clk;
>  struct platform_device;
>  struct reset_control;
> 
> -#define GMP_GRANULARITY (128 * 1024)
> 

Re: [PATCH] drm/v3d: Enable V3D to use different PAGE_SIZE

2024-02-19 Thread Iago Toral
El lun, 19-02-2024 a las 10:00 -0300, Maíra Canal escribió:
> Hi Iago,
> 
> On 2/19/24 09:56, Iago Toral wrote:
> > Hi Maíra,
> > 
> > El mié, 14-02-2024 a las 16:34 -0300, Maíra Canal escribió:
> > > Currently, the V3D driver uses PAGE_SHIFT over the assumption
> > > that
> > > PAGE_SHIFT = 12, as the PAGE_SIZE = 4KB. But, the RPi 5 is using
> > > PAGE_SIZE = 16KB, so the MMU PAGE_SHIFT is different than the
> > > system's
> > > PAGE_SHIFT.
> > > 
> > > Enable V3D to be used in system's with any PAGE_SIZE by making
> > > sure
> > > that
> > > everything MMU-related uses the MMU page shift.
> > > 
> > > Signed-off-by: Maíra Canal 
> > > ---
> > >   drivers/gpu/drm/v3d/v3d_bo.c  | 12 ++--
> > >   drivers/gpu/drm/v3d/v3d_debugfs.c |  2 +-
> > >   drivers/gpu/drm/v3d/v3d_drv.h |  2 ++
> > >   drivers/gpu/drm/v3d/v3d_irq.c |  2 +-
> > >   drivers/gpu/drm/v3d/v3d_mmu.c |  2 --
> > >   5 files changed, 10 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_bo.c
> > > b/drivers/gpu/drm/v3d/v3d_bo.c
> > > index 1bdfac8beafd..a07ede668cc1 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_bo.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_bo.c
> > 
> > I think we need the same change in v3d_get_bo_vaddr, no?
> 
> No, that one uses the PAGE_SHIFT of the CPU, because 
>  is in
> the CPU.

Ah, makes sense, thanks!

Reviewed-by: Iago Toral Quiroga 

> 
> Best Regards,
> - Maíra
> 
> > 
> > Iago
> > 
> > > @@ -40,7 +40,7 @@ void v3d_free_object(struct drm_gem_object
> > > *obj)
> > > 
> > >   mutex_lock(>bo_lock);
> > >   v3d->bo_stats.num_allocated--;
> > > - v3d->bo_stats.pages_allocated -= obj->size >>
> > > PAGE_SHIFT;
> > > + v3d->bo_stats.pages_allocated -= obj->size >>
> > > V3D_MMU_PAGE_SHIFT;
> > >   mutex_unlock(>bo_lock);
> > > 
> > >   spin_lock(>mm_lock);
> > > @@ -109,8 +109,8 @@ v3d_bo_create_finish(struct drm_gem_object
> > > *obj)
> > >    * lifetime of the BO.
> > >    */
> > >   ret = drm_mm_insert_node_generic(>mm, >node,
> > > -  obj->size >>
> > > PAGE_SHIFT,
> > > -  GMP_GRANULARITY >>
> > > PAGE_SHIFT, 0, 0);
> > > +  obj->size >>
> > > V3D_MMU_PAGE_SHIFT,
> > > +  GMP_GRANULARITY >>
> > > V3D_MMU_PAGE_SHIFT, 0, 0);
> > >   spin_unlock(>mm_lock);
> > >   if (ret)
> > >   return ret;
> > > @@ -118,7 +118,7 @@ v3d_bo_create_finish(struct drm_gem_object
> > > *obj)
> > >   /* Track stats for /debug/dri/n/bo_stats. */
> > >   mutex_lock(>bo_lock);
> > >   v3d->bo_stats.num_allocated++;
> > > - v3d->bo_stats.pages_allocated += obj->size >>
> > > PAGE_SHIFT;
> > > + v3d->bo_stats.pages_allocated += obj->size >>
> > > V3D_MMU_PAGE_SHIFT;
> > >   mutex_unlock(>bo_lock);
> > > 
> > >   v3d_mmu_insert_ptes(bo);
> > > @@ -201,7 +201,7 @@ int v3d_create_bo_ioctl(struct drm_device
> > > *dev,
> > > void *data,
> > >   if (IS_ERR(bo))
> > >   return PTR_ERR(bo);
> > > 
> > > - args->offset = bo->node.start << PAGE_SHIFT;
> > > + args->offset = bo->node.start << V3D_MMU_PAGE_SHIFT;
> > > 
> > >   ret = drm_gem_handle_create(file_priv, >base.base,
> > > >handle);
> > >   drm_gem_object_put(>base.base);
> > > @@ -246,7 +246,7 @@ int v3d_get_bo_offset_ioctl(struct drm_device
> > > *dev, void *data,
> > >   }
> > >   bo = to_v3d_bo(gem_obj);
> > > 
> > > - args->offset = bo->node.start << PAGE_SHIFT;
> > > + args->offset = bo->node.start << V3D_MMU_PAGE_SHIFT;
> > > 
> > >   drm_gem_object_put(gem_obj);
> > >   return 0;
> > > diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c
> > > b/drivers/gpu/drm/v3d/v3d_debugfs.c
> > > index dc3cf708d02e..19e3ee7ac897 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_debu

Re: [PATCH] drm/v3d: Enable V3D to use different PAGE_SIZE

2024-02-19 Thread Iago Toral
Hi Maíra,

El mié, 14-02-2024 a las 16:34 -0300, Maíra Canal escribió:
> Currently, the V3D driver uses PAGE_SHIFT over the assumption that
> PAGE_SHIFT = 12, as the PAGE_SIZE = 4KB. But, the RPi 5 is using
> PAGE_SIZE = 16KB, so the MMU PAGE_SHIFT is different than the
> system's
> PAGE_SHIFT.
> 
> Enable V3D to be used in system's with any PAGE_SIZE by making sure
> that
> everything MMU-related uses the MMU page shift.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_bo.c  | 12 ++--
>  drivers/gpu/drm/v3d/v3d_debugfs.c |  2 +-
>  drivers/gpu/drm/v3d/v3d_drv.h |  2 ++
>  drivers/gpu/drm/v3d/v3d_irq.c |  2 +-
>  drivers/gpu/drm/v3d/v3d_mmu.c |  2 --
>  5 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_bo.c
> b/drivers/gpu/drm/v3d/v3d_bo.c
> index 1bdfac8beafd..a07ede668cc1 100644
> --- a/drivers/gpu/drm/v3d/v3d_bo.c
> +++ b/drivers/gpu/drm/v3d/v3d_bo.c

I think we need the same change in v3d_get_bo_vaddr, no?

Iago

> @@ -40,7 +40,7 @@ void v3d_free_object(struct drm_gem_object *obj)
> 
>   mutex_lock(>bo_lock);
>   v3d->bo_stats.num_allocated--;
> - v3d->bo_stats.pages_allocated -= obj->size >> PAGE_SHIFT;
> + v3d->bo_stats.pages_allocated -= obj->size >>
> V3D_MMU_PAGE_SHIFT;
>   mutex_unlock(>bo_lock);
> 
>   spin_lock(>mm_lock);
> @@ -109,8 +109,8 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>    * lifetime of the BO.
>    */
>   ret = drm_mm_insert_node_generic(>mm, >node,
> -  obj->size >> PAGE_SHIFT,
> -  GMP_GRANULARITY >>
> PAGE_SHIFT, 0, 0);
> +  obj->size >>
> V3D_MMU_PAGE_SHIFT,
> +  GMP_GRANULARITY >>
> V3D_MMU_PAGE_SHIFT, 0, 0);
>   spin_unlock(>mm_lock);
>   if (ret)
>   return ret;
> @@ -118,7 +118,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>   /* Track stats for /debug/dri/n/bo_stats. */
>   mutex_lock(>bo_lock);
>   v3d->bo_stats.num_allocated++;
> - v3d->bo_stats.pages_allocated += obj->size >> PAGE_SHIFT;
> + v3d->bo_stats.pages_allocated += obj->size >>
> V3D_MMU_PAGE_SHIFT;
>   mutex_unlock(>bo_lock);
> 
>   v3d_mmu_insert_ptes(bo);
> @@ -201,7 +201,7 @@ int v3d_create_bo_ioctl(struct drm_device *dev,
> void *data,
>   if (IS_ERR(bo))
>   return PTR_ERR(bo);
> 
> - args->offset = bo->node.start << PAGE_SHIFT;
> + args->offset = bo->node.start << V3D_MMU_PAGE_SHIFT;
> 
>   ret = drm_gem_handle_create(file_priv, >base.base,
> >handle);
>   drm_gem_object_put(>base.base);
> @@ -246,7 +246,7 @@ int v3d_get_bo_offset_ioctl(struct drm_device
> *dev, void *data,
>   }
>   bo = to_v3d_bo(gem_obj);
> 
> - args->offset = bo->node.start << PAGE_SHIFT;
> + args->offset = bo->node.start << V3D_MMU_PAGE_SHIFT;
> 
>   drm_gem_object_put(gem_obj);
>   return 0;
> diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c
> b/drivers/gpu/drm/v3d/v3d_debugfs.c
> index dc3cf708d02e..19e3ee7ac897 100644
> --- a/drivers/gpu/drm/v3d/v3d_debugfs.c
> +++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
> @@ -219,7 +219,7 @@ static int v3d_debugfs_bo_stats(struct seq_file
> *m, void *unused)
>   seq_printf(m, "allocated bos:  %d\n",
>      v3d->bo_stats.num_allocated);
>   seq_printf(m, "allocated bo size (kb): %ld\n",
> -    (long)v3d->bo_stats.pages_allocated <<
> (PAGE_SHIFT - 10));
> +    (long)v3d->bo_stats.pages_allocated <<
> (V3D_MMU_PAGE_SHIFT - 10));
>   mutex_unlock(>bo_lock);
> 
>   return 0;
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 3c7d58866570..1950c723dde1 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -19,6 +19,8 @@ struct reset_control;
> 
>  #define GMP_GRANULARITY (128 * 1024)
> 
> +#define V3D_MMU_PAGE_SHIFT 12
> +
>  #define V3D_MAX_QUEUES (V3D_CPU + 1)
> 
>  static inline char *v3d_queue_to_string(enum v3d_queue queue)
> diff --git a/drivers/gpu/drm/v3d/v3d_irq.c
> b/drivers/gpu/drm/v3d/v3d_irq.c
> index afc76390a197..2e04f6cb661e 100644
> --- a/drivers/gpu/drm/v3d/v3d_irq.c
> +++ b/drivers/gpu/drm/v3d/v3d_irq.c
> @@ -70,7 +70,7 @@ v3d_overflow_mem_work(struct work_struct *work)
>   list_add_tail(>unref_head, >bin_job->render-
> >unref_list);
>   spin_unlock_irqrestore(>job_lock, irqflags);
> 
> - V3D_CORE_WRITE(0, V3D_PTB_BPOA, bo->node.start <<
> PAGE_SHIFT);
> + V3D_CORE_WRITE(0, V3D_PTB_BPOA, bo->node.start <<
> V3D_MMU_PAGE_SHIFT);
>   V3D_CORE_WRITE(0, V3D_PTB_BPOS, obj->size);
> 
>  out:
> diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c
> b/drivers/gpu/drm/v3d/v3d_mmu.c
> index 5a453532901f..14f3af40d6f6 100644
> --- a/drivers/gpu/drm/v3d/v3d_mmu.c
> +++ b/drivers/gpu/drm/v3d/v3d_mmu.c
> @@ -21,8 +21,6 @@
>  #include "v3d_drv.h"
>  #include "v3d_regs.h"
> 
> 

Re: [PATCH] drm/v3d: Free the job and assign it to NULL if initialization fails

2024-01-10 Thread Iago Toral
Ok, thanks for checking, you can add my R-B on the original patch then.

Iago

El mié, 10-01-2024 a las 07:42 -0300, Maira Canal escribió:
> Hi Iago,
> 
> On 1/10/24 03:48, Iago Toral wrote:
> > I think this is fine, but I was wondering if it would be simpler
> > and
> > easier to just remove the sched cleanup from v3d_job_init and
> > instead
> > always rely on callers to eventually call v3d_job_cleanup for fail
> > paths, where we are already calling v3d_job_cleanup.
> 
> If we just remove `drm_sched_job_cleanup()` from `v3d_job_init()` we
> trigger a use-after-free warning by the job refcount:
> 
> [   34.367222] [ cut here ]
> [   34.367235] refcount_t: underflow; use-after-free.
> [   34.367274] WARNING: CPU: 0 PID: 1922 at lib/refcount.c:28 
> refcount_warn_saturate+0x108/0x148
> [   34.367298] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer 
> snd_seq snd_seq_device algif_hash aes_neon_bs aes_neon_blk 
> algif_skcipher af_alg bnep hid_logitech_hidpp brcmfmac_wcc brcmfmac 
> hci_uart btbcm bluetooth vc4 brcmutil cfg80211 bcm2835_v4l2(C) 
> bcm2835_mmal_vchiq(C) binfmt_misc snd_soc_hdmi_codec videobuf2_v4l2
> cec 
> ecdh_generic drm_display_helper videobuf2_vmalloc ecc
> videobuf2_memops 
> drm_dma_helper videobuf2_common drm_kms_helper dwc2 raspberrypi_hwmon
> videodev snd_soc_core i2c_brcmstb rfkill libaes hid_logitech_dj mc
> v3d 
> snd_bcm2835(C) i2c_bcm2835 pwm_bcm2835 snd_pcm_dmaengine snd_pcm 
> gpu_sched snd_timer drm_shmem_helper snd nvmem_rmem uio_pdrv_genirq
> uio 
> i2c_dev drm dm_mod fuse drm_panel_orientation_quirks backlight
> configfs 
> ip_tables x_tables ipv6
> [   34.367434] CPU: 0 PID: 1922 Comm: v3d_submit_cl Tainted:
> G C 
>  6.7.0-rc3-g632ca3c92f38-dirty #74
> [   34.367441] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
> [   34.367444] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS 
> BTYPE=--)
> [   34.367450] pc : refcount_warn_saturate+0x108/0x148
> [   34.367456] lr : refcount_warn_saturate+0x108/0x148
> [   34.367462] sp : ffc08341bb90
> [   34.367465] x29: ffc08341bb90 x28: ff8102962400 x27: 
> ffee5592de88
> [   34.367473] x26: ff8116503e00 x25: ff81213e8800 x24: 
> 0048
> [   34.367481] x23: ff8101088000 x22: ffc08341bcf0 x21: 
> ffea
> [   34.367489] x20: ff8102962400 x19: ff8102962600 x18: 
> ffee5beb3468
> [   34.367497] x17: 0001 x16:  x15: 
> 0004
> [   34.367504] x14: ffee5c163738 x13: 0fff x12: 
> 0003
> [   34.367512] x11:  x10: 0027 x9 : 
> ada342fc9d5acc00
> [   34.367519] x8 : ada342fc9d5acc00 x7 : 65646e75203a745f x6 : 
> 746e756f63666572
> [   34.367526] x5 : ffee5c3129da x4 : ffee5c2fc59e x3 : 
> 
> [   34.367533] x2 :  x1 : ffc08341b930 x0 : 
> 0026
> [   34.367541] Call trace:
> [   34.367544]  refcount_warn_saturate+0x108/0x148
> [   34.367550]  v3d_submit_cl_ioctl+0x4cc/0x5e8 [v3d]
> [   34.367588]  drm_ioctl_kernel+0xe0/0x120 [drm]
> [   34.367767]  drm_ioctl+0x264/0x408 [drm]
> [   34.367866]  __arm64_sys_ioctl+0x9c/0xe0
> [   34.367877]  invoke_syscall+0x4c/0x118
> [   34.367887]  el0_svc_common+0xb8/0xf0
> [   34.367892]  do_el0_svc+0x28/0x40
> [   34.367898]  el0_svc+0x38/0x88
> [   34.367906]  el0t_64_sync_handler+0x84/0x100
> [   34.367913]  el0t_64_sync+0x190/0x198
> [   34.367917] ---[ end trace  ]---
> 
> Best Regards,
> - Maíra
> 
> > 
> > Iago
> > 
> > El mar, 09-01-2024 a las 11:28 -0300, Maíra Canal escribió:
> > > Currently, if `v3d_job_init()` fails (e.g. in the IGT test "bad-
> > > in-
> > > sync",
> > > where we submit an invalid in-sync to the IOCTL), then we end up
> > > with
> > > the following NULL pointer dereference:
> > > 
> > > [   34.146279] Unable to handle kernel NULL pointer dereference
> > > at
> > > virtual address 0078
> > > [   34.146301] Mem abort info:
> > > [   34.146306]   ESR = 0x9605
> > > [   34.146315]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [   34.146322]   SET = 0, FnV = 0
> > > [   34.146328]   EA = 0, S1PTW = 0
> > > [   34.146334]   FSC = 0x05: level 1 translation fault
> > > [   34.146340] Data abort info:
> > > [   34.146345]   ISV = 0, ISS = 0x0005, ISS2 = 0x
> > > [   34.146351]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > [   34.146357]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >

Re: [PATCH] drm/v3d: Free the job and assign it to NULL if initialization fails

2024-01-09 Thread Iago Toral
I think this is fine, but I was wondering if it would be simpler and
easier to just remove the sched cleanup from v3d_job_init and instead
always rely on callers to eventually call v3d_job_cleanup for fail
paths, where we are already calling v3d_job_cleanup.

Iago

El mar, 09-01-2024 a las 11:28 -0300, Maíra Canal escribió:
> Currently, if `v3d_job_init()` fails (e.g. in the IGT test "bad-in-
> sync",
> where we submit an invalid in-sync to the IOCTL), then we end up with
> the following NULL pointer dereference:
> 
> [   34.146279] Unable to handle kernel NULL pointer dereference at
> virtual address 0078
> [   34.146301] Mem abort info:
> [   34.146306]   ESR = 0x9605
> [   34.146315]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   34.146322]   SET = 0, FnV = 0
> [   34.146328]   EA = 0, S1PTW = 0
> [   34.146334]   FSC = 0x05: level 1 translation fault
> [   34.146340] Data abort info:
> [   34.146345]   ISV = 0, ISS = 0x0005, ISS2 = 0x
> [   34.146351]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [   34.146357]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [   34.146366] user pgtable: 4k pages, 39-bit VAs,
> pgdp=0001232e6000
> [   34.146375] [0078] pgd=,
> p4d=, pud=
> [   34.146399] Internal error: Oops: 9605 [#1] PREEMPT
> SMP
> [   34.146406] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer
> snd_seq snd_seq_device algif_hash aes_neon_bs aes_neon_blk
> algif_skcipher af_alg bnep hid_logitech_hidpp brcmfmac_wcc brcmfmac
> brcmutil hci_uart vc4 btbcm cfg80211 bluetooth bcm2835_v4l2(C)
> snd_soc_hdmi_codec binfmt_misc cec drm_display_helper hid_logitech_dj
> bcm2835_mmal_vchiq(C) drm_dma_helper drm_kms_helper videobuf2_v4l2
> raspberrypi_hwmon ecdh_generic videobuf2_vmalloc videobuf2_memops ecc
> videobuf2_common rfkill videodev libaes snd_soc_core dwc2 i2c_brcmstb
> snd_pcm_dmaengine snd_bcm2835(C) i2c_bcm2835 pwm_bcm2835 snd_pcm mc
> v3d snd_timer snd gpu_sched drm_shmem_helper nvmem_rmem
> uio_pdrv_genirq uio i2c_dev drm fuse dm_mod
> drm_panel_orientation_quirks backlight configfs ip_tables x_tables
> ipv6
> [   34.146556] CPU: 1 PID: 1890 Comm: v3d_submit_csd Tainted:
> G C 6.7.0-rc3-g49ddab089611 #68
> [   34.146563] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
> [   34.146569] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [   34.146575] pc : drm_sched_job_cleanup+0x3c/0x190 [gpu_sched]
> [   34.146611] lr : v3d_submit_csd_ioctl+0x1b4/0x460 [v3d]
> [   34.146653] sp : ffc083cbbb80
> [   34.146658] x29: ffc083cbbb90 x28: ff81035afc00 x27:
> ffe77a641168
> [   34.146668] x26: ff81056a8000 x25: 0058 x24:
> 
> [   34.146677] x23: ff81065e2000 x22: ff81035afe00 x21:
> ffc083cbbcf0
> [   34.146686] x20: ff81035afe00 x19: ffea x18:
> 
> [   34.146694] x17:  x16: ffe7989e34b0 x15:
> 
> [   34.146703] x14: 0404 x13: ff81035afe80 x12:
> ffc083cb8000
> [   34.146711] x11: cc57e05dfbe5ef00 x10: cc57e05dfbe5ef00 x9 :
> ffe77a64131c
> [   34.146719] x8 :  x7 :  x6 :
> 003f
> [   34.146727] x5 : 0040 x4 : ff81fefb03f0 x3 :
> ffc083cbba40
> [   34.146736] x2 : ff81056a8000 x1 : ffe7989e35e8 x0 :
> 
> [   34.146745] Call trace:
> [   34.146748]  drm_sched_job_cleanup+0x3c/0x190 [gpu_sched]
> [   34.146768]  v3d_submit_csd_ioctl+0x1b4/0x460 [v3d]
> [   34.146791]  drm_ioctl_kernel+0xe0/0x120 [drm]
> [   34.147029]  drm_ioctl+0x264/0x408 [drm]
> [   34.147135]  __arm64_sys_ioctl+0x9c/0xe0
> [   34.147152]  invoke_syscall+0x4c/0x118
> [   34.147162]  el0_svc_common+0xb8/0xf0
> [   34.147168]  do_el0_svc+0x28/0x40
> [   34.147174]  el0_svc+0x38/0x88
> [   34.147184]  el0t_64_sync_handler+0x84/0x100
> [   34.147191]  el0t_64_sync+0x190/0x198
> [   34.147201] Code: aa0003f4 f90007e8 f9401008 aa0803e0 (b8478c09)
> [   34.147210] ---[ end trace  ]---
> 
> This happens because we are calling `drm_sched_job_cleanup()` twice:
> once at `v3d_job_init()` and again when we call `v3d_job_cleanup()`.
> 
> To mitigate this issue, we can return to the same approach that we
> used
> to use before 464c61e76de8: deallocate the job after `v3d_job_init()`
> fails and assign it to NULL. Then, when we call `v3d_job_cleanup()`,
> job
> is NULL and the function returns.
> 
> Fixes: 464c61e76de8 ("drm/v3d: Decouple job allocation from job
> initiation")
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_submit.c | 35 +-
> --
>  1 file changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index fcff41dd2315..88f63d526b22 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ 

Re: [PATCH] drm/v3d: Fix support for register debugging on the RPi 4

2024-01-09 Thread Iago Toral
Thanks Maíra!

Reviewed-by: Iago Toral Quiroga 

El mar, 09-01-2024 a las 08:30 -0300, Maíra Canal escribió:
> RPi 4 uses V3D 4.2, which is currently not supported by the register
> definition stated at `v3d_core_reg_defs`. We should be able to
> support
> V3D 4.2, therefore, change the maximum version of the register
> definition to 42, not 41.
> 
> Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D
> 7.x")
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_debugfs.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c
> b/drivers/gpu/drm/v3d/v3d_debugfs.c
> index f843a50d5dce..94eafcecc65b 100644
> --- a/drivers/gpu/drm/v3d/v3d_debugfs.c
> +++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
> @@ -62,9 +62,9 @@ static const struct v3d_reg_def v3d_core_reg_defs[]
> = {
>   REGDEF(33, 71, V3D_PTB_BPCA),
>   REGDEF(33, 71, V3D_PTB_BPCS),
> 
> - REGDEF(33, 41, V3D_GMP_STATUS(33)),
> - REGDEF(33, 41, V3D_GMP_CFG(33)),
> - REGDEF(33, 41, V3D_GMP_VIO_ADDR(33)),
> + REGDEF(33, 42, V3D_GMP_STATUS(33)),
> + REGDEF(33, 42, V3D_GMP_CFG(33)),
> + REGDEF(33, 42, V3D_GMP_VIO_ADDR(33)),
> 
>   REGDEF(33, 71, V3D_ERR_FDBGO),
>   REGDEF(33, 71, V3D_ERR_FDBGB),
> @@ -74,13 +74,13 @@ static const struct v3d_reg_def
> v3d_core_reg_defs[] = {
> 
>  static const struct v3d_reg_def v3d_csd_reg_defs[] = {
>   REGDEF(41, 71, V3D_CSD_STATUS),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG0(41)),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG1(41)),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG2(41)),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG3(41)),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG4(41)),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG5(41)),
> - REGDEF(41, 41, V3D_CSD_CURRENT_CFG6(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG0(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG1(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG2(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG3(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG4(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG5(41)),
> + REGDEF(41, 42, V3D_CSD_CURRENT_CFG6(41)),
>   REGDEF(71, 71, V3D_CSD_CURRENT_CFG0(71)),
>   REGDEF(71, 71, V3D_CSD_CURRENT_CFG1(71)),
>   REGDEF(71, 71, V3D_CSD_CURRENT_CFG2(71)),
> --
> 2.43.0
> 
> 



Re: [PATCH v3 10/17] drm/v3d: Detach the CSD job BO setup

2023-11-28 Thread Iago Toral
El mar, 28-11-2023 a las 07:47 -0300, Maira Canal escribió:
> Hi Iago,
> 
> On 11/28/23 05:42, Iago Toral wrote:
> > El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
> > > From: Melissa Wen 
> > > 
> > > Detach CSD job setup from CSD submission ioctl to reuse it in CPU
> > > submission ioctl for indirect CSD job.
> > > 
> > > Signed-off-by: Melissa Wen 
> > > Co-developed-by: Maíra Canal 
> > > Signed-off-by: Maíra Canal 
> > > ---
> > >   drivers/gpu/drm/v3d/v3d_submit.c | 68 -
> > > -
> > > --
> > >   1 file changed, 42 insertions(+), 26 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> > > b/drivers/gpu/drm/v3d/v3d_submit.c
> > > index c134b113b181..eb26fe1e27e3 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_submit.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> > > @@ -256,6 +256,45 @@
> > > v3d_attach_fences_and_unlock_reservation(struct
> > > drm_file *file_priv,
> > >  }
> > >   }
> > >   
> > > +static int
> > > +v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
> > > +  struct v3d_dev *v3d,
> > > +  struct drm_v3d_submit_csd *args,
> > > +  struct v3d_csd_job **job,
> > > +  struct v3d_job **clean_job,
> > > +  struct v3d_submit_ext *se,
> > > +  struct ww_acquire_ctx *acquire_ctx)
> > > +{
> > > +   int ret;
> > > +
> > > +   ret = v3d_job_allocate((void *)job, sizeof(**job));
> > > +   if (ret)
> > > +   return ret;
> > > +
> > > +   ret = v3d_job_init(v3d, file_priv, &(*job)->base,
> > > +  v3d_job_free, args->in_sync, se,
> > > V3D_CSD);
> > > +   if (ret)
> > 
> > 
> > We should free the job here.
> > 
> > > +   return ret;
> > > +
> > > +   ret = v3d_job_allocate((void *)clean_job,
> > > sizeof(**clean_job));
> > > +   if (ret)
> > > +   return ret;
> > > +
> > > +   ret = v3d_job_init(v3d, file_priv, *clean_job,
> > > +  v3d_job_free, 0, NULL,
> > > V3D_CACHE_CLEAN);
> > > +   if (ret)
> > 
> > We should free job and clean_job here.
> > 
> > > +   return ret;
> > > +
> > > +   (*job)->args = *args;
> > > +
> > > +   ret = v3d_lookup_bos(>drm, file_priv, *clean_job,
> > > +    args->bo_handles, args-
> > > > bo_handle_count);
> > > +   if (ret)
> > 
> > Same here.
> > 
> > I think we probably want to have a fail label where we do this and
> > just
> > jump there from all the paths I mentioned above.
> 
> Actually, we are freeing the job in `v3d_submit_csd_ioctl`. Take a
> look
> here:
> 
>    48 ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
>    47  , _job, ,
>    46  _ctx);
>    45 if (ret)
>    44 goto fail;
> 
> If `v3d_setup_csd_jobs_and_bos` fails, we go to fail.
> 
>    43
>    42 if (args->perfmon_id) {
>    41 job->base.perfmon = v3d_perfmon_find(v3d_priv,
>    40 
> args->perfmon_id);
>    39 if (!job->base.perfmon) {
>    38 ret = -ENOENT;
>    37 goto fail_perfmon;
>    36 }
>    35 }
>    34
>    33 mutex_lock(>sched_lock);
>    32 v3d_push_job(>base);
>    31
>    30 ret = drm_sched_job_add_dependency(_job->base,
>    29 
> dma_fence_get(job->base.done_fence));
>    28 if (ret)
>    27 goto fail_unreserve;
>    26
>    25 v3d_push_job(clean_job);
>    24 mutex_unlock(>sched_lock);
>    23
>    22 v3d_attach_fences_and_unlock_reservation(file_priv,
>    21  clean_job,
>    20  _ctx,
>    19  args-
> >out_sync,
>    18  ,
>    17 
> clean_job->done_fence);
>    16
>    15 

Re: [PATCH v3 00/17] drm/v3d: Introduce CPU jobs

2023-11-28 Thread Iago Toral
zation.*`,
>  * `dEQP-VK.query_pool.*`
>  * and `dEQP-VK.multiview.*`.
> 
> [1]
> https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/v5/cpu-job
> 
> Changelog
> =
> 
> v1 -> v2:
> https://lore.kernel.org/dri-devel/20230904175019.1172713-1-mca...@igalia.com/
> 
> * Rebase on top of drm-misc-next.
> * Add GPU stats to the CPU queue.
> 
> v2 -> v3:
> https://lore.kernel.org/dri-devel/20231124012548.772095-1-mca...@igalia.com/
> 
> * Don't cast struct v3d_*_job to void *, use >base (Iago Toral)
> * Completely decouple job allocation from initialization (Iago Toral
> & Melissa Wen)
> * s/externsion/extension (Iago Toral)
> * Declare all CPU job functions as static const outside of the
> function (Iago Toral)
> * Document how many BOs are expected for each CPU job extension (Iago
> Toral)
> * Check if the number of BOs in the IOCTL matches the expectation
> (Iago Toral)
> 
> Best Regards,
> - Maíra
> 
> Maíra Canal (11):
>   drm/v3d: Don't allow two multisync extensions in the same job
>   drm/v3d: Decouple job allocation from job initiation
>   drm/v3d: Use v3d_get_extensions() to parse CPU job data
>   drm/v3d: Create tracepoints to track the CPU job
>   drm/v3d: Enable BO mapping
>   drm/v3d: Create a CPU job extension for a indirect CSD job
>   drm/v3d: Create a CPU job extension for the timestamp query job
>   drm/v3d: Create a CPU job extension for the reset timestamp job
>   drm/v3d: Create a CPU job extension to copy timestamp query to a
> buffer
>   drm/v3d: Create a CPU job extension for the reset performance query
> job
>   drm/v3d: Create a CPU job extension for the copy performance query
> job
> 
> Melissa Wen (6):
>   drm/v3d: Remove unused function header
>   drm/v3d: Move wait BO ioctl to the v3d_bo file
>   drm/v3d: Detach job submissions IOCTLs to a new specific file
>   drm/v3d: Simplify job refcount handling
>   drm/v3d: Add a CPU job submission
>   drm/v3d: Detach the CSD job BO setup
> 
>  drivers/gpu/drm/v3d/Makefile |    3 +-
>  drivers/gpu/drm/v3d/v3d_bo.c |   51 ++
>  drivers/gpu/drm/v3d/v3d_drv.c    |    4 +
>  drivers/gpu/drm/v3d/v3d_drv.h    |  134 ++-
>  drivers/gpu/drm/v3d/v3d_gem.c    |  768 -
>  drivers/gpu/drm/v3d/v3d_sched.c  |  315 +++
>  drivers/gpu/drm/v3d/v3d_submit.c | 1318
> ++
>  drivers/gpu/drm/v3d/v3d_trace.h  |   57 ++
>  include/uapi/drm/v3d_drm.h   |  240 +-
>  9 files changed, 2110 insertions(+), 780 deletions(-)
>  create mode 100644 drivers/gpu/drm/v3d/v3d_submit.c
> 
> --
> 2.42.0

I shared a few minor nits but otherwise I think this looks good. With
those nits fixed the series is:

Reviewed-by: Iago Toral Quiroga 

Thanks!
Iago


Re: [PATCH v3 17/17] drm/v3d: Create a CPU job extension for the copy performance query job

2023-11-28 Thread Iago Toral
El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
> A CPU job is a type of job that performs operations that requires CPU
> intervention. A copy performance query job is a job that copy the
> complete
> or partial result of a query to a buffer. In order to copy the result
> of
> a performance query to a buffer, we need to get the values from the
> performance monitors.
> 
> So, create a user extension for the CPU job that enables the creation
> of a copy performance query job. This user extension will allow the
> creation
> of a CPU job that copy the results of a performance query to a BO
> with the
> possibility to indicate the availability with a availability bit.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.h    |  1 +
>  drivers/gpu/drm/v3d/v3d_sched.c  | 66 +
>  drivers/gpu/drm/v3d/v3d_submit.c | 82
> 
>  include/uapi/drm/v3d_drm.h   | 50 +++
>  4 files changed, 199 insertions(+)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 0f7f80ad8d88..3c7d58866570 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -322,6 +322,7 @@ enum v3d_cpu_job_type {
> V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY,
> V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY,
> V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY,
> +   V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY,
>  };
>  
>  struct v3d_timestamp_query {
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
> b/drivers/gpu/drm/v3d/v3d_sched.c
> index 452c4a1db52e..203c32ed99d4 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -450,12 +450,78 @@ v3d_reset_performance_queries(struct
> v3d_cpu_job *job)
> }
>  }
>  
> +static void
> +v3d_write_performance_query_result(struct v3d_cpu_job *job, void
> *data, u32 query)
> +{
> +   struct v3d_performance_query_info *performance_query = 
> >performance_query;
> +   struct v3d_copy_query_results_info *copy = >copy;
> +   struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
> +   struct v3d_dev *v3d = job->base.v3d;
> +   struct v3d_perfmon *perfmon;
> +   u64 counter_values[V3D_PERFCNT_NUM];
> +
> +   for (int i = 0; i < performance_query->nperfmons; i++) {
> +   perfmon = v3d_perfmon_find(v3d_priv,
> +  performance_query-
> >queries[query].kperfmon_ids[i]);
> +   if (!perfmon) {
> +   DRM_DEBUG("Failed to find perfmon.");
> +   continue;
> +   }
> +
> +   v3d_perfmon_stop(v3d, perfmon, true);
> +
> +   memcpy(_values[i *
> DRM_V3D_MAX_PERF_COUNTERS], perfmon->values,
> +  perfmon->ncounters * sizeof(u64));
> +
> +   v3d_perfmon_put(perfmon);
> +   }
> +
> +   for (int i = 0; i < performance_query->ncounters; i++)
> +   write_to_buffer(data, i, copy->do_64bit,
> counter_values[i]);
> +}
> +
> +
> +static void
> +v3d_copy_performance_query(struct v3d_cpu_job *job)
> +{
> +   struct v3d_performance_query_info *performance_query = 
> >performance_query;
> +   struct v3d_copy_query_results_info *copy = >copy;
> +   struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
> +   struct dma_fence *fence;
> +   bool available, write_result;
> +   u8 *data;
> +
> +   v3d_get_bo_vaddr(bo);
> +
> +   data = ((u8 *) bo->vaddr) + copy->offset;
> +
> +   for (int i = 0; i < performance_query->count; i++) {
> +   fence = drm_syncobj_fence_get(performance_query-
> >queries[i].syncobj);
> +   available = fence ? dma_fence_is_signaled(fence) :
> false;
> +
> +   write_result = available || copy->do_partial;
> +   if (write_result)
> +   v3d_write_performance_query_result(job, data,
> i);
> +
> +   if (copy->availability_bit)
> +   write_to_buffer(data, performance_query-
> >ncounters,
> +   copy->do_64bit, available ?
> 1u : 0u);
> +
> +   data += copy->stride;
> +
> +   dma_fence_put(fence);
> +   }
> +
> +   v3d_put_bo_vaddr(bo);
> +}
> +
>  static const v3d_cpu_job_fn cpu_job_function[] = {
> [V3D_CPU_JOB_TYPE_INDIRECT_CSD] =
> v3d_rewrite_csd_job_wg_counts_from_indirect,
> [V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] = v3d_timestamp_query,
> [V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY] =
> v3d_reset_timestamp_queries,
> [V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY] =
> v3d_copy_query_results,
> [V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY] =
> v3d_reset_performance_queries,
> +   [V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY] =
> v3d_copy_performance_query,
>  };
>  
>  static struct dma_fence *
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> 

Re: [PATCH v3 16/17] drm/v3d: Create a CPU job extension for the reset performance query job

2023-11-28 Thread Iago Toral
El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
(...)
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index a3ae1f220291..76a02d2c01e6 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -76,6 +76,7 @@ struct drm_v3d_extension {
>  #define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY 0x03
>  #define DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY   0x04
>  #define DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY0x05
> +#define DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY 0x06
> __u32 flags; /* mbz */
>  };
>  
> @@ -492,6 +493,32 @@ struct drm_v3d_copy_timestamp_query {
> __u64 syncs;
>  };
>  
> +/**
> + * struct drm_v3d_reset_performance_query - ioctl extension for the
> CPU job to
> + * reset performance queries
> + *
> + * When an extension DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY is
> defined, it
> + * points to this extension to define a reset performance
> submission. This CPU
> + * job will reset the performance queries by resetting the values of
> the
> + * performance monitors. Moreover, it will reset the syncobj to
> reset query
> + * availability.
> + */
> +struct drm_v3d_reset_performance_query {
> +   struct drm_v3d_extension base;
> +
> +   /* Array of performance queries's syncobjs to indicate its
> availability */
> +   __u64 syncs;
> +
> +   /* Number of queries */
> +   __u32 count;
> +
> +   /* Number of performance monitors */
> +   __u32 nperfmons;
> +
> +   /* Array of u64 user-pointers that point to an array of
> kperfmon_ids */
> +   __u64 kperfmon_ids;
> +};
> +
>  struct drm_v3d_submit_cpu {
> /* Pointer to a u32 array of the BOs that are referenced by
> the job.
>  *
> @@ -507,6 +534,9 @@ struct drm_v3d_submit_cpu {
>  * For DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY, it must
> contain two
>  * BOs. The first is the BO for which the timestamp queries
> results
>  * will be written to. The second is the BO that contains the
> timestamp.
> +    *
> +    * For DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY, it must
> contain no
> +    * BOs.
>  */
(...) The first is the BO where the timestamps queries will be written.
(...)

Iago

> __u64 bo_handles;
>  



Re: [PATCH v3 15/17] drm/v3d: Create a CPU job extension to copy timestamp query to a buffer

2023-11-28 Thread Iago Toral
El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
(...)
>  /**
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index 930f8d07f088..a3ae1f220291 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -75,6 +75,7 @@ struct drm_v3d_extension {
>  #define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD0x02
>  #define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY 0x03
>  #define DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY   0x04
> +#define DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY0x05
> __u32 flags; /* mbz */
>  };
>  
> @@ -451,6 +452,46 @@ struct drm_v3d_reset_timestamp_query {
> __u32 count;
>  };
>  
> +/**
> + * struct drm_v3d_copy_timestamp_query - ioctl extension for the CPU
> job to copy
> + * query results to a buffer
> + *
> + * When an extension DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY is
> defined, it
> + * points to this extension to define a copy timestamp query
> submission. This
> + * CPU job will copy the timestamp queries results to a BO with the
> offset
> + * and stride defined in the extension.
> + */
> +struct drm_v3d_copy_timestamp_query {
> +   struct drm_v3d_extension base;
> +
> +   /* Define if should write to buffer using 64 or 32 bits */
> +   __u8 do_64bit;
> +
> +   /* Define if it can write to buffer even if the query is not
> available */
> +   __u8 do_partial;
> +
> +   /* Define if it should write availability bit to buffer */
> +   __u8 availability_bit;
> +
> +   /* mbz */
> +   __u8 pad;
> +
> +   /* Offset of the buffer in the BO */
> +   __u32 offset;
> +
> +   /* Stride of the buffer in the BO */
> +   __u32 stride;
> +
> +   /* Number of queries */
> +   __u32 count;
> +
> +   /* Array of queries' offsets within the timestamp BO for
> their value */
> +   __u64 offsets;
> +
> +   /* Array of timestamp's syncobjs to indicate its availability
> */
> +   __u64 syncs;
> +};
> +
>  struct drm_v3d_submit_cpu {
> /* Pointer to a u32 array of the BOs that are referenced by
> the job.
>  *
> @@ -462,6 +503,10 @@ struct drm_v3d_submit_cpu {
>  *
>  * For DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY, it must
> contain only
>  * one BO, that contains the timestamp.
> +    *
> +    * For DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY, it must
> contain two
> +    * BOs. The first is the BO for which the timestamp queries
> results
> +    * will be written to. The second is the BO that contains the
> timestamp.
>  */

(...) The first is the BO where the timestamp queries will be written
to. (...)

Iago

> __u64 bo_handles;
>  



Re: [PATCH v3 12/17] drm/v3d: Create a CPU job extension for a indirect CSD job

2023-11-28 Thread Iago Toral
El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
(...)
> @@ -860,19 +913,66 @@ v3d_submit_cpu_ioctl(struct drm_device *dev,
> void *data,
>  
> mutex_lock(>sched_lock);
> v3d_push_job(_job->base);
> +
> +   switch (cpu_job->job_type) {
> +   case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
> +   ret = drm_sched_job_add_dependency(_job-
> >base.base,
> + 
> dma_fence_get(cpu_job->base.done_fence));
> +   if (ret)
> +   goto fail_unreserve;
> +
> +   v3d_push_job(_job->base);
> +
> +   ret = drm_sched_job_add_dependency(_job->base,
> + 
> dma_fence_get(csd_job->base.done_fence));
> +   if (ret)
> +   goto fail_unreserve;
> +
> +   v3d_push_job(clean_job);
> +
> +   break;
> +   default:
> +   break;
> +   }
> mutex_unlock(>sched_lock);
>  
> +   out_se = (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD)
> ? NULL : 
> +
> v3d_attach_fences_and_unlock_reservation(file_priv,
>  _job->base,
>  _ctx, 0,
> -    NULL, cpu_job-
> >base.done_fence);
> +    out_se, cpu_job-
> >base.done_fence);
> +
> +   switch (cpu_job->job_type) {
> +   case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
> +   v3d_attach_fences_and_unlock_reservation(file_priv,
> +    clean_job,
> +    _job-
> >indirect_csd.acquire_ctx,
> +    0, ,
> clean_job->done_fence);
> +   break;
> +   default:
> +   break;
> +   }
>  
> v3d_job_put(_job->base);
> +   v3d_job_put(_job->base);
> +   v3d_job_put(clean_job);
>  
> return 0;
>  
> +fail_unreserve:
> +   mutex_unlock(>sched_lock);
> +
> +   drm_gem_unlock_reservations(cpu_job->base.bo, cpu_job-
> >base.bo_count,
> +   _ctx);
> +
> +   drm_gem_unlock_reservations(clean_job->bo, clean_job-
> >bo_count,
> +   _job-
> >indirect_csd.acquire_ctx);
> +
>  fail:
> v3d_job_cleanup((void *)cpu_job);
> +   v3d_job_cleanup((void *)csd_job);
> +   v3d_job_cleanup(clean_job);

This should be:

v3d_job_put(_job->base);
v3d_job_put(_job->base);
v3d_job_cleanup(clean_job);

Iago



Re: [PATCH v3 10/17] drm/v3d: Detach the CSD job BO setup

2023-11-28 Thread Iago Toral
El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
> From: Melissa Wen 
> 
> Detach CSD job setup from CSD submission ioctl to reuse it in CPU
> submission ioctl for indirect CSD job.
> 
> Signed-off-by: Melissa Wen 
> Co-developed-by: Maíra Canal 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_submit.c | 68 --
> --
>  1 file changed, 42 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index c134b113b181..eb26fe1e27e3 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -256,6 +256,45 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
> }
>  }
>  
> +static int
> +v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
> +  struct v3d_dev *v3d,
> +  struct drm_v3d_submit_csd *args,
> +  struct v3d_csd_job **job,
> +  struct v3d_job **clean_job,
> +  struct v3d_submit_ext *se,
> +  struct ww_acquire_ctx *acquire_ctx)
> +{
> +   int ret;
> +
> +   ret = v3d_job_allocate((void *)job, sizeof(**job));
> +   if (ret)
> +   return ret;
> +
> +   ret = v3d_job_init(v3d, file_priv, &(*job)->base,
> +  v3d_job_free, args->in_sync, se, V3D_CSD);
> +   if (ret)


We should free the job here.

> +   return ret;
> +
> +   ret = v3d_job_allocate((void *)clean_job,
> sizeof(**clean_job));
> +   if (ret)
> +   return ret;
> +
> +   ret = v3d_job_init(v3d, file_priv, *clean_job,
> +  v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
> +   if (ret)

We should free job and clean_job here.

> +   return ret;
> +
> +   (*job)->args = *args;
> +
> +   ret = v3d_lookup_bos(>drm, file_priv, *clean_job,
> +    args->bo_handles, args-
> >bo_handle_count);
> +   if (ret)

Same here.

I think we probably want to have a fail label where we do this and just
jump there from all the paths I mentioned above.

> +   return ret;
> +
> +   return v3d_lock_bo_reservations(*clean_job, acquire_ctx);
> +}
> +
>  static void
>  v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
>  {
> @@ -700,32 +739,9 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
> }
> }
>  
> -   ret = v3d_job_allocate((void *), sizeof(*job));
> -   if (ret)
> -   return ret;
> -
> -   ret = v3d_job_init(v3d, file_priv, >base,
> -  v3d_job_free, args->in_sync, ,
> V3D_CSD);
> -   if (ret)
> -   goto fail;
> -
> -   ret = v3d_job_allocate((void *)_job,
> sizeof(*clean_job));
> -   if (ret)
> -   goto fail;
> -
> -   ret = v3d_job_init(v3d, file_priv, clean_job,
> -  v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
> -   if (ret)
> -   goto fail;
> -
> -   job->args = *args;
> -
> -   ret = v3d_lookup_bos(dev, file_priv, clean_job,
> -    args->bo_handles, args-
> >bo_handle_count);
> -   if (ret)
> -   goto fail;
> -
> -   ret = v3d_lock_bo_reservations(clean_job, _ctx);
> +   ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
> +    , _job, ,
> +    _ctx);
> if (ret)
> goto fail;
>  



Re: [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job

2023-11-27 Thread Iago Toral
El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:

(...)
> +static void
> +v3d_timestamp_query(struct v3d_cpu_job *job)
> +{
> +   struct v3d_timestamp_query_info *timestamp_query = 
> >timestamp_query;
> +   struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);

I presume there is an implicit expectation here that a timestamp query
job only has one BO where we are writing query results, right? Maybe we
should document this more explicitly in the uAPI and also check that we
have at least that one BO before trying to map it and write to it?

Also, is there a reason why we take the job reference from job-
>base.bo[0] instead of job->bo[0]?

Iago

> +   u8 *value_addr;
> +
> +   v3d_get_bo_vaddr(bo);
> +
> +   for (int i = 0; i < timestamp_query->count; i++) {
> +   value_addr = ((u8 *) bo->vaddr) + timestamp_query-
> >queries[i].offset;
> +   *((u64 *) value_addr) = i == 0 ? ktime_get_ns() :
> 0ull;
> +
> +   drm_syncobj_replace_fence(timestamp_query-
> >queries[i].syncobj,
> + job->base.done_fence);
> +   }
> +
> +   v3d_put_bo_vaddr(bo);
> +}
> +
>  static struct dma_fence *
>  v3d_cpu_job_run(struct drm_sched_job *sched_job)
>  {
> @@ -315,6 +352,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
>  
> void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = {
> [V3D_CPU_JOB_TYPE_INDIRECT_CSD] =
> v3d_rewrite_csd_job_wg_counts_from_indirect,
> +   [V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] =
> v3d_timestamp_query,
> };
>  
> v3d->cpu_job = job;
> @@ -504,7 +542,7 @@ static const struct drm_sched_backend_ops
> v3d_cache_clean_sched_ops = {
>  static const struct drm_sched_backend_ops v3d_cpu_sched_ops = {
> .run_job = v3d_cpu_job_run,
> .timedout_job = v3d_generic_job_timedout,
> -   .free_job = v3d_sched_job_free
> +   .free_job = v3d_cpu_job_free
>  };
>  
>  int
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index b6707ef42706..2f03c8bca593 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -438,6 +438,64 @@ v3d_get_cpu_indirect_csd_params(struct drm_file
> *file_priv,
>   NULL, >acquire_ctx);
>  }
>  
> +/* Get data for the query timestamp job submission. */
> +static int
> +v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv,
> +  struct drm_v3d_extension __user
> *ext,
> +  struct v3d_cpu_job *job)
> +{
> +   u32 __user *offsets, *syncs;
> +   struct drm_v3d_timestamp_query timestamp;
> +
> +   if (!job) {
> +   DRM_DEBUG("CPU job extension was attached to a GPU
> job.\n");
> +   return -EINVAL;
> +   }
> +
> +   if (job->job_type) {
> +   DRM_DEBUG("Two CPU job extensions were added to the
> same CPU job.\n");
> +   return -EINVAL;
> +   }
> +
> +   if (copy_from_user(, ext, sizeof(timestamp)))
> +   return -EFAULT;
> +
> +   if (timestamp.pad)
> +   return -EINVAL;
> +
> +   job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
> +
> +   job->timestamp_query.queries =
> kvmalloc_array(timestamp.count,
> + sizeof(struct
> v3d_timestamp_query),
> + GFP_KERNEL);
> +   if (!job->timestamp_query.queries)
> +   return -ENOMEM;
> +
> +   offsets = u64_to_user_ptr(timestamp.offsets);
> +   syncs = u64_to_user_ptr(timestamp.syncs);
> +
> +   for (int i = 0; i < timestamp.count; i++) {
> +   u32 offset, sync;
> +
> +   if (copy_from_user(, offsets++,
> sizeof(offset))) {
> +   kvfree(job->timestamp_query.queries);
> +   return -EFAULT;
> +   }
> +
> +   job->timestamp_query.queries[i].offset = offset;
> +
> +   if (copy_from_user(, syncs++, sizeof(sync))) {
> +   kvfree(job->timestamp_query.queries);
> +   return -EFAULT;
> +   }
> +
> +   job->timestamp_query.queries[i].syncobj =
> drm_syncobj_find(file_priv, sync);
> +   }
> +   job->timestamp_query.count = timestamp.count;
> +
> +   return 0;
> +}
> +
>  /* Whenever userspace sets ioctl extensions, v3d_get_extensions
> parses data
>   * according to the extension id (name).
>   */
> @@ -466,6 +524,9 @@ v3d_get_extensions(struct drm_file *file_priv,
> case DRM_V3D_EXT_ID_CPU_INDIRECT_CSD:
> ret =
> v3d_get_cpu_indirect_csd_params(file_priv, user_ext, job);
> break;
> +   case DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY:
> +   ret =
> v3d_get_cpu_timestamp_query_params(file_priv, user_ext, job);
> + 

Re: [PATCH v2 07/17] drm/v3d: Add a CPU job submission

2023-11-27 Thread Iago Toral
El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> From: Melissa Wen 
> 
> Create a new type of job, a CPU job. A CPU job is a type of job that
> performs operations that requires CPU intervention. The overall idea
> is
> to use user extensions to enable different types of CPU job, allowing
> the
> CPU job to perform different operations according to the type of user
> externsion. The user extension ID identify the type of CPU job that
> must
> be dealt.
> 
> Having a CPU job is interesting for synchronization purposes as a CPU
> job has a queue like any other V3D job and can be synchoronized by
> the
> multisync extension.
> 
> Signed-off-by: Melissa Wen 
> Co-developed-by: Maíra Canal 
> Signed-off-by: Maíra Canal 
> ---
> 
(...)

> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
> b/drivers/gpu/drm/v3d/v3d_sched.c
> index fccbea2a5f2e..a32c91b94898 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -55,6 +55,12 @@ to_csd_job(struct drm_sched_job *sched_job)
> return container_of(sched_job, struct v3d_csd_job,
> base.base);
>  }
>  
> +static struct v3d_cpu_job *
> +to_cpu_job(struct drm_sched_job *sched_job)
> +{
> +   return container_of(sched_job, struct v3d_cpu_job,
> base.base);
> +}
> +
>  static void
>  v3d_sched_job_free(struct drm_sched_job *sched_job)
>  {
> @@ -262,6 +268,42 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
> return fence;
>  }
>  
> +static struct dma_fence *
> +v3d_cpu_job_run(struct drm_sched_job *sched_job)
> +{
> +   struct v3d_cpu_job *job = to_cpu_job(sched_job);
> +   struct v3d_dev *v3d = job->base.v3d;
> +   struct v3d_file_priv *file = job->base.file->driver_priv;
> +   u64 runtime;
> +
> +   void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = { };

Shouldn't this be a static const? Also, maybe we want declare it
outside the function itself?

Iago

> +
> +   v3d->cpu_job = job;
> +
> +   if (job->job_type >= ARRAY_SIZE(v3d_cpu_job_fn)) {
> +   DRM_DEBUG_DRIVER("Unknown CPU job: %d\n", job-
> >job_type);
> +   return NULL;
> +   }
> +
> +   file->start_ns[V3D_CPU] = local_clock();
> +   v3d->queue[V3D_CPU].start_ns = file->start_ns[V3D_CPU];
> +
> +   v3d_cpu_job_fn[job->job_type](job);
> +
> +   runtime = local_clock() - file->start_ns[V3D_CPU];
> +
> +   file->enabled_ns[V3D_CPU] += runtime;
> +   v3d->queue[V3D_CPU].enabled_ns += runtime;
> +
> +   file->jobs_sent[V3D_CPU]++;
> +   v3d->queue[V3D_CPU].jobs_sent++;
> +
> +   file->start_ns[V3D_CPU] = 0;
> +   v3d->queue[V3D_CPU].start_ns = 0;
> +
> +   return NULL;
> +}


Re: [PATCH v2 07/17] drm/v3d: Add a CPU job submission

2023-11-27 Thread Iago Toral
El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> From: Melissa Wen 
> 
> Create a new type of job, a CPU job. A CPU job is a type of job that
> performs operations that requires CPU intervention. The overall idea
> is
> to use user extensions to enable different types of CPU job, allowing
> the
> CPU job to perform different operations according to the type of user
> externsion. The user extension ID identify the type of CPU job that

s/externsion/extension

Iago

> must
> be dealt.
> 
> Having a CPU job is interesting for synchronization purposes as a CPU
> job has a queue like any other V3D job and can be synchoronized by
> the
> multisync extension.
> 
> Signed-off-by: Melissa Wen 
> Co-developed-by: Maíra Canal 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c    |  4 ++
>  drivers/gpu/drm/v3d/v3d_drv.h    | 14 +-
>  drivers/gpu/drm/v3d/v3d_sched.c  | 57 +++
>  drivers/gpu/drm/v3d/v3d_submit.c | 79
> 
>  include/uapi/drm/v3d_drm.h   | 17 +++
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 44a1ca57d6a4..3debf37e7d9b 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -91,6 +91,9 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
> case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
> args->value = 1;
> return 0;
> +   case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
> +   args->value = 1;
> +   return 0;
> default:
> DRM_DEBUG("Unknown parameter %d\n", args->param);
> return -EINVAL;
> @@ -189,6 +192,7 @@ static const struct drm_ioctl_desc
> v3d_drm_ioctls[] = {
> DRM_IOCTL_DEF_DRV(V3D_PERFMON_CREATE,
> v3d_perfmon_create_ioctl, DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY,
> v3d_perfmon_destroy_ioctl, DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES,
> v3d_perfmon_get_values_ioctl, DRM_RENDER_ALLOW),
> +   DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl,
> DRM_RENDER_ALLOW | DRM_AUTH),
>  };
>  
>  static const struct drm_driver v3d_drm_driver = {
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 4db9ace66024..010b146140ad 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -19,7 +19,7 @@ struct reset_control;
>  
>  #define GMP_GRANULARITY (128 * 1024)
>  
> -#define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
> +#define V3D_MAX_QUEUES (V3D_CPU + 1)
>  
>  static inline char *v3d_queue_to_string(enum v3d_queue queue)
>  {
> @@ -29,6 +29,7 @@ static inline char *v3d_queue_to_string(enum
> v3d_queue queue)
> case V3D_TFU: return "tfu";
> case V3D_CSD: return "csd";
> case V3D_CACHE_CLEAN: return "cache_clean";
> +   case V3D_CPU: return "cpu";
> }
> return "UNKNOWN";
>  }
> @@ -122,6 +123,7 @@ struct v3d_dev {
> struct v3d_render_job *render_job;
> struct v3d_tfu_job *tfu_job;
> struct v3d_csd_job *csd_job;
> +   struct v3d_cpu_job *cpu_job;
>  
> struct v3d_queue_state queue[V3D_MAX_QUEUES];
>  
> @@ -312,6 +314,14 @@ struct v3d_csd_job {
> struct drm_v3d_submit_csd args;
>  };
>  
> +enum v3d_cpu_job_type {};
> +
> +struct v3d_cpu_job {
> +   struct v3d_job base;
> +
> +   enum v3d_cpu_job_type job_type;
> +};
> +
>  struct v3d_submit_outsync {
> struct drm_syncobj *syncobj;
>  };
> @@ -414,6 +424,8 @@ int v3d_submit_tfu_ioctl(struct drm_device *dev,
> void *data,
>  struct drm_file *file_priv);
>  int v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
>  struct drm_file *file_priv);
> +int v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> +    struct drm_file *file_priv);
>  
>  /* v3d_irq.c */
>  int v3d_irq_init(struct v3d_dev *v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
> b/drivers/gpu/drm/v3d/v3d_sched.c
> index fccbea2a5f2e..a32c91b94898 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -55,6 +55,12 @@ to_csd_job(struct drm_sched_job *sched_job)
> return container_of(sched_job, struct v3d_csd_job,
> base.base);
>  }
>  
> +static struct v3d_cpu_job *
> +to_cpu_job(struct drm_sched_job *sched_job)
> +{
> +   return container_of(sched_job, struct v3d_cpu_job,
> base.base);
> +}
> +
>  static void
>  v3d_sched_job_free(struct drm_sched_job *sched_job)
>  {
> @@ -262,6 +268,42 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
> return fence;
>  }
>  
> +static struct dma_fence *
> +v3d_cpu_job_run(struct drm_sched_job *sched_job)
> +{
> +   struct v3d_cpu_job *job = to_cpu_job(sched_job);
> +   struct v3d_dev *v3d = job->base.v3d;
> +   struct v3d_file_priv *file = 

Re: [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation

2023-11-27 Thread Iago Toral
El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> We want to allow the IOCTLs to allocate the job without initiating
> it.
> This will be useful for the CPU job submission IOCTL, as the CPU job
> has
> the need to use information from the user extensions. Currently, the
> user extensions are parsed before the job allocation, making it
> impossible to fill the CPU job when parsing the user extensions.
> Therefore, decouple the job allocation from the job initiation.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_submit.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index fe46dd316ca0..ed1a310bbd2f 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -135,6 +135,21 @@ void v3d_job_put(struct v3d_job *job)
> kref_put(>refcount, job->free);
>  }
>  
> +static int
> +v3d_job_allocate(void **container, size_t size)
> +{
> +   if (*container)
> +   return 0;

Mmm... is this really what we want? At least right now we expect
v3d_job_allocate to always allocate memory, is there any scenario in
which we would expect to call this with an already allocated container?

Iago

> +
> +   *container = kcalloc(1, size, GFP_KERNEL);
> +   if (!*container) {
> +   DRM_ERROR("Cannot allocate memory for V3D job.\n");
> +   return -ENOMEM;
> +   }
> +
> +   return 0;
> +}
> +
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
>  void **container, size_t size, void (*free)(struct kref
> *ref),
> @@ -145,11 +160,9 @@ v3d_job_init(struct v3d_dev *v3d, struct
> drm_file *file_priv,
> bool has_multisync = se && (se->flags &
> DRM_V3D_EXT_ID_MULTI_SYNC);
> int ret, i;
>  
> -   *container = kcalloc(1, size, GFP_KERNEL);
> -   if (!*container) {
> -   DRM_ERROR("Cannot allocate memory for v3d job.");
> -   return -ENOMEM;
> -   }
> +   ret = v3d_job_allocate(container, size);
> +   if (ret)
> +   return ret;
>  
> job = *container;
> job->v3d = v3d;



Re: [PATCH v2 04/17] drm/v3d: Simplify job refcount handling

2023-11-26 Thread Iago Toral
El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> From: Melissa Wen 
> 
> Instead of checking if the job is NULL every time we call the
> function,
> check it inside the function.
> 
> Signed-off-by: Melissa Wen 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_submit.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index f36214002f37..e18e7c963884 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -129,6 +129,9 @@ void v3d_job_cleanup(struct v3d_job *job)
>  
>  void v3d_job_put(struct v3d_job *job)
>  {
> +   if (!job)
> +   return;
> +
> kref_put(>refcount, job->free);
>  }
>  
> @@ -517,11 +520,9 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  ,
>  last_job-
> >done_fence);
>  
> -   if (bin)
> -   v3d_job_put(>base);
> -   v3d_job_put(>base);
> -   if (clean_job)
> -   v3d_job_put(clean_job);
> +   v3d_job_put((void *)bin);
> +   v3d_job_put((void *)render);
> +   v3d_job_put((void *)clean_job);

Personally, I am not a big fan of casting to void* instead of using
>base for all the v3d_job_put calls in this patch.

Iago

>  
> return 0;
>  
> @@ -621,7 +622,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>  ,
>  job-
> >base.done_fence);
>  
> -   v3d_job_put(>base);
> +   v3d_job_put((void *)job);
>  
> return 0;
>  
> @@ -725,7 +726,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> *data,
>  ,
>  clean_job-
> >done_fence);
>  
> -   v3d_job_put(>base);
> +   v3d_job_put((void *)job);
> v3d_job_put(clean_job);
>  
> return 0;



Re: [RFC PATCH 2/2] vc4: introduce DMA-BUF heap

2023-11-09 Thread Iago Toral
Hi Simon,

thanks for looking into this! I few comments below inline...

El jue, 09-11-2023 a las 07:45 +, Simon Ser escribió:
> User-space sometimes needs to allocate scanout-capable memory for
> GPU rendering purposes. On a vc4/v3d split render/display SoC, this
> is achieved via DRM dumb buffers: the v3d user-space driver opens
> the primary vc4 node, allocates a DRM dumb buffer there, exports it
> as a DMA-BUF, imports it into the v3d render node, and renders to it.
> 
> However, DRM dumb buffers are only meant for CPU rendering, they are
> not intended to be used for GPU rendering. Primary nodes should only
> be used for mode-setting purposes, other programs should not attempt
> to open it. Moreover, opening the primary node is already broken on
> some setups: systemd grants permission to open primary nodes to
> physically logged in users, but this breaks when the user is not
> physically logged in (e.g. headless setup) and when the distribution
> is using a different init (e.g. Alpine Linux uses openrc).
> 
> We need an alternate way for v3d to allocate scanout-capable memory.
> Leverage DMA heaps for this purpose: expose a CMA heap to user-space.
> Preliminary testing has been done with wlroots [1].
> 
> This is still an RFC. Open questions:

I think Maxime is the right person to answer about vc4 but I'll answer
a few of these questions from my perspective:

> 
> - Does this approach make sense to y'all in general?

Yeah, this sounds sensible to me.

> - What would be a good name for the heap? "vc4" is maybe a bit naive
> and
>   not precise enough. Something with "cma"? Do we need to plan a
> naming
>   scheme to accomodate for multiple vc4 devices?
> - Right now root privileges are necessary to open the heap. Should we
>   allow everybody to open the heap by default (after all, user-space
> can
>   already allocate arbitrary amounts of GPU memory)? Should we leave
> it
>   up to user-space to solve this issue (via logind/seatd or a Wayland
>   protocol or something else)?

I think so, yes, after all, the main user of this will be the Mesa
driver, so root only would not be great I think.

> 
> TODO:
> 
> - Need to add !vc5 support.

I believe that would be models before Raspberry Pi 4, which don't have
v3d, so maybe we don't need this there?

Iago

> - Need to the extend DMA heaps API to allow vc4 to unregister the
> heap
>   on unload.
> 
> [1]:
> https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4414
> 
> Signed-off-by: Simon Ser 
> Cc: Maxime Ripard 
> Cc: Daniel Vetter 
> Cc: Daniel Stone 
> Cc: Erico Nunes 
> Cc: Iago Toral Quiroga 
> Cc: Maíra Canal 
> Cc: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/vc4/Makefile   |  2 ++
>  drivers/gpu/drm/vc4/vc4_dma_heap.c | 51
> ++
>  drivers/gpu/drm/vc4/vc4_drv.c  |  6 
>  drivers/gpu/drm/vc4/vc4_drv.h  |  5 +++
>  4 files changed, 64 insertions(+)
>  create mode 100644 drivers/gpu/drm/vc4/vc4_dma_heap.c
> 
> diff --git a/drivers/gpu/drm/vc4/Makefile
> b/drivers/gpu/drm/vc4/Makefile
> index c41f89a15a55..e4048870cec7 100644
> --- a/drivers/gpu/drm/vc4/Makefile
> +++ b/drivers/gpu/drm/vc4/Makefile
> @@ -34,4 +34,6 @@ vc4-$(CONFIG_DRM_VC4_KUNIT_TEST) += \
>  
>  vc4-$(CONFIG_DEBUG_FS) += vc4_debugfs.o
>  
> +vc4-$(CONFIG_DMABUF_HEAPS) += vc4_dma_heap.o
> +
>  obj-$(CONFIG_DRM_VC4)  += vc4.o
> diff --git a/drivers/gpu/drm/vc4/vc4_dma_heap.c
> b/drivers/gpu/drm/vc4/vc4_dma_heap.c
> new file mode 100644
> index ..010d0a88f3fa
> --- /dev/null
> +++ b/drivers/gpu/drm/vc4/vc4_dma_heap.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + *  Copyright © 2023 Simon Ser
> + */
> +
> +#include 
> +#include 
> +
> +#include "vc4_drv.h"
> +
> +static struct dma_buf *vc4_dma_heap_allocate(struct dma_heap *heap,
> +    unsigned long size,
> +    unsigned long fd_flags,
> +    unsigned long
> heap_flags)
> +{
> +   struct vc4_dev *vc4 = dma_heap_get_drvdata(heap);
> +   //DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +   struct drm_gem_dma_object *dma_obj;
> +   struct dma_buf *dmabuf;
> +
> +   if (WARN_ON_ONCE(!vc4->is_vc5))
> +   return ERR_PTR(-ENODEV);
> +
> +   dma_obj = drm_gem_dma_create(>base, size);
> +   if (IS_ERR(dma_obj))
> +   return ERR_CAST(dma_obj);
> +
> +   dmabuf = drm_gem_prime_export(_obj->base, fd_flags);
> +   drm_gem_object_put(_obj->base);
> +   return dmabuf;
> +}
> +
> +static const str

[PATCH v3 2/4] drm/v3d: fix up register addresses for V3D 7.x

2023-10-31 Thread Iago Toral Quiroga
This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_debugfs.c | 178 +-
 drivers/gpu/drm/v3d/v3d_gem.c |   4 +-
 drivers/gpu/drm/v3d/v3d_irq.c |  46 
 drivers/gpu/drm/v3d/v3d_regs.h|  94 +---
 drivers/gpu/drm/v3d/v3d_sched.c   |  38 ---
 5 files changed, 204 insertions(+), 156 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c 
b/drivers/gpu/drm/v3d/v3d_debugfs.c
index 330669f51fa7..f843a50d5dce 100644
--- a/drivers/gpu/drm/v3d/v3d_debugfs.c
+++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
@@ -12,69 +12,83 @@
 #include "v3d_drv.h"
 #include "v3d_regs.h"
 
-#define REGDEF(reg) { reg, #reg }
+#define REGDEF(min_ver, max_ver, reg) { min_ver, max_ver, reg, #reg }
 struct v3d_reg_def {
+   u32 min_ver;
+   u32 max_ver;
u32 reg;
const char *name;
 };
 
 static const struct v3d_reg_def v3d_hub_reg_defs[] = {
-   REGDEF(V3D_HUB_AXICFG),
-   REGDEF(V3D_HUB_UIFCFG),
-   REGDEF(V3D_HUB_IDENT0),
-   REGDEF(V3D_HUB_IDENT1),
-   REGDEF(V3D_HUB_IDENT2),
-   REGDEF(V3D_HUB_IDENT3),
-   REGDEF(V3D_HUB_INT_STS),
-   REGDEF(V3D_HUB_INT_MSK_STS),
-
-   REGDEF(V3D_MMU_CTL),
-   REGDEF(V3D_MMU_VIO_ADDR),
-   REGDEF(V3D_MMU_VIO_ID),
-   REGDEF(V3D_MMU_DEBUG_INFO),
+   REGDEF(33, 42, V3D_HUB_AXICFG),
+   REGDEF(33, 71, V3D_HUB_UIFCFG),
+   REGDEF(33, 71, V3D_HUB_IDENT0),
+   REGDEF(33, 71, V3D_HUB_IDENT1),
+   REGDEF(33, 71, V3D_HUB_IDENT2),
+   REGDEF(33, 71, V3D_HUB_IDENT3),
+   REGDEF(33, 71, V3D_HUB_INT_STS),
+   REGDEF(33, 71, V3D_HUB_INT_MSK_STS),
+
+   REGDEF(33, 71, V3D_MMU_CTL),
+   REGDEF(33, 71, V3D_MMU_VIO_ADDR),
+   REGDEF(33, 71, V3D_MMU_VIO_ID),
+   REGDEF(33, 71, V3D_MMU_DEBUG_INFO),
+
+   REGDEF(71, 71, V3D_GMP_STATUS(71)),
+   REGDEF(71, 71, V3D_GMP_CFG(71)),
+   REGDEF(71, 71, V3D_GMP_VIO_ADDR(71)),
 };
 
 static const struct v3d_reg_def v3d_gca_reg_defs[] = {
-   REGDEF(V3D_GCA_SAFE_SHUTDOWN),
-   REGDEF(V3D_GCA_SAFE_SHUTDOWN_ACK),
+   REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN),
+   REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN_ACK),
 };
 
 static const struct v3d_reg_def v3d_core_reg_defs[] = {
-   REGDEF(V3D_CTL_IDENT0),
-   REGDEF(V3D_CTL_IDENT1),
-   REGDEF(V3D_CTL_IDENT2),
-   REGDEF(V3D_CTL_MISCCFG),
-   REGDEF(V3D_CTL_INT_STS),
-   REGDEF(V3D_CTL_INT_MSK_STS),
-   REGDEF(V3D_CLE_CT0CS),
-   REGDEF(V3D_CLE_CT0CA),
-   REGDEF(V3D_CLE_CT0EA),
-   REGDEF(V3D_CLE_CT1CS),
-   REGDEF(V3D_CLE_CT1CA),
-   REGDEF(V3D_CLE_CT1EA),
-
-   REGDEF(V3D_PTB_BPCA),
-   REGDEF(V3D_PTB_BPCS),
-
-   REGDEF(V3D_GMP_STATUS),
-   REGDEF(V3D_GMP_CFG),
-   REGDEF(V3D_GMP_VIO_ADDR),
-
-   REGDEF(V3D_ERR_FDBGO),
-   REGDEF(V3D_ERR_FDBGB),
-   REGDEF(V3D_ERR_FDBGS),
-   REGDEF(V3D_ERR_STAT),
+   REGDEF(33, 71, V3D_CTL_IDENT0),
+   REGDEF(33, 71, V3D_CTL_IDENT1),
+   REGDEF(33, 71, V3D_CTL_IDENT2),
+   REGDEF(33, 71, V3D_CTL_MISCCFG),
+   REGDEF(33, 71, V3D_CTL_INT_STS),
+   REGDEF(33, 71, V3D_CTL_INT_MSK_STS),
+   REGDEF(33, 71, V3D_CLE_CT0CS),
+   REGDEF(33, 71, V3D_CLE_CT0CA),
+   REGDEF(33, 71, V3D_CLE_CT0EA),
+   REGDEF(33, 71, V3D_CLE_CT1CS),
+   REGDEF(33, 71, V3D_CLE_CT1CA),
+   REGDEF(33, 71, V3D_CLE_CT1EA),
+
+   REGDEF(33, 71, V3D_PTB_BPCA),
+   REGDEF(33, 71, V3D_PTB_BPCS),
+
+   REGDEF(33, 41, V3D_GMP_STATUS(33)),
+   REGDEF(33, 41, V3D_GMP_CFG(33)),
+   REGDEF(33, 41, V3D_GMP_VIO_ADDR(33)),
+
+   REGDEF(33, 71, V3D_ERR_FDBGO),
+   REGDEF(33, 71, V3D_ERR_FDBGB),
+   REGDEF(33, 71, V3D_ERR_FDBGS),
+   REGDEF(33, 71, V3D_ERR_STAT),
 };
 
 static const struct v3d_reg_def v3d_csd_reg_defs[] = {
-   REGDEF(V3D_CSD_STATUS),
-   REGDEF(V3D_CSD_CURRENT_CFG0),
-   REGDEF(V3D_CSD_CURRENT_CFG1),
-   REGDEF(V3D_CSD_CURRENT_CFG2),
-   REGDEF(V3D_CSD_CURRENT_CFG3),
-   REGDEF(V3D_CSD_CURRENT_CFG4),
-   REGDEF(V3D_CSD_CURRENT_CFG5),
-   REGDEF(V3D_CSD_CURRENT_CFG6),
+   REGDEF(41, 71, V3D_CSD_STATUS),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG0(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG1(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG2(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG3(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG4(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG5(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG6(41)),
+   REGDEF(71, 71, V3D_CSD_CURRENT_CFG0(71)),
+   REGDEF(71, 71, V3D_CSD_CURRENT_CFG1(71)),
+   REGDEF(71, 71, V3D_CSD_CURRENT_CFG2(71)),
+   REGDEF(71, 71, V3D_CSD_CURRENT_CFG3(71)),
+   REGDEF(71, 71, V3D

[PATCH v3 0/4] V3D module changes for Pi5

2023-10-31 Thread Iago Toral Quiroga
This series includes patches to update the V3D kernel module
that drives the VideoCore VI GPU in Raspberry Pi 4 to also support
the Video Core VII iteration present in Raspberry Pi 5.

The first patch in the series adds a small uAPI update required for
TFU jobs, the second patch addresses the bulk of the work and
involves mostly updates to register addresses, the third and fourth
patches match the 'brcm,2712-v3d' device string from Pi5 with the
V3D driver.

The changes for the user-space driver can be found in the
corresponding Mesa MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450

Changes in v2:
  - Addeded s-o-b to patches (Maíra Canal)
  - patch 2: fixed style warnings (Maíra Canal)
  - patch 2: Use macro with version param to get reg addresses (Maíra Canal)
  - new patch: Update the device tree binding (Stefan Wahren)

Changes in v3:
  - Moved changelog entries in patches to cover letter (Stefan Wahren)
  - Added DT maintainers (Stefan Wahren, Krzysztof Kozlowski)

Iago Toral Quiroga (4):
  drm/v3d: update UAPI to match user-space for V3D 7.x
  drm/v3d: fix up register addresses for V3D 7.x
  dt-bindings: gpu: v3d: Add BCM2712's compatible
  drm/v3d: add brcm,2712-v3d as a compatible V3D device

 .../devicetree/bindings/gpu/brcm,bcm-v3d.yaml |   1 +
 drivers/gpu/drm/v3d/v3d_debugfs.c | 178 ++
 drivers/gpu/drm/v3d/v3d_drv.c |   1 +
 drivers/gpu/drm/v3d/v3d_gem.c |   4 +-
 drivers/gpu/drm/v3d/v3d_irq.c |  46 +++--
 drivers/gpu/drm/v3d/v3d_regs.h|  94 +
 drivers/gpu/drm/v3d/v3d_sched.c   |  38 ++--
 include/uapi/drm/v3d_drm.h|   5 +
 8 files changed, 211 insertions(+), 156 deletions(-)

-- 
2.39.2



[PATCH v3 1/4] drm/v3d: update UAPI to match user-space for V3D 7.x

2023-10-31 Thread Iago Toral Quiroga
V3D 7.x takes a new parameter to configure TFU jobs that needs
to be provided by user space.

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Maíra Canal 
---
v2: added s-o-b, fixed typo in commit message (Maíra Canal)

 include/uapi/drm/v3d_drm.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 3dfc0af8756a..1a7d7a689de3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -319,6 +319,11 @@ struct drm_v3d_submit_tfu {
 
/* Pointer to an array of ioctl extensions*/
__u64 extensions;
+
+   struct {
+   __u32 ioc;
+   __u32 pad;
+   } v71;
 };
 
 /* Submits a compute shader for dispatch.  This job will block on any
-- 
2.39.2



[PATCH v3 3/4] dt-bindings: gpu: v3d: Add BCM2712's compatible

2023-10-31 Thread Iago Toral Quiroga
BCM2712, Raspberry Pi 5's SoC, contains a V3D core. So add its specific
compatible to the bindings.

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Maíra Canal 
---
 Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml 
b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
index dae55b8a267b..dc078ceeca9a 100644
--- a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
+++ b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
@@ -17,6 +17,7 @@ properties:
   compatible:
 enum:
   - brcm,2711-v3d
+  - brcm,2712-v3d
   - brcm,7268-v3d
   - brcm,7278-v3d
 
-- 
2.39.2



[PATCH v3 4/4] drm/v3d: add brcm,2712-v3d as a compatible V3D device

2023-10-31 Thread Iago Toral Quiroga
This is required to get the V3D module to load with Raspberry Pi 5.

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Stefan Wahren 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index ffbbe9d527d3..1ab46bdf8ad7 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -187,6 +187,7 @@ static const struct drm_driver v3d_drm_driver = {
 
 static const struct of_device_id v3d_of_match[] = {
{ .compatible = "brcm,2711-v3d" },
+   { .compatible = "brcm,2712-v3d" },
{ .compatible = "brcm,7268-v3d" },
{ .compatible = "brcm,7278-v3d" },
{},
-- 
2.39.2



Re: [PATCH 1/2] drm/v3d: wait for all jobs to finish before unregistering

2023-10-30 Thread Iago Toral
El mar, 24-10-2023 a las 07:05 -0300, Maira Canal escribió:
> Hi Iago,
> 
> On 10/24/23 02:57, Iago Toral wrote:
> > El lun, 23-10-2023 a las 07:58 -0300, Maíra Canal escribió:
> > > Currently, we are only warning the user if the BIN or RENDER jobs
> > > don't
> > > finish before we unregister V3D. We must wait for all jobs to
> > > finish
> > > before unregistering. Therefore, warn the user if TFU or CSD jobs
> > > are not done by the time the driver is unregistered.
> > > 
> > > Signed-off-by: Maíra Canal 
> > > ---
> > >   drivers/gpu/drm/v3d/v3d_gem.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > > b/drivers/gpu/drm/v3d/v3d_gem.c
> > > index 2e94ce788c71..afa7d170d1ff 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > > @@ -1072,6 +1072,8 @@ v3d_gem_destroy(struct drm_device *dev)
> > >   */
> > >  WARN_ON(v3d->bin_job);
> > >  WARN_ON(v3d->render_job);
> > > +   WARN_ON(v3d->tfu_job);
> > > +   WARN_ON(v3d->csd_job);
> > 
> > I guess we should do this for cache clean jobs too, right?
> 
> As the cache clean jobs are synchronous, we don't keep track of the
> current cache clean job. When I say that the cache clean jobs are
> synchronous, it means that the end of the job is not determined by
> an interruption. Therefore, there is no need to make sure that the
> cache clean jobs are still running.

I see, thanks Maíra.

Reviewed-by: Iago Toral Quiroga 

> 
> Best Regards,
> - Maíra
> 
> > 
> > Iago
> > 
> > >   
> > >  drm_mm_takedown(>mm);
> > >   
> > 
> 



Re: [PATCH v2 2/4] drm/v3d: fix up register addresses for V3D 7.x

2023-10-30 Thread Iago Toral
El lun, 30-10-2023 a las 11:28 +0100, Stefan Wahren escribió:
> Hi Iago,
> 
> Am 30.10.23 um 11:14 schrieb Iago Toral:
> > Hi Stefan,
> > 
> > El lun, 30-10-2023 a las 10:58 +0100, Stefan Wahren escribió:
> > > Hi Iago,
> > > 
> > > Am 30.10.23 um 09:28 schrieb Iago Toral Quiroga:
> > > > This patch updates a number of register addresses that have
> > > > been changed in Raspberry Pi 5 (V3D 7.1) and updates the
> > > > code to use the corresponding registers and addresses based
> > > > on the actual V3D version.
> > > > 
> > > > v2:
> > > >    - added s-o-b and commit message. (Maíra Canal)
> > > >    - Used macro that takes version as argument and returns
> > > >  appropriate values instead of two different definitions
> > > >      for post-v71 and pre-v71 hardware when possible. (Maíra
> > > > Canal)
> > > >    - fixed style warnings from checkpatch.pl. (Maíra Canal)
> > > > 
> > > > Signed-off-by: Iago Toral Quiroga 
> > > > ---
> > > >    drivers/gpu/drm/v3d/v3d_debugfs.c | 178 +---
> > > > -
> > > > -
> > > >    drivers/gpu/drm/v3d/v3d_gem.c |   4 +-
> > > >    drivers/gpu/drm/v3d/v3d_irq.c |  46 
> > > >    drivers/gpu/drm/v3d/v3d_regs.h    |  94 +---
> > > >    drivers/gpu/drm/v3d/v3d_sched.c   |  38 ---
> > > >    5 files changed, 204 insertions(+), 156 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c
> > > > b/drivers/gpu/drm/v3d/v3d_debugfs.c
> > > > index 330669f51fa7..f843a50d5dce 100644
> > > > --- a/drivers/gpu/drm/v3d/v3d_debugfs.c
> > > > +++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
> > > > @@ -12,69 +12,83 @@
> > > >    #include "v3d_drv.h"
> > > >    #include "v3d_regs.h"
> > > > 
> > > > -#define REGDEF(reg) { reg, #reg }
> > > > +#define REGDEF(min_ver, max_ver, reg) { min_ver, max_ver, reg,
> > > > #reg }
> > > >    struct v3d_reg_def {
> > > > +   u32 min_ver;
> > > > +   u32 max_ver;
> > > Is this documented some where which SoC has which V3D version?
> > > 
> > Not that I am aware of.
> > 
> > There are really only two Raspberry Pi SoCs supported by v3d:
> > bcm2711
> > is Raspberry Pi 4 which is V3D 4.2 (compatible with 4.1), and
> > bcm2712
> > is Raspberry Pi 5 which is V3D 7.1.
> okay, in this case the best source would be Emma's wiki [1]. Maybe
> these
> should be added as well to the wiki.

Sure, will do that, thanks!

Iago

> 
> [1] -
> https://github.com/anholt/linux/wiki/Devices-with-Videocore-graphics
> > 
> > I don't know what SoCs are supported by versions of V3D before 4.1,
> > I
> > think those were targetting set-top-box hardware that Emma used
> > while
> > setting up the driver before the SoC for Raspberry Pi 4 was
> > available.
> > 
> > Iago
> 
> 



Re: [PATCH v2 3/4] dt-bindings: gpu: v3d: Add BCM2712's compatible

2023-10-30 Thread Iago Toral
El lun, 30-10-2023 a las 10:57 +0100, Stefan Wahren escribió:
> Hi Iago,
> 
> Am 30.10.23 um 09:28 schrieb Iago Toral Quiroga:
> > BCM2712, Raspberry Pi 5's SoC, contains a V3D core. So add its
> > specific
> > compatible to the bindings.
> > 
> > v2: new, requested by Stefan Wahren.
> Thanks for sending this but the line above belongs below --- since
> it's
> not relevant after the patch has been applied.
> > 
> > Signed-off-by: Iago Toral Quiroga 
> Unfortunately this patch will be ignored because the relevant
> devicetree
> people are missing to give their Ack.
> 
> Please use scripts/get_maintainers.pl
> 

Sorry about that, my bad, should I resend the patch (as part of a v3)
or is it enough to add the relevant maintainers in a reply to the
original v2 patch?

Iago

> Best regards
> > ---
> >   Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/gpu/brcm,bcm-
> > v3d.yaml b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
> > index dae55b8a267b..dc078ceeca9a 100644
> > --- a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
> > +++ b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
> > @@ -17,6 +17,7 @@ properties:
> >     compatible:
> >   enum:
> >     - brcm,2711-v3d
> > +  - brcm,2712-v3d
> >     - brcm,7268-v3d
> >     - brcm,7278-v3d
> > 
> 
> 



Re: [PATCH v2 2/4] drm/v3d: fix up register addresses for V3D 7.x

2023-10-30 Thread Iago Toral
Hi Stefan,

El lun, 30-10-2023 a las 10:58 +0100, Stefan Wahren escribió:
> Hi Iago,
> 
> Am 30.10.23 um 09:28 schrieb Iago Toral Quiroga:
> > This patch updates a number of register addresses that have
> > been changed in Raspberry Pi 5 (V3D 7.1) and updates the
> > code to use the corresponding registers and addresses based
> > on the actual V3D version.
> > 
> > v2:
> >   - added s-o-b and commit message. (Maíra Canal)
> >   - Used macro that takes version as argument and returns
> >     appropriate values instead of two different definitions
> >     for post-v71 and pre-v71 hardware when possible. (Maíra Canal)
> >   - fixed style warnings from checkpatch.pl. (Maíra Canal)
> > 
> > Signed-off-by: Iago Toral Quiroga 
> > ---
> >   drivers/gpu/drm/v3d/v3d_debugfs.c | 178 +
> > -
> >   drivers/gpu/drm/v3d/v3d_gem.c |   4 +-
> >   drivers/gpu/drm/v3d/v3d_irq.c |  46 
> >   drivers/gpu/drm/v3d/v3d_regs.h    |  94 +---
> >   drivers/gpu/drm/v3d/v3d_sched.c   |  38 ---
> >   5 files changed, 204 insertions(+), 156 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c
> > b/drivers/gpu/drm/v3d/v3d_debugfs.c
> > index 330669f51fa7..f843a50d5dce 100644
> > --- a/drivers/gpu/drm/v3d/v3d_debugfs.c
> > +++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
> > @@ -12,69 +12,83 @@
> >   #include "v3d_drv.h"
> >   #include "v3d_regs.h"
> > 
> > -#define REGDEF(reg) { reg, #reg }
> > +#define REGDEF(min_ver, max_ver, reg) { min_ver, max_ver, reg,
> > #reg }
> >   struct v3d_reg_def {
> > +   u32 min_ver;
> > +   u32 max_ver;
> Is this documented some where which SoC has which V3D version?
> 

Not that I am aware of.

There are really only two Raspberry Pi SoCs supported by v3d: bcm2711
is Raspberry Pi 4 which is V3D 4.2 (compatible with 4.1), and bcm2712
is Raspberry Pi 5 which is V3D 7.1.

I don't know what SoCs are supported by versions of V3D before 4.1, I
think those were targetting set-top-box hardware that Emma used while
setting up the driver before the SoC for Raspberry Pi 4 was available.

Iago


[PATCH v2 4/4] drm/v3d: add brcm,2712-v3d as a compatible V3D device

2023-10-30 Thread Iago Toral Quiroga
This is required to get the V3D module to load with Raspberry Pi 5.

v2:
 - added s-o-b and commit message. (Maíra)
 - keep order of compatible strings. (Stefan Wahren)

Signed-off-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index ffbbe9d527d3..1ab46bdf8ad7 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -187,6 +187,7 @@ static const struct drm_driver v3d_drm_driver = {
 
 static const struct of_device_id v3d_of_match[] = {
{ .compatible = "brcm,2711-v3d" },
+   { .compatible = "brcm,2712-v3d" },
{ .compatible = "brcm,7268-v3d" },
{ .compatible = "brcm,7278-v3d" },
{},
-- 
2.39.2



[PATCH v2 2/4] drm/v3d: fix up register addresses for V3D 7.x

2023-10-30 Thread Iago Toral Quiroga
This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.

v2:
 - added s-o-b and commit message. (Maíra Canal)
 - Used macro that takes version as argument and returns
   appropriate values instead of two different definitions
   for post-v71 and pre-v71 hardware when possible. (Maíra Canal)
 - fixed style warnings from checkpatch.pl. (Maíra Canal)

Signed-off-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_debugfs.c | 178 +-
 drivers/gpu/drm/v3d/v3d_gem.c |   4 +-
 drivers/gpu/drm/v3d/v3d_irq.c |  46 
 drivers/gpu/drm/v3d/v3d_regs.h|  94 +---
 drivers/gpu/drm/v3d/v3d_sched.c   |  38 ---
 5 files changed, 204 insertions(+), 156 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c 
b/drivers/gpu/drm/v3d/v3d_debugfs.c
index 330669f51fa7..f843a50d5dce 100644
--- a/drivers/gpu/drm/v3d/v3d_debugfs.c
+++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
@@ -12,69 +12,83 @@
 #include "v3d_drv.h"
 #include "v3d_regs.h"
 
-#define REGDEF(reg) { reg, #reg }
+#define REGDEF(min_ver, max_ver, reg) { min_ver, max_ver, reg, #reg }
 struct v3d_reg_def {
+   u32 min_ver;
+   u32 max_ver;
u32 reg;
const char *name;
 };
 
 static const struct v3d_reg_def v3d_hub_reg_defs[] = {
-   REGDEF(V3D_HUB_AXICFG),
-   REGDEF(V3D_HUB_UIFCFG),
-   REGDEF(V3D_HUB_IDENT0),
-   REGDEF(V3D_HUB_IDENT1),
-   REGDEF(V3D_HUB_IDENT2),
-   REGDEF(V3D_HUB_IDENT3),
-   REGDEF(V3D_HUB_INT_STS),
-   REGDEF(V3D_HUB_INT_MSK_STS),
-
-   REGDEF(V3D_MMU_CTL),
-   REGDEF(V3D_MMU_VIO_ADDR),
-   REGDEF(V3D_MMU_VIO_ID),
-   REGDEF(V3D_MMU_DEBUG_INFO),
+   REGDEF(33, 42, V3D_HUB_AXICFG),
+   REGDEF(33, 71, V3D_HUB_UIFCFG),
+   REGDEF(33, 71, V3D_HUB_IDENT0),
+   REGDEF(33, 71, V3D_HUB_IDENT1),
+   REGDEF(33, 71, V3D_HUB_IDENT2),
+   REGDEF(33, 71, V3D_HUB_IDENT3),
+   REGDEF(33, 71, V3D_HUB_INT_STS),
+   REGDEF(33, 71, V3D_HUB_INT_MSK_STS),
+
+   REGDEF(33, 71, V3D_MMU_CTL),
+   REGDEF(33, 71, V3D_MMU_VIO_ADDR),
+   REGDEF(33, 71, V3D_MMU_VIO_ID),
+   REGDEF(33, 71, V3D_MMU_DEBUG_INFO),
+
+   REGDEF(71, 71, V3D_GMP_STATUS(71)),
+   REGDEF(71, 71, V3D_GMP_CFG(71)),
+   REGDEF(71, 71, V3D_GMP_VIO_ADDR(71)),
 };
 
 static const struct v3d_reg_def v3d_gca_reg_defs[] = {
-   REGDEF(V3D_GCA_SAFE_SHUTDOWN),
-   REGDEF(V3D_GCA_SAFE_SHUTDOWN_ACK),
+   REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN),
+   REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN_ACK),
 };
 
 static const struct v3d_reg_def v3d_core_reg_defs[] = {
-   REGDEF(V3D_CTL_IDENT0),
-   REGDEF(V3D_CTL_IDENT1),
-   REGDEF(V3D_CTL_IDENT2),
-   REGDEF(V3D_CTL_MISCCFG),
-   REGDEF(V3D_CTL_INT_STS),
-   REGDEF(V3D_CTL_INT_MSK_STS),
-   REGDEF(V3D_CLE_CT0CS),
-   REGDEF(V3D_CLE_CT0CA),
-   REGDEF(V3D_CLE_CT0EA),
-   REGDEF(V3D_CLE_CT1CS),
-   REGDEF(V3D_CLE_CT1CA),
-   REGDEF(V3D_CLE_CT1EA),
-
-   REGDEF(V3D_PTB_BPCA),
-   REGDEF(V3D_PTB_BPCS),
-
-   REGDEF(V3D_GMP_STATUS),
-   REGDEF(V3D_GMP_CFG),
-   REGDEF(V3D_GMP_VIO_ADDR),
-
-   REGDEF(V3D_ERR_FDBGO),
-   REGDEF(V3D_ERR_FDBGB),
-   REGDEF(V3D_ERR_FDBGS),
-   REGDEF(V3D_ERR_STAT),
+   REGDEF(33, 71, V3D_CTL_IDENT0),
+   REGDEF(33, 71, V3D_CTL_IDENT1),
+   REGDEF(33, 71, V3D_CTL_IDENT2),
+   REGDEF(33, 71, V3D_CTL_MISCCFG),
+   REGDEF(33, 71, V3D_CTL_INT_STS),
+   REGDEF(33, 71, V3D_CTL_INT_MSK_STS),
+   REGDEF(33, 71, V3D_CLE_CT0CS),
+   REGDEF(33, 71, V3D_CLE_CT0CA),
+   REGDEF(33, 71, V3D_CLE_CT0EA),
+   REGDEF(33, 71, V3D_CLE_CT1CS),
+   REGDEF(33, 71, V3D_CLE_CT1CA),
+   REGDEF(33, 71, V3D_CLE_CT1EA),
+
+   REGDEF(33, 71, V3D_PTB_BPCA),
+   REGDEF(33, 71, V3D_PTB_BPCS),
+
+   REGDEF(33, 41, V3D_GMP_STATUS(33)),
+   REGDEF(33, 41, V3D_GMP_CFG(33)),
+   REGDEF(33, 41, V3D_GMP_VIO_ADDR(33)),
+
+   REGDEF(33, 71, V3D_ERR_FDBGO),
+   REGDEF(33, 71, V3D_ERR_FDBGB),
+   REGDEF(33, 71, V3D_ERR_FDBGS),
+   REGDEF(33, 71, V3D_ERR_STAT),
 };
 
 static const struct v3d_reg_def v3d_csd_reg_defs[] = {
-   REGDEF(V3D_CSD_STATUS),
-   REGDEF(V3D_CSD_CURRENT_CFG0),
-   REGDEF(V3D_CSD_CURRENT_CFG1),
-   REGDEF(V3D_CSD_CURRENT_CFG2),
-   REGDEF(V3D_CSD_CURRENT_CFG3),
-   REGDEF(V3D_CSD_CURRENT_CFG4),
-   REGDEF(V3D_CSD_CURRENT_CFG5),
-   REGDEF(V3D_CSD_CURRENT_CFG6),
+   REGDEF(41, 71, V3D_CSD_STATUS),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG0(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG1(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG2(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG3(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG4(41)),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG5(41)),
+   REGDEF(41, 41, 

[PATCH v2 3/4] dt-bindings: gpu: v3d: Add BCM2712's compatible

2023-10-30 Thread Iago Toral Quiroga
BCM2712, Raspberry Pi 5's SoC, contains a V3D core. So add its specific
compatible to the bindings.

v2: new, requested by Stefan Wahren.

Signed-off-by: Iago Toral Quiroga 
---
 Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml 
b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
index dae55b8a267b..dc078ceeca9a 100644
--- a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
+++ b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
@@ -17,6 +17,7 @@ properties:
   compatible:
 enum:
   - brcm,2711-v3d
+  - brcm,2712-v3d
   - brcm,7268-v3d
   - brcm,7278-v3d
 
-- 
2.39.2



[PATCH v2 1/4] drm/v3d: update UAPI to match user-space for V3D 7.x

2023-10-30 Thread Iago Toral Quiroga
V3D 7.x takes a new parameter to configure TFU jobs that needs
to be provided by user space.

v2: added s-o-b, fixed typo in commit message (Maíra Canal)

Signed-off-by: Iago Toral Quiroga 
---
 include/uapi/drm/v3d_drm.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 3dfc0af8756a..1a7d7a689de3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -319,6 +319,11 @@ struct drm_v3d_submit_tfu {
 
/* Pointer to an array of ioctl extensions*/
__u64 extensions;
+
+   struct {
+   __u32 ioc;
+   __u32 pad;
+   } v71;
 };
 
 /* Submits a compute shader for dispatch.  This job will block on any
-- 
2.39.2



[PATCH v2 0/4] V3D module changes for Pi5

2023-10-30 Thread Iago Toral Quiroga
This series includes patches to update the V3D kernel module
that drives the VideoCore VI GPU in Raspberry Pi 4 to also support
the Video Core VII iteration present in Raspberry Pi 5.

The first patch in the series adds a small uAPI update required for
TFU jobs, the second patch addresses the bulk of the work and
involves mostly updates to register addresses, the third and fourth
patches match the 'brcm,2712-v3d' device string from Pi5 with the
V3D driver.

The changes for the user-space driver can be found in the
corresponding Mesa MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450

Iago Toral Quiroga (4):
  drm/v3d: update UAPI to match user-space for V3D 7.x
  drm/v3d: fix up register addresses for V3D 7.x
  dt-bindings: gpu: v3d: Add BCM2712's compatible
  drm/v3d: add brcm,2712-v3d as a compatible V3D device

 .../devicetree/bindings/gpu/brcm,bcm-v3d.yaml |   1 +
 drivers/gpu/drm/v3d/v3d_debugfs.c | 178 ++
 drivers/gpu/drm/v3d/v3d_drv.c |   1 +
 drivers/gpu/drm/v3d/v3d_gem.c |   4 +-
 drivers/gpu/drm/v3d/v3d_irq.c |  46 +++--
 drivers/gpu/drm/v3d/v3d_regs.h|  94 +
 drivers/gpu/drm/v3d/v3d_sched.c   |  38 ++--
 include/uapi/drm/v3d_drm.h|   5 +
 8 files changed, 211 insertions(+), 156 deletions(-)

-- 
2.39.2



Re: [PATCH 1/2] drm/v3d: wait for all jobs to finish before unregistering

2023-10-23 Thread Iago Toral
El lun, 23-10-2023 a las 07:58 -0300, Maíra Canal escribió:
> Currently, we are only warning the user if the BIN or RENDER jobs
> don't
> finish before we unregister V3D. We must wait for all jobs to finish
> before unregistering. Therefore, warn the user if TFU or CSD jobs
> are not done by the time the driver is unregistered.
> 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 2e94ce788c71..afa7d170d1ff 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -1072,6 +1072,8 @@ v3d_gem_destroy(struct drm_device *dev)
>  */
> WARN_ON(v3d->bin_job);
> WARN_ON(v3d->render_job);
> +   WARN_ON(v3d->tfu_job);
> +   WARN_ON(v3d->csd_job);

I guess we should do this for cache clean jobs too, right?

Iago

>  
> drm_mm_takedown(>mm);
>  



[PATCH 2/3] drm/v3d: update UAPI to match user-space for V3D 7.x

2023-09-28 Thread Iago Toral Quiroga
V3D t.x takes a new parameter to configure TFU jobs that needs
to be provided by user space.
---
 include/uapi/drm/v3d_drm.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 3dfc0af8756a..1a7d7a689de3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -319,6 +319,11 @@ struct drm_v3d_submit_tfu {
 
/* Pointer to an array of ioctl extensions*/
__u64 extensions;
+
+   struct {
+   __u32 ioc;
+   __u32 pad;
+   } v71;
 };
 
 /* Submits a compute shader for dispatch.  This job will block on any
-- 
2.39.2



[PATCH 1/3] drm/v3d: fix up register addresses for V3D 7.x

2023-09-28 Thread Iago Toral Quiroga
---
 drivers/gpu/drm/v3d/v3d_debugfs.c | 173 +-
 drivers/gpu/drm/v3d/v3d_gem.c |   3 +
 drivers/gpu/drm/v3d/v3d_irq.c |  47 
 drivers/gpu/drm/v3d/v3d_regs.h|  51 -
 drivers/gpu/drm/v3d/v3d_sched.c   |  41 ---
 5 files changed, 200 insertions(+), 115 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c 
b/drivers/gpu/drm/v3d/v3d_debugfs.c
index 330669f51fa7..90b2b5b2710c 100644
--- a/drivers/gpu/drm/v3d/v3d_debugfs.c
+++ b/drivers/gpu/drm/v3d/v3d_debugfs.c
@@ -12,69 +12,83 @@
 #include "v3d_drv.h"
 #include "v3d_regs.h"
 
-#define REGDEF(reg) { reg, #reg }
+#define REGDEF(min_ver, max_ver, reg) { min_ver, max_ver, reg, #reg }
 struct v3d_reg_def {
+   u32 min_ver;
+   u32 max_ver;
u32 reg;
const char *name;
 };
 
 static const struct v3d_reg_def v3d_hub_reg_defs[] = {
-   REGDEF(V3D_HUB_AXICFG),
-   REGDEF(V3D_HUB_UIFCFG),
-   REGDEF(V3D_HUB_IDENT0),
-   REGDEF(V3D_HUB_IDENT1),
-   REGDEF(V3D_HUB_IDENT2),
-   REGDEF(V3D_HUB_IDENT3),
-   REGDEF(V3D_HUB_INT_STS),
-   REGDEF(V3D_HUB_INT_MSK_STS),
-
-   REGDEF(V3D_MMU_CTL),
-   REGDEF(V3D_MMU_VIO_ADDR),
-   REGDEF(V3D_MMU_VIO_ID),
-   REGDEF(V3D_MMU_DEBUG_INFO),
+   REGDEF(33, 42, V3D_HUB_AXICFG),
+   REGDEF(33, 71, V3D_HUB_UIFCFG),
+   REGDEF(33, 71, V3D_HUB_IDENT0),
+   REGDEF(33, 71, V3D_HUB_IDENT1),
+   REGDEF(33, 71, V3D_HUB_IDENT2),
+   REGDEF(33, 71, V3D_HUB_IDENT3),
+   REGDEF(33, 71, V3D_HUB_INT_STS),
+   REGDEF(33, 71, V3D_HUB_INT_MSK_STS),
+
+   REGDEF(33, 71, V3D_MMU_CTL),
+   REGDEF(33, 71, V3D_MMU_VIO_ADDR),
+   REGDEF(33, 71, V3D_MMU_VIO_ID),
+   REGDEF(33, 71, V3D_MMU_DEBUG_INFO),
+
+   REGDEF(71, 71, V3D_V7_GMP_STATUS),
+   REGDEF(71, 71, V3D_V7_GMP_CFG),
+   REGDEF(71, 71, V3D_V7_GMP_VIO_ADDR),
 };
 
 static const struct v3d_reg_def v3d_gca_reg_defs[] = {
-   REGDEF(V3D_GCA_SAFE_SHUTDOWN),
-   REGDEF(V3D_GCA_SAFE_SHUTDOWN_ACK),
+   REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN),
+   REGDEF(33, 33, V3D_GCA_SAFE_SHUTDOWN_ACK),
 };
 
 static const struct v3d_reg_def v3d_core_reg_defs[] = {
-   REGDEF(V3D_CTL_IDENT0),
-   REGDEF(V3D_CTL_IDENT1),
-   REGDEF(V3D_CTL_IDENT2),
-   REGDEF(V3D_CTL_MISCCFG),
-   REGDEF(V3D_CTL_INT_STS),
-   REGDEF(V3D_CTL_INT_MSK_STS),
-   REGDEF(V3D_CLE_CT0CS),
-   REGDEF(V3D_CLE_CT0CA),
-   REGDEF(V3D_CLE_CT0EA),
-   REGDEF(V3D_CLE_CT1CS),
-   REGDEF(V3D_CLE_CT1CA),
-   REGDEF(V3D_CLE_CT1EA),
-
-   REGDEF(V3D_PTB_BPCA),
-   REGDEF(V3D_PTB_BPCS),
-
-   REGDEF(V3D_GMP_STATUS),
-   REGDEF(V3D_GMP_CFG),
-   REGDEF(V3D_GMP_VIO_ADDR),
-
-   REGDEF(V3D_ERR_FDBGO),
-   REGDEF(V3D_ERR_FDBGB),
-   REGDEF(V3D_ERR_FDBGS),
-   REGDEF(V3D_ERR_STAT),
+   REGDEF(33, 71, V3D_CTL_IDENT0),
+   REGDEF(33, 71, V3D_CTL_IDENT1),
+   REGDEF(33, 71, V3D_CTL_IDENT2),
+   REGDEF(33, 71, V3D_CTL_MISCCFG),
+   REGDEF(33, 71, V3D_CTL_INT_STS),
+   REGDEF(33, 71, V3D_CTL_INT_MSK_STS),
+   REGDEF(33, 71, V3D_CLE_CT0CS),
+   REGDEF(33, 71, V3D_CLE_CT0CA),
+   REGDEF(33, 71, V3D_CLE_CT0EA),
+   REGDEF(33, 71, V3D_CLE_CT1CS),
+   REGDEF(33, 71, V3D_CLE_CT1CA),
+   REGDEF(33, 71, V3D_CLE_CT1EA),
+
+   REGDEF(33, 71, V3D_PTB_BPCA),
+   REGDEF(33, 71, V3D_PTB_BPCS),
+
+   REGDEF(33, 41, V3D_GMP_STATUS),
+   REGDEF(33, 41, V3D_GMP_CFG),
+   REGDEF(33, 41, V3D_GMP_VIO_ADDR),
+
+   REGDEF(33, 71, V3D_ERR_FDBGO),
+   REGDEF(33, 71, V3D_ERR_FDBGB),
+   REGDEF(33, 71, V3D_ERR_FDBGS),
+   REGDEF(33, 71, V3D_ERR_STAT),
 };
 
 static const struct v3d_reg_def v3d_csd_reg_defs[] = {
-   REGDEF(V3D_CSD_STATUS),
-   REGDEF(V3D_CSD_CURRENT_CFG0),
-   REGDEF(V3D_CSD_CURRENT_CFG1),
-   REGDEF(V3D_CSD_CURRENT_CFG2),
-   REGDEF(V3D_CSD_CURRENT_CFG3),
-   REGDEF(V3D_CSD_CURRENT_CFG4),
-   REGDEF(V3D_CSD_CURRENT_CFG5),
-   REGDEF(V3D_CSD_CURRENT_CFG6),
+   REGDEF(41, 71, V3D_CSD_STATUS),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG0),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG1),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG2),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG3),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG4),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG5),
+   REGDEF(41, 41, V3D_CSD_CURRENT_CFG6),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG0),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG1),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG2),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG3),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG4),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG5),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG6),
+   REGDEF(71, 71, V3D_V7_CSD_CURRENT_CFG7),
 };
 
 static int v3d_v3d_debugfs_regs(struct seq_file *m, void *unused)
@@ -85,38 +99,37 @@ static int v3d_v3d_debugfs_regs(struct seq_file *m, void 
*unused)
int i, core;
 
 

[PATCH 0/3] V3D module changes for Pi5

2023-09-28 Thread Iago Toral Quiroga
This series includes patches to update the V3D kernel module
that drives the VideoCore VI GPU in Raspberry Pi 4 to also support
the Video Core VII iteration present in Raspberry Pi 5.

The first patch in the series addresses the bulk of the work and
involves mostly updates to register addresses. The second patch
adds a small uAPI update required for TFU jobs and the third and
final patch matches the 'brcm,2712-v3d' device string from Pi5
with the V3D driver.

The changes for the user-space driver can be found in the
corresponding Mesa MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450

Iago Toral Quiroga (3):
  drm/v3d: fix up register addresses for V3D 7.x
  drm/v3d: update UAPI to match user-space for V3D 7.x
  drm/v3d: add brcm,2712-v3d as a compatible V3D device

 drivers/gpu/drm/v3d/v3d_debugfs.c | 173 +-
 drivers/gpu/drm/v3d/v3d_drv.c |   1 +
 drivers/gpu/drm/v3d/v3d_gem.c |   3 +
 drivers/gpu/drm/v3d/v3d_irq.c |  47 
 drivers/gpu/drm/v3d/v3d_regs.h|  51 -
 drivers/gpu/drm/v3d/v3d_sched.c   |  41 ---
 include/uapi/drm/v3d_drm.h|   5 +
 7 files changed, 206 insertions(+), 115 deletions(-)

-- 
2.39.2



[PATCH 3/3] drm/v3d: add brcm,2712-v3d as a compatible V3D device

2023-09-28 Thread Iago Toral Quiroga
---
 drivers/gpu/drm/v3d/v3d_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index ffbbe9d527d3..0ed2e7ba8b33 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -186,6 +186,7 @@ static const struct drm_driver v3d_drm_driver = {
 };
 
 static const struct of_device_id v3d_of_match[] = {
+   { .compatible = "brcm,2712-v3d" },
{ .compatible = "brcm,2711-v3d" },
{ .compatible = "brcm,7268-v3d" },
{ .compatible = "brcm,7278-v3d" },
-- 
2.39.2



Re: [PATCH] MAINTAINERS: add Melissa to V3D maintainers

2022-05-01 Thread Iago Toral
Acked-by: Iago Toral Quiroga 

On Fri, 2022-04-29 at 18:33 -0100, Melissa Wen wrote:
> I've been contributing to v3d through improvements, reviews, testing,
> debugging, etc. So, I'm adding myself as a co-maintainer of the V3D
> driver.
> 
> Signed-off-by: Melissa Wen 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9b2b0dc44506..0a854b7f2491 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6714,6 +6714,7 @@ F:  drivers/gpu/drm/omapdrm/
>  
>  DRM DRIVERS FOR V3D
>  M:   Emma Anholt 
> +M:   Melissa Wen 
>  S:   Supported
>  T:   git git://anongit.freedesktop.org/drm/drm-misc
>  F:   Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml



Re: [PATCH v3 4/4] drm/v3d: add multiple syncobjs support

2021-10-01 Thread Iago Toral
On Fri, 2021-10-01 at 09:37 +0100, Melissa Wen wrote:
> On 10/01, Iago Toral wrote:
> > On Thu, 2021-09-30 at 17:19 +0100, Melissa Wen wrote:
> > > Using the generic extension from the previous patch, a specific
> > > multisync
> > > extension enables more than one in/out binary syncobj per job
> > > submission.
> > > Arrays of syncobjs are set in struct drm_v3d_multisync, that also
> > > cares
> > > of determining the stage for sync (wait deps) according to the
> > > job
> > > queue.
> > > 
> > > v2:
> > > - subclass the generic extension struct (Daniel)
> > > - simplify adding dependency conditions to make understandable
> > > (Iago)
> > > 
> > > v3:
> > > - fix conditions to consider single or multiples in/out_syncs
> > > (Iago)
> > > - remove irrelevant comment (Iago)
> > > 
> > > Signed-off-by: Melissa Wen 
> > > ---
> > >  drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
> > >  drivers/gpu/drm/v3d/v3d_drv.h |  24 +++--
> > >  drivers/gpu/drm/v3d/v3d_gem.c | 185
> > > ++
> > > 
> > >  include/uapi/drm/v3d_drm.h|  49 -
> > >  4 files changed, 232 insertions(+), 32 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> > > b/drivers/gpu/drm/v3d/v3d_drv.c
> > > index 3d6b9bcce2f7..bd46396a1ae0 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_drv.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> > > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct
> > > drm_device 
> > 
> > (...)
> > 
> > > @@ -516,9 +536,11 @@
> > > v3d_attach_fences_and_unlock_reservation(struct
> > > drm_file *file_priv,
> > >struct v3d_job *job,
> > >struct ww_acquire_ctx
> > > *acquire_ctx,
> > >u32 out_sync,
> > > +  struct v3d_submit_ext *se,
> > >struct dma_fence *done_fence)
> > >  {
> > >   struct drm_syncobj *sync_out;
> > > + bool has_multisync = se && (se->flags & 
> > 
> > We always pass the 'se' pointer from a local variable allocated in
> > the
> > stack of the caller so it is never going to be NULL.
> > 
> > I am happy to keep the NULL check if you want to protect against
> > future
> > changes just in case, but if we do that then...
> > 
> > > DRM_V3D_EXT_ID_MULTI_SYNC);
> > >   int i;
> > >  
> > >   for (i = 0; i < job->bo_count; i++) {
> > 
> > (...)
> > 
> > > +static void
> > > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
> > > +{
> > > + unsigned int i;
> > > +
> > > + if (!se->out_sync_count)
> > 
> > ...we should also check for NULL here for consistency.
> yes, consistency makes sense here.
> > Also, I think there is another problem in the code: we always call
> > v3d_job_cleanup for failed jobs, but only call v3d_job_put for
> > successful jobs. However, reading the docs for drm_sched_job_init:
> > 
> > "Drivers must make sure drm_sched_job_cleanup() if this function
> > returns successfully, even when @job is aborted before
> > drm_sched_job_arm() is called."
> > 
> > So my understanding is that we should call v3d_job_cleanup instead
> > of
> > v3d_job_put for successful jobs or we would be leaking sched
> > resources
> > on every successful job, no?
> 
> When job_init is successful, v3d_job_cleanup is called by scheduler
> when
> job is completed. drm_sched_job_cleanup describes how it works after
> drm_sched_job_arm:
> 
> " After that point of no return @job is committed to be executed by
> the
> scheduler, and this function should be called from the
> _sched_backend_ops.free_job callback."
> 
> On v3d_sched.c, .free_job points to v3d_sched_job_free(), which in
> turn
> calls v3d_job_cleanup() (and then drm_sched_job_cleanup). So, it
> looks
> ok.
> 
> Also, we can say that the very last v3d_job_put() is in charge to
> decrement refcount initialized (set 1) in v3d_job_init(); while the
> v3d_job_cleanup() from v3d_sched_job_free() callback decrements
> refcount
> that was incremented when v3d_job_push() pushed the job to the
> scheduler.
> 
> So, refcount pairs seem consistent, I mean, get and put. And about
> drm_sched_job_cleanup, it is explicitly called when job_init fails or
> after that by the scheduler.
> 

   A. Ah ok, thanks for explaining this!

With the consistency issue discussed above fixed, for the series:

Reviewed-by: Iago Toral Quiroga 

Iago



Re: [PATCH v3 4/4] drm/v3d: add multiple syncobjs support

2021-10-01 Thread Iago Toral
On Thu, 2021-09-30 at 17:19 +0100, Melissa Wen wrote:
> Using the generic extension from the previous patch, a specific
> multisync
> extension enables more than one in/out binary syncobj per job
> submission.
> Arrays of syncobjs are set in struct drm_v3d_multisync, that also
> cares
> of determining the stage for sync (wait deps) according to the job
> queue.
> 
> v2:
> - subclass the generic extension struct (Daniel)
> - simplify adding dependency conditions to make understandable (Iago)
> 
> v3:
> - fix conditions to consider single or multiples in/out_syncs (Iago)
> - remove irrelevant comment (Iago)
> 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
>  drivers/gpu/drm/v3d/v3d_drv.h |  24 +++--
>  drivers/gpu/drm/v3d/v3d_gem.c | 185 ++
> 
>  include/uapi/drm/v3d_drm.h|  49 -
>  4 files changed, 232 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 3d6b9bcce2f7..bd46396a1ae0 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device 

(...)

> 
> @@ -516,9 +536,11 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>struct v3d_job *job,
>struct ww_acquire_ctx
> *acquire_ctx,
>u32 out_sync,
> +  struct v3d_submit_ext *se,
>struct dma_fence *done_fence)
>  {
>   struct drm_syncobj *sync_out;
> + bool has_multisync = se && (se->flags & 

We always pass the 'se' pointer from a local variable allocated in the
stack of the caller so it is never going to be NULL.

I am happy to keep the NULL check if you want to protect against future
changes just in case, but if we do that then...

> DRM_V3D_EXT_ID_MULTI_SYNC);
>   int i;
>  
>   for (i = 0; i < job->bo_count; i++) {

(...)

> +static void
> +v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
> +{
> + unsigned int i;
> +
> + if (!se->out_sync_count)

...we should also check for NULL here for consistency.

Also, I think there is another problem in the code: we always call
v3d_job_cleanup for failed jobs, but only call v3d_job_put for
successful jobs. However, reading the docs for drm_sched_job_init:

"Drivers must make sure drm_sched_job_cleanup() if this function
returns successfully, even when @job is aborted before
drm_sched_job_arm() is called."

So my understanding is that we should call v3d_job_cleanup instead of
v3d_job_put for successful jobs or we would be leaking sched resources
on every successful job, no?

Iago

> + return;
> +
> + for (i = 0; i < se->out_sync_count; i++)
> + drm_syncobj_put(se->out_syncs[i].syncobj);
> + kvfree(se->out_syncs);
> +}
> +
> +static int
> +v3d_get_multisync_post_deps(struct drm_file *file_priv,
> + struct v3d_submit_ext *se,
> + u32 count, u64 handles)
> +{
> + struct drm_v3d_sem __user *post_deps;
> + int i, ret;
> +
> + if (!count)
> + return 0;
> +
> + se->out_syncs = (struct v3d_submit_outsync *)
> + kvmalloc_array(count,
> +sizeof(struct
> v3d_submit_outsync),
> +GFP_KERNEL);
> + if (!se->out_syncs)
> + return -ENOMEM;
> +
> + post_deps = u64_to_user_ptr(handles);
> +
> + for (i = 0; i < count; i++) {
> + struct drm_v3d_sem out;
> +
> + ret = copy_from_user(, post_deps++, sizeof(out));
> + if (ret) {
> + DRM_DEBUG("Failed to copy post dep handles\n");
> + goto fail;
> + }
> +
> + se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
> + out.handle)
> ;
> + if (!se->out_syncs[i].syncobj) {
> + ret = -EINVAL;
> + goto fail;
> + }
>   }
> + se->out_sync_count = count;
> +
> + return 0;
> +
> +fail:
> + for (i--; i >= 0; i--)
> + drm_syncobj_put(se->out_syncs[i].syncobj);
> + kvfree(se->out_syncs);
> +
> + return ret;
> +}
> +
> +/* Get data for multiple binary semaphores synchronization. Parse
> syncobj
> + * to be signaled when job completes (out_sync).
> + */
> +static int
> +v3d_get_multisync_submit_deps(struct drm_file *file_priv,
> +   struct drm_v3d_extension __user *ext,
> +   void *data)
> +{
> + struct drm_v3d_multi_sync multisync;
> + struct v3d_submit_ext *se = data;
> + int ret;
> +
> + ret = copy_from_user(, ext, sizeof(multisync));
> + if (ret)
> + return ret;
> +
> + if 

Re: [PATCH v2 3/4] drm/v3d: add generic ioctl extension

2021-09-30 Thread Iago Toral
On Thu, 2021-09-30 at 10:22 +0100, Melissa Wen wrote:
> > > 
> O 09/30, Iago Toral wrote:
> > On Wed, 2021-09-29 at 10:44 +0100, Melissa Wen wrote:
(...) 
> > >  /**
> > >   * struct drm_v3d_submit_cl - ioctl argument for submitting
> > > commands
> > > to the 3D
> > > @@ -135,12 +149,16 @@ struct drm_v3d_submit_cl {
> > >   /* Number of BO handles passed in (size is that times 4). */
> > >   __u32 bo_handle_count;
> > >  
> > > + /* DRM_V3D_SUBMIT_* properties */
> > >   __u32 flags;
> > >  
> > >   /* ID of the perfmon to attach to this job. 0 means no perfmon.
> > > */
> > >   __u32 perfmon_id;
> > >  
> > >   __u32 pad;
> > > +
> > > + /* Pointer to an array of ioctl extensions*/
> > > + __u64 extensions;
> > >  };
> > >  
> > >  /**
> > > @@ -248,6 +266,12 @@ struct drm_v3d_submit_tfu {
> > >   __u32 in_sync;
> > >   /* Sync object to signal when the TFU job is done. */
> > >   __u32 out_sync;
> > > +
> > > + __u32 flags;
> > > +
> > > + /* Pointer to an array of ioctl extensions*/
> > > + __u64 extensions;
> > 
> > We want __u64 fields aligned to 64-bit so we should swap the
> > positions
> > of flags and extensions.
> 
> hmm.. not sure. before two arrays of 4 x _u32 elements, we have seven
> _u32 elements... this is why I counted a odd number of _u32 and put
> _u32
> flags before _u64 extensions... or is it working different for array
> types?
> 

Ah yes, I was confused by the patch format, but you're right.

> For the same reason, I think there is an unalignment issue on
> submit_csd that would need to change the current interface to solve
> (afaiu)... 
> 

Yes, that one is not aligned, but it is too late to fix now without
braking the interface. We have not seen any issues caused by that on
32-bit Raspbian though.

Iago

> > > +
> > >  };
> > >  
> > >  /* Submits a compute shader for dispatch.  This job will block
> > > on
> > > any
> > > @@ -276,6 +300,13 @@ struct drm_v3d_submit_csd {
> > >  
> > >   /* ID of the perfmon to attach to this job. 0 means no perfmon.
> > > */
> > >   __u32 perfmon_id;
> > > +
> > > + /* Pointer to an array of ioctl extensions*/
> > > + __u64 extensions;
> > > +
> > > + __u32 flags;
> > > +
> > > + __u32 pad;
> > >  };
> > >  
> > >  enum {



Re: [PATCH v2 4/4] drm/v3d: add multiple syncobjs support

2021-09-30 Thread Iago Toral
On Wed, 2021-09-29 at 10:45 +0100, Melissa Wen wrote:
> Using the generic extension from the previous patch, a specific
> multisync
> extension enables more than one in/out binary syncobj per job
> submission.
> Arrays of syncobjs are set in struct drm_v3d_multisync, that also
> cares
> of determining the stage for sync (wait deps) according to the job
> queue.
> 
> v2:
> - subclass the generic extension struct (Daniel)
> - simplify adding dependency conditions to make understandable (Iago)
> 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
>  drivers/gpu/drm/v3d/v3d_drv.h |  23 +++--
>  drivers/gpu/drm/v3d/v3d_gem.c | 176 ++
> 
>  include/uapi/drm/v3d_drm.h|  49 +-
>  4 files changed, 224 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 3d6b9bcce2f7..bd46396a1ae0 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>   case DRM_V3D_PARAM_SUPPORTS_PERFMON:
>   args->value = (v3d->ver >= 40);
>   return 0;
> + case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
> + args->value = 1;
> + return 0;
>   default:
>   DRM_DEBUG("Unknown parameter %d\n", args->param);
>   return -EINVAL;
> @@ -135,9 +138,8 @@ v3d_postclose(struct drm_device *dev, struct
> drm_file *file)
>   struct v3d_file_priv *v3d_priv = file->driver_priv;
>   enum v3d_queue q;
>  
> - for (q = 0; q < V3D_MAX_QUEUES; q++) {
> + for (q = 0; q < V3D_MAX_QUEUES; q++)
>   drm_sched_entity_destroy(_priv->sched_entity[q]);
> - }
>  
>   v3d_perfmon_close_file(v3d_priv);
>   kfree(v3d_priv);
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index b900a050d5e2..b14ff1e96f49 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -19,15 +19,6 @@ struct reset_control;
>  
>  #define GMP_GRANULARITY (128 * 1024)
>  
> -/* Enum for each of the V3D queues. */
> -enum v3d_queue {
> - V3D_BIN,
> - V3D_RENDER,
> - V3D_TFU,
> - V3D_CSD,
> - V3D_CACHE_CLEAN,
> -};
> -
>  #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
>  
>  struct v3d_queue_state {
> @@ -294,6 +285,20 @@ struct v3d_csd_job {
>   struct drm_v3d_submit_csd args;
>  };
>  
> +struct v3d_submit_outsync {
> + struct drm_syncobj *syncobj;
> +};
> +
> +struct v3d_submit_ext {
> + u32 wait_stage;
> +
> + u32 in_sync_count;
> + u64 in_syncs;
> +
> + u32 out_sync_count;
> + struct v3d_submit_outsync *out_syncs;
> +};
> +
>  /**
>   * __wait_for - magic wait macro
>   *
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index b912419027f7..e92998d39eaa 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -454,11 +454,12 @@ v3d_job_add_deps(struct drm_file *file_priv,
> struct v3d_job *job,
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
>void **container, size_t size, void (*free)(struct kref
> *ref),
> -  u32 in_sync, enum v3d_queue queue)
> +  u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue
> queue)
>  {
>   struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
>   struct v3d_job *job;
> - int ret;
> + bool has_multisync = (se && se->in_sync_count);

I think this is not correct... we could be using the multisync
interface and pass 0 in_syncs and/or 0 out_syncs... what should
determine if we take the multisync path is the presence of the
extension alone.

> + int ret, i;
>  
>   *container = kcalloc(1, size, GFP_KERNEL);
>   if (!*container) {
> @@ -479,9 +480,28 @@ v3d_job_init(struct v3d_dev *v3d, struct
> drm_file *file_priv,
>   if (ret)
>   goto fail;
>  
> - ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> - if (ret)
> - goto fail_job;
> + if (has_multisync) {
> + if (se->wait_stage == queue) {
> + struct drm_v3d_sem __user *handle =
> u64_to_user_ptr(se->in_syncs);
> +
> + for (i = 0; i < se->in_sync_count; i++) {
> + struct drm_v3d_sem in;
> +
> + ret = copy_from_user(, handle++,
> sizeof(in));
> + if (ret) {
> + DRM_DEBUG("Failed to copy wait
> dep handle.\n");
> + goto fail_job;
> + }
> + ret = v3d_job_add_deps(file_priv, job,
> in.handle, 0);
> + if (ret)
> + goto fail_job;
> + }
> + }
> + } else {
> + ret = v3d_job_add_deps(file_priv, job, 

Re: [PATCH v2 3/4] drm/v3d: add generic ioctl extension

2021-09-30 Thread Iago Toral
On Wed, 2021-09-29 at 10:44 +0100, Melissa Wen wrote:
> Add support to attach generic extensions on job submission. This
> patch
> is third prep work to enable multiple syncobjs on job submission.
> With
> this work, when the job submission interface needs to be extended to
> accomodate a new feature, we will use a generic extension struct
> where
> an id determines the data type to be pointed. The first application
> is
> to enable multiples in/out syncobj (next patch), but the base is
> already done for future features. Therefore, to attach a new feature,
> a specific extension struct should subclass drm_v3d_extension and
> update the list of extensions in a job submission.
> 
> v2:
> - remove redundant elements to subclass struct (Daniel)
> 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |  4 +-
>  drivers/gpu/drm/v3d/v3d_gem.c | 71
> +--
>  include/uapi/drm/v3d_drm.h| 31 +++
>  3 files changed, 100 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index c1deab2cf38d..3d6b9bcce2f7 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -83,7 +83,6 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>   return 0;
>   }
>  
> -
>   switch (args->param) {
>   case DRM_V3D_PARAM_SUPPORTS_TFU:
>   args->value = 1;
> @@ -147,7 +146,7 @@ v3d_postclose(struct drm_device *dev, struct
> drm_file *file)
>  DEFINE_DRM_GEM_FOPS(v3d_drm_fops);
>  
>  /* DRM_AUTH is required on SUBMIT_CL for now, while we don't have
> GMP
> - * protection between clients.  Note that render nodes would be be
> + * protection between clients.  Note that render nodes would be
>   * able to submit CLs that could access BOs from clients
> authenticated
>   * with the master node.  The TFU doesn't use the GMP, so it would
>   * need to stay DRM_AUTH until we do buffer size/offset validation.
> @@ -219,7 +218,6 @@ static int v3d_platform_drm_probe(struct
> platform_device *pdev)
>   u32 mmu_debug;
>   u32 ident1;
>  
> -
>   v3d = devm_drm_dev_alloc(dev, _drm_driver, struct v3d_dev,
> drm);
>   if (IS_ERR(v3d))
>   return PTR_ERR(v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 9cfa6f8d4357..b912419027f7 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -535,6 +535,33 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>   }
>  }
>  
> +static int
> +v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
> +{
> + struct drm_v3d_extension __user *user_ext;
> +
> + user_ext = u64_to_user_ptr(ext_handles);
> + while(user_ext) {
> + struct drm_v3d_extension ext;
> +
> + if (copy_from_user(, user_ext, sizeof(ext))) {
> + DRM_DEBUG("Failed to copy submit extension\n");
> + return -EFAULT;
> + }
> +
> + switch (ext.id) {
> + case 0:
> + default:
> + DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> ext.id);
> + return -EINVAL;
> + }
> +
> + user_ext = u64_to_user_ptr(ext.next);
> + }
> +
> + return 0;
> +}
> +
>  /**
>   * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
>   * @dev: DRM device
> @@ -563,15 +590,24 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>  
>   trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args-
> >rcl_end);
>  
> - if (args->pad != 0)
> + if (args->pad)
>   return -EINVAL;
>  
> - if (args->flags != 0 &&
> - args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> + if (args->flags &&
> + args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
> + DRM_V3D_SUBMIT_EXTENSION)) {
>   DRM_INFO("invalid flags: %d\n", args->flags);
>   return -EINVAL;
>   }
>  
> + if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> + ret = v3d_get_extensions(file_priv, args->extensions);
> + if (ret) {
> + DRM_DEBUG("Failed to get extensions.\n");
> + return ret;
> + }
> + }
> +
>   ret = v3d_job_init(v3d, file_priv, (void *),
> sizeof(*render),
>  v3d_render_job_free, args->in_sync_bcl,
> V3D_RENDER);
>   if (ret)
> @@ -700,6 +736,19 @@ v3d_submit_tfu_ioctl(struct drm_device *dev,
> void *data,
>  
>   trace_v3d_submit_tfu_ioctl(>drm, args->iia);
>  
> + if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> + DRM_DEBUG("invalid flags: %d\n", args->flags);
> + return -EINVAL;
> + }
> +
> + if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> + ret = v3d_get_extensions(file_priv, args->extensions);
> + if 

Re: [PATCH v2 2/4] drm/v3d: alloc and init job in one shot

2021-09-30 Thread Iago Toral
On Wed, 2021-09-29 at 10:43 +0100, Melissa Wen wrote:
> Move job memory allocation to v3d_job_init function. This aim to
> facilitate
> error handling in job initialization, since cleanup steps are similar
> for all
> (struct v3d_job)-based types of job involved in a command submission.
> To
> generalize v3d_job_init(), this change takes into account that all
> job structs
> have the first element a struct v3d_job (bin, render, tfu, csd) or it
> is a
> v3d_job itself (clean_job) for pointer casting.
> 
> Suggested-by: Iago Toral 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 115 +---
> --
>  1 file changed, 42 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index e60fbc28ef29..9cfa6f8d4357 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -392,6 +392,9 @@ v3d_render_job_free(struct kref *ref)
>  
>  void v3d_job_cleanup(struct v3d_job *job)
>  {
> + if (!job)
> + return;
> +
>   drm_sched_job_cleanup(>base);
>   v3d_job_put(job);
>  }
> @@ -450,12 +453,20 @@ v3d_job_add_deps(struct drm_file *file_priv,
> struct v3d_job *job,
>  
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> -  struct v3d_job *job, void (*free)(struct kref *ref),
> +  void **container, size_t size, void (*free)(struct kref
> *ref),
>u32 in_sync, enum v3d_queue queue)
>  {
>   struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> + struct v3d_job *job;
>   int ret;
>  
> + *container = kcalloc(1, size, GFP_KERNEL);
> + if (!*container) {
> + DRM_ERROR("Cannot allocate memory for v3d job.");
> + return -ENOMEM;
> + }
> +
> + job = *container;
>   job->v3d = v3d;
>   job->free = free;
>  

Right below this hunk we have an early return that doesn't jump to
fail:

ret = pm_runtime_get_sync(v3d->drm.dev);
if (ret < 0)
return ret;


so we should kfree(*container) and set it to NULL there, right?

> @@ -479,6 +490,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file
> *file_priv,
>   drm_sched_job_cleanup(>base);
>  fail:
>   pm_runtime_put_autosuspend(v3d->drm.dev);
> + kfree(*container);
> + *container = NULL;
> +
>   return ret;
>  }
> 

(...)

> @@ -806,29 +789,15 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>   return -EINVAL;
>   }
>  
> - job = kcalloc(1, sizeof(*job), GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - ret = v3d_job_init(v3d, file_priv, >base,
> + ret = v3d_job_init(v3d, file_priv, (void *), sizeof(*job),
>  v3d_job_free, args->in_sync, V3D_CSD);
> - if (ret) {
> - kfree(job);
> - return ret;
> - }
> -
> - clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
> - if (!clean_job) {
> - v3d_job_cleanup(>base);
> - return -ENOMEM;
> - }
> + if (ret)
> + goto fail;
> 

For this to work we need to explicitly initialize clean_job to NULL. 
Notice that the same applies to the bin and clean jobs in the CL ioctl,
but in that case we're already initializing them to NULL.

> - ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0,
> V3D_CACHE_CLEAN);
> - if (ret) {
> - v3d_job_cleanup(>base);
> - kfree(clean_job);
> - return ret;
> - }
> + ret = v3d_job_init(v3d, file_priv, (void *)_job,
> sizeof(*clean_job),
> +v3d_job_free, 0, V3D_CACHE_CLEAN);
> + if (ret)
> + goto fail;
>  
>   job->args = *args;
>  
> @@ -877,7 +846,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> *data,
>   drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
>   _ctx);
>  fail:
> - v3d_job_cleanup(>base);
> + v3d_job_cleanup((void *)job);
>   v3d_job_cleanup(clean_job);
>  
>   return ret;



Re: [PATCH] drm/v3d: fix sched job resources cleanup when a job is aborted

2021-09-17 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

With that said, I don't like how we are doing error handling here, I
think we want to simplify this and try to make it so we centralize
error handling in one place instead of having multiple error exits
paths, each one trying to do the right thing at that point. This is
error prone, as this patch is showing.

Here is a proposal to make this better:

Make job memory allocation part of v3d_job_init. v3d_job init already
handles error conditions nicely. If we do this, we no longer need to
handle allocation errors in ioctls one by one and for any job we only
have two scenarios: v3d_job_init was successul or it failed (in which
case we know it already cleaned up after itself and we should have a
NULL job as a result). If v3d_job_init failed, then we *always* jump to
the fail tag and there we call v3d_job_cleanup for all jobs that can be
created in that ioctl. If a job is NULL then v3d_job_cleanup returns
early, otherwise, we know it is a fully initialized job, so it does 
drm_sched_job_cleanup + v3d_job_put.

I think that should make error handling in these functions a lot
easier.

Iago


On Thu, 2021-09-16 at 22:27 +0100, Melissa Wen wrote:
> In a cl submission, when bin job initialization fails, sched job
> resources
> were already allocated for the render job. At this point,
> drm_sched_job_init(render) was done in v3d_job_init but the render
> job is
> aborted before drm_sched_job_arm (in v3d_job_push) happens;
> therefore, not
> only v3d_job_put but also drm_sched_job_cleanup should be called (by
> v3d_job_cleanup). A similar issue is addressed for csd and tfu
> submissions.
> 
> The issue was noticed from a review by Iago Toral in a patch that
> touches
> the same part of the code.
> 
> Fixes: 916044fac8623 ("drm/v3d: Move drm_sched_job_init to
> v3d_job_init")
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 1953706bdaeb..ead0be8d48a7 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -567,14 +567,14 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>   if (args->bcl_start != args->bcl_end) {
>   bin = kcalloc(1, sizeof(*bin), GFP_KERNEL);
>   if (!bin) {
> - v3d_job_put(>base);
> + v3d_job_cleanup(>base);
> 
> > return -ENOMEM;
>   }
>  
>   ret = v3d_job_init(v3d, file_priv, >base,
>  v3d_job_free, args->in_sync_bcl,
> V3D_BIN);
>   if (ret) {
> - v3d_job_put(>base);
> + v3d_job_cleanup(>base);
>   kfree(bin);
>   return ret;
>   }
> @@ -716,7 +716,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>   job->base.bo = kcalloc(ARRAY_SIZE(args->bo_handles),
>  sizeof(*job->base.bo), GFP_KERNEL);
>   if (!job->base.bo) {
> - v3d_job_put(>base);
> + v3d_job_cleanup(>base);
>   return -ENOMEM;
>   }
>  
> @@ -810,14 +810,13 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  
>   clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
>   if (!clean_job) {
> - v3d_job_put(>base);
> - kfree(job);
> + v3d_job_cleanup(>base);
>   return -ENOMEM;
>   }
>  
>   ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0,
> V3D_CACHE_CLEAN);
>   if (ret) {
> - v3d_job_put(>base);
> + v3d_job_cleanup(>base);
>   kfree(clean_job);
>   return ret;
>   }



Re: [PATCH 3/3] drm/v3d: add multiple syncobjs support

2021-09-16 Thread Iago Toral
I think this looks good overall, I have a few questions/commments
below:

On Wed, 2021-08-18 at 18:57 +0100, Melissa Wen wrote:
> Using the generic extension support set in the previous patch, this
> patch enables more than one in/out binary syncobj per job submission.
> Arrays of syncobjs are set in a specific extension type (multisync)
> that also cares of determining the stage for sync (bin/render)
> through a flag - when this is the case.
> 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |   3 +
>  drivers/gpu/drm/v3d/v3d_drv.h |  14 +++
>  drivers/gpu/drm/v3d/v3d_gem.c | 209 +++-
> --
>  include/uapi/drm/v3d_drm.h|  38 +++
>  4 files changed, 226 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 6a0516160bb2..939ca8c833f5 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>   case DRM_V3D_PARAM_SUPPORTS_PERFMON:
>   args->value = (v3d->ver >= 40);
>   return 0;
> + case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
> + args->value = 1;
> + return 0;
>   default:
>   DRM_DEBUG("Unknown parameter %d\n", args->param);
>   return -EINVAL;
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index b900a050d5e2..544c60404a0f 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -294,6 +294,20 @@ struct v3d_csd_job {
>   struct drm_v3d_submit_csd args;
>  };
>  
> +struct v3d_submit_outsync {
> + struct drm_syncobj *syncobj;
> +};
> +
> +struct v3d_submit_ext {
> + u32 flags;
> +
> + u32 in_sync_count;
> + u64 in_syncs;
> +
> + u32 out_sync_count;
> + struct v3d_submit_outsync *out_syncs;
> +};
> +
>  /**
>   * __wait_for - magic wait macro
>   *
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index e254919b6c5e..e7aabe1a0e11 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -392,6 +392,9 @@ v3d_render_job_free(struct kref *ref)
>  
>  void v3d_job_cleanup(struct v3d_job *job)
>  {
> + if (!job)
> + return;
> +
>   drm_sched_job_cleanup(>base);
>   v3d_job_put(job);
>  }
> @@ -451,10 +454,11 @@ v3d_job_add_deps(struct drm_file *file_priv,
> struct v3d_job *job,
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
>struct v3d_job *job, void (*free)(struct kref *ref),
> -  u32 in_sync, enum v3d_queue queue)
> +  u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue
> queue)
>  {
>   struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> - int ret;
> + bool has_multinsync = (se && se->in_sync_count);
> + int ret, i;
>  
>   job->v3d = v3d;
>   job->free = free;
> @@ -463,14 +467,30 @@ v3d_job_init(struct v3d_dev *v3d, struct
> drm_file *file_priv,
>   if (ret < 0)
>   return ret;
>  
> - ret = drm_sched_job_init(>base, _priv-
> >sched_entity[queue],
> -  v3d_priv);
> + ret = drm_sched_job_init(>base, _priv-
> >sched_entity[queue], v3d_priv);
>   if (ret)
>   goto fail;
>  
> - ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> - if (ret)
> - goto fail_job;
> + if (has_multinsync && (se->flags == (queue == V3D_RENDER))) {

I think this is unnecessarily difficult to understand, I'd suggest to
code the condition more explicitly:

if (has_multisync &&
(see->flags & DRM_V3D_IN_SYNC_RCL) &&
queue ==  V3D_RENDER)

unless I am missing the point here :)

> + struct drm_v3d_sem __user *handle = u64_to_user_ptr(se-
> >in_syncs);
> +
> + for (i = 0; i < se->in_sync_count; i++) {
> + struct drm_v3d_sem in;
> +
> + ret = copy_from_user(, handle++, sizeof(
> in));
> + if (ret) {
> + DRM_DEBUG("Failed to copy wait dep
> handle.\n");
> + goto fail_job;
> + }
> + ret = v3d_job_add_deps(file_priv, job,
> in.handle, 0);
> + if (ret)
> + goto fail_job;
> + }
> + } else if (!has_multinsync) {
> + ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> + if (ret)
> + goto fail_job;
> + }
>  
>   kref_init(>refcount);
>  
> @@ -500,6 +520,7 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>struct v3d_job *job,
>struct ww_acquire_ctx
> *acquire_ctx,
>u32 out_sync,
> +  struct 

Re: [PATCH 2/3] drm/v3d: add generic ioctl extension

2021-09-15 Thread Iago Toral
On Wed, 2021-09-15 at 17:28 +0100, Melissa Wen wrote:
> On 09/15, Iago Toral wrote:
> > On Wed, 2021-08-18 at 18:56 +0100, Melissa Wen wrote:
(...)
> > > 
> > >  /**
> > > @@ -248,6 +266,15 @@ struct drm_v3d_submit_tfu {
> > >   __u32 in_sync;
> > >   /* Sync object to signal when the TFU job is done. */
> > >   __u32 out_sync;
> > > +
> > > + /* Number of extensions*/
> > > + __u32 extension_count;
> > > +
> > > + /* Pointer to an array of ioctl extensions*/
> > > + __u64 extensions;
> > > +
> > > + /* DRM_V3D_SUBMIT_* properties */
> > > + __u32 flags;
> > 
> > A silly nit: maybe put flags before the extension fields above for
> > consistency with the CSD and CL submission commands.
> 
> hmm.. I arranged it that way for alignment reasons (afaiu), but I can
> (or should) include a _u32 pad right after out_sync to set these in
> the
> same sequence.

Ah, that's fine, my suggestion was just for style, let's keep it as is.

Iago




Re: [PATCH 2/3] drm/v3d: add generic ioctl extension

2021-09-15 Thread Iago Toral
On Wed, 2021-08-18 at 18:56 +0100, Melissa Wen wrote:
> Add support to attach generic extensions on job submission.
> This patch is a second prep work to enable multiple syncobjs on job
> submission. With this work, when the job submission interface needs
> to be extended to accomodate a new feature, we will use a generic
> extension struct where an id determines the data type to be pointed.
> The first application is to enable multiples in/out syncobj (next
> patch), but the base is already done for future features.
> 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |  4 +-
>  drivers/gpu/drm/v3d/v3d_gem.c | 80 -
> --
>  include/uapi/drm/v3d_drm.h| 38 -
>  3 files changed, 113 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 9403c3b36aca..6a0516160bb2 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -83,7 +83,6 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>   return 0;
>   }
>  
> -
>   switch (args->param) {
>   case DRM_V3D_PARAM_SUPPORTS_TFU:
>   args->value = 1;
> @@ -147,7 +146,7 @@ v3d_postclose(struct drm_device *dev, struct
> drm_file *file)
>  DEFINE_DRM_GEM_FOPS(v3d_drm_fops);
>  
>  /* DRM_AUTH is required on SUBMIT_CL for now, while we don't have
> GMP
> - * protection between clients.  Note that render nodes would be be
> + * protection between clients.  Note that render nodes would be
>   * able to submit CLs that could access BOs from clients
> authenticated
>   * with the master node.  The TFU doesn't use the GMP, so it would
>   * need to stay DRM_AUTH until we do buffer size/offset validation.
> @@ -222,7 +221,6 @@ static int v3d_platform_drm_probe(struct
> platform_device *pdev)
>   u32 mmu_debug;
>   u32 ident1;
>  
> -
>   v3d = devm_drm_dev_alloc(dev, _drm_driver, struct v3d_dev,
> drm);
>   if (IS_ERR(v3d))
>   return PTR_ERR(v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 593ed2206d74..e254919b6c5e 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -521,6 +521,38 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>   }
>  }
>  
> +static int
> +v3d_get_extensions(struct drm_file *file_priv,
> +u32 ext_count, u64 ext_handles)
> +{
> + int i;
> + struct drm_v3d_extension __user *handles;
> +
> + if (!ext_count)
> + return 0;
> +
> + handles = u64_to_user_ptr(ext_handles);
> + for (i = 0; i < ext_count; i++) {
> + struct drm_v3d_extension ext;
> +
> + if (copy_from_user(, handles, sizeof(ext))) {
> + DRM_DEBUG("Failed to copy submit extension\n");
> + return -EFAULT;
> + }
> +
> + switch (ext.id) {
> + case 0:
> + default:
> + DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> ext.id);
> + return -EINVAL;
> + }
> +
> + handles = u64_to_user_ptr(ext.next);
> + }
> +
> + return 0;
> +}
> +
>  /**
>   * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
>   * @dev: DRM device
> @@ -549,15 +581,23 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>  
>   trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args-
> >rcl_end);
>  
> - if (args->pad != 0)
> - return -EINVAL;
> -
> - if (args->flags != 0 &&
> - args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> + if (args->flags &&
> + args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
> + DRM_V3D_SUBMIT_EXTENSION)) {
>   DRM_INFO("invalid flags: %d\n", args->flags);
>   return -EINVAL;
>   }
>  
> + if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> + ret = v3d_get_extensions(file_priv,
> +  args->extension_count,
> +  args->extensions);
> + if (ret) {
> + DRM_DEBUG("Failed to get extensions.\n");
> + return ret;
> + }
> + }
> +
>   render = kcalloc(1, sizeof(*render), GFP_KERNEL);
>   if (!render)
>   return -ENOMEM;
> @@ -711,6 +751,21 @@ v3d_submit_tfu_ioctl(struct drm_device *dev,
> void *data,
>  
>   trace_v3d_submit_tfu_ioctl(>drm, args->iia);
>  
> + if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> + DRM_DEBUG("invalid flags: %d\n", args->flags);
> + return -EINVAL;
> + }
> +
> + if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> + ret = v3d_get_extensions(file_priv,
> +  args->extension_count,
> +  args->extensions);
> +  

[PATCH v2] drm/v3d: fix wait for TMU write combiner flush

2021-09-15 Thread Iago Toral Quiroga
The hardware sets the TMUWCF bit back to 0 when the TMU write
combiner flush completes so we should be checking for that instead
of the L2TFLS bit.

v2 (Melissa Wen):
  - Add Signed-off-by and Fixes tags.
  - Change the error message for the timeout to be more clear.

Fixes spurious Vulkan CTS failures in:
dEQP-VK.binding_model.descriptorset_random.*

Fixes: d223f98f02099 ("drm/v3d: Add support for compute shader dispatch")
Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Melissa Wen 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index a3529809d547..1953706bdaeb 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -197,8 +197,8 @@ v3d_clean_caches(struct v3d_dev *v3d)
 
V3D_CORE_WRITE(core, V3D_CTL_L2TCACTL, V3D_L2TCACTL_TMUWCF);
if (wait_for(!(V3D_CORE_READ(core, V3D_CTL_L2TCACTL) &
-  V3D_L2TCACTL_L2TFLS), 100)) {
-   DRM_ERROR("Timeout waiting for L1T write combiner flush\n");
+  V3D_L2TCACTL_TMUWCF), 100)) {
+   DRM_ERROR("Timeout waiting for TMU write combiner flush\n");
}
 
mutex_lock(>cache_clean_lock);
-- 
2.25.1



Re: [PATCH] drm/v3d: fix wait for TMU write combiner flush

2021-09-15 Thread Iago Toral
On Wed, 2021-09-15 at 09:57 +0100, Melissa Wen wrote:
> On 09/14, Iago Toral Quiroga wrote:
> > The hardware sets the TMUWCF bit back to 0 when the TMU write
> > combiner flush completes so we should be checking for that instead
> > of the L2TFLS bit.
> > 
> > Fixes spurious Vulkan CTS failures in:
> > dEQP-VK.binding_model.descriptorset_random.*
> Hi Iago,
> 
> makes sense to me.
> 
> can you add the fix tag?
> Fixes: d223f98f02099 ("drm/v3d: Add support for compute shader
> dispatch") 
> 
> also, you forgot to add your Signed-off-by tag.

Will include both, thanks.

> > ---
> >  drivers/gpu/drm/v3d/v3d_gem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > b/drivers/gpu/drm/v3d/v3d_gem.c
> > index a3529809d547..5159f544bc16 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -197,7 +197,7 @@ v3d_clean_caches(struct v3d_dev *v3d)
> >  
> > V3D_CORE_WRITE(core, V3D_CTL_L2TCACTL, V3D_L2TCACTL_TMUWCF);
> > if (wait_for(!(V3D_CORE_READ(core, V3D_CTL_L2TCACTL) &
> > -  V3D_L2TCACTL_L2TFLS), 100)) {
> > +  V3D_L2TCACTL_TMUWCF), 100)) {
> > DRM_ERROR("Timeout waiting for L1T write combiner
> > flush\n");
> hm.. would it be clearer to say "TMU write combiner" here?

Yes, I guess it does. I'll add this too.

> 
> in the next version, you can already include:
> Reviewed-by: Melissa Wen 
> 
Thanks!

Iago

> Thanks,
> 
> Melissa
> > }
> >  
> > -- 
> > 2.25.1
> > 



[PATCH] drm/v3d: fix wait for TMU write combiner flush

2021-09-13 Thread Iago Toral Quiroga
The hardware sets the TMUWCF bit back to 0 when the TMU write
combiner flush completes so we should be checking for that instead
of the L2TFLS bit.

Fixes spurious Vulkan CTS failures in:
dEQP-VK.binding_model.descriptorset_random.*
---
 drivers/gpu/drm/v3d/v3d_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index a3529809d547..5159f544bc16 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -197,7 +197,7 @@ v3d_clean_caches(struct v3d_dev *v3d)
 
V3D_CORE_WRITE(core, V3D_CTL_L2TCACTL, V3D_L2TCACTL_TMUWCF);
if (wait_for(!(V3D_CORE_READ(core, V3D_CTL_L2TCACTL) &
-  V3D_L2TCACTL_L2TFLS), 100)) {
+  V3D_L2TCACTL_TMUWCF), 100)) {
DRM_ERROR("Timeout waiting for L1T write combiner flush\n");
}
 
-- 
2.25.1



Re: [PATCH AUTOSEL 5.4 003/205] drm/v3d: don't leak bin job if v3d_job_init fails.

2020-01-23 Thread Iago Toral
On Thu, 2020-01-23 at 09:17 -0500, Sasha Levin wrote:
> On Fri, Jan 17, 2020 at 08:25:30AM +0100, Iago Toral wrote:
> > Hi Sasha,
> > 
> > 
> > please notice that there were two separate patches that addressed
> > the
> > same issue and applying both simultaneously leads to a double free
> > (which is what I see is happening with this patch: see the second
> > call
> > to kfree(bin) right below the one added here). This issue was
> > raised
> > previously here:
> > 
> > https://lists.freedesktop.org/archives/dri-devel/2019-October/241425.html
> 
> I'll drop this patch for now. Any idea why upstream didn't pick up
> the
> fix yet? I see the problem still exists there.

+Daniel

I am not sure, when this issue was found Daniel added a few people on
CC as heads-up, but maybe nobody actually got to fix the merge conflict
in the end?

Iago

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH AUTOSEL 5.4 003/205] drm/v3d: don't leak bin job if v3d_job_init fails.

2020-01-16 Thread Iago Toral
Hi Sasha,


please notice that there were two separate patches that addressed the
same issue and applying both simultaneously leads to a double free
(which is what I see is happening with this patch: see the second call
to kfree(bin) right below the one added here). This issue was raised
previously here:

https://lists.freedesktop.org/archives/dri-devel/2019-October/241425.html

Iago

On Thu, 2020-01-16 at 11:39 -0500, Sasha Levin wrote:
> From: Iago Toral Quiroga 
> 
> [ Upstream commit 0d352a3a8a1f26168d09f7073e61bb4b328e3bb9 ]
> 
> If the initialization of the job fails we need to kfree() it
> before returning.
> 
> Signed-off-by: Iago Toral Quiroga 
> Signed-off-by: Eric Anholt 
> Link: 
> https://patchwork.freedesktop.org/patch/msgid/20190916071125.5255-1-ito...@igalia.com
> Fixes: a783a09ee76d ("drm/v3d: Refactor job management.")
> Reviewed-by: Eric Anholt 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 19c092d75266..6316bf3646af 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -565,6 +565,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>   ret = v3d_job_init(v3d, file_priv, >base,
>  v3d_job_free, args->in_sync_bcl);
>   if (ret) {
> + kfree(bin);
>   v3d_job_put(>base);
>   kfree(bin);
>   return ret;

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3] drm/v3d: clean caches at the end of render jobs on request from user space

2019-09-19 Thread Iago Toral Quiroga
Extends the user space ioctl for CL submissions so it can include a request
to flush the cache once the CL execution has completed. Fixes memory
write violation messages reported by the kernel in workloads involving
shader memory writes (SSBOs, shader images, scratch, etc) which sometimes
also lead to GPU resets during Piglit and CTS workloads.

v2: if v3d_job_init() fails we need to kfree() the job instead of
v3d_job_put() it (Eric Anholt).

v3 (Eric Anholt):
  - Drop _FLAG suffix from the new flag name.
  - Add a new param so userspace can tell whether cache flushing is
implemented in the kernel.

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Eric Anholt 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20190912083516.13797-1-ito...@igalia.com
---
 drivers/gpu/drm/v3d/v3d_drv.c |  3 ++
 drivers/gpu/drm/v3d/v3d_gem.c | 54 +--
 include/uapi/drm/v3d_drm.h|  8 --
 3 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3506ae2723ae..e94bf75368be 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -126,6 +126,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
case DRM_V3D_PARAM_SUPPORTS_CSD:
args->value = v3d_has_csd(v3d);
return 0;
+   case DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH:
+   args->value = 1;
+   return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index fb32cda18ffe..4c4b59ae2c81 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -530,13 +530,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
struct drm_v3d_submit_cl *args = data;
struct v3d_bin_job *bin = NULL;
struct v3d_render_job *render;
+   struct v3d_job *clean_job = NULL;
+   struct v3d_job *last_job;
struct ww_acquire_ctx acquire_ctx;
int ret = 0;
 
trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args->rcl_end);
 
-   if (args->pad != 0) {
-   DRM_INFO("pad must be zero: %d\n", args->pad);
+   if (args->flags != 0 &&
+   args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
+   DRM_INFO("invalid flags: %d\n", args->flags);
return -EINVAL;
}
 
@@ -576,12 +579,31 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
bin->render = render;
}
 
-   ret = v3d_lookup_bos(dev, file_priv, >base,
+   if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
+   clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
+   if (!clean_job) {
+   ret = -ENOMEM;
+   goto fail;
+   }
+
+   ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0);
+   if (ret) {
+   kfree(clean_job);
+   clean_job = NULL;
+   goto fail;
+   }
+
+   last_job = clean_job;
+   } else {
+   last_job = >base;
+   }
+
+   ret = v3d_lookup_bos(dev, file_priv, last_job,
 args->bo_handles, args->bo_handle_count);
if (ret)
goto fail;
 
-   ret = v3d_lock_bo_reservations(>base, _ctx);
+   ret = v3d_lock_bo_reservations(last_job, _ctx);
if (ret)
goto fail;
 
@@ -600,28 +622,44 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
ret = v3d_push_job(v3d_priv, >base, V3D_RENDER);
if (ret)
goto fail_unreserve;
+
+   if (clean_job) {
+   struct dma_fence *render_fence =
+   dma_fence_get(render->base.done_fence);
+   ret = drm_gem_fence_array_add(_job->deps, render_fence);
+   if (ret)
+   goto fail_unreserve;
+   ret = v3d_push_job(v3d_priv, clean_job, V3D_CACHE_CLEAN);
+   if (ret)
+   goto fail_unreserve;
+   }
+
mutex_unlock(>sched_lock);
 
v3d_attach_fences_and_unlock_reservation(file_priv,
->base,
+last_job,
 _ctx,
 args->out_sync,
-render->base.done_fence);
+last_job->done_fence);
 
if (bin)
v3d_job_put(>base);
v3d_job_put(>base);
+   if (clean_job)
+   v3d_job_put(clean_job);

Re: [PATCH v2] drm/v3d: clean caches at the end of render jobs on request from user space

2019-09-18 Thread Iago Toral
Eric, could you push this for me? I don't have push rights to DRM yet.

Iago

On Wed, 2019-09-18 at 11:15 +0200, Iago Toral Quiroga wrote:
> Extends the user space ioctl for CL submissions so it can include a
> request
> to flush the cache once the CL execution has completed. Fixes memory
> write violation messages reported by the kernel in workloads
> involving
> shader memory writes (SSBOs, shader images, scratch, etc) which
> sometimes
> also lead to GPU resets during Piglit and CTS workloads.
> 
> v2: if v3d_job_init() fails we need to kfree() the job instead of
> v3d_job_put() it (Eric Anholt).
> 
> Signed-off-by: Iago Toral Quiroga 
> Reviewed-by: Eric Anholt 
> Link: 
> https://patchwork.freedesktop.org/patch/msgid/20190912083516.13797-1-ito...@igalia.com
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 54 +--
> 
>  include/uapi/drm/v3d_drm.h|  7 +++--
>  2 files changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 5d80507b539b..d46d91346d09 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -530,13 +530,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>   struct drm_v3d_submit_cl *args = data;
>   struct v3d_bin_job *bin = NULL;
>   struct v3d_render_job *render;
> + struct v3d_job *clean_job = NULL;
> + struct v3d_job *last_job;
>   struct ww_acquire_ctx acquire_ctx;
>   int ret = 0;
>  
>   trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args-
> >rcl_end);
>  
> - if (args->pad != 0) {
> - DRM_INFO("pad must be zero: %d\n", args->pad);
> + if (args->flags != 0 &&
> + args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
> + DRM_INFO("invalid flags: %d\n", args->flags);
>   return -EINVAL;
>   }
>  
> @@ -575,12 +578,31 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>   bin->render = render;
>   }
>  
> - ret = v3d_lookup_bos(dev, file_priv, >base,
> + if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
> + clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
> + if (!clean_job) {
> + ret = -ENOMEM;
> + goto fail;
> + }
> +
> + ret = v3d_job_init(v3d, file_priv, clean_job,
> v3d_job_free, 0);
> + if (ret) {
> + kfree(clean_job);
> + clean_job = NULL;
> + goto fail;
> + }
> +
> + last_job = clean_job;
> + } else {
> + last_job = >base;
> + }
> +
> + ret = v3d_lookup_bos(dev, file_priv, last_job,
>args->bo_handles, args->bo_handle_count);
>   if (ret)
>   goto fail;
>  
> - ret = v3d_lock_bo_reservations(>base, _ctx);
> + ret = v3d_lock_bo_reservations(last_job, _ctx);
>   if (ret)
>   goto fail;
>  
> @@ -599,28 +621,44 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>   ret = v3d_push_job(v3d_priv, >base, V3D_RENDER);
>   if (ret)
>   goto fail_unreserve;
> +
> + if (clean_job) {
> + struct dma_fence *render_fence =
> + dma_fence_get(render->base.done_fence);
> + ret = drm_gem_fence_array_add(_job->deps,
> render_fence);
> + if (ret)
> + goto fail_unreserve;
> + ret = v3d_push_job(v3d_priv, clean_job,
> V3D_CACHE_CLEAN);
> + if (ret)
> + goto fail_unreserve;
> + }
> +
>   mutex_unlock(>sched_lock);
>  
>   v3d_attach_fences_and_unlock_reservation(file_priv,
> -  >base,
> +  last_job,
>_ctx,
>args->out_sync,
> -  render-
> >base.done_fence);
> +  last_job->done_fence);
>  
>   if (bin)
>   v3d_job_put(>base);
>   v3d_job_put(>base);
> + if (clean_job)
> + v3d_job_put(clean_job);
>  
>   return 0;
>  
>  fail_unreserve:
>   mutex_unlock(>sched_lock);
> - drm_gem_unlock_reservations(render->base.bo,
> - render->base.bo_count,
> _ctx

[PATCH v2] drm/v3d: clean caches at the end of render jobs on request from user space

2019-09-18 Thread Iago Toral Quiroga
Extends the user space ioctl for CL submissions so it can include a request
to flush the cache once the CL execution has completed. Fixes memory
write violation messages reported by the kernel in workloads involving
shader memory writes (SSBOs, shader images, scratch, etc) which sometimes
also lead to GPU resets during Piglit and CTS workloads.

v2: if v3d_job_init() fails we need to kfree() the job instead of
v3d_job_put() it (Eric Anholt).

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Eric Anholt 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20190912083516.13797-1-ito...@igalia.com
---
 drivers/gpu/drm/v3d/v3d_gem.c | 54 +--
 include/uapi/drm/v3d_drm.h|  7 +++--
 2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 5d80507b539b..d46d91346d09 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -530,13 +530,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
struct drm_v3d_submit_cl *args = data;
struct v3d_bin_job *bin = NULL;
struct v3d_render_job *render;
+   struct v3d_job *clean_job = NULL;
+   struct v3d_job *last_job;
struct ww_acquire_ctx acquire_ctx;
int ret = 0;
 
trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args->rcl_end);
 
-   if (args->pad != 0) {
-   DRM_INFO("pad must be zero: %d\n", args->pad);
+   if (args->flags != 0 &&
+   args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
+   DRM_INFO("invalid flags: %d\n", args->flags);
return -EINVAL;
}
 
@@ -575,12 +578,31 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
bin->render = render;
}
 
-   ret = v3d_lookup_bos(dev, file_priv, >base,
+   if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
+   clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
+   if (!clean_job) {
+   ret = -ENOMEM;
+   goto fail;
+   }
+
+   ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0);
+   if (ret) {
+   kfree(clean_job);
+   clean_job = NULL;
+   goto fail;
+   }
+
+   last_job = clean_job;
+   } else {
+   last_job = >base;
+   }
+
+   ret = v3d_lookup_bos(dev, file_priv, last_job,
 args->bo_handles, args->bo_handle_count);
if (ret)
goto fail;
 
-   ret = v3d_lock_bo_reservations(>base, _ctx);
+   ret = v3d_lock_bo_reservations(last_job, _ctx);
if (ret)
goto fail;
 
@@ -599,28 +621,44 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
ret = v3d_push_job(v3d_priv, >base, V3D_RENDER);
if (ret)
goto fail_unreserve;
+
+   if (clean_job) {
+   struct dma_fence *render_fence =
+   dma_fence_get(render->base.done_fence);
+   ret = drm_gem_fence_array_add(_job->deps, render_fence);
+   if (ret)
+   goto fail_unreserve;
+   ret = v3d_push_job(v3d_priv, clean_job, V3D_CACHE_CLEAN);
+   if (ret)
+   goto fail_unreserve;
+   }
+
mutex_unlock(>sched_lock);
 
v3d_attach_fences_and_unlock_reservation(file_priv,
->base,
+last_job,
 _ctx,
 args->out_sync,
-render->base.done_fence);
+last_job->done_fence);
 
if (bin)
v3d_job_put(>base);
v3d_job_put(>base);
+   if (clean_job)
+   v3d_job_put(clean_job);
 
return 0;
 
 fail_unreserve:
mutex_unlock(>sched_lock);
-   drm_gem_unlock_reservations(render->base.bo,
-   render->base.bo_count, _ctx);
+   drm_gem_unlock_reservations(last_job->bo,
+   last_job->bo_count, _ctx);
 fail:
if (bin)
v3d_job_put(>base);
v3d_job_put(>base);
+   if (clean_job)
+   v3d_job_put(clean_job);
 
return ret;
 }
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 58fbe48c91e9..58d2040ea48c 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -48,6 +48,8 @@ extern "C" {
 #define DRM_IOCTL_V3D_SUBMIT_TFU  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_TFU, struct drm_v3d_su

[PATCH] drm/v3d: don't leak bin job if v3d_job_init fails.

2019-09-16 Thread Iago Toral Quiroga
If the initialization of the job fails we need to kfree() it
before returning.

Signed-off-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index d46d91346d09..ed68731404a7 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -566,6 +566,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
ret = v3d_job_init(v3d, file_priv, >base,
   v3d_job_free, args->in_sync_bcl);
if (ret) {
+   kfree(bin);
v3d_job_put(>base);
return ret;
}
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/v3d: clean caches at the end of render jobs on request from user space

2019-09-13 Thread Iago Toral
On Thu, 2019-09-12 at 10:25 -0700, Eric Anholt wrote:
> Iago Toral Quiroga  writes:
> 
> > Extends the user space ioctl for CL submissions so it can include a
> > request
> > to flush the cache once the CL execution has completed. Fixes
> > memory
> > write violation messages reported by the kernel in workloads
> > involving
> > shader memory writes (SSBOs, shader images, scratch, etc) which
> > sometimes
> > also lead to GPU resets during Piglit and CTS workloads.
> 
> Some context for any other reviewers: This patch is the interface
> change
> necessary to expose GLES 3.1 on V3D.  It turns out the HW packets for
> flushing the caches were broken in multiple ways.
> 
> > Signed-off-by: Iago Toral Quiroga 
> > ---
> >  drivers/gpu/drm/v3d/v3d_gem.c | 51 +
> > --
> >  include/uapi/drm/v3d_drm.h|  7 ++---
> >  2 files changed, 47 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > b/drivers/gpu/drm/v3d/v3d_gem.c
> > index 5d80507b539b..530fe9d9d5bd 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -530,13 +530,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> > void *data,
> > struct drm_v3d_submit_cl *args = data;
> > struct v3d_bin_job *bin = NULL;
> > struct v3d_render_job *render;
> > +   struct v3d_job *clean_job = NULL;
> > +   struct v3d_job *last_job;
> > struct ww_acquire_ctx acquire_ctx;
> > int ret = 0;
> >  
> > trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args-
> > >rcl_end);
> >  
> > -   if (args->pad != 0) {
> > -   DRM_INFO("pad must be zero: %d\n", args->pad);
> > +   if (args->flags != 0 &&
> > +   args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
> > +   DRM_INFO("invalid flags: %d\n", args->flags);
> > return -EINVAL;
> > }
> >  
> > @@ -575,12 +578,28 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> > void *data,
> > bin->render = render;
> > }
> >  
> > -   ret = v3d_lookup_bos(dev, file_priv, >base,
> > +   if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
> > +   clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
> > +   if (!clean_job) {
> > +   ret = -ENOMEM;
> > +   goto fail;
> > +   }
> > +
> > +   ret = v3d_job_init(v3d, file_priv, clean_job,
> > v3d_job_free, 0);
> > +   if (ret)
> > +   goto fail;
> 
> Only issue I see: If v3d_job_init() fails, we need to not
> v3d_job_put()
> it.  I'm fine with either kfree() it and NULL the ptr before jumping
> to
> fail, or open code the bin/render puts.

It seems we also call v3d_job_put() for the bin job when v3d_job_init()
fails, which also returns immediately in that case instead of jumping
to fail to v3d_job_put the render job, so I guess we need the same
treatment there. Shall I fix that in this patch too or would you rather
see a different patch sent separately for that?

> With that,
> 
> Reviewed-by: Eric Anholt 
> 
> > +
> > +   last_job = clean_job;
> > +   } else {
> > +   last_job = >base;
> > +   }
> > +
> > +   ret = v3d_lookup_bos(dev, file_priv, last_job,
> >  args->bo_handles, args->bo_handle_count);
> > if (ret)
> > goto fail;
> >  
> > -   ret = v3d_lock_bo_reservations(>base, _ctx);
> > +   ret = v3d_lock_bo_reservations(last_job, _ctx);
> > if (ret)
> > goto fail;
> >  
> > @@ -599,28 +618,44 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> > void *data,
> > ret = v3d_push_job(v3d_priv, >base, V3D_RENDER);
> > if (ret)
> > goto fail_unreserve;
> > +
> > +   if (clean_job) {
> > +   struct dma_fence *render_fence =
> > +   dma_fence_get(render->base.done_fence);
> > +   ret = drm_gem_fence_array_add(_job->deps,
> > render_fence);
> > +   if (ret)
> > +   goto fail_unreserve;
> > +   ret = v3d_push_job(v3d_priv, clean_job,
> > V3D_CACHE_CLEAN);
> > +   if (ret)
> > +   goto fail_unreserve;
> > +   }
> > +
> > mutex_unlock(>sched_lock);
> >  
> > v3d_attach_fences_and_unlock_reservation(file_priv,
> > -   

[PATCH] drm/v3d: clean caches at the end of render jobs on request from user space

2019-09-12 Thread Iago Toral Quiroga
Extends the user space ioctl for CL submissions so it can include a request
to flush the cache once the CL execution has completed. Fixes memory
write violation messages reported by the kernel in workloads involving
shader memory writes (SSBOs, shader images, scratch, etc) which sometimes
also lead to GPU resets during Piglit and CTS workloads.

Signed-off-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 51 +--
 include/uapi/drm/v3d_drm.h|  7 ++---
 2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 5d80507b539b..530fe9d9d5bd 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -530,13 +530,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
struct drm_v3d_submit_cl *args = data;
struct v3d_bin_job *bin = NULL;
struct v3d_render_job *render;
+   struct v3d_job *clean_job = NULL;
+   struct v3d_job *last_job;
struct ww_acquire_ctx acquire_ctx;
int ret = 0;
 
trace_v3d_submit_cl_ioctl(>drm, args->rcl_start, args->rcl_end);
 
-   if (args->pad != 0) {
-   DRM_INFO("pad must be zero: %d\n", args->pad);
+   if (args->flags != 0 &&
+   args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
+   DRM_INFO("invalid flags: %d\n", args->flags);
return -EINVAL;
}
 
@@ -575,12 +578,28 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
bin->render = render;
}
 
-   ret = v3d_lookup_bos(dev, file_priv, >base,
+   if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG) {
+   clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
+   if (!clean_job) {
+   ret = -ENOMEM;
+   goto fail;
+   }
+
+   ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0);
+   if (ret)
+   goto fail;
+
+   last_job = clean_job;
+   } else {
+   last_job = >base;
+   }
+
+   ret = v3d_lookup_bos(dev, file_priv, last_job,
 args->bo_handles, args->bo_handle_count);
if (ret)
goto fail;
 
-   ret = v3d_lock_bo_reservations(>base, _ctx);
+   ret = v3d_lock_bo_reservations(last_job, _ctx);
if (ret)
goto fail;
 
@@ -599,28 +618,44 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
ret = v3d_push_job(v3d_priv, >base, V3D_RENDER);
if (ret)
goto fail_unreserve;
+
+   if (clean_job) {
+   struct dma_fence *render_fence =
+   dma_fence_get(render->base.done_fence);
+   ret = drm_gem_fence_array_add(_job->deps, render_fence);
+   if (ret)
+   goto fail_unreserve;
+   ret = v3d_push_job(v3d_priv, clean_job, V3D_CACHE_CLEAN);
+   if (ret)
+   goto fail_unreserve;
+   }
+
mutex_unlock(>sched_lock);
 
v3d_attach_fences_and_unlock_reservation(file_priv,
->base,
+last_job,
 _ctx,
 args->out_sync,
-render->base.done_fence);
+last_job->done_fence);
 
if (bin)
v3d_job_put(>base);
v3d_job_put(>base);
+   if (clean_job)
+   v3d_job_put(clean_job);
 
return 0;
 
 fail_unreserve:
mutex_unlock(>sched_lock);
-   drm_gem_unlock_reservations(render->base.bo,
-   render->base.bo_count, _ctx);
+   drm_gem_unlock_reservations(last_job->bo,
+   last_job->bo_count, _ctx);
 fail:
if (bin)
v3d_job_put(>base);
v3d_job_put(>base);
+   if (clean_job)
+   v3d_job_put(clean_job);
 
return ret;
 }
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 58fbe48c91e9..58d2040ea48c 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -48,6 +48,8 @@ extern "C" {
 #define DRM_IOCTL_V3D_SUBMIT_TFU  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_TFU, struct drm_v3d_submit_tfu)
 #define DRM_IOCTL_V3D_SUBMIT_CSD  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CSD, struct drm_v3d_submit_csd)
 
+#define DRM_V3D_SUBMIT_CL_FLUSH_CACHE_FLAG0x01
+
 /**
  * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
  * engine.
@@ -61,7 +63,7 @@ extern "C" {
  * flushed