Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
On Thu, Mar 7, 2019 at 9:11 AM Eric Anholt wrote: > > Rob Herring writes: > > > On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu wrote: > >> > >> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for > >> OpenGL vertex shader processing and PP is for fragment shader > >> processing. Each processor has its own MMU so prcessors work in > >> virtual address space. > >> - There's only one GP but multiple PP (max 4 for mali 400 and 8 > >> for mali 450) in the same mali 4xx GPU. All PPs are grouped > >> togather to handle a single fragment shader task divided by > >> FB output tiled pixels. Mali 400 user space driver is > >> responsible for assign target tiled pixels to each PP, but mali > >> 450 has a HW module called DLBU to dynamically balance each > >> PP's load. > >> - User space driver allocate buffer object and map into GPU > >> virtual address space, upload command stream and draw data with > >> CPU mmap of the buffer object, then submit task to GP/PP with > >> a register frame indicating where is the command stream and misc > >> settings. > >> - There's no command stream validation/relocation due to each user > >> process has its own GPU virtual address space. GP/PP's MMU switch > >> virtual address space before running two tasks from different > >> user process. Error or evil user space code just get MMU fault > >> or GP/PP error IRQ, then the HW/SW will be recovered. > >> - Use GEM+shmem for MM. Currently just alloc and pin memory when > >> gem object creation. GPU vm map of the buffer is also done in > >> the alloc stage in kernel space. We may delay the memory > >> allocation and real GPU vm map to command submission stage in the > >> furture as improvement. > >> - Use drm_sched for GPU task schedule. Each OpenGL context should > >> have a lima context object in the kernel to distinguish tasks > >> from different user. drm_sched gets task from each lima context > >> in a fair way. > >> > >> mesa driver can be found here before upstreamed: > >> https://gitlab.freedesktop.org/lima/mesa > >> > >> v7: > >> - remove lima_fence_ops with default value > >> - move fence slab create to device probe > >> - check pad ioctl args to be zero > >> - add comments for user/kernel interface > >> > >> v6: > >> - fix comments by checkpatch.pl > >> > >> v5: > >> - export gp/pp version to userspace > >> - rebase on drm-misc-next > >> > >> v4: > >> - use get param interface to get info > >> - separate context create/free ioctl > >> - remove unused max sched task param > >> - update copyright time > >> - use xarray instead of idr > >> - stop using drmP.h > >> > >> v3: > >> - fix comments from kbuild robot > >> - restrict supported arch to tested ones > >> > >> v2: > >> - fix syscall argument check > >> - fix job finish fence leak since kernel 5.0 > >> - use drm syncobj to replace native fence > >> - move buffer object GPU va map into kernel > >> - reserve syscall argument space for future info > >> - remove kernel gem modifier > >> - switch TTM back to GEM+shmem MM > >> - use time based io poll > >> - use whole register name > >> - adopt gem reservation obj integration > >> - use drm_timeout_abs_to_jiffies > >> > >> Cc: Eric Anholt > >> Cc: Rob Herring > >> Cc: Christian König > >> Cc: Daniel Vetter > >> Cc: Alex Deucher > >> Cc: Sam Ravnborg > >> Cc: Rob Clark > >> Signed-off-by: Andreas Baierl > >> Signed-off-by: Erico Nunes > >> Signed-off-by: Heiko Stuebner > >> Signed-off-by: Marek Vasut > >> Signed-off-by: Neil Armstrong > >> Signed-off-by: Simon Shields > >> Signed-off-by: Vasily Khoruzhick > >> Signed-off-by: Qiang Yu > >> --- > >> drivers/gpu/drm/Kconfig | 2 + > >> drivers/gpu/drm/Makefile | 1 + > >> drivers/gpu/drm/lima/Kconfig | 10 + > >> drivers/gpu/drm/lima/Makefile | 21 ++ > >> drivers/gpu/drm/lima/lima_bcast.c | 47 +++ > >> drivers/gpu/drm/lima/lima_bcast.h | 14 + > >> drivers/gpu/drm/lima/lima_ctx.c | 97 ++ > >> drivers/gpu/drm/lima/lima_ctx.h | 30 ++ > >> drivers/gpu/drm/lima/lima_device.c| 385 +++ > >> drivers/gpu/drm/lima/lima_device.h| 131 > >> drivers/gpu/drm/lima/lima_dlbu.c | 58 > >> drivers/gpu/drm/lima/lima_dlbu.h | 18 ++ > >> drivers/gpu/drm/lima/lima_drv.c | 376 +++ > >> drivers/gpu/drm/lima/lima_drv.h | 45 +++ > >> drivers/gpu/drm/lima/lima_gem.c | 381 +++ > >> drivers/gpu/drm/lima/lima_gem.h | 25 ++ > >> drivers/gpu/drm/lima/lima_gem_prime.c | 47 +++ > >> drivers/gpu/drm/lima/lima_gem_prime.h | 13 + > >> drivers/gpu/drm/lima/lima_gp.c| 283 + > >> drivers/gpu/drm/lima/lima_gp.h| 16 + > >> drivers/gpu/drm/lima/lima_l2_cache.c | 80 + > >> drivers/gpu/drm/lima/lima_l2_cache.h | 14 + > >> drivers/gpu/drm/lima/lima_mmu.c | 142 + > >>
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
On Thu, Mar 7, 2019 at 8:15 AM Dave Airlie wrote: > > > +#endif > > diff --git a/include/uapi/drm/lima_drm.h b/include/uapi/drm/lima_drm.h > > new file mode 100644 > > index ..05f8c910d7fb > > --- /dev/null > > +++ b/include/uapi/drm/lima_drm.h > > @@ -0,0 +1,164 @@ > > +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */ > > +/* Copyright 2017-2018 Qiang Yu */ > > + > > +#ifndef __LIMA_DRM_H__ > > +#define __LIMA_DRM_H__ > > + > > +#include "drm.h" > > + > > +#if defined(__cplusplus) > > +extern "C" { > > +#endif > > + > > +enum drm_lima_param_gpu_id { > > + DRM_LIMA_PARAM_GPU_ID_UNKNOWN, > > + DRM_LIMA_PARAM_GPU_ID_MALI400, > > + DRM_LIMA_PARAM_GPU_ID_MALI450, > > +}; > > + > > +enum drm_lima_param { > > + DRM_LIMA_PARAM_GPU_ID, > > + DRM_LIMA_PARAM_NUM_PP, > > + DRM_LIMA_PARAM_GP_VERSION, > > + DRM_LIMA_PARAM_PP_VERSION, > > +}; > > + > > +/** > > + * get various information of the GPU > > + */ > > +struct drm_lima_get_param { > > + __u32 param; /* in, value in enum drm_lima_param */ > > + __u32 pad; /* pad, must be zero */ > > + __u64 value; /* out, parameter value */ > > +}; > > + > > +/** > > + * create a buffer for used by GPU > > + */ > > +struct drm_lima_gem_create { > > + __u32 size;/* in, buffer size */ > > + __u32 flags; /* in, currently no flags, must be zero */ > > + __u32 handle; /* out, GEM buffer handle */ > > + __u32 pad; /* pad, must be zero */ > > +}; > > + > > +/** > > + * get information of a buffer > > + */ > > +struct drm_lima_gem_info { > > + __u32 handle; /* in, GEM buffer handle */ > > + __u32 va; /* out, virtual address mapped into GPU MMU */ > > + __u64 offset; /* out, used to mmap this buffer to CPU */ > > +}; > > + > > +#define LIMA_SUBMIT_BO_READ 0x01 > > +#define LIMA_SUBMIT_BO_WRITE 0x02 > > + > > +/* buffer information used by one task */ > > +struct drm_lima_gem_submit_bo { > > + __u32 handle; /* in, GEM buffer handle */ > > + __u32 flags; /* in, buffer read/write by GPU */ > > +}; > > + > > +#define LIMA_GP_FRAME_REG_NUM 6 > > + > > +/* frame used to setup GP for each task */ > > +struct drm_lima_gp_frame { > > + __u32 frame[LIMA_GP_FRAME_REG_NUM]; > > +}; > > + > > +#define LIMA_PP_FRAME_REG_NUM 23 > > +#define LIMA_PP_WB_REG_NUM 12 > > + > > +/* frame used to setup mali400 GPU PP for each task */ > > +struct drm_lima_m400_pp_frame { > > + __u32 frame[LIMA_PP_FRAME_REG_NUM]; > > + __u32 num_pp; > > + __u32 wb[3 * LIMA_PP_WB_REG_NUM]; > > + __u32 plbu_array_address[4]; > > + __u32 fragment_stack_address[4]; > > +}; > > + > > +/* frame used to setup mali450 GPU PP for each task */ > > +struct drm_lima_m450_pp_frame { > > + __u32 frame[LIMA_PP_FRAME_REG_NUM]; > > + __u32 num_pp; > > + __u32 wb[3 * LIMA_PP_WB_REG_NUM]; > > + __u32 use_dlbu; > > + __u32 _pad; > > + union { > > + __u32 plbu_array_address[8]; > > + __u32 dlbu_regs[4]; > > + }; > > + __u32 fragment_stack_address[8]; > > +}; > > + > > +#define LIMA_PIPE_GP 0x00 > > +#define LIMA_PIPE_PP 0x01 > > + > > +#define LIMA_SUBMIT_FLAG_EXPLICIT_FENCE (1 << 0) > > + > > +/** > > + * submit a task to GPU > > + */ > > +struct drm_lima_gem_submit { > > + __u32 ctx; /* in, context handle task is submitted to */ > > + __u32 pipe;/* in, which pipe to use, GP/PP */ > > + __u32 nr_bos; /* in, array length of bos field */ > > + __u32 frame_size; /* in, size of frame field */ > > + __u64 bos; /* in, array of drm_lima_gem_submit_bo */ > > + __u64 frame; /* in, GP/PP frame */ > > + __u32 flags; /* in, submit flags */ > > + __u32 out_sync;/* in, drm_syncobj handle used to wait task > > finish after submission */ > > + __u32 in_sync[2]; /* in, drm_syncobj handle used to wait before > > start this task */ > > +}; > > This seems a bit limited, is there a reason it's two, at least in > Vulkan drivers we'd want more than two I suspect (Vulkan may not work > on this hw anyways), but it might be required in the future to make > this extensible. Mali4xx GPU does not support Vulkan, the reason I pick two is, one for sync_file fd imported drm_syncobj, one for GP out_sync be able to pass to PP in_sync directly when explicit fence without drm_syncobj -> sync_file -> merge sync_file -> drm_syncobj pass. > > At least a comment stating why 2 was picked is sufficient for current use > cases. > OK, will add it. Regards, Qiang ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
On Thu, Mar 7, 2019 at 8:08 AM Dave Airlie wrote: > > On Thu, 7 Mar 2019 at 09:46, Rob Herring wrote: > > > > On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu wrote: > > > > > > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for > > > OpenGL vertex shader processing and PP is for fragment shader > > > processing. Each processor has its own MMU so prcessors work in > > > virtual address space. > > > - There's only one GP but multiple PP (max 4 for mali 400 and 8 > > > for mali 450) in the same mali 4xx GPU. All PPs are grouped > > > togather to handle a single fragment shader task divided by > > > FB output tiled pixels. Mali 400 user space driver is > > > responsible for assign target tiled pixels to each PP, but mali > > > 450 has a HW module called DLBU to dynamically balance each > > > PP's load. > > > - User space driver allocate buffer object and map into GPU > > > virtual address space, upload command stream and draw data with > > > CPU mmap of the buffer object, then submit task to GP/PP with > > > a register frame indicating where is the command stream and misc > > > settings. > > > - There's no command stream validation/relocation due to each user > > > process has its own GPU virtual address space. GP/PP's MMU switch > > > virtual address space before running two tasks from different > > > user process. Error or evil user space code just get MMU fault > > > or GP/PP error IRQ, then the HW/SW will be recovered. > > > - Use GEM+shmem for MM. Currently just alloc and pin memory when > > > gem object creation. GPU vm map of the buffer is also done in > > > the alloc stage in kernel space. We may delay the memory > > > allocation and real GPU vm map to command submission stage in the > > > furture as improvement. > > > - Use drm_sched for GPU task schedule. Each OpenGL context should > > > have a lima context object in the kernel to distinguish tasks > > > from different user. drm_sched gets task from each lima context > > > in a fair way. > > > > > > mesa driver can be found here before upstreamed: > > > https://gitlab.freedesktop.org/lima/mesa > > > > > > v7: > > > - remove lima_fence_ops with default value > > > - move fence slab create to device probe > > > - check pad ioctl args to be zero > > > - add comments for user/kernel interface > > > > > > v6: > > > - fix comments by checkpatch.pl > > > > > > v5: > > > - export gp/pp version to userspace > > > - rebase on drm-misc-next > > > > > > v4: > > > - use get param interface to get info > > > - separate context create/free ioctl > > > - remove unused max sched task param > > > - update copyright time > > > - use xarray instead of idr > > > - stop using drmP.h > > > > > > v3: > > > - fix comments from kbuild robot > > > - restrict supported arch to tested ones > > > > > > v2: > > > - fix syscall argument check > > > - fix job finish fence leak since kernel 5.0 > > > - use drm syncobj to replace native fence > > > - move buffer object GPU va map into kernel > > > - reserve syscall argument space for future info > > > - remove kernel gem modifier > > > - switch TTM back to GEM+shmem MM > > > - use time based io poll > > > - use whole register name > > > - adopt gem reservation obj integration > > > - use drm_timeout_abs_to_jiffies > > > > > > Cc: Eric Anholt > > > Cc: Rob Herring > > > Cc: Christian König > > > Cc: Daniel Vetter > > > Cc: Alex Deucher > > > Cc: Sam Ravnborg > > > Cc: Rob Clark > > > Signed-off-by: Andreas Baierl > > > Signed-off-by: Erico Nunes > > > Signed-off-by: Heiko Stuebner > > > Signed-off-by: Marek Vasut > > > Signed-off-by: Neil Armstrong > > > Signed-off-by: Simon Shields > > > Signed-off-by: Vasily Khoruzhick > > > Signed-off-by: Qiang Yu > > > --- > > > drivers/gpu/drm/Kconfig | 2 + > > > drivers/gpu/drm/Makefile | 1 + > > > drivers/gpu/drm/lima/Kconfig | 10 + > > > drivers/gpu/drm/lima/Makefile | 21 ++ > > > drivers/gpu/drm/lima/lima_bcast.c | 47 +++ > > > drivers/gpu/drm/lima/lima_bcast.h | 14 + > > > drivers/gpu/drm/lima/lima_ctx.c | 97 ++ > > > drivers/gpu/drm/lima/lima_ctx.h | 30 ++ > > > drivers/gpu/drm/lima/lima_device.c| 385 +++ > > > drivers/gpu/drm/lima/lima_device.h| 131 > > > drivers/gpu/drm/lima/lima_dlbu.c | 58 > > > drivers/gpu/drm/lima/lima_dlbu.h | 18 ++ > > > drivers/gpu/drm/lima/lima_drv.c | 376 +++ > > > drivers/gpu/drm/lima/lima_drv.h | 45 +++ > > > drivers/gpu/drm/lima/lima_gem.c | 381 +++ > > > drivers/gpu/drm/lima/lima_gem.h | 25 ++ > > > drivers/gpu/drm/lima/lima_gem_prime.c | 47 +++ > > > drivers/gpu/drm/lima/lima_gem_prime.h | 13 + > > > drivers/gpu/drm/lima/lima_gp.c| 283 + > > > drivers/gpu/drm/lima/lima_gp.h| 16 + > > > drivers/gpu/drm/lima/lima_l2_cache.c | 80 + >
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
Rob Herring writes: > On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu wrote: >> >> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for >> OpenGL vertex shader processing and PP is for fragment shader >> processing. Each processor has its own MMU so prcessors work in >> virtual address space. >> - There's only one GP but multiple PP (max 4 for mali 400 and 8 >> for mali 450) in the same mali 4xx GPU. All PPs are grouped >> togather to handle a single fragment shader task divided by >> FB output tiled pixels. Mali 400 user space driver is >> responsible for assign target tiled pixels to each PP, but mali >> 450 has a HW module called DLBU to dynamically balance each >> PP's load. >> - User space driver allocate buffer object and map into GPU >> virtual address space, upload command stream and draw data with >> CPU mmap of the buffer object, then submit task to GP/PP with >> a register frame indicating where is the command stream and misc >> settings. >> - There's no command stream validation/relocation due to each user >> process has its own GPU virtual address space. GP/PP's MMU switch >> virtual address space before running two tasks from different >> user process. Error or evil user space code just get MMU fault >> or GP/PP error IRQ, then the HW/SW will be recovered. >> - Use GEM+shmem for MM. Currently just alloc and pin memory when >> gem object creation. GPU vm map of the buffer is also done in >> the alloc stage in kernel space. We may delay the memory >> allocation and real GPU vm map to command submission stage in the >> furture as improvement. >> - Use drm_sched for GPU task schedule. Each OpenGL context should >> have a lima context object in the kernel to distinguish tasks >> from different user. drm_sched gets task from each lima context >> in a fair way. >> >> mesa driver can be found here before upstreamed: >> https://gitlab.freedesktop.org/lima/mesa >> >> v7: >> - remove lima_fence_ops with default value >> - move fence slab create to device probe >> - check pad ioctl args to be zero >> - add comments for user/kernel interface >> >> v6: >> - fix comments by checkpatch.pl >> >> v5: >> - export gp/pp version to userspace >> - rebase on drm-misc-next >> >> v4: >> - use get param interface to get info >> - separate context create/free ioctl >> - remove unused max sched task param >> - update copyright time >> - use xarray instead of idr >> - stop using drmP.h >> >> v3: >> - fix comments from kbuild robot >> - restrict supported arch to tested ones >> >> v2: >> - fix syscall argument check >> - fix job finish fence leak since kernel 5.0 >> - use drm syncobj to replace native fence >> - move buffer object GPU va map into kernel >> - reserve syscall argument space for future info >> - remove kernel gem modifier >> - switch TTM back to GEM+shmem MM >> - use time based io poll >> - use whole register name >> - adopt gem reservation obj integration >> - use drm_timeout_abs_to_jiffies >> >> Cc: Eric Anholt >> Cc: Rob Herring >> Cc: Christian König >> Cc: Daniel Vetter >> Cc: Alex Deucher >> Cc: Sam Ravnborg >> Cc: Rob Clark >> Signed-off-by: Andreas Baierl >> Signed-off-by: Erico Nunes >> Signed-off-by: Heiko Stuebner >> Signed-off-by: Marek Vasut >> Signed-off-by: Neil Armstrong >> Signed-off-by: Simon Shields >> Signed-off-by: Vasily Khoruzhick >> Signed-off-by: Qiang Yu >> --- >> drivers/gpu/drm/Kconfig | 2 + >> drivers/gpu/drm/Makefile | 1 + >> drivers/gpu/drm/lima/Kconfig | 10 + >> drivers/gpu/drm/lima/Makefile | 21 ++ >> drivers/gpu/drm/lima/lima_bcast.c | 47 +++ >> drivers/gpu/drm/lima/lima_bcast.h | 14 + >> drivers/gpu/drm/lima/lima_ctx.c | 97 ++ >> drivers/gpu/drm/lima/lima_ctx.h | 30 ++ >> drivers/gpu/drm/lima/lima_device.c| 385 +++ >> drivers/gpu/drm/lima/lima_device.h| 131 >> drivers/gpu/drm/lima/lima_dlbu.c | 58 >> drivers/gpu/drm/lima/lima_dlbu.h | 18 ++ >> drivers/gpu/drm/lima/lima_drv.c | 376 +++ >> drivers/gpu/drm/lima/lima_drv.h | 45 +++ >> drivers/gpu/drm/lima/lima_gem.c | 381 +++ >> drivers/gpu/drm/lima/lima_gem.h | 25 ++ >> drivers/gpu/drm/lima/lima_gem_prime.c | 47 +++ >> drivers/gpu/drm/lima/lima_gem_prime.h | 13 + >> drivers/gpu/drm/lima/lima_gp.c| 283 + >> drivers/gpu/drm/lima/lima_gp.h| 16 + >> drivers/gpu/drm/lima/lima_l2_cache.c | 80 + >> drivers/gpu/drm/lima/lima_l2_cache.h | 14 + >> drivers/gpu/drm/lima/lima_mmu.c | 142 + >> drivers/gpu/drm/lima/lima_mmu.h | 16 + >> drivers/gpu/drm/lima/lima_object.c| 122 >> drivers/gpu/drm/lima/lima_object.h| 36 +++ >> drivers/gpu/drm/lima/lima_pmu.c | 60 >> drivers/gpu/drm/lima/lima_pmu.h | 12 + >> drivers/gpu/drm/lima/lima_pp.c| 427
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
> +#endif > diff --git a/include/uapi/drm/lima_drm.h b/include/uapi/drm/lima_drm.h > new file mode 100644 > index ..05f8c910d7fb > --- /dev/null > +++ b/include/uapi/drm/lima_drm.h > @@ -0,0 +1,164 @@ > +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */ > +/* Copyright 2017-2018 Qiang Yu */ > + > +#ifndef __LIMA_DRM_H__ > +#define __LIMA_DRM_H__ > + > +#include "drm.h" > + > +#if defined(__cplusplus) > +extern "C" { > +#endif > + > +enum drm_lima_param_gpu_id { > + DRM_LIMA_PARAM_GPU_ID_UNKNOWN, > + DRM_LIMA_PARAM_GPU_ID_MALI400, > + DRM_LIMA_PARAM_GPU_ID_MALI450, > +}; > + > +enum drm_lima_param { > + DRM_LIMA_PARAM_GPU_ID, > + DRM_LIMA_PARAM_NUM_PP, > + DRM_LIMA_PARAM_GP_VERSION, > + DRM_LIMA_PARAM_PP_VERSION, > +}; > + > +/** > + * get various information of the GPU > + */ > +struct drm_lima_get_param { > + __u32 param; /* in, value in enum drm_lima_param */ > + __u32 pad; /* pad, must be zero */ > + __u64 value; /* out, parameter value */ > +}; > + > +/** > + * create a buffer for used by GPU > + */ > +struct drm_lima_gem_create { > + __u32 size;/* in, buffer size */ > + __u32 flags; /* in, currently no flags, must be zero */ > + __u32 handle; /* out, GEM buffer handle */ > + __u32 pad; /* pad, must be zero */ > +}; > + > +/** > + * get information of a buffer > + */ > +struct drm_lima_gem_info { > + __u32 handle; /* in, GEM buffer handle */ > + __u32 va; /* out, virtual address mapped into GPU MMU */ > + __u64 offset; /* out, used to mmap this buffer to CPU */ > +}; > + > +#define LIMA_SUBMIT_BO_READ 0x01 > +#define LIMA_SUBMIT_BO_WRITE 0x02 > + > +/* buffer information used by one task */ > +struct drm_lima_gem_submit_bo { > + __u32 handle; /* in, GEM buffer handle */ > + __u32 flags; /* in, buffer read/write by GPU */ > +}; > + > +#define LIMA_GP_FRAME_REG_NUM 6 > + > +/* frame used to setup GP for each task */ > +struct drm_lima_gp_frame { > + __u32 frame[LIMA_GP_FRAME_REG_NUM]; > +}; > + > +#define LIMA_PP_FRAME_REG_NUM 23 > +#define LIMA_PP_WB_REG_NUM 12 > + > +/* frame used to setup mali400 GPU PP for each task */ > +struct drm_lima_m400_pp_frame { > + __u32 frame[LIMA_PP_FRAME_REG_NUM]; > + __u32 num_pp; > + __u32 wb[3 * LIMA_PP_WB_REG_NUM]; > + __u32 plbu_array_address[4]; > + __u32 fragment_stack_address[4]; > +}; > + > +/* frame used to setup mali450 GPU PP for each task */ > +struct drm_lima_m450_pp_frame { > + __u32 frame[LIMA_PP_FRAME_REG_NUM]; > + __u32 num_pp; > + __u32 wb[3 * LIMA_PP_WB_REG_NUM]; > + __u32 use_dlbu; > + __u32 _pad; > + union { > + __u32 plbu_array_address[8]; > + __u32 dlbu_regs[4]; > + }; > + __u32 fragment_stack_address[8]; > +}; > + > +#define LIMA_PIPE_GP 0x00 > +#define LIMA_PIPE_PP 0x01 > + > +#define LIMA_SUBMIT_FLAG_EXPLICIT_FENCE (1 << 0) > + > +/** > + * submit a task to GPU > + */ > +struct drm_lima_gem_submit { > + __u32 ctx; /* in, context handle task is submitted to */ > + __u32 pipe;/* in, which pipe to use, GP/PP */ > + __u32 nr_bos; /* in, array length of bos field */ > + __u32 frame_size; /* in, size of frame field */ > + __u64 bos; /* in, array of drm_lima_gem_submit_bo */ > + __u64 frame; /* in, GP/PP frame */ > + __u32 flags; /* in, submit flags */ > + __u32 out_sync;/* in, drm_syncobj handle used to wait task finish > after submission */ > + __u32 in_sync[2]; /* in, drm_syncobj handle used to wait before > start this task */ > +}; This seems a bit limited, is there a reason it's two, at least in Vulkan drivers we'd want more than two I suspect (Vulkan may not work on this hw anyways), but it might be required in the future to make this extensible. At least a comment stating why 2 was picked is sufficient for current use cases. Thanks, Dave ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
On Thu, 7 Mar 2019 at 09:46, Rob Herring wrote: > > On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu wrote: > > > > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for > > OpenGL vertex shader processing and PP is for fragment shader > > processing. Each processor has its own MMU so prcessors work in > > virtual address space. > > - There's only one GP but multiple PP (max 4 for mali 400 and 8 > > for mali 450) in the same mali 4xx GPU. All PPs are grouped > > togather to handle a single fragment shader task divided by > > FB output tiled pixels. Mali 400 user space driver is > > responsible for assign target tiled pixels to each PP, but mali > > 450 has a HW module called DLBU to dynamically balance each > > PP's load. > > - User space driver allocate buffer object and map into GPU > > virtual address space, upload command stream and draw data with > > CPU mmap of the buffer object, then submit task to GP/PP with > > a register frame indicating where is the command stream and misc > > settings. > > - There's no command stream validation/relocation due to each user > > process has its own GPU virtual address space. GP/PP's MMU switch > > virtual address space before running two tasks from different > > user process. Error or evil user space code just get MMU fault > > or GP/PP error IRQ, then the HW/SW will be recovered. > > - Use GEM+shmem for MM. Currently just alloc and pin memory when > > gem object creation. GPU vm map of the buffer is also done in > > the alloc stage in kernel space. We may delay the memory > > allocation and real GPU vm map to command submission stage in the > > furture as improvement. > > - Use drm_sched for GPU task schedule. Each OpenGL context should > > have a lima context object in the kernel to distinguish tasks > > from different user. drm_sched gets task from each lima context > > in a fair way. > > > > mesa driver can be found here before upstreamed: > > https://gitlab.freedesktop.org/lima/mesa > > > > v7: > > - remove lima_fence_ops with default value > > - move fence slab create to device probe > > - check pad ioctl args to be zero > > - add comments for user/kernel interface > > > > v6: > > - fix comments by checkpatch.pl > > > > v5: > > - export gp/pp version to userspace > > - rebase on drm-misc-next > > > > v4: > > - use get param interface to get info > > - separate context create/free ioctl > > - remove unused max sched task param > > - update copyright time > > - use xarray instead of idr > > - stop using drmP.h > > > > v3: > > - fix comments from kbuild robot > > - restrict supported arch to tested ones > > > > v2: > > - fix syscall argument check > > - fix job finish fence leak since kernel 5.0 > > - use drm syncobj to replace native fence > > - move buffer object GPU va map into kernel > > - reserve syscall argument space for future info > > - remove kernel gem modifier > > - switch TTM back to GEM+shmem MM > > - use time based io poll > > - use whole register name > > - adopt gem reservation obj integration > > - use drm_timeout_abs_to_jiffies > > > > Cc: Eric Anholt > > Cc: Rob Herring > > Cc: Christian König > > Cc: Daniel Vetter > > Cc: Alex Deucher > > Cc: Sam Ravnborg > > Cc: Rob Clark > > Signed-off-by: Andreas Baierl > > Signed-off-by: Erico Nunes > > Signed-off-by: Heiko Stuebner > > Signed-off-by: Marek Vasut > > Signed-off-by: Neil Armstrong > > Signed-off-by: Simon Shields > > Signed-off-by: Vasily Khoruzhick > > Signed-off-by: Qiang Yu > > --- > > drivers/gpu/drm/Kconfig | 2 + > > drivers/gpu/drm/Makefile | 1 + > > drivers/gpu/drm/lima/Kconfig | 10 + > > drivers/gpu/drm/lima/Makefile | 21 ++ > > drivers/gpu/drm/lima/lima_bcast.c | 47 +++ > > drivers/gpu/drm/lima/lima_bcast.h | 14 + > > drivers/gpu/drm/lima/lima_ctx.c | 97 ++ > > drivers/gpu/drm/lima/lima_ctx.h | 30 ++ > > drivers/gpu/drm/lima/lima_device.c| 385 +++ > > drivers/gpu/drm/lima/lima_device.h| 131 > > drivers/gpu/drm/lima/lima_dlbu.c | 58 > > drivers/gpu/drm/lima/lima_dlbu.h | 18 ++ > > drivers/gpu/drm/lima/lima_drv.c | 376 +++ > > drivers/gpu/drm/lima/lima_drv.h | 45 +++ > > drivers/gpu/drm/lima/lima_gem.c | 381 +++ > > drivers/gpu/drm/lima/lima_gem.h | 25 ++ > > drivers/gpu/drm/lima/lima_gem_prime.c | 47 +++ > > drivers/gpu/drm/lima/lima_gem_prime.h | 13 + > > drivers/gpu/drm/lima/lima_gp.c| 283 + > > drivers/gpu/drm/lima/lima_gp.h| 16 + > > drivers/gpu/drm/lima/lima_l2_cache.c | 80 + > > drivers/gpu/drm/lima/lima_l2_cache.h | 14 + > > drivers/gpu/drm/lima/lima_mmu.c | 142 + > > drivers/gpu/drm/lima/lima_mmu.h | 16 + > > drivers/gpu/drm/lima/lima_object.c| 122 > > drivers/gpu/drm/lima/lima_object.h| 36 +++ > >
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu wrote: > > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for > OpenGL vertex shader processing and PP is for fragment shader > processing. Each processor has its own MMU so prcessors work in > virtual address space. > - There's only one GP but multiple PP (max 4 for mali 400 and 8 > for mali 450) in the same mali 4xx GPU. All PPs are grouped > togather to handle a single fragment shader task divided by > FB output tiled pixels. Mali 400 user space driver is > responsible for assign target tiled pixels to each PP, but mali > 450 has a HW module called DLBU to dynamically balance each > PP's load. > - User space driver allocate buffer object and map into GPU > virtual address space, upload command stream and draw data with > CPU mmap of the buffer object, then submit task to GP/PP with > a register frame indicating where is the command stream and misc > settings. > - There's no command stream validation/relocation due to each user > process has its own GPU virtual address space. GP/PP's MMU switch > virtual address space before running two tasks from different > user process. Error or evil user space code just get MMU fault > or GP/PP error IRQ, then the HW/SW will be recovered. > - Use GEM+shmem for MM. Currently just alloc and pin memory when > gem object creation. GPU vm map of the buffer is also done in > the alloc stage in kernel space. We may delay the memory > allocation and real GPU vm map to command submission stage in the > furture as improvement. > - Use drm_sched for GPU task schedule. Each OpenGL context should > have a lima context object in the kernel to distinguish tasks > from different user. drm_sched gets task from each lima context > in a fair way. > > mesa driver can be found here before upstreamed: > https://gitlab.freedesktop.org/lima/mesa > > v7: > - remove lima_fence_ops with default value > - move fence slab create to device probe > - check pad ioctl args to be zero > - add comments for user/kernel interface > > v6: > - fix comments by checkpatch.pl > > v5: > - export gp/pp version to userspace > - rebase on drm-misc-next > > v4: > - use get param interface to get info > - separate context create/free ioctl > - remove unused max sched task param > - update copyright time > - use xarray instead of idr > - stop using drmP.h > > v3: > - fix comments from kbuild robot > - restrict supported arch to tested ones > > v2: > - fix syscall argument check > - fix job finish fence leak since kernel 5.0 > - use drm syncobj to replace native fence > - move buffer object GPU va map into kernel > - reserve syscall argument space for future info > - remove kernel gem modifier > - switch TTM back to GEM+shmem MM > - use time based io poll > - use whole register name > - adopt gem reservation obj integration > - use drm_timeout_abs_to_jiffies > > Cc: Eric Anholt > Cc: Rob Herring > Cc: Christian König > Cc: Daniel Vetter > Cc: Alex Deucher > Cc: Sam Ravnborg > Cc: Rob Clark > Signed-off-by: Andreas Baierl > Signed-off-by: Erico Nunes > Signed-off-by: Heiko Stuebner > Signed-off-by: Marek Vasut > Signed-off-by: Neil Armstrong > Signed-off-by: Simon Shields > Signed-off-by: Vasily Khoruzhick > Signed-off-by: Qiang Yu > --- > drivers/gpu/drm/Kconfig | 2 + > drivers/gpu/drm/Makefile | 1 + > drivers/gpu/drm/lima/Kconfig | 10 + > drivers/gpu/drm/lima/Makefile | 21 ++ > drivers/gpu/drm/lima/lima_bcast.c | 47 +++ > drivers/gpu/drm/lima/lima_bcast.h | 14 + > drivers/gpu/drm/lima/lima_ctx.c | 97 ++ > drivers/gpu/drm/lima/lima_ctx.h | 30 ++ > drivers/gpu/drm/lima/lima_device.c| 385 +++ > drivers/gpu/drm/lima/lima_device.h| 131 > drivers/gpu/drm/lima/lima_dlbu.c | 58 > drivers/gpu/drm/lima/lima_dlbu.h | 18 ++ > drivers/gpu/drm/lima/lima_drv.c | 376 +++ > drivers/gpu/drm/lima/lima_drv.h | 45 +++ > drivers/gpu/drm/lima/lima_gem.c | 381 +++ > drivers/gpu/drm/lima/lima_gem.h | 25 ++ > drivers/gpu/drm/lima/lima_gem_prime.c | 47 +++ > drivers/gpu/drm/lima/lima_gem_prime.h | 13 + > drivers/gpu/drm/lima/lima_gp.c| 283 + > drivers/gpu/drm/lima/lima_gp.h| 16 + > drivers/gpu/drm/lima/lima_l2_cache.c | 80 + > drivers/gpu/drm/lima/lima_l2_cache.h | 14 + > drivers/gpu/drm/lima/lima_mmu.c | 142 + > drivers/gpu/drm/lima/lima_mmu.h | 16 + > drivers/gpu/drm/lima/lima_object.c| 122 > drivers/gpu/drm/lima/lima_object.h| 36 +++ > drivers/gpu/drm/lima/lima_pmu.c | 60 > drivers/gpu/drm/lima/lima_pmu.h | 12 + > drivers/gpu/drm/lima/lima_pp.c| 427 ++ > drivers/gpu/drm/lima/lima_pp.h| 19 ++ > drivers/gpu/drm/lima/lima_regs.h | 298 ++ >
Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs
Qiang Yu writes: > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for > OpenGL vertex shader processing and PP is for fragment shader > processing. Each processor has its own MMU so prcessors work in > virtual address space. > - There's only one GP but multiple PP (max 4 for mali 400 and 8 > for mali 450) in the same mali 4xx GPU. All PPs are grouped > togather to handle a single fragment shader task divided by > FB output tiled pixels. Mali 400 user space driver is > responsible for assign target tiled pixels to each PP, but mali > 450 has a HW module called DLBU to dynamically balance each > PP's load. > - User space driver allocate buffer object and map into GPU > virtual address space, upload command stream and draw data with > CPU mmap of the buffer object, then submit task to GP/PP with > a register frame indicating where is the command stream and misc > settings. > - There's no command stream validation/relocation due to each user > process has its own GPU virtual address space. GP/PP's MMU switch > virtual address space before running two tasks from different > user process. Error or evil user space code just get MMU fault > or GP/PP error IRQ, then the HW/SW will be recovered. > - Use GEM+shmem for MM. Currently just alloc and pin memory when > gem object creation. GPU vm map of the buffer is also done in > the alloc stage in kernel space. We may delay the memory > allocation and real GPU vm map to command submission stage in the > furture as improvement. > - Use drm_sched for GPU task schedule. Each OpenGL context should > have a lima context object in the kernel to distinguish tasks > from different user. drm_sched gets task from each lima context > in a fair way. > > mesa driver can be found here before upstreamed: > https://gitlab.freedesktop.org/lima/mesa > > v7: > - remove lima_fence_ops with default value > - move fence slab create to device probe > - check pad ioctl args to be zero > - add comments for user/kernel interface Thanks for adding the comments! That helps a lot. I feel pretty good about the ABI at this point. Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel