Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Qiang Yu
On Thu, Mar 7, 2019 at 9:11 AM Eric Anholt  wrote:
>
> Rob Herring  writes:
>
> > On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu  wrote:
> >>
> >> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
> >>   OpenGL vertex shader processing and PP is for fragment shader
> >>   processing. Each processor has its own MMU so prcessors work in
> >>   virtual address space.
> >> - There's only one GP but multiple PP (max 4 for mali 400 and 8
> >>   for mali 450) in the same mali 4xx GPU. All PPs are grouped
> >>   togather to handle a single fragment shader task divided by
> >>   FB output tiled pixels. Mali 400 user space driver is
> >>   responsible for assign target tiled pixels to each PP, but mali
> >>   450 has a HW module called DLBU to dynamically balance each
> >>   PP's load.
> >> - User space driver allocate buffer object and map into GPU
> >>   virtual address space, upload command stream and draw data with
> >>   CPU mmap of the buffer object, then submit task to GP/PP with
> >>   a register frame indicating where is the command stream and misc
> >>   settings.
> >> - There's no command stream validation/relocation due to each user
> >>   process has its own GPU virtual address space. GP/PP's MMU switch
> >>   virtual address space before running two tasks from different
> >>   user process. Error or evil user space code just get MMU fault
> >>   or GP/PP error IRQ, then the HW/SW will be recovered.
> >> - Use GEM+shmem for MM. Currently just alloc and pin memory when
> >>   gem object creation. GPU vm map of the buffer is also done in
> >>   the alloc stage in kernel space. We may delay the memory
> >>   allocation and real GPU vm map to command submission stage in the
> >>   furture as improvement.
> >> - Use drm_sched for GPU task schedule. Each OpenGL context should
> >>   have a lima context object in the kernel to distinguish tasks
> >>   from different user. drm_sched gets task from each lima context
> >>   in a fair way.
> >>
> >> mesa driver can be found here before upstreamed:
> >> https://gitlab.freedesktop.org/lima/mesa
> >>
> >> v7:
> >> - remove lima_fence_ops with default value
> >> - move fence slab create to device probe
> >> - check pad ioctl args to be zero
> >> - add comments for user/kernel interface
> >>
> >> v6:
> >> - fix comments by checkpatch.pl
> >>
> >> v5:
> >> - export gp/pp version to userspace
> >> - rebase on drm-misc-next
> >>
> >> v4:
> >> - use get param interface to get info
> >> - separate context create/free ioctl
> >> - remove unused max sched task param
> >> - update copyright time
> >> - use xarray instead of idr
> >> - stop using drmP.h
> >>
> >> v3:
> >> - fix comments from kbuild robot
> >> - restrict supported arch to tested ones
> >>
> >> v2:
> >> - fix syscall argument check
> >> - fix job finish fence leak since kernel 5.0
> >> - use drm syncobj to replace native fence
> >> - move buffer object GPU va map into kernel
> >> - reserve syscall argument space for future info
> >> - remove kernel gem modifier
> >> - switch TTM back to GEM+shmem MM
> >> - use time based io poll
> >> - use whole register name
> >> - adopt gem reservation obj integration
> >> - use drm_timeout_abs_to_jiffies
> >>
> >> Cc: Eric Anholt 
> >> Cc: Rob Herring 
> >> Cc: Christian König 
> >> Cc: Daniel Vetter 
> >> Cc: Alex Deucher 
> >> Cc: Sam Ravnborg 
> >> Cc: Rob Clark 
> >> Signed-off-by: Andreas Baierl 
> >> Signed-off-by: Erico Nunes 
> >> Signed-off-by: Heiko Stuebner 
> >> Signed-off-by: Marek Vasut 
> >> Signed-off-by: Neil Armstrong 
> >> Signed-off-by: Simon Shields 
> >> Signed-off-by: Vasily Khoruzhick 
> >> Signed-off-by: Qiang Yu 
> >> ---
> >>  drivers/gpu/drm/Kconfig   |   2 +
> >>  drivers/gpu/drm/Makefile  |   1 +
> >>  drivers/gpu/drm/lima/Kconfig  |  10 +
> >>  drivers/gpu/drm/lima/Makefile |  21 ++
> >>  drivers/gpu/drm/lima/lima_bcast.c |  47 +++
> >>  drivers/gpu/drm/lima/lima_bcast.h |  14 +
> >>  drivers/gpu/drm/lima/lima_ctx.c   |  97 ++
> >>  drivers/gpu/drm/lima/lima_ctx.h   |  30 ++
> >>  drivers/gpu/drm/lima/lima_device.c| 385 +++
> >>  drivers/gpu/drm/lima/lima_device.h| 131 
> >>  drivers/gpu/drm/lima/lima_dlbu.c  |  58 
> >>  drivers/gpu/drm/lima/lima_dlbu.h  |  18 ++
> >>  drivers/gpu/drm/lima/lima_drv.c   | 376 +++
> >>  drivers/gpu/drm/lima/lima_drv.h   |  45 +++
> >>  drivers/gpu/drm/lima/lima_gem.c   | 381 +++
> >>  drivers/gpu/drm/lima/lima_gem.h   |  25 ++
> >>  drivers/gpu/drm/lima/lima_gem_prime.c |  47 +++
> >>  drivers/gpu/drm/lima/lima_gem_prime.h |  13 +
> >>  drivers/gpu/drm/lima/lima_gp.c| 283 +
> >>  drivers/gpu/drm/lima/lima_gp.h|  16 +
> >>  drivers/gpu/drm/lima/lima_l2_cache.c  |  80 +
> >>  drivers/gpu/drm/lima/lima_l2_cache.h  |  14 +
> >>  drivers/gpu/drm/lima/lima_mmu.c   | 142 +
> >>  

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Qiang Yu
On Thu, Mar 7, 2019 at 8:15 AM Dave Airlie  wrote:
>
> > +#endif
> > diff --git a/include/uapi/drm/lima_drm.h b/include/uapi/drm/lima_drm.h
> > new file mode 100644
> > index ..05f8c910d7fb
> > --- /dev/null
> > +++ b/include/uapi/drm/lima_drm.h
> > @@ -0,0 +1,164 @@
> > +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
> > +/* Copyright 2017-2018 Qiang Yu  */
> > +
> > +#ifndef __LIMA_DRM_H__
> > +#define __LIMA_DRM_H__
> > +
> > +#include "drm.h"
> > +
> > +#if defined(__cplusplus)
> > +extern "C" {
> > +#endif
> > +
> > +enum drm_lima_param_gpu_id {
> > +   DRM_LIMA_PARAM_GPU_ID_UNKNOWN,
> > +   DRM_LIMA_PARAM_GPU_ID_MALI400,
> > +   DRM_LIMA_PARAM_GPU_ID_MALI450,
> > +};
> > +
> > +enum drm_lima_param {
> > +   DRM_LIMA_PARAM_GPU_ID,
> > +   DRM_LIMA_PARAM_NUM_PP,
> > +   DRM_LIMA_PARAM_GP_VERSION,
> > +   DRM_LIMA_PARAM_PP_VERSION,
> > +};
> > +
> > +/**
> > + * get various information of the GPU
> > + */
> > +struct drm_lima_get_param {
> > +   __u32 param; /* in, value in enum drm_lima_param */
> > +   __u32 pad;   /* pad, must be zero */
> > +   __u64 value; /* out, parameter value */
> > +};
> > +
> > +/**
> > + * create a buffer for used by GPU
> > + */
> > +struct drm_lima_gem_create {
> > +   __u32 size;/* in, buffer size */
> > +   __u32 flags;   /* in, currently no flags, must be zero */
> > +   __u32 handle;  /* out, GEM buffer handle */
> > +   __u32 pad; /* pad, must be zero */
> > +};
> > +
> > +/**
> > + * get information of a buffer
> > + */
> > +struct drm_lima_gem_info {
> > +   __u32 handle;  /* in, GEM buffer handle */
> > +   __u32 va;  /* out, virtual address mapped into GPU MMU */
> > +   __u64 offset;  /* out, used to mmap this buffer to CPU */
> > +};
> > +
> > +#define LIMA_SUBMIT_BO_READ   0x01
> > +#define LIMA_SUBMIT_BO_WRITE  0x02
> > +
> > +/* buffer information used by one task */
> > +struct drm_lima_gem_submit_bo {
> > +   __u32 handle;  /* in, GEM buffer handle */
> > +   __u32 flags;   /* in, buffer read/write by GPU */
> > +};
> > +
> > +#define LIMA_GP_FRAME_REG_NUM 6
> > +
> > +/* frame used to setup GP for each task */
> > +struct drm_lima_gp_frame {
> > +   __u32 frame[LIMA_GP_FRAME_REG_NUM];
> > +};
> > +
> > +#define LIMA_PP_FRAME_REG_NUM 23
> > +#define LIMA_PP_WB_REG_NUM 12
> > +
> > +/* frame used to setup mali400 GPU PP for each task */
> > +struct drm_lima_m400_pp_frame {
> > +   __u32 frame[LIMA_PP_FRAME_REG_NUM];
> > +   __u32 num_pp;
> > +   __u32 wb[3 * LIMA_PP_WB_REG_NUM];
> > +   __u32 plbu_array_address[4];
> > +   __u32 fragment_stack_address[4];
> > +};
> > +
> > +/* frame used to setup mali450 GPU PP for each task */
> > +struct drm_lima_m450_pp_frame {
> > +   __u32 frame[LIMA_PP_FRAME_REG_NUM];
> > +   __u32 num_pp;
> > +   __u32 wb[3 * LIMA_PP_WB_REG_NUM];
> > +   __u32 use_dlbu;
> > +   __u32 _pad;
> > +   union {
> > +   __u32 plbu_array_address[8];
> > +   __u32 dlbu_regs[4];
> > +   };
> > +   __u32 fragment_stack_address[8];
> > +};
> > +
> > +#define LIMA_PIPE_GP  0x00
> > +#define LIMA_PIPE_PP  0x01
> > +
> > +#define LIMA_SUBMIT_FLAG_EXPLICIT_FENCE (1 << 0)
> > +
> > +/**
> > + * submit a task to GPU
> > + */
> > +struct drm_lima_gem_submit {
> > +   __u32 ctx; /* in, context handle task is submitted to */
> > +   __u32 pipe;/* in, which pipe to use, GP/PP */
> > +   __u32 nr_bos;  /* in, array length of bos field */
> > +   __u32 frame_size;  /* in, size of frame field */
> > +   __u64 bos; /* in, array of drm_lima_gem_submit_bo */
> > +   __u64 frame;   /* in, GP/PP frame */
> > +   __u32 flags;   /* in, submit flags */
> > +   __u32 out_sync;/* in, drm_syncobj handle used to wait task 
> > finish after submission */
> > +   __u32 in_sync[2];  /* in, drm_syncobj handle used to wait before 
> > start this task */
> > +};
>
> This seems a bit limited, is there a reason it's two, at least in
> Vulkan drivers we'd want more than two I suspect (Vulkan may not work
> on this hw anyways), but it might be required in the future to make
> this extensible.
Mali4xx GPU does not support Vulkan, the reason I pick two is, one for
sync_file fd imported drm_syncobj, one for GP out_sync be able to pass
to PP in_sync directly when explicit fence without
drm_syncobj -> sync_file -> merge sync_file -> drm_syncobj
pass.

>
> At least a comment stating why 2 was picked is sufficient for current use 
> cases.
>
OK, will add it.

Regards,
Qiang
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Qiang Yu
On Thu, Mar 7, 2019 at 8:08 AM Dave Airlie  wrote:
>
> On Thu, 7 Mar 2019 at 09:46, Rob Herring  wrote:
> >
> > On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu  wrote:
> > >
> > > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
> > >   OpenGL vertex shader processing and PP is for fragment shader
> > >   processing. Each processor has its own MMU so prcessors work in
> > >   virtual address space.
> > > - There's only one GP but multiple PP (max 4 for mali 400 and 8
> > >   for mali 450) in the same mali 4xx GPU. All PPs are grouped
> > >   togather to handle a single fragment shader task divided by
> > >   FB output tiled pixels. Mali 400 user space driver is
> > >   responsible for assign target tiled pixels to each PP, but mali
> > >   450 has a HW module called DLBU to dynamically balance each
> > >   PP's load.
> > > - User space driver allocate buffer object and map into GPU
> > >   virtual address space, upload command stream and draw data with
> > >   CPU mmap of the buffer object, then submit task to GP/PP with
> > >   a register frame indicating where is the command stream and misc
> > >   settings.
> > > - There's no command stream validation/relocation due to each user
> > >   process has its own GPU virtual address space. GP/PP's MMU switch
> > >   virtual address space before running two tasks from different
> > >   user process. Error or evil user space code just get MMU fault
> > >   or GP/PP error IRQ, then the HW/SW will be recovered.
> > > - Use GEM+shmem for MM. Currently just alloc and pin memory when
> > >   gem object creation. GPU vm map of the buffer is also done in
> > >   the alloc stage in kernel space. We may delay the memory
> > >   allocation and real GPU vm map to command submission stage in the
> > >   furture as improvement.
> > > - Use drm_sched for GPU task schedule. Each OpenGL context should
> > >   have a lima context object in the kernel to distinguish tasks
> > >   from different user. drm_sched gets task from each lima context
> > >   in a fair way.
> > >
> > > mesa driver can be found here before upstreamed:
> > > https://gitlab.freedesktop.org/lima/mesa
> > >
> > > v7:
> > > - remove lima_fence_ops with default value
> > > - move fence slab create to device probe
> > > - check pad ioctl args to be zero
> > > - add comments for user/kernel interface
> > >
> > > v6:
> > > - fix comments by checkpatch.pl
> > >
> > > v5:
> > > - export gp/pp version to userspace
> > > - rebase on drm-misc-next
> > >
> > > v4:
> > > - use get param interface to get info
> > > - separate context create/free ioctl
> > > - remove unused max sched task param
> > > - update copyright time
> > > - use xarray instead of idr
> > > - stop using drmP.h
> > >
> > > v3:
> > > - fix comments from kbuild robot
> > > - restrict supported arch to tested ones
> > >
> > > v2:
> > > - fix syscall argument check
> > > - fix job finish fence leak since kernel 5.0
> > > - use drm syncobj to replace native fence
> > > - move buffer object GPU va map into kernel
> > > - reserve syscall argument space for future info
> > > - remove kernel gem modifier
> > > - switch TTM back to GEM+shmem MM
> > > - use time based io poll
> > > - use whole register name
> > > - adopt gem reservation obj integration
> > > - use drm_timeout_abs_to_jiffies
> > >
> > > Cc: Eric Anholt 
> > > Cc: Rob Herring 
> > > Cc: Christian König 
> > > Cc: Daniel Vetter 
> > > Cc: Alex Deucher 
> > > Cc: Sam Ravnborg 
> > > Cc: Rob Clark 
> > > Signed-off-by: Andreas Baierl 
> > > Signed-off-by: Erico Nunes 
> > > Signed-off-by: Heiko Stuebner 
> > > Signed-off-by: Marek Vasut 
> > > Signed-off-by: Neil Armstrong 
> > > Signed-off-by: Simon Shields 
> > > Signed-off-by: Vasily Khoruzhick 
> > > Signed-off-by: Qiang Yu 
> > > ---
> > >  drivers/gpu/drm/Kconfig   |   2 +
> > >  drivers/gpu/drm/Makefile  |   1 +
> > >  drivers/gpu/drm/lima/Kconfig  |  10 +
> > >  drivers/gpu/drm/lima/Makefile |  21 ++
> > >  drivers/gpu/drm/lima/lima_bcast.c |  47 +++
> > >  drivers/gpu/drm/lima/lima_bcast.h |  14 +
> > >  drivers/gpu/drm/lima/lima_ctx.c   |  97 ++
> > >  drivers/gpu/drm/lima/lima_ctx.h   |  30 ++
> > >  drivers/gpu/drm/lima/lima_device.c| 385 +++
> > >  drivers/gpu/drm/lima/lima_device.h| 131 
> > >  drivers/gpu/drm/lima/lima_dlbu.c  |  58 
> > >  drivers/gpu/drm/lima/lima_dlbu.h  |  18 ++
> > >  drivers/gpu/drm/lima/lima_drv.c   | 376 +++
> > >  drivers/gpu/drm/lima/lima_drv.h   |  45 +++
> > >  drivers/gpu/drm/lima/lima_gem.c   | 381 +++
> > >  drivers/gpu/drm/lima/lima_gem.h   |  25 ++
> > >  drivers/gpu/drm/lima/lima_gem_prime.c |  47 +++
> > >  drivers/gpu/drm/lima/lima_gem_prime.h |  13 +
> > >  drivers/gpu/drm/lima/lima_gp.c| 283 +
> > >  drivers/gpu/drm/lima/lima_gp.h|  16 +
> > >  drivers/gpu/drm/lima/lima_l2_cache.c  |  80 +
> 

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Eric Anholt
Rob Herring  writes:

> On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu  wrote:
>>
>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>>   OpenGL vertex shader processing and PP is for fragment shader
>>   processing. Each processor has its own MMU so prcessors work in
>>   virtual address space.
>> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>>   for mali 450) in the same mali 4xx GPU. All PPs are grouped
>>   togather to handle a single fragment shader task divided by
>>   FB output tiled pixels. Mali 400 user space driver is
>>   responsible for assign target tiled pixels to each PP, but mali
>>   450 has a HW module called DLBU to dynamically balance each
>>   PP's load.
>> - User space driver allocate buffer object and map into GPU
>>   virtual address space, upload command stream and draw data with
>>   CPU mmap of the buffer object, then submit task to GP/PP with
>>   a register frame indicating where is the command stream and misc
>>   settings.
>> - There's no command stream validation/relocation due to each user
>>   process has its own GPU virtual address space. GP/PP's MMU switch
>>   virtual address space before running two tasks from different
>>   user process. Error or evil user space code just get MMU fault
>>   or GP/PP error IRQ, then the HW/SW will be recovered.
>> - Use GEM+shmem for MM. Currently just alloc and pin memory when
>>   gem object creation. GPU vm map of the buffer is also done in
>>   the alloc stage in kernel space. We may delay the memory
>>   allocation and real GPU vm map to command submission stage in the
>>   furture as improvement.
>> - Use drm_sched for GPU task schedule. Each OpenGL context should
>>   have a lima context object in the kernel to distinguish tasks
>>   from different user. drm_sched gets task from each lima context
>>   in a fair way.
>>
>> mesa driver can be found here before upstreamed:
>> https://gitlab.freedesktop.org/lima/mesa
>>
>> v7:
>> - remove lima_fence_ops with default value
>> - move fence slab create to device probe
>> - check pad ioctl args to be zero
>> - add comments for user/kernel interface
>>
>> v6:
>> - fix comments by checkpatch.pl
>>
>> v5:
>> - export gp/pp version to userspace
>> - rebase on drm-misc-next
>>
>> v4:
>> - use get param interface to get info
>> - separate context create/free ioctl
>> - remove unused max sched task param
>> - update copyright time
>> - use xarray instead of idr
>> - stop using drmP.h
>>
>> v3:
>> - fix comments from kbuild robot
>> - restrict supported arch to tested ones
>>
>> v2:
>> - fix syscall argument check
>> - fix job finish fence leak since kernel 5.0
>> - use drm syncobj to replace native fence
>> - move buffer object GPU va map into kernel
>> - reserve syscall argument space for future info
>> - remove kernel gem modifier
>> - switch TTM back to GEM+shmem MM
>> - use time based io poll
>> - use whole register name
>> - adopt gem reservation obj integration
>> - use drm_timeout_abs_to_jiffies
>>
>> Cc: Eric Anholt 
>> Cc: Rob Herring 
>> Cc: Christian König 
>> Cc: Daniel Vetter 
>> Cc: Alex Deucher 
>> Cc: Sam Ravnborg 
>> Cc: Rob Clark 
>> Signed-off-by: Andreas Baierl 
>> Signed-off-by: Erico Nunes 
>> Signed-off-by: Heiko Stuebner 
>> Signed-off-by: Marek Vasut 
>> Signed-off-by: Neil Armstrong 
>> Signed-off-by: Simon Shields 
>> Signed-off-by: Vasily Khoruzhick 
>> Signed-off-by: Qiang Yu 
>> ---
>>  drivers/gpu/drm/Kconfig   |   2 +
>>  drivers/gpu/drm/Makefile  |   1 +
>>  drivers/gpu/drm/lima/Kconfig  |  10 +
>>  drivers/gpu/drm/lima/Makefile |  21 ++
>>  drivers/gpu/drm/lima/lima_bcast.c |  47 +++
>>  drivers/gpu/drm/lima/lima_bcast.h |  14 +
>>  drivers/gpu/drm/lima/lima_ctx.c   |  97 ++
>>  drivers/gpu/drm/lima/lima_ctx.h   |  30 ++
>>  drivers/gpu/drm/lima/lima_device.c| 385 +++
>>  drivers/gpu/drm/lima/lima_device.h| 131 
>>  drivers/gpu/drm/lima/lima_dlbu.c  |  58 
>>  drivers/gpu/drm/lima/lima_dlbu.h  |  18 ++
>>  drivers/gpu/drm/lima/lima_drv.c   | 376 +++
>>  drivers/gpu/drm/lima/lima_drv.h   |  45 +++
>>  drivers/gpu/drm/lima/lima_gem.c   | 381 +++
>>  drivers/gpu/drm/lima/lima_gem.h   |  25 ++
>>  drivers/gpu/drm/lima/lima_gem_prime.c |  47 +++
>>  drivers/gpu/drm/lima/lima_gem_prime.h |  13 +
>>  drivers/gpu/drm/lima/lima_gp.c| 283 +
>>  drivers/gpu/drm/lima/lima_gp.h|  16 +
>>  drivers/gpu/drm/lima/lima_l2_cache.c  |  80 +
>>  drivers/gpu/drm/lima/lima_l2_cache.h  |  14 +
>>  drivers/gpu/drm/lima/lima_mmu.c   | 142 +
>>  drivers/gpu/drm/lima/lima_mmu.h   |  16 +
>>  drivers/gpu/drm/lima/lima_object.c| 122 
>>  drivers/gpu/drm/lima/lima_object.h|  36 +++
>>  drivers/gpu/drm/lima/lima_pmu.c   |  60 
>>  drivers/gpu/drm/lima/lima_pmu.h   |  12 +
>>  drivers/gpu/drm/lima/lima_pp.c| 427 

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Dave Airlie
> +#endif
> diff --git a/include/uapi/drm/lima_drm.h b/include/uapi/drm/lima_drm.h
> new file mode 100644
> index ..05f8c910d7fb
> --- /dev/null
> +++ b/include/uapi/drm/lima_drm.h
> @@ -0,0 +1,164 @@
> +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
> +/* Copyright 2017-2018 Qiang Yu  */
> +
> +#ifndef __LIMA_DRM_H__
> +#define __LIMA_DRM_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +enum drm_lima_param_gpu_id {
> +   DRM_LIMA_PARAM_GPU_ID_UNKNOWN,
> +   DRM_LIMA_PARAM_GPU_ID_MALI400,
> +   DRM_LIMA_PARAM_GPU_ID_MALI450,
> +};
> +
> +enum drm_lima_param {
> +   DRM_LIMA_PARAM_GPU_ID,
> +   DRM_LIMA_PARAM_NUM_PP,
> +   DRM_LIMA_PARAM_GP_VERSION,
> +   DRM_LIMA_PARAM_PP_VERSION,
> +};
> +
> +/**
> + * get various information of the GPU
> + */
> +struct drm_lima_get_param {
> +   __u32 param; /* in, value in enum drm_lima_param */
> +   __u32 pad;   /* pad, must be zero */
> +   __u64 value; /* out, parameter value */
> +};
> +
> +/**
> + * create a buffer for used by GPU
> + */
> +struct drm_lima_gem_create {
> +   __u32 size;/* in, buffer size */
> +   __u32 flags;   /* in, currently no flags, must be zero */
> +   __u32 handle;  /* out, GEM buffer handle */
> +   __u32 pad; /* pad, must be zero */
> +};
> +
> +/**
> + * get information of a buffer
> + */
> +struct drm_lima_gem_info {
> +   __u32 handle;  /* in, GEM buffer handle */
> +   __u32 va;  /* out, virtual address mapped into GPU MMU */
> +   __u64 offset;  /* out, used to mmap this buffer to CPU */
> +};
> +
> +#define LIMA_SUBMIT_BO_READ   0x01
> +#define LIMA_SUBMIT_BO_WRITE  0x02
> +
> +/* buffer information used by one task */
> +struct drm_lima_gem_submit_bo {
> +   __u32 handle;  /* in, GEM buffer handle */
> +   __u32 flags;   /* in, buffer read/write by GPU */
> +};
> +
> +#define LIMA_GP_FRAME_REG_NUM 6
> +
> +/* frame used to setup GP for each task */
> +struct drm_lima_gp_frame {
> +   __u32 frame[LIMA_GP_FRAME_REG_NUM];
> +};
> +
> +#define LIMA_PP_FRAME_REG_NUM 23
> +#define LIMA_PP_WB_REG_NUM 12
> +
> +/* frame used to setup mali400 GPU PP for each task */
> +struct drm_lima_m400_pp_frame {
> +   __u32 frame[LIMA_PP_FRAME_REG_NUM];
> +   __u32 num_pp;
> +   __u32 wb[3 * LIMA_PP_WB_REG_NUM];
> +   __u32 plbu_array_address[4];
> +   __u32 fragment_stack_address[4];
> +};
> +
> +/* frame used to setup mali450 GPU PP for each task */
> +struct drm_lima_m450_pp_frame {
> +   __u32 frame[LIMA_PP_FRAME_REG_NUM];
> +   __u32 num_pp;
> +   __u32 wb[3 * LIMA_PP_WB_REG_NUM];
> +   __u32 use_dlbu;
> +   __u32 _pad;
> +   union {
> +   __u32 plbu_array_address[8];
> +   __u32 dlbu_regs[4];
> +   };
> +   __u32 fragment_stack_address[8];
> +};
> +
> +#define LIMA_PIPE_GP  0x00
> +#define LIMA_PIPE_PP  0x01
> +
> +#define LIMA_SUBMIT_FLAG_EXPLICIT_FENCE (1 << 0)
> +
> +/**
> + * submit a task to GPU
> + */
> +struct drm_lima_gem_submit {
> +   __u32 ctx; /* in, context handle task is submitted to */
> +   __u32 pipe;/* in, which pipe to use, GP/PP */
> +   __u32 nr_bos;  /* in, array length of bos field */
> +   __u32 frame_size;  /* in, size of frame field */
> +   __u64 bos; /* in, array of drm_lima_gem_submit_bo */
> +   __u64 frame;   /* in, GP/PP frame */
> +   __u32 flags;   /* in, submit flags */
> +   __u32 out_sync;/* in, drm_syncobj handle used to wait task finish 
> after submission */
> +   __u32 in_sync[2];  /* in, drm_syncobj handle used to wait before 
> start this task */
> +};

This seems a bit limited, is there a reason it's two, at least in
Vulkan drivers we'd want more than two I suspect (Vulkan may not work
on this hw anyways), but it might be required in the future to make
this extensible.

At least a comment stating why 2 was picked is sufficient for current use cases.

Thanks,
Dave
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Dave Airlie
On Thu, 7 Mar 2019 at 09:46, Rob Herring  wrote:
>
> On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu  wrote:
> >
> > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
> >   OpenGL vertex shader processing and PP is for fragment shader
> >   processing. Each processor has its own MMU so prcessors work in
> >   virtual address space.
> > - There's only one GP but multiple PP (max 4 for mali 400 and 8
> >   for mali 450) in the same mali 4xx GPU. All PPs are grouped
> >   togather to handle a single fragment shader task divided by
> >   FB output tiled pixels. Mali 400 user space driver is
> >   responsible for assign target tiled pixels to each PP, but mali
> >   450 has a HW module called DLBU to dynamically balance each
> >   PP's load.
> > - User space driver allocate buffer object and map into GPU
> >   virtual address space, upload command stream and draw data with
> >   CPU mmap of the buffer object, then submit task to GP/PP with
> >   a register frame indicating where is the command stream and misc
> >   settings.
> > - There's no command stream validation/relocation due to each user
> >   process has its own GPU virtual address space. GP/PP's MMU switch
> >   virtual address space before running two tasks from different
> >   user process. Error or evil user space code just get MMU fault
> >   or GP/PP error IRQ, then the HW/SW will be recovered.
> > - Use GEM+shmem for MM. Currently just alloc and pin memory when
> >   gem object creation. GPU vm map of the buffer is also done in
> >   the alloc stage in kernel space. We may delay the memory
> >   allocation and real GPU vm map to command submission stage in the
> >   furture as improvement.
> > - Use drm_sched for GPU task schedule. Each OpenGL context should
> >   have a lima context object in the kernel to distinguish tasks
> >   from different user. drm_sched gets task from each lima context
> >   in a fair way.
> >
> > mesa driver can be found here before upstreamed:
> > https://gitlab.freedesktop.org/lima/mesa
> >
> > v7:
> > - remove lima_fence_ops with default value
> > - move fence slab create to device probe
> > - check pad ioctl args to be zero
> > - add comments for user/kernel interface
> >
> > v6:
> > - fix comments by checkpatch.pl
> >
> > v5:
> > - export gp/pp version to userspace
> > - rebase on drm-misc-next
> >
> > v4:
> > - use get param interface to get info
> > - separate context create/free ioctl
> > - remove unused max sched task param
> > - update copyright time
> > - use xarray instead of idr
> > - stop using drmP.h
> >
> > v3:
> > - fix comments from kbuild robot
> > - restrict supported arch to tested ones
> >
> > v2:
> > - fix syscall argument check
> > - fix job finish fence leak since kernel 5.0
> > - use drm syncobj to replace native fence
> > - move buffer object GPU va map into kernel
> > - reserve syscall argument space for future info
> > - remove kernel gem modifier
> > - switch TTM back to GEM+shmem MM
> > - use time based io poll
> > - use whole register name
> > - adopt gem reservation obj integration
> > - use drm_timeout_abs_to_jiffies
> >
> > Cc: Eric Anholt 
> > Cc: Rob Herring 
> > Cc: Christian König 
> > Cc: Daniel Vetter 
> > Cc: Alex Deucher 
> > Cc: Sam Ravnborg 
> > Cc: Rob Clark 
> > Signed-off-by: Andreas Baierl 
> > Signed-off-by: Erico Nunes 
> > Signed-off-by: Heiko Stuebner 
> > Signed-off-by: Marek Vasut 
> > Signed-off-by: Neil Armstrong 
> > Signed-off-by: Simon Shields 
> > Signed-off-by: Vasily Khoruzhick 
> > Signed-off-by: Qiang Yu 
> > ---
> >  drivers/gpu/drm/Kconfig   |   2 +
> >  drivers/gpu/drm/Makefile  |   1 +
> >  drivers/gpu/drm/lima/Kconfig  |  10 +
> >  drivers/gpu/drm/lima/Makefile |  21 ++
> >  drivers/gpu/drm/lima/lima_bcast.c |  47 +++
> >  drivers/gpu/drm/lima/lima_bcast.h |  14 +
> >  drivers/gpu/drm/lima/lima_ctx.c   |  97 ++
> >  drivers/gpu/drm/lima/lima_ctx.h   |  30 ++
> >  drivers/gpu/drm/lima/lima_device.c| 385 +++
> >  drivers/gpu/drm/lima/lima_device.h| 131 
> >  drivers/gpu/drm/lima/lima_dlbu.c  |  58 
> >  drivers/gpu/drm/lima/lima_dlbu.h  |  18 ++
> >  drivers/gpu/drm/lima/lima_drv.c   | 376 +++
> >  drivers/gpu/drm/lima/lima_drv.h   |  45 +++
> >  drivers/gpu/drm/lima/lima_gem.c   | 381 +++
> >  drivers/gpu/drm/lima/lima_gem.h   |  25 ++
> >  drivers/gpu/drm/lima/lima_gem_prime.c |  47 +++
> >  drivers/gpu/drm/lima/lima_gem_prime.h |  13 +
> >  drivers/gpu/drm/lima/lima_gp.c| 283 +
> >  drivers/gpu/drm/lima/lima_gp.h|  16 +
> >  drivers/gpu/drm/lima/lima_l2_cache.c  |  80 +
> >  drivers/gpu/drm/lima/lima_l2_cache.h  |  14 +
> >  drivers/gpu/drm/lima/lima_mmu.c   | 142 +
> >  drivers/gpu/drm/lima/lima_mmu.h   |  16 +
> >  drivers/gpu/drm/lima/lima_object.c| 122 
> >  drivers/gpu/drm/lima/lima_object.h|  36 +++
> >  

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Rob Herring
On Wed, Mar 6, 2019 at 9:24 AM Qiang Yu  wrote:
>
> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>   OpenGL vertex shader processing and PP is for fragment shader
>   processing. Each processor has its own MMU so prcessors work in
>   virtual address space.
> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>   for mali 450) in the same mali 4xx GPU. All PPs are grouped
>   togather to handle a single fragment shader task divided by
>   FB output tiled pixels. Mali 400 user space driver is
>   responsible for assign target tiled pixels to each PP, but mali
>   450 has a HW module called DLBU to dynamically balance each
>   PP's load.
> - User space driver allocate buffer object and map into GPU
>   virtual address space, upload command stream and draw data with
>   CPU mmap of the buffer object, then submit task to GP/PP with
>   a register frame indicating where is the command stream and misc
>   settings.
> - There's no command stream validation/relocation due to each user
>   process has its own GPU virtual address space. GP/PP's MMU switch
>   virtual address space before running two tasks from different
>   user process. Error or evil user space code just get MMU fault
>   or GP/PP error IRQ, then the HW/SW will be recovered.
> - Use GEM+shmem for MM. Currently just alloc and pin memory when
>   gem object creation. GPU vm map of the buffer is also done in
>   the alloc stage in kernel space. We may delay the memory
>   allocation and real GPU vm map to command submission stage in the
>   furture as improvement.
> - Use drm_sched for GPU task schedule. Each OpenGL context should
>   have a lima context object in the kernel to distinguish tasks
>   from different user. drm_sched gets task from each lima context
>   in a fair way.
>
> mesa driver can be found here before upstreamed:
> https://gitlab.freedesktop.org/lima/mesa
>
> v7:
> - remove lima_fence_ops with default value
> - move fence slab create to device probe
> - check pad ioctl args to be zero
> - add comments for user/kernel interface
>
> v6:
> - fix comments by checkpatch.pl
>
> v5:
> - export gp/pp version to userspace
> - rebase on drm-misc-next
>
> v4:
> - use get param interface to get info
> - separate context create/free ioctl
> - remove unused max sched task param
> - update copyright time
> - use xarray instead of idr
> - stop using drmP.h
>
> v3:
> - fix comments from kbuild robot
> - restrict supported arch to tested ones
>
> v2:
> - fix syscall argument check
> - fix job finish fence leak since kernel 5.0
> - use drm syncobj to replace native fence
> - move buffer object GPU va map into kernel
> - reserve syscall argument space for future info
> - remove kernel gem modifier
> - switch TTM back to GEM+shmem MM
> - use time based io poll
> - use whole register name
> - adopt gem reservation obj integration
> - use drm_timeout_abs_to_jiffies
>
> Cc: Eric Anholt 
> Cc: Rob Herring 
> Cc: Christian König 
> Cc: Daniel Vetter 
> Cc: Alex Deucher 
> Cc: Sam Ravnborg 
> Cc: Rob Clark 
> Signed-off-by: Andreas Baierl 
> Signed-off-by: Erico Nunes 
> Signed-off-by: Heiko Stuebner 
> Signed-off-by: Marek Vasut 
> Signed-off-by: Neil Armstrong 
> Signed-off-by: Simon Shields 
> Signed-off-by: Vasily Khoruzhick 
> Signed-off-by: Qiang Yu 
> ---
>  drivers/gpu/drm/Kconfig   |   2 +
>  drivers/gpu/drm/Makefile  |   1 +
>  drivers/gpu/drm/lima/Kconfig  |  10 +
>  drivers/gpu/drm/lima/Makefile |  21 ++
>  drivers/gpu/drm/lima/lima_bcast.c |  47 +++
>  drivers/gpu/drm/lima/lima_bcast.h |  14 +
>  drivers/gpu/drm/lima/lima_ctx.c   |  97 ++
>  drivers/gpu/drm/lima/lima_ctx.h   |  30 ++
>  drivers/gpu/drm/lima/lima_device.c| 385 +++
>  drivers/gpu/drm/lima/lima_device.h| 131 
>  drivers/gpu/drm/lima/lima_dlbu.c  |  58 
>  drivers/gpu/drm/lima/lima_dlbu.h  |  18 ++
>  drivers/gpu/drm/lima/lima_drv.c   | 376 +++
>  drivers/gpu/drm/lima/lima_drv.h   |  45 +++
>  drivers/gpu/drm/lima/lima_gem.c   | 381 +++
>  drivers/gpu/drm/lima/lima_gem.h   |  25 ++
>  drivers/gpu/drm/lima/lima_gem_prime.c |  47 +++
>  drivers/gpu/drm/lima/lima_gem_prime.h |  13 +
>  drivers/gpu/drm/lima/lima_gp.c| 283 +
>  drivers/gpu/drm/lima/lima_gp.h|  16 +
>  drivers/gpu/drm/lima/lima_l2_cache.c  |  80 +
>  drivers/gpu/drm/lima/lima_l2_cache.h  |  14 +
>  drivers/gpu/drm/lima/lima_mmu.c   | 142 +
>  drivers/gpu/drm/lima/lima_mmu.h   |  16 +
>  drivers/gpu/drm/lima/lima_object.c| 122 
>  drivers/gpu/drm/lima/lima_object.h|  36 +++
>  drivers/gpu/drm/lima/lima_pmu.c   |  60 
>  drivers/gpu/drm/lima/lima_pmu.h   |  12 +
>  drivers/gpu/drm/lima/lima_pp.c| 427 ++
>  drivers/gpu/drm/lima/lima_pp.h|  19 ++
>  drivers/gpu/drm/lima/lima_regs.h  | 298 ++
> 

Re: [PATCH v7 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-03-06 Thread Eric Anholt
Qiang Yu  writes:

> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>   OpenGL vertex shader processing and PP is for fragment shader
>   processing. Each processor has its own MMU so prcessors work in
>   virtual address space.
> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>   for mali 450) in the same mali 4xx GPU. All PPs are grouped
>   togather to handle a single fragment shader task divided by
>   FB output tiled pixels. Mali 400 user space driver is
>   responsible for assign target tiled pixels to each PP, but mali
>   450 has a HW module called DLBU to dynamically balance each
>   PP's load.
> - User space driver allocate buffer object and map into GPU
>   virtual address space, upload command stream and draw data with
>   CPU mmap of the buffer object, then submit task to GP/PP with
>   a register frame indicating where is the command stream and misc
>   settings.
> - There's no command stream validation/relocation due to each user
>   process has its own GPU virtual address space. GP/PP's MMU switch
>   virtual address space before running two tasks from different
>   user process. Error or evil user space code just get MMU fault
>   or GP/PP error IRQ, then the HW/SW will be recovered.
> - Use GEM+shmem for MM. Currently just alloc and pin memory when
>   gem object creation. GPU vm map of the buffer is also done in
>   the alloc stage in kernel space. We may delay the memory
>   allocation and real GPU vm map to command submission stage in the
>   furture as improvement.
> - Use drm_sched for GPU task schedule. Each OpenGL context should
>   have a lima context object in the kernel to distinguish tasks
>   from different user. drm_sched gets task from each lima context
>   in a fair way.
>
> mesa driver can be found here before upstreamed:
> https://gitlab.freedesktop.org/lima/mesa
>
> v7:
> - remove lima_fence_ops with default value
> - move fence slab create to device probe
> - check pad ioctl args to be zero
> - add comments for user/kernel interface

Thanks for adding the comments!  That helps a lot.  I feel pretty good
about the ABI at this point.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel