Re: [PATCH 5/8] drm/imx: Introduce i.MX8qxp DPU DRM

2020-11-23 Thread Laurentiu Palcu
Hi Liu Ying,

On Mon, Nov 23, 2020 at 10:45:38AM +0800, Liu Ying wrote:
> Hi Laurentiu,
> 
> On Fri, 2020-11-20 at 16:38 +0200, Laurentiu Palcu wrote:
> > Hi Liu Ying,
> > 
> > I gave this a first look but, since this is a huge piece of code and I'm not
> > very familiar with DPU, I'll probably give it another pass next week.
> > 
> > Anyway, some comments/questions inline.
> > 
> > On Thu, Nov 19, 2020 at 11:22:22AM +0200, Liu Ying wrote:
> > > This patch introduces i.MX8qxp Display Processing Unit(DPU) DRM support.
> > > 
> > > DPU is comprised of two main components that include a blit engine for
> > > 2D graphics accelerations(with composition support) and a display 
> > > controller
> > > for display output processing, as well as a command sequencer.  Outside of
> > > DPU, optional prefetch engines, a.k.a, Prefetch Resolve Gasket(PRG) and
> > > Display Prefetch Resolve(DPR), can fetch data from memory prior to some 
> > > DPU
> > > fetchunits of blit engine and display controller.  The prefetch engines
> > > support reading linear formats and resolving Vivante GPU tile formats.
> > > 
> > > This patch adds kernel modesetting support for the display controller 
> > > part.
> > > The driver supports two CRTCs per display controller, planes backed by
> > > four fetchunits(decode0/1, fetchlayer, fetchwarp), fetchunit allocation
> > > logic for the two CRTCs, prefetch engines(with tile resolving supported),
> > > plane upscaling/deinterlacing/yuv2rgb CSC/alpha blending and CRTC gamma
> > > correction.  The registers of the controller is accessed without command
> > > sequencer involved, instead just by using CPU.
> > 
> > Will you also add support for command sequencer in the future? The command
> > sequencer seems to have the same purpose as the DCSS context loader on 
> > iMX8MQ,
> > which is very useful for configuring the various DC blocks right in the
> > blanking period without having to worry about synchronization. Why not use 
> > it
> > from the very beginning?
> 
> The single command sequencer(cmdseq) per DPU supports
> NOP/CALL/RET/WRITE/COPY/CCOPY/SYNC instructions.
> 
> It is designed to autonomously process command lists.
> 
> Two reasons for not using cmdseq for display controller(kms):
> 
> 1) performance consideration:
> 
> The SYNC instruction stops processing the subsequent command list until
> one specific hardware event of the DPU is triggered, like shadow loaded
> event, ComCtrl_SW0(cmdseq sequence complete event), etc. To use the
> autonomous command list processing, we would queue command blocks to a
> chain. The command blocks are usually comprised of some initial WRITE
> instructions and a final SYNC instruction(to make sure the WRITE
> instructions take effect).  The command blocks could be for blit engine
> or display controller.  As blit engine and display controller
> operations are naturally async, those SYNC instructions would impact
> the blit and display performance.  Even the two display pipelines(two
> CRTCs) of DPU themselves could impact each other's performance, because
> there could be parallel atomic commits for the two pipelines. 
> 
> 2) no cmdseq read instructions: 
> 
> The KMS driver for the display controller needs to read DPU registers
> to report CRTC scannout position, vlbank count and CRC data(DPU
> Signature units). Also, it needs to read DPU registers to sync some
> events(like FrameGen counter moving). Inserting CPU read operations
> into command sequence would pollute the pure cmdseq programming model.
> 
> 
> In general, cmdseq is probably better to be reserved for blit engine,
> since I assume blit engine, as a 2D engine, usually doesn't need
> reading registers.  Also, blit performance would be good.

Thanks for clarifying this.

> 
> > 
> > > Reference manual can be found at:
> > > https://www.nxp.com/webapp/Download?colCode=IMX8DQXPRM
> > > 
> > > Signed-off-by: Liu Ying 
> > > ---
> > 
> > [...]
> > 
> > > diff --git a/drivers/gpu/drm/imx/dpu/dpu-core.c 
> > > b/drivers/gpu/drm/imx/dpu/dpu-core.c
> > > new file mode 100644
> > > index ..1583c7a
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/imx/dpu/dpu-core.c
> > > @@ -0,0 +1,880 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +
> > > +/*
> > > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > > + * Copyright 2017-2020 NXP
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "dpu.h"
> > > +#include "dpu-prv.h"
> > > +
> > > +static inline u32 dpu_comctrl_read(struct dpu_soc *dpu, unsigned int 
> > > offset)
> > > +{
> > > + return readl(dpu->comctrl_reg + offset);
> > > +}
> > > +
> > > +static inline void dpu_comctrl_write(struct dpu_soc *dpu,
> > > +  unsigned int offset, u32 value)
> > > +{
> > > + writel(value, dpu->comctrl_reg + offset);
> > > +}
> > > +
> > > +/* 

Re: [PATCH 5/8] drm/imx: Introduce i.MX8qxp DPU DRM

2020-11-22 Thread Liu Ying
Hi Laurentiu,

On Fri, 2020-11-20 at 16:38 +0200, Laurentiu Palcu wrote:
> Hi Liu Ying,
> 
> I gave this a first look but, since this is a huge piece of code and I'm not
> very familiar with DPU, I'll probably give it another pass next week.
> 
> Anyway, some comments/questions inline.
> 
> On Thu, Nov 19, 2020 at 11:22:22AM +0200, Liu Ying wrote:
> > This patch introduces i.MX8qxp Display Processing Unit(DPU) DRM support.
> > 
> > DPU is comprised of two main components that include a blit engine for
> > 2D graphics accelerations(with composition support) and a display controller
> > for display output processing, as well as a command sequencer.  Outside of
> > DPU, optional prefetch engines, a.k.a, Prefetch Resolve Gasket(PRG) and
> > Display Prefetch Resolve(DPR), can fetch data from memory prior to some DPU
> > fetchunits of blit engine and display controller.  The prefetch engines
> > support reading linear formats and resolving Vivante GPU tile formats.
> > 
> > This patch adds kernel modesetting support for the display controller part.
> > The driver supports two CRTCs per display controller, planes backed by
> > four fetchunits(decode0/1, fetchlayer, fetchwarp), fetchunit allocation
> > logic for the two CRTCs, prefetch engines(with tile resolving supported),
> > plane upscaling/deinterlacing/yuv2rgb CSC/alpha blending and CRTC gamma
> > correction.  The registers of the controller is accessed without command
> > sequencer involved, instead just by using CPU.
> 
> Will you also add support for command sequencer in the future? The command
> sequencer seems to have the same purpose as the DCSS context loader on iMX8MQ,
> which is very useful for configuring the various DC blocks right in the
> blanking period without having to worry about synchronization. Why not use it
> from the very beginning?

The single command sequencer(cmdseq) per DPU supports
NOP/CALL/RET/WRITE/COPY/CCOPY/SYNC instructions.

It is designed to autonomously process command lists.

Two reasons for not using cmdseq for display controller(kms):

1) performance consideration:

The SYNC instruction stops processing the subsequent command list until
one specific hardware event of the DPU is triggered, like shadow loaded
event, ComCtrl_SW0(cmdseq sequence complete event), etc. To use the
autonomous command list processing, we would queue command blocks to a
chain. The command blocks are usually comprised of some initial WRITE
instructions and a final SYNC instruction(to make sure the WRITE
instructions take effect).  The command blocks could be for blit engine
or display controller.  As blit engine and display controller
operations are naturally async, those SYNC instructions would impact
the blit and display performance.  Even the two display pipelines(two
CRTCs) of DPU themselves could impact each other's performance, because
there could be parallel atomic commits for the two pipelines. 

2) no cmdseq read instructions: 

The KMS driver for the display controller needs to read DPU registers
to report CRTC scannout position, vlbank count and CRC data(DPU
Signature units). Also, it needs to read DPU registers to sync some
events(like FrameGen counter moving). Inserting CPU read operations
into command sequence would pollute the pure cmdseq programming model.


In general, cmdseq is probably better to be reserved for blit engine,
since I assume blit engine, as a 2D engine, usually doesn't need
reading registers.  Also, blit performance would be good.

> 
> > Reference manual can be found at:
> > https://www.nxp.com/webapp/Download?colCode=IMX8DQXPRM
> > 
> > Signed-off-by: Liu Ying 
> > ---
> 
> [...]
> 
> > diff --git a/drivers/gpu/drm/imx/dpu/dpu-core.c 
> > b/drivers/gpu/drm/imx/dpu/dpu-core.c
> > new file mode 100644
> > index ..1583c7a
> > --- /dev/null
> > +++ b/drivers/gpu/drm/imx/dpu/dpu-core.c
> > @@ -0,0 +1,880 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +
> > +/*
> > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > + * Copyright 2017-2020 NXP
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "dpu.h"
> > +#include "dpu-prv.h"
> > +
> > +static inline u32 dpu_comctrl_read(struct dpu_soc *dpu, unsigned int 
> > offset)
> > +{
> > +   return readl(dpu->comctrl_reg + offset);
> > +}
> > +
> > +static inline void dpu_comctrl_write(struct dpu_soc *dpu,
> > +unsigned int offset, u32 value)
> > +{
> > +   writel(value, dpu->comctrl_reg + offset);
> > +}
> > +
> > +/* Constant Frame */
> > +static const unsigned int cf_ids[] = {0, 1, 4, 5};
> > +static const enum dpu_unit_type cf_types[] = {DPU_DISP, DPU_DISP,
> > + DPU_DISP, DPU_DISP};
> > +static const unsigned long cf_ofss[] = {0x4400, 0x5400, 0x4c00, 0x5c00};
> > +static const unsigned long cf_pec_ofss[] = {0x960, 0x9e0, 0x9a0, 0xa20};
> > +

Re: [PATCH 5/8] drm/imx: Introduce i.MX8qxp DPU DRM

2020-11-20 Thread Laurentiu Palcu
Hi Liu Ying,

I gave this a first look but, since this is a huge piece of code and I'm not
very familiar with DPU, I'll probably give it another pass next week.

Anyway, some comments/questions inline.

On Thu, Nov 19, 2020 at 11:22:22AM +0200, Liu Ying wrote:
> This patch introduces i.MX8qxp Display Processing Unit(DPU) DRM support.
> 
> DPU is comprised of two main components that include a blit engine for
> 2D graphics accelerations(with composition support) and a display controller
> for display output processing, as well as a command sequencer.  Outside of
> DPU, optional prefetch engines, a.k.a, Prefetch Resolve Gasket(PRG) and
> Display Prefetch Resolve(DPR), can fetch data from memory prior to some DPU
> fetchunits of blit engine and display controller.  The prefetch engines
> support reading linear formats and resolving Vivante GPU tile formats.
> 
> This patch adds kernel modesetting support for the display controller part.
> The driver supports two CRTCs per display controller, planes backed by
> four fetchunits(decode0/1, fetchlayer, fetchwarp), fetchunit allocation
> logic for the two CRTCs, prefetch engines(with tile resolving supported),
> plane upscaling/deinterlacing/yuv2rgb CSC/alpha blending and CRTC gamma
> correction.  The registers of the controller is accessed without command
> sequencer involved, instead just by using CPU.

Will you also add support for command sequencer in the future? The command
sequencer seems to have the same purpose as the DCSS context loader on iMX8MQ,
which is very useful for configuring the various DC blocks right in the
blanking period without having to worry about synchronization. Why not use it
from the very beginning?

> 
> Reference manual can be found at:
> https://www.nxp.com/webapp/Download?colCode=IMX8DQXPRM
> 
> Signed-off-by: Liu Ying 
> ---

[...]

> diff --git a/drivers/gpu/drm/imx/dpu/dpu-core.c 
> b/drivers/gpu/drm/imx/dpu/dpu-core.c
> new file mode 100644
> index ..1583c7a
> --- /dev/null
> +++ b/drivers/gpu/drm/imx/dpu/dpu-core.c
> @@ -0,0 +1,880 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +/*
> + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> + * Copyright 2017-2020 NXP
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dpu.h"
> +#include "dpu-prv.h"
> +
> +static inline u32 dpu_comctrl_read(struct dpu_soc *dpu, unsigned int offset)
> +{
> + return readl(dpu->comctrl_reg + offset);
> +}
> +
> +static inline void dpu_comctrl_write(struct dpu_soc *dpu,
> +  unsigned int offset, u32 value)
> +{
> + writel(value, dpu->comctrl_reg + offset);
> +}
> +
> +/* Constant Frame */
> +static const unsigned int cf_ids[] = {0, 1, 4, 5};
> +static const enum dpu_unit_type cf_types[] = {DPU_DISP, DPU_DISP,
> +   DPU_DISP, DPU_DISP};
> +static const unsigned long cf_ofss[] = {0x4400, 0x5400, 0x4c00, 0x5c00};
> +static const unsigned long cf_pec_ofss[] = {0x960, 0x9e0, 0x9a0, 0xa20};
> +
> +/* Display Engine Configuration */
> +static const unsigned int dec_ids[] = {0, 1};
> +static const enum dpu_unit_type dec_types[] = {DPU_DISP, DPU_DISP};
> +static const unsigned long dec_ofss[] = {0xb400, 0xb420};
> +
> +/* External Destination */
> +static const unsigned int ed_ids[] = {0, 1, 4, 5};
> +static const enum dpu_unit_type ed_types[] = {DPU_DISP, DPU_DISP,
> +   DPU_DISP, DPU_DISP};
> +static const unsigned long ed_ofss[] = {0x4800, 0x5800, 0x5000, 0x6000};
> +static const unsigned long ed_pec_ofss[] = {0x980, 0xa00, 0x9c0, 0xa40};
> +
> +/* Fetch Decode */
> +static const unsigned int fd_ids[] = {0, 1, 9};
> +static const enum dpu_unit_type fd_types[] = {DPU_DISP, DPU_DISP, DPU_BLIT};
> +static const unsigned long fd_ofss[] = {0x6c00, 0x7800, 0x1000};
> +static const unsigned long fd_pec_ofss[] = {0xa80, 0xaa0, 0x820};
> +
> +/* Fetch ECO */
> +static const unsigned int fe_ids[] = {0, 1, 2, 9};
> +static const enum dpu_unit_type fe_types[] = {DPU_DISP, DPU_DISP,
> +   DPU_DISP, DPU_BLIT};
> +static const unsigned long fe_ofss[] = {0x7400, 0x8000, 0x6800, 0x1c00};
> +static const unsigned long fe_pec_ofss[] = {0xa90, 0xab0, 0xa70, 0x850};
> +
> +/* Frame Generator */
> +static const unsigned int fg_ids[] = {0, 1};
> +static const enum dpu_unit_type fg_types[] = {DPU_DISP, DPU_DISP};
> +static const unsigned long fg_ofss[] = {0xb800, 0xd400};
> +
> +/* Fetch Layer */
> +static const unsigned int fl_ids[] = {0};
> +static const enum dpu_unit_type fl_types[] = {DPU_DISP};
> +static const unsigned long fl_ofss[] = {0x8400};
> +static const unsigned long fl_pec_ofss[] = {0xac0};
> +
> +/* Fetch Warp */
> +static const unsigned int fw_ids[] = {2, 9};
> +static const enum dpu_unit_type fw_types[] = {DPU_DISP, DPU_BLIT};
> +static const unsigned long fw_ofss[] = {0x6400,