Re: [PATCH v3 1/4] drm/i915/display: add support for DMC wakelocks

2024-03-28 Thread Luca Coelho
On Thu, 2024-03-21 at 07:43 +, Shankar, Uma wrote:
> 
> > -Original Message-
> > From: Coelho, Luciano 
> > Sent: Monday, March 18, 2024 7:08 PM
> > To: intel-gfx@lists.freedesktop.org
> > Cc: intel...@lists.freedesktop.org; Shankar, Uma ;
> > ville.syrj...@linux.intel.com; Nikula, Jani 
> > Subject: [PATCH v3 1/4] drm/i915/display: add support for DMC wakelocks
> > 
> > In order to reduce the DC5->DC2 restore time, wakelocks have been introduced
> > in DMC so the driver can tell it when registers and other memory areas are 
> > going
> > to be accessed and keep their respective blocks awake.
> > 
> > Implement this in the driver by adding the concept of DMC wakelocks.
> > When the driver needs to access memory which lies inside pre-defined 
> > ranges, it
> > will tell DMC to set the wakelock, access the memory, then wait for a while 
> > and
> > clear the wakelock.
> > 
> > The wakelock state is protected in the driver with spinlocks to prevent
> > concurrency issues.
> > 
> > BSpec: 71583
> 
> These are internal links, not sure how useful they will be for upstream 
> discussions.
> Give the relevant details here which is better I believe instead of an 
> internal link.

As we discussed offline, this seems to be common practice, and Gustavo
actually asked me to add it in a previous review, so we decided to keep
it.


> > Signed-off-by: Luca Coelho 
> > ---
> >  drivers/gpu/drm/i915/Makefile |   1 +
> >  drivers/gpu/drm/i915/display/intel_de.h   |  97 +++-
> >  .../gpu/drm/i915/display/intel_display_core.h |   2 +
> >  .../drm/i915/display/intel_display_driver.c   |   1 +
> >  drivers/gpu/drm/i915/display/intel_dmc_regs.h |   6 +
> >  drivers/gpu/drm/i915/display/intel_dmc_wl.c   | 226 ++
> >  drivers/gpu/drm/i915/display/intel_dmc_wl.h   |  30 +++
> >  drivers/gpu/drm/xe/Makefile   |   1 +
> >  8 files changed, 356 insertions(+), 8 deletions(-)  create mode 100644
> > drivers/gpu/drm/i915/display/intel_dmc_wl.c
> >  create mode 100644 drivers/gpu/drm/i915/display/intel_dmc_wl.h
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 3ef6ed41e62b..af83ea94c771 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -270,6 +270,7 @@ i915-y += \
> > display/intel_display_rps.o \
> > display/intel_display_wa.o \
> > display/intel_dmc.o \
> > +   display/intel_dmc_wl.o \
> > display/intel_dpio_phy.o \
> > display/intel_dpll.o \
> > display/intel_dpll_mgr.o \
> > diff --git a/drivers/gpu/drm/i915/display/intel_de.h
> > b/drivers/gpu/drm/i915/display/intel_de.h
> > index 42552d8c151e..6728a0077793 100644
> > --- a/drivers/gpu/drm/i915/display/intel_de.h
> > +++ b/drivers/gpu/drm/i915/display/intel_de.h
> > @@ -13,52 +13,125 @@
> >  static inline u32
> >  intel_de_read(struct drm_i915_private *i915, i915_reg_t reg)  {
> > -   return intel_uncore_read(&i915->uncore, reg);
> > +   u32 val;
> > +
> > +   intel_dmc_wl_get(i915, reg);
> 
> I think one fundamental issue in taking the lock at the granularity of
> every MMIO, we risk adding latency here. We should profile the time
> it adds.

This is a good point! Again, as we concluded in our offline discussion,
the problem here is that we are calling __intel_de_rmw_nowl() whenever
the refcount is zero, so that cause one extra register read (at least).
We should only do this when the wakelock is not taken anymore, i.e.
when the work has already run and released it.

The simplest way to solve this now, is to add a flag that tells us
whether the wakelock is taken or not, regardless of the refcount, so we
don't try to take it again for every MMIO.

In the future, if all the checks we do (without accessing hardware
registers) is still too time-consuming, we can create more helper
functions that bypass the wakelock, like the existing _nowl() versions.


> For modeset for ex, we will end up programming multiple MMIO's from 1 place
> But every MMIO call will take the wakelock. This is overkill, instead we can 
> just take
> the lock once, do our stuff and then release it. 
> 
> > +
> > +   val = intel_uncore_read(&i915->uncore, reg);
> > +
> > +   intel_dmc_wl_put(i915, reg);
> > +
> > +   return val;
> >  }
> > 
> >  static inline u8
> >  intel_de_read8(struct drm_i915_private *i915, i915_reg_t reg)  {
> > -   return intel_uncore_read8(&i915->uncore, reg);
> > +   u8 val;
> > +
> > +   intel_dmc_wl_get(i915, reg);
> > +
> > +   val = intel_uncore_read8(&i915->uncore, reg);
> 
> Same here and all similar functions.
> 
> > +   intel_dmc_wl_put(i915, reg);
> > +
> > +   return val;
> >  }
> > 
> >  static inline u64
> >  intel_de_read64_2x32(struct drm_i915_private *i915,
> >  i915_reg_t lower_reg, i915_reg_t upper_reg)  {
> > -   return intel_uncore_read64_2x32(&i915->uncore, lower_reg,
> > upper_reg);
> > +   u64 val;
> > +
> > +   intel_dmc_wl_get(i915, lower_reg);
> > +   intel_dmc_wl_get(i915

RE: [PATCH v3 1/4] drm/i915/display: add support for DMC wakelocks

2024-03-21 Thread Shankar, Uma



> -Original Message-
> From: Coelho, Luciano 
> Sent: Monday, March 18, 2024 7:08 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: intel...@lists.freedesktop.org; Shankar, Uma ;
> ville.syrj...@linux.intel.com; Nikula, Jani 
> Subject: [PATCH v3 1/4] drm/i915/display: add support for DMC wakelocks
> 
> In order to reduce the DC5->DC2 restore time, wakelocks have been introduced
> in DMC so the driver can tell it when registers and other memory areas are 
> going
> to be accessed and keep their respective blocks awake.
> 
> Implement this in the driver by adding the concept of DMC wakelocks.
> When the driver needs to access memory which lies inside pre-defined ranges, 
> it
> will tell DMC to set the wakelock, access the memory, then wait for a while 
> and
> clear the wakelock.
> 
> The wakelock state is protected in the driver with spinlocks to prevent
> concurrency issues.
> 
> BSpec: 71583

These are internal links, not sure how useful they will be for upstream 
discussions.
Give the relevant details here which is better I believe instead of an internal 
link.

> Signed-off-by: Luca Coelho 
> ---
>  drivers/gpu/drm/i915/Makefile |   1 +
>  drivers/gpu/drm/i915/display/intel_de.h   |  97 +++-
>  .../gpu/drm/i915/display/intel_display_core.h |   2 +
>  .../drm/i915/display/intel_display_driver.c   |   1 +
>  drivers/gpu/drm/i915/display/intel_dmc_regs.h |   6 +
>  drivers/gpu/drm/i915/display/intel_dmc_wl.c   | 226 ++
>  drivers/gpu/drm/i915/display/intel_dmc_wl.h   |  30 +++
>  drivers/gpu/drm/xe/Makefile   |   1 +
>  8 files changed, 356 insertions(+), 8 deletions(-)  create mode 100644
> drivers/gpu/drm/i915/display/intel_dmc_wl.c
>  create mode 100644 drivers/gpu/drm/i915/display/intel_dmc_wl.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 3ef6ed41e62b..af83ea94c771 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -270,6 +270,7 @@ i915-y += \
>   display/intel_display_rps.o \
>   display/intel_display_wa.o \
>   display/intel_dmc.o \
> + display/intel_dmc_wl.o \
>   display/intel_dpio_phy.o \
>   display/intel_dpll.o \
>   display/intel_dpll_mgr.o \
> diff --git a/drivers/gpu/drm/i915/display/intel_de.h
> b/drivers/gpu/drm/i915/display/intel_de.h
> index 42552d8c151e..6728a0077793 100644
> --- a/drivers/gpu/drm/i915/display/intel_de.h
> +++ b/drivers/gpu/drm/i915/display/intel_de.h
> @@ -13,52 +13,125 @@
>  static inline u32
>  intel_de_read(struct drm_i915_private *i915, i915_reg_t reg)  {
> - return intel_uncore_read(&i915->uncore, reg);
> + u32 val;
> +
> + intel_dmc_wl_get(i915, reg);

I think one fundamental issue in taking the lock at the granularity of
every MMIO, we risk adding latency here. We should profile the time
it adds.

For modeset for ex, we will end up programming multiple MMIO's from 1 place
But every MMIO call will take the wakelock. This is overkill, instead we can 
just take
the lock once, do our stuff and then release it. 

> +
> + val = intel_uncore_read(&i915->uncore, reg);
> +
> + intel_dmc_wl_put(i915, reg);
> +
> + return val;
>  }
> 
>  static inline u8
>  intel_de_read8(struct drm_i915_private *i915, i915_reg_t reg)  {
> - return intel_uncore_read8(&i915->uncore, reg);
> + u8 val;
> +
> + intel_dmc_wl_get(i915, reg);
> +
> + val = intel_uncore_read8(&i915->uncore, reg);

Same here and all similar functions.

> + intel_dmc_wl_put(i915, reg);
> +
> + return val;
>  }
> 
>  static inline u64
>  intel_de_read64_2x32(struct drm_i915_private *i915,
>i915_reg_t lower_reg, i915_reg_t upper_reg)  {
> - return intel_uncore_read64_2x32(&i915->uncore, lower_reg,
> upper_reg);
> + u64 val;
> +
> + intel_dmc_wl_get(i915, lower_reg);
> + intel_dmc_wl_get(i915, upper_reg);

I guess if the register falls in the range, taking just 1 wakelock should be 
enough.

> +
> + val = intel_uncore_read64_2x32(&i915->uncore, lower_reg, upper_reg);
> +
> + intel_dmc_wl_put(i915, upper_reg);
> + intel_dmc_wl_put(i915, lower_reg);
> +
> + return val;
>  }
> 
>  static inline void
>  intel_de_posting_read(struct drm_i915_private *i915, i915_reg_t reg)  {
> + intel_dmc_wl_get(i915, reg);
> +
>   intel_uncore_posting_read(&i915->uncore, reg);
> +
> + intel_dmc_wl_put(i915, reg);
>  }
> 
>  static inline void
>  intel_de_write(struct drm_i915_private *i915, i915_reg_t reg, u32 val)  {
> + intel_dmc_wl_get(i915, reg);
> +
>   intel_uncore_write(&i915->uncore, reg, val);
> +
> + intel_dmc_wl_put(i915, reg);
>  }
> 
>  static inline u32
> -intel_de_rmw(struct drm_i915_private *i915, i915_reg_t reg, u32 clear, u32 
> set)
> +__intel_de_rmw_nowl(struct drm_i915_private *i915, i915_reg_t reg,
> + u32 clear, u32 set)
>  {
>   return intel_uncore_rmw(&i915->uncore, reg, clear, set);  }
> 
> +static inl