from:"Rob Clark"

[pull] drm/msm: drm-msm-next-2024-04-11 for v6.9-rc4

2024-04-11 Thread Rob Clark

Hi Dave,

Fixes for v6.9, description below

The following changes since commit 4be445f5b6b6810baf397b2d159bd07c3573fd75:

  drm/msm/dpu: capture snapshot on the first commit_done timeout
(2024-03-04 11:44:03 +0200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-next-2024-04-11

for you to fetch changes up to 9dc23cba0927d09cb481da064c8413eb9df42e2b:

  drm/msm/adreno: Set highest_bank_bit for A619 (2024-04-05 11:24:53 -0700)


Fixes for v6.9

Display:
- Fixes for PM refcount leak when DP goes to disconnected state and
  also when link training fails. This is also one of the issues found
  with the pm runtime series
- Add missing newlines to prints in msm_fb and msm_kms
- Change permissions of some dpu debugfs entries which write to const
  data from catalog to read-only to avoid protection faults
- Fix the interface table for the catalog of X1E80100. This is an
  important fix to bringup DP for X1E80100.
- Logging fix to print the callback symbol in the invalid IRQ message
  case rather than printing when its known to be NULL.
- Bindings fix to add DP node as child of mdss for mdss node
- Minor typo fix in DP driver API which handles port status change

GPU:
- fix CHRASHDUMP_READ()
- fix HHB (highest bank bit) for a619 to fix UBWC corruption


Abhinav Kumar (1):
  drm/msm/dp: fix typo in dp_display_handle_port_status_changed()

Dmitry Baryshkov (3):
  drm/msm/dpu: don't allow overriding data from catalog
  drm/msm/dpu: make error messages at
dpu_core_irq_register_callback() more sensible
  dt-bindings: display/msm: sm8150-mdss: add DP node

Johan Hovold (2):
  drm/msm/dp: fix runtime PM leak on disconnect
  drm/msm/dp: fix runtime PM leak on connect failure

Kuogee Hsieh (1):
  drm/msm/dp: assign correct DP controller ID to x1e80100 interface table

Luca Weiss (1):
  drm/msm/adreno: Set highest_bank_bit for A619

Miguel Ojeda (1):
  drm/msm: fix the `CRASHDUMP_READ` target of `a6xx_get_shader_block()`

Stephen Boyd (1):
  drm/msm: Add newlines to some debug prints

 .../bindings/display/msm/qcom,sm8150-mdss.yaml |  9 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  4 +++
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c|  2 +-
 .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 34 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c  | 10 +++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c  |  8 ++---
 drivers/gpu/drm/msm/dp/dp_display.c|  6 ++--
 drivers/gpu/drm/msm/msm_fb.c   |  6 ++--
 drivers/gpu/drm/msm/msm_kms.c  |  4 +--
 9 files changed, 63 insertions(+), 20 deletions(-)

Re: [PATCH 3/6] drm/msm/adreno: Allow specifying default speedbin value

2024-04-09 Thread Rob Clark

On Tue, Apr 9, 2024 at 8:23 AM Dmitry Baryshkov
 wrote:
>
> On Tue, Apr 09, 2024 at 05:12:46PM +0200, Konrad Dybcio wrote:
> >
> >
> > On 4/6/24 04:56, Dmitry Baryshkov wrote:
> > > On Fri, Apr 05, 2024 at 10:41:31AM +0200, Konrad Dybcio wrote:
> > > > From: Neil Armstrong 
> > > >
> > > > Usually, speedbin 0 is the "super SKU", a.k.a the one which can clock
> > > > the highest. Falling back to it when things go wrong is largely
> > > > suboptimal, as more often than not, the top frequencies are not
> > > > supposed to work on other bins.
> > >
> > > Isn't it better to just return an error here instead of trying to guess
> > > which speedbin to use?
> >
> > Not sure. I'd rather better compatibility for e.g. booting up a new
> > laptop with just dt.
>
> New speedbin can have lower max speed, so by attempting to run it at
> higher freq you might be breaking it.

Usually there are some OPPs in common to all speedbins, so picking a
freq from that set would seem like the safe thing to do

BR,
-R

>
> >
> > >
> > > If that's not the case, I think the commit should be expanded with
> > > actually setting default_speedbin for the existing GPUs.
> >
> > I think that should be addressed, although separately.
>
> I'd prefer to have it as a part of this patch, but I'd not NAK it just
> for this reason.
>
> --
> With best wishes
> Dmitry

Re: [PATCH] drm/sysfs: Add drm class-wide attribute to get active device clients

2024-04-05 Thread Rob Clark

On Wed, Apr 3, 2024 at 11:37 AM Adrián Larumbe
 wrote:
>
> Up to this day, all fdinfo-based GPU profilers must traverse the entire
> /proc directory structure to find open DRM clients with fdinfo file
> descriptors. This is inefficient and time-consuming.
>
> This patch adds a new device class attribute that will install a sysfs file
> per DRM device, which can be queried by profilers to get a list of PIDs for
> their open clients. This file isn't human-readable, and it's meant to be
> queried only by GPU profilers like gputop and nvtop.
>
> Cc: Boris Brezillon 
> Cc: Tvrtko Ursulin 
> Cc: Christopher Healy 
> Signed-off-by: Adrián Larumbe 

It does seem like a good idea.. idk if there is some precedent to
prefer binary vs ascii in sysfs, but having a way to avoid walking
_all_ processes is a good idea.

BR,
-R

> ---
>  drivers/gpu/drm/drm_internal.h   |  2 +-
>  drivers/gpu/drm/drm_privacy_screen.c |  2 +-
>  drivers/gpu/drm/drm_sysfs.c  | 89 ++--
>  3 files changed, 74 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> index 2215baef9a3e..9a399b03d11c 100644
> --- a/drivers/gpu/drm/drm_internal.h
> +++ b/drivers/gpu/drm/drm_internal.h
> @@ -145,7 +145,7 @@ bool drm_master_internal_acquire(struct drm_device *dev);
>  void drm_master_internal_release(struct drm_device *dev);
>
>  /* drm_sysfs.c */
> -extern struct class *drm_class;
> +extern struct class drm_class;
>
>  int drm_sysfs_init(void);
>  void drm_sysfs_destroy(void);
> diff --git a/drivers/gpu/drm/drm_privacy_screen.c 
> b/drivers/gpu/drm/drm_privacy_screen.c
> index 6cc39e30781f..2fbd24ba5818 100644
> --- a/drivers/gpu/drm/drm_privacy_screen.c
> +++ b/drivers/gpu/drm/drm_privacy_screen.c
> @@ -401,7 +401,7 @@ struct drm_privacy_screen *drm_privacy_screen_register(
> mutex_init(>lock);
> BLOCKING_INIT_NOTIFIER_HEAD(>notifier_head);
>
> -   priv->dev.class = drm_class;
> +   priv->dev.class = _class;
> priv->dev.type = _privacy_screen_type;
> priv->dev.parent = parent;
> priv->dev.release = drm_privacy_screen_device_release;
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index a953f69a34b6..56ca9e22c720 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -58,8 +58,6 @@ static struct device_type drm_sysfs_device_connector = {
> .name = "drm_connector",
>  };
>
> -struct class *drm_class;
> -
>  #ifdef CONFIG_ACPI
>  static bool drm_connector_acpi_bus_match(struct device *dev)
>  {
> @@ -128,6 +126,62 @@ static const struct component_ops typec_connector_ops = {
>
>  static CLASS_ATTR_STRING(version, S_IRUGO, "drm 1.1.0 20060810");
>
> +static ssize_t clients_show(struct device *cd, struct device_attribute 
> *attr, char *buf)
> +{
> +   struct drm_minor *minor = cd->driver_data;
> +   struct drm_device *ddev = minor->dev;
> +   struct drm_file *priv;
> +   ssize_t offset = 0;
> +   void *pid_buf;
> +
> +   if (minor->type != DRM_MINOR_RENDER)
> +   return 0;
> +
> +   pid_buf = kvmalloc(PAGE_SIZE, GFP_KERNEL);
> +   if (!pid_buf)
> +   return 0;
> +
> +   mutex_lock(>filelist_mutex);
> +   list_for_each_entry_reverse(priv, >filelist, lhead) {
> +   struct pid *pid;
> +
> +   if (drm_WARN_ON(ddev, (PAGE_SIZE - offset) < sizeof(pid_t)))
> +   break;
> +
> +   rcu_read_lock();
> +   pid = rcu_dereference(priv->pid);
> +   (*(pid_t *)(pid_buf + offset)) = pid_vnr(pid);
> +   rcu_read_unlock();
> +
> +   offset += sizeof(pid_t);
> +   }
> +   mutex_unlock(>filelist_mutex);
> +
> +   if (offset < PAGE_SIZE)
> +   (*(pid_t *)(pid_buf + offset)) = 0;
> +
> +   memcpy(buf, pid_buf, offset);
> +
> +   kvfree(pid_buf);
> +
> +   return offset;
> +
> +}
> +static DEVICE_ATTR_RO(clients);
> +
> +static struct attribute *drm_device_attrs[] = {
> +   _attr_clients.attr,
> +   NULL,
> +};
> +ATTRIBUTE_GROUPS(drm_device);
> +
> +struct class drm_class = {
> +   .name   = "drm",
> +   .dev_groups = drm_device_groups,
> +};
> +
> +static bool drm_class_initialised;
> +
>  /**
>   * drm_sysfs_init - initialize sysfs helpers
>   *
> @@ -142,18 +196,19 @@ int drm_sysfs_init(void)
>  {
> int err;
>
> -   drm_class = class_create("drm");
> -   if (IS_ERR(drm_class))
> -   return PTR_ERR(drm_class);
> +   err = class_register(_class);
> +   if (err)
> +   return err;
>
> -   err = class_create_file(drm_class, _attr_version.attr);
> +   err = class_create_file(_class, _attr_version.attr);
> if (err) {
> -   class_destroy(drm_class);
> -   drm_class = NULL;
> +   class_destroy(_class);
> return err;
> }
>
> -

Re: [PATCH] drm/prime: Unbreak virtgpu dma-buf export

2024-03-28 Thread Rob Clark

On Thu, Mar 28, 2024 at 11:54 AM Simon Ser  wrote:
>
> On Thursday, March 28th, 2024 at 19:47, Rob Clark  wrote:
>
> > any chance I could talk you into pushing to drm-misc-fixes?
>
> Oh sorry, I thought you had access… Pushed with a minor edit to remove
> unnecessary parentheses to make checkpatch happy!

Thanks!

BR,
-R

Re: [PATCH] drm/prime: Unbreak virtgpu dma-buf export

2024-03-28 Thread Rob Clark

On Tue, Mar 26, 2024 at 2:15 AM Simon Ser  wrote:
>
> Makes sense to me!
>
> Reviewed-by: Simon Ser 

Thanks.. any chance I could talk you into pushing to drm-misc-fixes?

BR,
-R

Re: [PATCH] drm/prime: Unbreak virtgpu dma-buf export

2024-03-25 Thread Rob Clark

This is actually a bit concerning.. importing a host page backed
buffer without guest mapping into a passthru device probably doesn't
work and should be rejected earlier.

I do think we should relax the restriction (either taking my patch or
reverting the commit it fixes) until we work this out properly
(because the original patch is a regression), but importing a buffer
without guest pages into a passthru device can't possibly work
properly.  Maybe it works by chance if the host buffer is mapped to
the guest, but that is not guaranteed.

BR,
-R

On Mon, Mar 25, 2024 at 3:35 PM Dominik Behr  wrote:
>
> It also fixes importing virtgpu blobs into real hardware, for instance amdgpu 
> for DRI_PRIME rendering.
>
> On Fri, Mar 22, 2024 at 2:48 PM Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> virtgpu "vram" GEM objects do not implement obj->get_sg_table().  But
>> they also don't use drm_gem_map_dma_buf().  In fact they may not even
>> have guest visible pages.  But it is perfectly fine to export and share
>> with other virtual devices.
>>
>> Reported-by: Dominik Behr 
>> Fixes: 207395da5a97 ("drm/prime: reject DMA-BUF attach when get_sg_table is 
>> missing")
>> Signed-off-by: Rob Clark 
>> ---
>>  drivers/gpu/drm/drm_prime.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 7352bde299d5..64dd6276e828 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -582,7 +582,12 @@ int drm_gem_map_attach(struct dma_buf *dma_buf,
>>  {
>> struct drm_gem_object *obj = dma_buf->priv;
>>
>> -   if (!obj->funcs->get_sg_table)
>> +   /*
>> +* drm_gem_map_dma_buf() requires obj->get_sg_table(), but drivers
>> +* that implement their own ->map_dma_buf() do not.
>> +*/
>> +   if ((dma_buf->ops->map_dma_buf == drm_gem_map_dma_buf) &&
>> +   !obj->funcs->get_sg_table)
>> return -ENOSYS;
>>
>> return drm_gem_pin(obj);
>> --
>> 2.44.0
>>

[PATCH] drm/prime: Unbreak virtgpu dma-buf export

2024-03-22 Thread Rob Clark

From: Rob Clark 

virtgpu "vram" GEM objects do not implement obj->get_sg_table().  But
they also don't use drm_gem_map_dma_buf().  In fact they may not even
have guest visible pages.  But it is perfectly fine to export and share
with other virtual devices.

Reported-by: Dominik Behr 
Fixes: 207395da5a97 ("drm/prime: reject DMA-BUF attach when get_sg_table is 
missing")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/drm_prime.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 7352bde299d5..64dd6276e828 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -582,7 +582,12 @@ int drm_gem_map_attach(struct dma_buf *dma_buf,
 {
struct drm_gem_object *obj = dma_buf->priv;
 
-   if (!obj->funcs->get_sg_table)
+   /*
+* drm_gem_map_dma_buf() requires obj->get_sg_table(), but drivers
+* that implement their own ->map_dma_buf() do not.
+*/
+   if ((dma_buf->ops->map_dma_buf == drm_gem_map_dma_buf) &&
+   !obj->funcs->get_sg_table)
return -ENOSYS;
 
return drm_gem_pin(obj);
-- 
2.44.0

Re: [PATCH v2 1/2] drm/prime: reject DMA-BUF attach when get_sg_table is missing

2024-03-20 Thread Rob Clark

On Thu, Mar 2, 2023 at 6:35 AM Simon Ser  wrote:
>
> drm_gem_map_dma_buf() requires drm_gem_object_funcs.get_sg_table
> to be implemented, or else WARNs.
>
> Allow drivers to leave this hook unimplemented to implement purely
> local DMA-BUFs (ie, DMA-BUFs which cannot be imported anywhere
> else but the device which allocated them). In that case, reject
> imports to other devices in drm_gem_map_attach().
>
> v2: new patch
>
> Signed-off-by: Simon Ser 
> Cc: Daniel Vetter 
> Cc: Thomas Zimmermann 
> Cc: Tian Tao 
> Cc: Maxime Ripard 
> Cc: Christian König 
> Cc: Hans de Goede 
> ---
>  drivers/gpu/drm/drm_prime.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index f924b8b4ab6b..ab1d21d63a03 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -544,7 +544,8 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
> void *data,
>   * Optional pinning of buffers is handled at dma-buf attach and detach time 
> in
>   * drm_gem_map_attach() and drm_gem_map_detach(). Backing storage itself is
>   * handled by drm_gem_map_dma_buf() and drm_gem_unmap_dma_buf(), which 
> relies on
> - * _gem_object_funcs.get_sg_table.
> + * _gem_object_funcs.get_sg_table. If _gem_object_funcs.get_sg_table 
> is
> + * unimplemented, exports into another device are rejected.
>   *
>   * For kernel-internal access there's drm_gem_dmabuf_vmap() and
>   * drm_gem_dmabuf_vunmap(). Userspace mmap support is provided by
> @@ -583,6 +584,9 @@ int drm_gem_map_attach(struct dma_buf *dma_buf,
>  {
> struct drm_gem_object *obj = dma_buf->priv;
>
> +   if (!obj->funcs->get_sg_table)
> +   return -EOPNOTSUPP;

This breaks virtgpu, where buffers may not necessarily have guest
backing pages, but still may be shared with other virtual devices
(because the host side buffer _does_ have backing pages)

BR,
-R

> +
> return drm_gem_pin(obj);
>  }
>  EXPORT_SYMBOL(drm_gem_map_attach);
> --
> 2.39.2
>
>

Re: [PATCH RFC v3 00/12] drm/msm: generate register header files

2024-03-15 Thread Rob Clark

On Fri, Mar 15, 2024 at 4:46 AM Dmitry Baryshkov
 wrote:
>
> Currently display-related register headers are generated from XML files
> shipped withing Mesa source tree. This is not fully optimal: it requires
> multi-stage process of the changes first being landed to Mesa and only
> then synced to the kernel tree.

I think we'd more or less need to continue following this process for
the gpu .xml so that the kernel and mesa are not diverging.  I guess
we could drop the display related .xml from mesa.  (But it would be
nice to have a decoder tool for display devcoredumps, like we do for
gpu..)

BR,
-R

> Move original XML files to the kernel tree and generate header files
> when required.
>
> NOTE: the gen_header.py script is based on the non-merged Mesa MR [1].
> Once that MR lands, I will update the script and commit messages and
> send the next iteration.
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28193
>
> Signed-off-by: Dmitry Baryshkov 
> ---
> Changes in v3:
> - Split XML and git rm patches in hope to pass ML limitations
> - Link to v2: 
> https://lore.kernel.org/r/20240315-fd-xml-shipped-v2-0-7cd68ecc4...@linaro.org
>
> Changes in v2:
> - Removed the _shipped files, always generating the headers (Masahiro
>   Yamada)
> - Replaced headergen2 with gen_headers.py
> - Simplify Makefile rules, making all Adreno objects depend on Adreno
>   headers and all displau objects depend on all display headers
> - Also handle Adreno registers
> - Link to v1: 
> https://lore.kernel.org/r/20240226-fd-xml-shipped-v1-0-86bb6c334...@linaro.org
>
> ---
> Dmitry Baryshkov (12):
>   drm/msm/mdp5: add writeback block bases
>   drm/msm/hdmi: drop qfprom.xml.h
>   drm/msm/dsi: drop mmss_cc.xml.h
>   drm/msm: move msm_gpummu.c to adreno/a2xx_gpummu.c
>   drm/msm: import XML display registers database
>   drm/msm: import A2xx-A4xx XML display registers database
>   drm/msm: import A5xx-A7xx XML display registers database
>   drm/msm: import gen_header.py script from Mesa
>   drm/msm: generate headers on the fly
>   drm/msm: drop display-related headers
>   drm/msm: drop A5xx, A6xx headers
>   drm/msm: drop A2xx-A4xx headers
>
>  drivers/gpu/drm/msm/.gitignore | 6 +
>  drivers/gpu/drm/msm/Makefile   |97 +-
>  drivers/gpu/drm/msm/adreno/a2xx.xml.h  |  3251 -
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c  | 4 +-
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.h  | 4 +
>  .../drm/msm/{msm_gpummu.c => adreno/a2xx_gpummu.c} |45 +-
>  drivers/gpu/drm/msm/adreno/a3xx.xml.h  |  3268 -
>  drivers/gpu/drm/msm/adreno/a4xx.xml.h  |  4379 ---
>  drivers/gpu/drm/msm/adreno/a5xx.xml.h  |  5572 -
>  drivers/gpu/drm/msm/adreno/a6xx.xml.h  | 11858 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h  |   422 -
>  drivers/gpu/drm/msm/adreno/adreno_common.xml.h |   539 -
>  drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h|  2803 -
>  drivers/gpu/drm/msm/disp/mdp4/mdp4.xml.h   |  1181 --
>  drivers/gpu/drm/msm/disp/mdp5/mdp5.xml.h   |  1979 
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h   |11 +
>  drivers/gpu/drm/msm/disp/mdp_common.xml.h  |   111 -
>  drivers/gpu/drm/msm/dsi/dsi.xml.h  |   790 --
>  drivers/gpu/drm/msm/dsi/dsi_phy_10nm.xml.h |   227 -
>  drivers/gpu/drm/msm/dsi/dsi_phy_14nm.xml.h |   309 -
>  drivers/gpu/drm/msm/dsi/dsi_phy_20nm.xml.h |   237 -
>  drivers/gpu/drm/msm/dsi/dsi_phy_28nm.xml.h |   384 -
>  drivers/gpu/drm/msm/dsi/dsi_phy_28nm_8960.xml.h|   286 -
>  drivers/gpu/drm/msm/dsi/dsi_phy_7nm.xml.h  |   483 -
>  drivers/gpu/drm/msm/dsi/mmss_cc.xml.h  |   131 -
>  drivers/gpu/drm/msm/dsi/sfpb.xml.h |70 -
>  drivers/gpu/drm/msm/hdmi/hdmi.xml.h|  1399 ---
>  drivers/gpu/drm/msm/hdmi/qfprom.xml.h  |61 -
>  drivers/gpu/drm/msm/msm_drv.c  | 3 +-
>  drivers/gpu/drm/msm/msm_gpu.c  | 2 +-
>  drivers/gpu/drm/msm/msm_mmu.h  | 5 -
>  drivers/gpu/drm/msm/registers/adreno/a2xx.xml  |  1865 +++
>  drivers/gpu/drm/msm/registers/adreno/a3xx.xml  |  1751 +++
>  drivers/gpu/drm/msm/registers/adreno/a4xx.xml  |  2409 
>  drivers/gpu/drm/msm/registers/adreno/a5xx.xml  |  3039 +
>  drivers/gpu/drm/msm/registers/adreno/a6xx.xml  |  4969 
>  drivers/gpu/drm/msm/registers/adreno/a6xx_gmu.xml  |   228 +
>  .../gpu/drm/msm/registers/adreno/adreno_common.xml |   399 +
>  .../gpu/drm/msm/registers/adreno/adreno_pm4.xml|  2267 
>  drivers/gpu/drm/msm/registers/display/dsi.xml  |   390 +
>  .../gpu/drm/msm/registers/display/dsi_phy_10nm.xml |   102 +
>  .../gpu/drm/msm/registers/display/dsi_phy_14nm.xml |   135 +
>

Re: Time for drm-ci-next?

2024-03-15 Thread Rob Clark

On Fri, Mar 15, 2024 at 2:28 AM Jani Nikula  wrote:
>
> On Thu, 14 Mar 2024, Rob Clark  wrote:
> > When we first merged drm/ci I was unsure if it would need it's own
> > -next branch.  But after using it for a couple releases, a few times
> > I've found myself wanting to backmerge drm/ci changes without
> > necessarily backmerging all of drm-misc-next.
> >
> > So, maybe it makes some sense to have a drm-ci-next branch that
> > driver-maintainers could back-merge as-needed?
>
> That's a crossmerge instead of a backmerge, and I feel that could get
> messy. What if folks crossmerge drm-ci-next but it gets rejected for
> drm-next? Or the baselines are different, and the crossmerge pulls in
> way more stuff than it should?

Yeah, it would defeat the point a bit of drm-ci-next was on too new of
a baseline, the whole point is to be able to merge CI changes without
pulling in unrelated changes.  So drm-ci-next would need to base on
something older, like the previous kernel release tag.

> IMO the route should be drm-ci-next -> pull request to drm-next ->
> backmerge drm-next to drivers and drm-misc-next.
>
> I'm not opposed to having drm-ci-next at all, mainly indifferent, but I
> question the merge flows. And then the question becomes, does my
> suggested merge flow complicate your original goal?
>

I guess we could avoid merging drm-ci-next until it had been merged
into drm-next?

Basically, I often find myself needing to merge CI patches on top of
msm-next in order to run CI, and then after a clean CI run, reset HEAD
back before the merge and force-push.  Which isn't really how things
should work.

BR,
-R

>
> BR,
> Jani.
>
>
> --
> Jani Nikula, Intel

Time for drm-ci-next?

2024-03-14 Thread Rob Clark

When we first merged drm/ci I was unsure if it would need it's own
-next branch.  But after using it for a couple releases, a few times
I've found myself wanting to backmerge drm/ci changes without
necessarily backmerging all of drm-misc-next.

So, maybe it makes some sense to have a drm-ci-next branch that
driver-maintainers could back-merge as-needed?

Thoughts?

BR,
-R

[pull] drm/msm: drm-msm-next-2024-03-07 for v6.9

2024-03-07 Thread Rob Clark

Hi Dave,

This is the last bit for v6.9, which was waiting on
drm-misc-next-2024-02-29.  Description below.

The following changes since commit 177bce60cd10a4ffdc9881bf6f2dff7880408c1d:

  Merge tag 'drm-misc-next-2024-02-29' into msm-next (2024-03-03 18:32:11 -0800)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-next-2024-03-07

for you to fetch changes up to 4be445f5b6b6810baf397b2d159bd07c3573fd75:

  drm/msm/dpu: capture snapshot on the first commit_done timeout
(2024-03-04 11:44:03 +0200)


Late updates for v6.9, the main part is CDM (YUV over DP) which was
waiting for drm-misc-next-2024-02-29.

DPU:
- Add support for YUV420 over DP
- Patchset to ease debugging of vblank timeouts
- Small cleanup


Dmitry Baryshkov (3):
  drm/msm/dpu: make "vblank timeout" more useful
  drm/msm/dpu: split dpu_encoder_wait_for_event into two functions
  drm/msm/dpu: capture snapshot on the first commit_done timeout

Kuogee Hsieh (1):
  drm/msm/dpu: add support of new peripheral flush mechanism

Paloma Arellano (18):
  drm/msm/dpu: allow certain formats for CDM for DP
  drm/msm/dpu: add division of drm_display_mode's hskew parameter
  drm/msm/dpu: pass mode dimensions instead of fb size in CDM setup
  drm/msm/dpu: allow dpu_encoder_helper_phys_setup_cdm to work for DP
  drm/msm/dpu: move dpu_encoder_helper_phys_setup_cdm to dpu_encoder
  drm/msm/dp: rename wide_bus_en to wide_bus_supported
  drm/msm/dp: store mode YUV420 information to be used by rest of DP
  drm/msm/dp: check if VSC SDP is supported in DP programming
  drm/msm/dpu: move widebus logic to its own API
  drm/msm/dp: program config ctrl for YUV420 over DP
  drm/msm/dp: change clock related programming for YUV420 over DP
  drm/msm/dp: move parity calculation to dp_utils
  drm/msm/dp: add VSC SDP support for YUV420 over DP
  drm/msm/dp: enable SDP and SDE periph flush update
  drm/msm/dpu: modify encoder programming for CDM over DP
  drm/msm/dpu: modify timing engine programming for YUV420 over DP
  drm/msm/dpu: reserve CDM blocks for DP if mode is YUV420
  drm/msm/dp: allow YUV420 mode for DP connector when CDM available

 drivers/gpu/drm/msm/Makefile   |   3 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c| 244 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h|  26 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h   |  26 ++-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c   |  32 ++-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c| 100 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c |   2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c |  17 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h |  10 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   6 +-
 drivers/gpu/drm/msm/dp/dp_audio.c  | 101 ++---
 drivers/gpu/drm/msm/dp/dp_catalog.c| 115 +-
 drivers/gpu/drm/msm/dp/dp_catalog.h|   9 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.c   |  17 +-
 drivers/gpu/drm/msm/dp/dp_display.c|  82 +--
 drivers/gpu/drm/msm/dp/dp_drm.c|   6 +-
 drivers/gpu/drm/msm/dp/dp_drm.h|   3 +-
 drivers/gpu/drm/msm/dp/dp_panel.c  |  53 +
 drivers/gpu/drm/msm/dp/dp_panel.h  |   2 +
 drivers/gpu/drm/msm/dp/dp_reg.h|   9 +
 drivers/gpu/drm/msm/dp/dp_utils.c  |  96 
 drivers/gpu/drm/msm/dp/dp_utils.h  |  36 +++
 drivers/gpu/drm/msm/msm_drv.h  |  32 +--
 23 files changed, 736 insertions(+), 291 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/dp/dp_utils.c
 create mode 100644 drivers/gpu/drm/msm/dp/dp_utils.h

Re: [PATCH] drm/udl: Add ARGB8888 as a format

2024-03-06 Thread Rob Clark

On Wed, Mar 6, 2024 at 3:24 PM Ville Syrjälä
 wrote:
>
> On Wed, Mar 06, 2024 at 07:37:16AM -0800, Rob Clark wrote:
> > On Wed, Mar 6, 2024 at 7:06 AM Ville Syrjälä
> >  wrote:
> > >
> > > On Wed, Mar 06, 2024 at 06:49:15AM -0800, Rob Clark wrote:
> > > > On Wed, Mar 6, 2024 at 4:18 AM Thomas Zimmermann  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > sorry that I did not see the patch before.
> > > > >
> > > > > Am 27.02.24 um 23:19 schrieb Douglas Anderson:
> > > > > > Even though the UDL driver converts to RGB565 internally (see
> > > > > > pixel32_to_be16() in udl_transfer.c), it advertises XRGB for
> > > > > > compatibility. Let's add ARGB to that list.
> > > > >
> > > > > We had a heated discussion about the emulation of color formats. It 
> > > > > was
> > > > > decided that XRGB is the only format to support; and that's only
> > > > > because legacy userspace sometimes expects it. Adding other formats to
> > > > > the list should not be done easily.
> > > >
> > > > OTOH it is fixing a kernel change that broke userspace
> > > >
> > > > > >
> > > > > > This makes UDL devices work on ChromeOS again after commit
> > > > > > c91acda3a380 ("drm/gem: Check for valid formats"). Prior to that
> > > > > > commit things were "working" because we'd silently treat the 
> > > > > > ARGB
> > > > > > that ChromeOS wanted as XRGB.
> > > > >
> > > > > This problem has been caused by userspace. Why can it not be fixed 
> > > > > there?
> > > > >
> > > > > And udl is just one driver. Any other driver without ARGB, such as
> > > > > simpledrm or ofdrm, would be affected. Do these work?
> > > >
> > > > Probably any driver where ARGB is equivalent to XRGB (ie.
> > > > single primary plane, etc) should advertise both.
> > >
> > > To me that seemes likely to trick userspace developers into
> > > assuming that ARGB is always available, and then when they
> > > finally try on hardware that doesn't have ARGB it'll just
> > > fail miserably.
> >
> > I think that ship has sailed already, at least for any drivers that
> > previously silently accepted ARGB
>
> Perhaps. Although I don't actually understand what kind of weird
> userspace people are running if it somehow expects ARGB to be there,
> but only for some specific kms drivers. Is said userspace really
> somehow checking which kms driver is present and then just ignoring
> the pixel format list exposed by the driver? Or is it just some
> super hw specific thing where they can just assume a specific kms
> driver?

I think chrome compositor (as in CrOS) always just picks ARGB
because, on devices that support overlays/underlays, it will use
underlays in some cases.  Yes, lazy, and a userspace bug.  But this
worked previously until commit c91acda3a380 ("drm/gem: Check for valid
formats"), so it seems to me like a clear case of kernel breaking
userspace.  I don't think we really have a choice other than to allow
ARGB.

A lot of drivers like simpledrm will never encounter the chrome
compositor, so it is ofc an option to leave them as-is until someone
reports a regression, which is maybe unlikely.  I suppose udl is a
special case because it can show up anywhere.

BR,
-R

> Anyways, adding ARGB to even more drivers seems like a terrible
> idea to me.
>
> --
> Ville Syrjälä
> Intel

Re: [PATCH] drm/udl: Add ARGB8888 as a format

2024-03-06 Thread Rob Clark

On Wed, Mar 6, 2024 at 7:06 AM Ville Syrjälä
 wrote:
>
> On Wed, Mar 06, 2024 at 06:49:15AM -0800, Rob Clark wrote:
> > On Wed, Mar 6, 2024 at 4:18 AM Thomas Zimmermann  
> > wrote:
> > >
> > > Hi,
> > >
> > > sorry that I did not see the patch before.
> > >
> > > Am 27.02.24 um 23:19 schrieb Douglas Anderson:
> > > > Even though the UDL driver converts to RGB565 internally (see
> > > > pixel32_to_be16() in udl_transfer.c), it advertises XRGB for
> > > > compatibility. Let's add ARGB to that list.
> > >
> > > We had a heated discussion about the emulation of color formats. It was
> > > decided that XRGB is the only format to support; and that's only
> > > because legacy userspace sometimes expects it. Adding other formats to
> > > the list should not be done easily.
> >
> > OTOH it is fixing a kernel change that broke userspace
> >
> > > >
> > > > This makes UDL devices work on ChromeOS again after commit
> > > > c91acda3a380 ("drm/gem: Check for valid formats"). Prior to that
> > > > commit things were "working" because we'd silently treat the ARGB
> > > > that ChromeOS wanted as XRGB.
> > >
> > > This problem has been caused by userspace. Why can it not be fixed there?
> > >
> > > And udl is just one driver. Any other driver without ARGB, such as
> > > simpledrm or ofdrm, would be affected. Do these work?
> >
> > Probably any driver where ARGB is equivalent to XRGB (ie.
> > single primary plane, etc) should advertise both.
>
> To me that seemes likely to trick userspace developers into
> assuming that ARGB is always available, and then when they
> finally try on hardware that doesn't have ARGB it'll just
> fail miserably.

I think that ship has sailed already, at least for any drivers that
previously silently accepted ARGB

BR,
-R

> --
> Ville Syrjälä
> Intel

Re: [PATCH] drm/udl: Add ARGB8888 as a format

2024-03-06 Thread Rob Clark

On Wed, Mar 6, 2024 at 4:18 AM Thomas Zimmermann  wrote:
>
> Hi,
>
> sorry that I did not see the patch before.
>
> Am 27.02.24 um 23:19 schrieb Douglas Anderson:
> > Even though the UDL driver converts to RGB565 internally (see
> > pixel32_to_be16() in udl_transfer.c), it advertises XRGB for
> > compatibility. Let's add ARGB to that list.
>
> We had a heated discussion about the emulation of color formats. It was
> decided that XRGB is the only format to support; and that's only
> because legacy userspace sometimes expects it. Adding other formats to
> the list should not be done easily.

OTOH it is fixing a kernel change that broke userspace

> >
> > This makes UDL devices work on ChromeOS again after commit
> > c91acda3a380 ("drm/gem: Check for valid formats"). Prior to that
> > commit things were "working" because we'd silently treat the ARGB
> > that ChromeOS wanted as XRGB.
>
> This problem has been caused by userspace. Why can it not be fixed there?
>
> And udl is just one driver. Any other driver without ARGB, such as
> simpledrm or ofdrm, would be affected. Do these work?

Probably any driver where ARGB is equivalent to XRGB (ie.
single primary plane, etc) should advertise both.

BR,
-R

> Best regards
> Thomas
>
> >
> > Fixes: c91acda3a380 ("drm/gem: Check for valid formats")
> > Signed-off-by: Douglas Anderson 
> > ---
> >
> >   drivers/gpu/drm/udl/udl_modeset.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/udl/udl_modeset.c 
> > b/drivers/gpu/drm/udl/udl_modeset.c
> > index 7702359c90c2..0f8d3678770e 100644
> > --- a/drivers/gpu/drm/udl/udl_modeset.c
> > +++ b/drivers/gpu/drm/udl/udl_modeset.c
> > @@ -253,6 +253,7 @@ static int udl_handle_damage(struct drm_framebuffer *fb,
> >   static const uint32_t udl_primary_plane_formats[] = {
> >   DRM_FORMAT_RGB565,
> >   DRM_FORMAT_XRGB,
> > + DRM_FORMAT_ARGB,
> >   };
> >
> >   static const uint64_t udl_primary_plane_fmtmods[] = {
>
> --
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
>

Re: [RFC] drm/msm: Add GPU memory traces

2024-03-04 Thread Rob Clark

On Mon, Mar 4, 2024 at 5:38 PM Gurchetan Singh
 wrote:
>
>
>
>
> On Fri, Mar 1, 2024 at 10:54 AM Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> Perfetto can use these traces to track global and per-process GPU memory
>> usage.
>>
>> Signed-off-by: Rob Clark 
>> ---
>> I realized the tracepoint that perfetto uses to show GPU memory usage
>> globally and per-process was already upstream, but with no users.
>>
>> This overlaps a bit with fdinfo, but ftrace is a lighter weight
>> mechanism and fits better with perfetto (plus is already supported in
>> trace_processor and perfetto UI, whereas something fdinfo based would
>> require new code to be added in perfetto.
>>
>> We could probably do this more globally (ie. drm_gem_get/put_pages() and
>> drm_gem_handle_create_tail()/drm_gem_object_release_handle() if folks
>> prefer.  Not sure where that leaves the TTM drivers.
>>
>>  drivers/gpu/drm/msm/Kconfig   |  1 +
>>  drivers/gpu/drm/msm/msm_drv.h |  5 +
>>  drivers/gpu/drm/msm/msm_gem.c | 37 +++
>>  drivers/gpu/drm/msm/msm_gpu.h |  8 
>>  4 files changed, 51 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
>> index f202f26adab2..e4c912fcaf22 100644
>> --- a/drivers/gpu/drm/msm/Kconfig
>> +++ b/drivers/gpu/drm/msm/Kconfig
>> @@ -33,6 +33,7 @@ config DRM_MSM
>> select PM_OPP
>> select NVMEM
>> select PM_GENERIC_DOMAINS
>> +   select TRACE_GPU_MEM
>> help
>>   DRM/KMS driver for MSM/snapdragon.
>>
>> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
>> index 16a7cbc0b7dd..cb8f7e804b5b 100644
>> --- a/drivers/gpu/drm/msm/msm_drv.h
>> +++ b/drivers/gpu/drm/msm/msm_drv.h
>> @@ -137,6 +137,11 @@ struct msm_drm_private {
>> struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
>> struct msm_perf_state *perf;
>>
>> +   /**
>> +* total_mem: Total/global amount of memory backing GEM objects.
>> +*/
>> +   atomic64_t total_mem;
>> +
>> /**
>>  * List of all GEM objects (mainly for debugfs, protected by obj_lock
>>  * (acquire before per GEM object lock)
>> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
>> index 175ee4ab8a6f..e04c4af5d154 100644
>> --- a/drivers/gpu/drm/msm/msm_gem.c
>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>> @@ -12,6 +12,9 @@
>>  #include 
>>
>>  #include 
>> +#include 
>> +
>> +#include 
>>
>>  #include "msm_drv.h"
>>  #include "msm_fence.h"
>> @@ -33,6 +36,34 @@ static bool use_pages(struct drm_gem_object *obj)
>> return !msm_obj->vram_node;
>>  }
>>
>> +static void update_device_mem(struct msm_drm_private *priv, ssize_t size)
>> +{
>> +   uint64_t total_mem = atomic64_add_return(size, >total_mem);
>> +   trace_gpu_mem_total(0, 0, total_mem);
>> +}
>> +
>> +static void update_ctx_mem(struct drm_file *file, ssize_t size)
>> +{
>> +   struct msm_file_private *ctx = file->driver_priv;
>> +   uint64_t ctx_mem = atomic64_add_return(size, >ctx_mem);
>> +
>> +   rcu_read_lock(); /* Locks file->pid! */
>> +   trace_gpu_mem_total(0, pid_nr(file->pid), ctx_mem);
>> +   rcu_read_unlock();
>> +
>> +}
>> +
>> +static int msm_gem_open(struct drm_gem_object *obj, struct drm_file *file)
>> +{
>> +   update_ctx_mem(file, obj->size);
>> +   return 0;
>> +}
>> +
>> +static void msm_gem_close(struct drm_gem_object *obj, struct drm_file *file)
>> +{
>> +   update_ctx_mem(file, -obj->size);
>> +}
>> +
>>  /*
>>   * Cache sync.. this is a bit over-complicated, to fit dma-mapping
>>   * API.  Really GPU cache is out of scope here (handled on cmdstream)
>> @@ -156,6 +187,8 @@ static struct page **get_pages(struct drm_gem_object 
>> *obj)
>> return p;
>> }
>>
>> +   update_device_mem(dev->dev_private, obj->size);
>> +
>> msm_obj->pages = p;
>>
>> msm_obj->sgt = drm_prime_pages_to_sg(obj->dev, p, npages);
>> @@ -209,6 +242,8 @@ static void put_pages(struct drm_gem_object *obj)
>> msm_obj->sgt = NULL;
>> }
>>
>> +   update

Re: [RFC] drm/msm: Add GPU memory traces

2024-03-01 Thread Rob Clark

On Fri, Mar 1, 2024 at 10:53 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Perfetto can use these traces to track global and per-process GPU memory
> usage.
>
> Signed-off-by: Rob Clark 
> ---
> I realized the tracepoint that perfetto uses to show GPU memory usage
> globally and per-process was already upstream, but with no users.
>
> This overlaps a bit with fdinfo, but ftrace is a lighter weight
> mechanism and fits better with perfetto (plus is already supported in
> trace_processor and perfetto UI, whereas something fdinfo based would
> require new code to be added in perfetto.

Side-note, I'm also investigating mesa based perfetto memory traces,
which can give a more granular view (ie. breakdown of memory used for
image/buffer/cmdstream/cache/etc), but not a global view.  And the
userspace based traces have the unfortunate design decision to trace
incremental rather than absolute total values, so results can be
incorrect if traces are dropped.  So neither userspace based nor
kernel based gpu memory traces are an adequate replacement for the
other.

BR,
-R

> We could probably do this more globally (ie. drm_gem_get/put_pages() and
> drm_gem_handle_create_tail()/drm_gem_object_release_handle() if folks
> prefer.  Not sure where that leaves the TTM drivers.
>
>  drivers/gpu/drm/msm/Kconfig   |  1 +
>  drivers/gpu/drm/msm/msm_drv.h |  5 +
>  drivers/gpu/drm/msm/msm_gem.c | 37 +++
>  drivers/gpu/drm/msm/msm_gpu.h |  8 
>  4 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
> index f202f26adab2..e4c912fcaf22 100644
> --- a/drivers/gpu/drm/msm/Kconfig
> +++ b/drivers/gpu/drm/msm/Kconfig
> @@ -33,6 +33,7 @@ config DRM_MSM
> select PM_OPP
> select NVMEM
> select PM_GENERIC_DOMAINS
> +   select TRACE_GPU_MEM
> help
>   DRM/KMS driver for MSM/snapdragon.
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> index 16a7cbc0b7dd..cb8f7e804b5b 100644
> --- a/drivers/gpu/drm/msm/msm_drv.h
> +++ b/drivers/gpu/drm/msm/msm_drv.h
> @@ -137,6 +137,11 @@ struct msm_drm_private {
> struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
> struct msm_perf_state *perf;
>
> +   /**
> +* total_mem: Total/global amount of memory backing GEM objects.
> +*/
> +   atomic64_t total_mem;
> +
> /**
>  * List of all GEM objects (mainly for debugfs, protected by obj_lock
>  * (acquire before per GEM object lock)
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 175ee4ab8a6f..e04c4af5d154 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -12,6 +12,9 @@
>  #include 
>
>  #include 
> +#include 
> +
> +#include 
>
>  #include "msm_drv.h"
>  #include "msm_fence.h"
> @@ -33,6 +36,34 @@ static bool use_pages(struct drm_gem_object *obj)
> return !msm_obj->vram_node;
>  }
>
> +static void update_device_mem(struct msm_drm_private *priv, ssize_t size)
> +{
> +   uint64_t total_mem = atomic64_add_return(size, >total_mem);
> +   trace_gpu_mem_total(0, 0, total_mem);
> +}
> +
> +static void update_ctx_mem(struct drm_file *file, ssize_t size)
> +{
> +   struct msm_file_private *ctx = file->driver_priv;
> +   uint64_t ctx_mem = atomic64_add_return(size, >ctx_mem);
> +
> +   rcu_read_lock(); /* Locks file->pid! */
> +   trace_gpu_mem_total(0, pid_nr(file->pid), ctx_mem);
> +   rcu_read_unlock();
> +
> +}
> +
> +static int msm_gem_open(struct drm_gem_object *obj, struct drm_file *file)
> +{
> +   update_ctx_mem(file, obj->size);
> +   return 0;
> +}
> +
> +static void msm_gem_close(struct drm_gem_object *obj, struct drm_file *file)
> +{
> +   update_ctx_mem(file, -obj->size);
> +}
> +
>  /*
>   * Cache sync.. this is a bit over-complicated, to fit dma-mapping
>   * API.  Really GPU cache is out of scope here (handled on cmdstream)
> @@ -156,6 +187,8 @@ static struct page **get_pages(struct drm_gem_object *obj)
> return p;
> }
>
> +   update_device_mem(dev->dev_private, obj->size);
> +
> msm_obj->pages = p;
>
> msm_obj->sgt = drm_prime_pages_to_sg(obj->dev, p, npages);
> @@ -209,6 +242,8 @@ static void put_pages(struct drm_gem_object *obj)
> msm_obj->sgt = NULL;
> }
>
> +   update_device_mem(obj->dev->dev_private, -obj->size);
> +
>

[RFC] drm/msm: Add GPU memory traces

2024-03-01 Thread Rob Clark

From: Rob Clark 

Perfetto can use these traces to track global and per-process GPU memory
usage.

Signed-off-by: Rob Clark 
---
I realized the tracepoint that perfetto uses to show GPU memory usage
globally and per-process was already upstream, but with no users.

This overlaps a bit with fdinfo, but ftrace is a lighter weight
mechanism and fits better with perfetto (plus is already supported in
trace_processor and perfetto UI, whereas something fdinfo based would
require new code to be added in perfetto.

We could probably do this more globally (ie. drm_gem_get/put_pages() and
drm_gem_handle_create_tail()/drm_gem_object_release_handle() if folks
prefer.  Not sure where that leaves the TTM drivers.

 drivers/gpu/drm/msm/Kconfig   |  1 +
 drivers/gpu/drm/msm/msm_drv.h |  5 +
 drivers/gpu/drm/msm/msm_gem.c | 37 +++
 drivers/gpu/drm/msm/msm_gpu.h |  8 
 4 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index f202f26adab2..e4c912fcaf22 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -33,6 +33,7 @@ config DRM_MSM
select PM_OPP
select NVMEM
select PM_GENERIC_DOMAINS
+   select TRACE_GPU_MEM
help
  DRM/KMS driver for MSM/snapdragon.
 
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 16a7cbc0b7dd..cb8f7e804b5b 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -137,6 +137,11 @@ struct msm_drm_private {
struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
struct msm_perf_state *perf;
 
+   /**
+* total_mem: Total/global amount of memory backing GEM objects.
+*/
+   atomic64_t total_mem;
+
/**
 * List of all GEM objects (mainly for debugfs, protected by obj_lock
 * (acquire before per GEM object lock)
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 175ee4ab8a6f..e04c4af5d154 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -12,6 +12,9 @@
 #include 
 
 #include 
+#include 
+
+#include 
 
 #include "msm_drv.h"
 #include "msm_fence.h"
@@ -33,6 +36,34 @@ static bool use_pages(struct drm_gem_object *obj)
return !msm_obj->vram_node;
 }
 
+static void update_device_mem(struct msm_drm_private *priv, ssize_t size)
+{
+   uint64_t total_mem = atomic64_add_return(size, >total_mem);
+   trace_gpu_mem_total(0, 0, total_mem);
+}
+
+static void update_ctx_mem(struct drm_file *file, ssize_t size)
+{
+   struct msm_file_private *ctx = file->driver_priv;
+   uint64_t ctx_mem = atomic64_add_return(size, >ctx_mem);
+
+   rcu_read_lock(); /* Locks file->pid! */
+   trace_gpu_mem_total(0, pid_nr(file->pid), ctx_mem);
+   rcu_read_unlock();
+
+}
+
+static int msm_gem_open(struct drm_gem_object *obj, struct drm_file *file)
+{
+   update_ctx_mem(file, obj->size);
+   return 0;
+}
+
+static void msm_gem_close(struct drm_gem_object *obj, struct drm_file *file)
+{
+   update_ctx_mem(file, -obj->size);
+}
+
 /*
  * Cache sync.. this is a bit over-complicated, to fit dma-mapping
  * API.  Really GPU cache is out of scope here (handled on cmdstream)
@@ -156,6 +187,8 @@ static struct page **get_pages(struct drm_gem_object *obj)
return p;
}
 
+   update_device_mem(dev->dev_private, obj->size);
+
msm_obj->pages = p;
 
msm_obj->sgt = drm_prime_pages_to_sg(obj->dev, p, npages);
@@ -209,6 +242,8 @@ static void put_pages(struct drm_gem_object *obj)
msm_obj->sgt = NULL;
}
 
+   update_device_mem(obj->dev->dev_private, -obj->size);
+
if (use_pages(obj))
drm_gem_put_pages(obj, msm_obj->pages, true, false);
else
@@ -1118,6 +1153,8 @@ static const struct vm_operations_struct vm_ops = {
 
 static const struct drm_gem_object_funcs msm_gem_object_funcs = {
.free = msm_gem_free_object,
+   .open = msm_gem_open,
+   .close = msm_gem_close,
.pin = msm_gem_prime_pin,
.unpin = msm_gem_prime_unpin,
.get_sg_table = msm_gem_prime_get_sg_table,
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 2bfcb222e353..f7d2a7d6f8cc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -428,6 +428,14 @@ struct msm_file_private {
 * level.
 */
struct drm_sched_entity *entities[NR_SCHED_PRIORITIES * 
MSM_GPU_MAX_RINGS];
+
+   /**
+* ctx_mem:
+*
+* Total amount of memory of GEM buffers with handles attached for
+* this context.
+*/
+   atomic64_t ctx_mem;
 };
 
 /**
-- 
2.44.0

[pull] drm/msm: drm-msm-next-2024-02-29 for v6.9

2024-02-29 Thread Rob Clark

U if condition
  dt-bindings: arm-smmu: Document SM8650 GPU SMMU
  drm/msm: add support for A750 GPU

Rob Clark (2):
  drm/msm/adreno: Update generated headers
  drm/msm/a7xx: Fix LLC typo

Rob Herring (1):
  dt-bindings: display: msm: sm8650-mdss: Add missing explicit
"additionalProperties"

 .../bindings/display/msm/dsi-controller-main.yaml  |2 +
 .../devicetree/bindings/display/msm/gmu.yaml   |1 +
 .../devicetree/bindings/display/msm/gpu.yaml   |6 +-
 .../devicetree/bindings/display/msm/qcom,mdss.yaml |1 +
 .../bindings/display/msm/qcom,sm8650-dpu.yaml  |4 +-
 .../bindings/display/msm/qcom,sm8650-mdss.yaml |4 +
 .../bindings/display/msm/qcom,x1e80100-mdss.yaml   |  251 +
 .../devicetree/bindings/iommu/arm,smmu.yaml|   17 +-
 drivers/gpu/drm/msm/Makefile   |2 -
 drivers/gpu/drm/msm/adreno/a2xx.xml.h  |   73 +-
 drivers/gpu/drm/msm/adreno/a3xx.xml.h  |  131 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |   13 +-
 drivers/gpu/drm/msm/adreno/a4xx.xml.h  |  182 +-
 drivers/gpu/drm/msm/adreno/a5xx.xml.h  |  666 +--
 drivers/gpu/drm/msm/adreno/a6xx.xml.h  | 5275 
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c  |8 +-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h  |  179 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  220 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c|  727 ++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h|  311 +-
 drivers/gpu/drm/msm/adreno/adreno_common.xml.h |  260 +-
 drivers/gpu/drm/msm/adreno/adreno_device.c |   69 +-
 .../gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h  |  928 
 .../gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h  |  753 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|   31 +-
 drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h|  573 ++-
 .../gpu/drm/msm/disp/dpu1/catalog/dpu_3_2_sdm660.h |  291 ++
 .../gpu/drm/msm/disp/dpu1/catalog/dpu_3_3_sdm630.h |  225 +
 .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   |  449 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c|  105 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h|7 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h   |   15 +-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c   |   95 +-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c   |   60 +-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c|   88 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |4 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |3 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c|   15 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h|1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|  127 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h|1 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c |  154 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h  |   74 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c  |   61 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.h  |3 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cmd_encoder.c   |   42 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_encoder.c   |   42 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_irq.c   |2 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c   |   71 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.h   |   10 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c   |   12 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.h   |4 +-
 drivers/gpu/drm/msm/dp/dp_aux.c|9 +-
 drivers/gpu/drm/msm/dp/dp_aux.h|2 +
 drivers/gpu/drm/msm/dp/dp_catalog.c|  156 +-
 drivers/gpu/drm/msm/dp/dp_catalog.h|6 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.c   |  358 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.h   |   17 +-
 drivers/gpu/drm/msm/dp/dp_debug.c  |3 +-
 drivers/gpu/drm/msm/dp/dp_display.c|  102 +-
 drivers/gpu/drm/msm/dp/dp_display.h|3 +-
 drivers/gpu/drm/msm/dp/dp_link.h   |   23 -
 drivers/gpu/drm/msm/dp/dp_panel.c  |   66 +
 drivers/gpu/drm/msm/dp/dp_parser.c |  327 --
 drivers/gpu/drm/msm/dp/dp_parser.h |  155 -
 drivers/gpu/drm/msm/dp/dp_power.c  |  183 -
 drivers/gpu/drm/msm/dp/dp_power.h  |   95 -
 drivers/gpu/drm/msm/dsi/dsi.c  |   10 +-
 drivers/gpu/drm/msm/dsi/dsi.h  |   22 +-
 drivers/gpu/drm/msm/dsi/dsi_host.c |   51 +-
 drivers/gpu/drm/msm/dsi/dsi_manager.c  |   65 +-
 drivers/gpu/drm/msm/msm_drv.c  |   33 +
 drivers/gpu/drm/msm/msm_drv.h  |4 +
 drivers/gpu/drm/msm/msm_io_utils.c |   13 +
 drivers/gpu/drm/msm/msm_kms.h  |4 -
 drivers/gpu/drm/msm/msm_mdss.c

[pull] drm/msm: drm-msm-fixes-2024-02-28 for v6.8-rc7

2024-02-28 Thread Rob Clark

Hi Dave,

A late revert to address a displayport hpd regression.

The following changes since commit 8c7bfd8262319fd3f127a5380f593ea76f1b88a2:

  drm/msm: Wire up tlb ops (2024-02-15 08:51:31 -0800)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2024-02-28

for you to fetch changes up to 664bad6af3cbe01d6804b7264bee674b3e7dae7e:

  Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD status
changes" (2024-02-28 15:32:29 +0200)


Fixes for v6.8-rc7

DP:
- Revert a change which was causing a HDP regression


Dmitry Baryshkov (1):
  Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD
status changes"

 drivers/gpu/drm/msm/dp/dp_display.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Re: [PATCH] drm: ci: uprev IGT

2024-02-23 Thread Rob Clark

On Wed, Feb 21, 2024 at 6:36 PM Dmitry Baryshkov
 wrote:
>
> On Tue, 20 Feb 2024 at 16:31, Helen Koike  wrote:
> >
> >
> >
> > On 20/02/2024 09:17, Dmitry Baryshkov wrote:
> > > Bump IGT revision to pick up Rob Clark's fixes for the msm driver:
> > >
> > > - msm_submit@invalid-duplicate-bo-submit,Fail
> > >
> > > Signed-off-by: Dmitry Baryshkov 
> >
> > Do you have a gitlab pipeline link I can check?
>
> Before uprev: https://gitlab.freedesktop.org/drm/msm/-/pipelines/1109455
>
> After uprev: https://gitlab.freedesktop.org/drm/msm/-/pipelines/1109501

jfyi a couple more fixes landed after this, for kms_plane_cursor
(skips->pass) and kms_universal_plane (fail->pass)..

I have additional fixes for kms_bw, and kms_plane_scaling still
waiting for review

BR,
-R

[pull] drm/msm: drm-msm-fixes-2024-02-15 for v6.8-rc5

2024-02-15 Thread Rob Clark

Hi Dave,

Another fixes pull, this time actually including the GPU fixes left
out of last week's fixes due to miss-applied label, plus addition of a
tlb invalidation fix.  Description below.

The following changes since commit 8d35217149daa33358c284aca6a56d5ab92cfc6c:

  drm/msm/mdss: specify cfg bandwidth for SDM670 (2024-01-25 14:36:04 -0800)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2024-02-15

for you to fetch changes up to 8c7bfd8262319fd3f127a5380f593ea76f1b88a2:

  drm/msm: Wire up tlb ops (2024-02-15 08:51:31 -0800)


Fixes for v6.8-rc5

GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable
- tlb invalidation fix


Dmitry Baryshkov (1):
  drm/msm/a6xx: set highest_bank_bit to 13 for a610

Rob Clark (3):
  drm/msm/gem: Fix double resv lock aquire
  Revert "drm/msm/gpu: Push gpu lock down past runpm"
  drm/msm: Wire up tlb ops

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  2 +-
 drivers/gpu/drm/msm/msm_gem_prime.c   |  4 ++--
 drivers/gpu/drm/msm/msm_gpu.c | 11 +--
 drivers/gpu/drm/msm/msm_iommu.c   | 32 +---
 drivers/gpu/drm/msm/msm_ringbuffer.c  |  7 +--
 5 files changed, 42 insertions(+), 14 deletions(-)

Re: [PATCH] drm/msm: Wire up tlb ops

2024-02-15 Thread Rob Clark

On Wed, Feb 14, 2024 at 11:34 PM Johan Hovold  wrote:
>
> On Tue, Feb 13, 2024 at 09:23:40AM -0800, Rob Clark wrote:
> > From: Rob Clark 
> >
> > The brute force iommu_flush_iotlb_all() was good enough for unmap, but
> > in some cases a map operation could require removing a table pte entry
> > to replace with a block entry.  This also requires tlb invalidation.
> > Missing this was resulting an obscure iova fault on what should be a
> > valid buffer address.
> >
> > Thanks to Robin Murphy for helping me understand the cause of the fault.
> >
> > Cc: Robin Murphy 
> > Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable")
>
> Sounds like you're missing a
>
> Cc: sta...@vger.kernel.org
>
> here? Or is there some reason not to backport this fix (to 5.9 and later
> kernels)?

No reason, I just expected the Fixes tag was sufficient

BR,
-R

> > Signed-off-by: Rob Clark 
>
> Johan

[PATCH] drm/msm: Wire up tlb ops

2024-02-13 Thread Rob Clark

From: Rob Clark 

The brute force iommu_flush_iotlb_all() was good enough for unmap, but
in some cases a map operation could require removing a table pte entry
to replace with a block entry.  This also requires tlb invalidation.
Missing this was resulting an obscure iova fault on what should be a
valid buffer address.

Thanks to Robin Murphy for helping me understand the cause of the fault.

Cc: Robin Murphy 
Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_iommu.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 5cc8d358cc97..d5512037c38b 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -21,6 +21,8 @@ struct msm_iommu_pagetable {
struct msm_mmu base;
struct msm_mmu *parent;
struct io_pgtable_ops *pgtbl_ops;
+   const struct iommu_flush_ops *tlb;
+   struct device *iommu_dev;
unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
phys_addr_t ttbr;
u32 asid;
@@ -201,11 +203,33 @@ static const struct msm_mmu_funcs pagetable_funcs = {
 
 static void msm_iommu_tlb_flush_all(void *cookie)
 {
+   struct msm_iommu_pagetable *pagetable = cookie;
+   struct adreno_smmu_priv *adreno_smmu;
+
+   if (!pm_runtime_get_if_in_use(pagetable->iommu_dev))
+   return;
+
+   adreno_smmu = dev_get_drvdata(pagetable->parent->dev);
+
+   pagetable->tlb->tlb_flush_all((void *)adreno_smmu->cookie);
+
+   pm_runtime_put_autosuspend(pagetable->iommu_dev);
 }
 
 static void msm_iommu_tlb_flush_walk(unsigned long iova, size_t size,
size_t granule, void *cookie)
 {
+   struct msm_iommu_pagetable *pagetable = cookie;
+   struct adreno_smmu_priv *adreno_smmu;
+
+   if (!pm_runtime_get_if_in_use(pagetable->iommu_dev))
+   return;
+
+   adreno_smmu = dev_get_drvdata(pagetable->parent->dev);
+
+   pagetable->tlb->tlb_flush_walk(iova, size, granule, (void 
*)adreno_smmu->cookie);
+
+   pm_runtime_put_autosuspend(pagetable->iommu_dev);
 }
 
 static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
@@ -213,7 +237,7 @@ static void msm_iommu_tlb_add_page(struct 
iommu_iotlb_gather *gather,
 {
 }
 
-static const struct iommu_flush_ops null_tlb_ops = {
+static const struct iommu_flush_ops tlb_ops = {
.tlb_flush_all = msm_iommu_tlb_flush_all,
.tlb_flush_walk = msm_iommu_tlb_flush_walk,
.tlb_add_page = msm_iommu_tlb_add_page,
@@ -254,10 +278,10 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu 
*parent)
 
/* The incoming cfg will have the TTBR1 quirk enabled */
ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
-   ttbr0_cfg.tlb = _tlb_ops;
+   ttbr0_cfg.tlb = _ops;
 
pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
-   _cfg, iommu->domain);
+   _cfg, pagetable);
 
if (!pagetable->pgtbl_ops) {
kfree(pagetable);
@@ -279,6 +303,8 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu 
*parent)
 
/* Needed later for TLB flush */
pagetable->parent = parent;
+   pagetable->tlb = ttbr1_cfg->tlb;
+   pagetable->iommu_dev = ttbr1_cfg->iommu_dev;
pagetable->pgsize_bitmap = ttbr0_cfg.pgsize_bitmap;
pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
 
-- 
2.43.0

Re: [PATCH] drm/crtc: fix uninitialized variable use even harder

2024-02-12 Thread Rob Clark

On Mon, Feb 12, 2024 at 1:55 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> DRM_MODESET_LOCK_ALL_BEGIN() has a hidden trap-door (aka retry loop),
> which means we can't rely too much on variable initializers.
>
> Fixes: 6e455f5dcdd1 ("drm/crtc: fix uninitialized variable use")
> Signed-off-by: Rob Clark 
> ---
> I have mixed feelings about DRM_MODESET_LOCK_ALL_BEGIN() (and friends)
> magic.  On one hand it simplifies the deadlock/back dance.  OTOH it
> conceals a nasty sharp edge.  Maybe it is better to have the complicated
> restart path a bit more explicit, like it was originally.

I should also point out, had drm-misc-next been using gitlab MRs and
gitlab CI, we would have caught this ;-)

BR,
-R

>  drivers/gpu/drm/drm_crtc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index cb90e70d85e8..65f9f66933bb 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -904,6 +904,7 @@ int drm_mode_setcrtc(struct drm_device *dev, void *data,
> connector_set = NULL;
> fb = NULL;
> mode = NULL;
> +   num_connectors = 0;
>
> DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
>
> --
> 2.43.0
>

[PATCH] drm/crtc: fix uninitialized variable use even harder

2024-02-12 Thread Rob Clark

From: Rob Clark 

DRM_MODESET_LOCK_ALL_BEGIN() has a hidden trap-door (aka retry loop),
which means we can't rely too much on variable initializers.

Fixes: 6e455f5dcdd1 ("drm/crtc: fix uninitialized variable use")
Signed-off-by: Rob Clark 
---
I have mixed feelings about DRM_MODESET_LOCK_ALL_BEGIN() (and friends)
magic.  On one hand it simplifies the deadlock/back dance.  OTOH it
conceals a nasty sharp edge.  Maybe it is better to have the complicated
restart path a bit more explicit, like it was originally.

 drivers/gpu/drm/drm_crtc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index cb90e70d85e8..65f9f66933bb 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -904,6 +904,7 @@ int drm_mode_setcrtc(struct drm_device *dev, void *data,
connector_set = NULL;
fb = NULL;
mode = NULL;
+   num_connectors = 0;
 
DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
 
-- 
2.43.0

[pull] drm/msm: drm-msm-fixes-2024-02-07 for v6.8-rc4

2024-02-07 Thread Rob Clark

Hi Dave,

A few fixes for v6.8, description below

The following changes since commit d4ca26ac4be0d9aea7005c40df75e6775749671b:

  drm/msm/dp: call dp_display_get_next_bridge() during probe
(2023-12-14 09:27:46 +0200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2024-02-07

for you to fetch changes up to 8d35217149daa33358c284aca6a56d5ab92cfc6c:

  drm/msm/mdss: specify cfg bandwidth for SDM670 (2024-01-25 14:36:04 -0800)


Fixes for v6.8-rc4

DPU:
- fix for kernel doc warnings and smatch warnings in dpu_encoder
- fix for smatch warning in dpu_encoder
- fix the bus bandwidth value for SDM670

DP:
- fixes to handle unknown bpc case correctly for DP. The current code was
  spilling over into other bits of DP configuration register, had to be
  fixed to avoid the extra shifts which were causing the spill over
- fix for MISC0 programming in DP driver to program the correct
  colorimetry value

GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable


Abhinav Kumar (1):
  drm/msm/dpu: check for valid hw_pp in dpu_encoder_helper_phys_cleanup

Dmitry Baryshkov (1):
  drm/msm/mdss: specify cfg bandwidth for SDM670

Kuogee Hsieh (2):
  drm/msms/dp: fixed link clock divider bits be over written in
BPC unknown case
  drm/msm/dp: return correct Colorimetry for DP_TEST_DYNAMIC_RANGE_CEA case

Randy Dunlap (1):
  drm/msm/dpu: fix kernel-doc warnings

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c |  8 ++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c  |  3 ++-
 drivers/gpu/drm/msm/dp/dp_ctrl.c|  5 -
 drivers/gpu/drm/msm/dp/dp_link.c| 22 ++
 drivers/gpu/drm/msm/dp/dp_reg.h |  3 +++
 drivers/gpu/drm/msm/msm_mdss.c  |  1 +
 6 files changed, 22 insertions(+), 20 deletions(-)

[no subject]

2024-02-07 Thread Rob Clark

Hi Dave,

A few fixes for v6.8, description below

The following changes since commit d4ca26ac4be0d9aea7005c40df75e6775749671b:

  drm/msm/dp: call dp_display_get_next_bridge() during probe
(2023-12-14 09:27:46 +0200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2024-02-07

for you to fetch changes up to 8d35217149daa33358c284aca6a56d5ab92cfc6c:

  drm/msm/mdss: specify cfg bandwidth for SDM670 (2024-01-25 14:36:04 -0800)


Fixes for v6.8-rc4

DPU:
- fix for kernel doc warnings and smatch warnings in dpu_encoder
- fix for smatch warning in dpu_encoder
- fix the bus bandwidth value for SDM670

DP:
- fixes to handle unknown bpc case correctly for DP. The current code was
  spilling over into other bits of DP configuration register, had to be
  fixed to avoid the extra shifts which were causing the spill over
- fix for MISC0 programming in DP driver to program the correct
  colorimetry value

GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable


Abhinav Kumar (1):
  drm/msm/dpu: check for valid hw_pp in dpu_encoder_helper_phys_cleanup

Dmitry Baryshkov (1):
  drm/msm/mdss: specify cfg bandwidth for SDM670

Kuogee Hsieh (2):
  drm/msms/dp: fixed link clock divider bits be over written in
BPC unknown case
  drm/msm/dp: return correct Colorimetry for DP_TEST_DYNAMIC_RANGE_CEA case

Randy Dunlap (1):
  drm/msm/dpu: fix kernel-doc warnings

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c |  8 ++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c  |  3 ++-
 drivers/gpu/drm/msm/dp/dp_ctrl.c|  5 -
 drivers/gpu/drm/msm/dp/dp_link.c| 22 ++
 drivers/gpu/drm/msm/dp/dp_reg.h |  3 +++
 drivers/gpu/drm/msm/msm_mdss.c  |  1 +
 6 files changed, 22 insertions(+), 20 deletions(-)

[PATCH v3] drm/msm/gem: Fix double resv lock aquire

2024-01-31 Thread Rob Clark

From: Rob Clark 

Since commit 79e2cf2e7a19 ("drm/gem: Take reservation lock for vmap/vunmap
operations"), the resv lock is already held in the prime vmap path, so
don't try to grab it again.

v2: This applies to vunmap path as well
v3: Fix fixes commit

Fixes: 79e2cf2e7a19 ("drm/gem: Take reservation lock for vmap/vunmap 
operations")
Signed-off-by: Rob Clark 
Acked-by: Christian König 
---
 drivers/gpu/drm/msm/msm_gem_prime.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c 
b/drivers/gpu/drm/msm/msm_gem_prime.c
index 5f68e31a3e4e..0915f3b68752 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -26,7 +26,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 {
void *vaddr;
 
-   vaddr = msm_gem_get_vaddr(obj);
+   vaddr = msm_gem_get_vaddr_locked(obj);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
iosys_map_set_vaddr(map, vaddr);
@@ -36,7 +36,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 
 void msm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
 {
-   msm_gem_put_vaddr(obj);
+   msm_gem_put_vaddr_locked(obj);
 }
 
 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
-- 
2.43.0

[PATCH v2] drm/msm/gem: Fix double resv lock aquire

2024-01-30 Thread Rob Clark

From: Rob Clark 

Since commit 56e5abba8c3e ("dma-buf: Add unlocked variant of vmapping
functions"), the resv lock is already held in the prime vmap path, so
don't try to grab it again.

v2: This applies to vunmap path as well

Fixes: 56e5abba8c3e ("dma-buf: Add unlocked variant of vmapping functions")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_prime.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c 
b/drivers/gpu/drm/msm/msm_gem_prime.c
index 5f68e31a3e4e..0915f3b68752 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -26,7 +26,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 {
void *vaddr;
 
-   vaddr = msm_gem_get_vaddr(obj);
+   vaddr = msm_gem_get_vaddr_locked(obj);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
iosys_map_set_vaddr(map, vaddr);
@@ -36,7 +36,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 
 void msm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
 {
-   msm_gem_put_vaddr(obj);
+   msm_gem_put_vaddr_locked(obj);
 }
 
 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
-- 
2.43.0

[PATCH] drm/msm/gem: Fix double resv lock aquire

2024-01-30 Thread Rob Clark

From: Rob Clark 

Since commit 56e5abba8c3e ("dma-buf: Add unlocked variant of vmapping
functions"), the resv lock is already held in the prime vmap path, so
don't try to grab it again.

Fixes: 56e5abba8c3e ("dma-buf: Add unlocked variant of vmapping functions")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_prime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c 
b/drivers/gpu/drm/msm/msm_gem_prime.c
index 5f68e31a3e4e..8a27b57a5bea 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -26,7 +26,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
 {
void *vaddr;
 
-   vaddr = msm_gem_get_vaddr(obj);
+   vaddr = msm_gem_get_vaddr_locked(obj);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
iosys_map_set_vaddr(map, vaddr);
-- 
2.43.0

Re: [PATCH] drm/ci: Add msm tests

2024-01-12 Thread Rob Clark

On Fri, Jan 12, 2024 at 7:57 AM Rob Clark  wrote:
>
> On Fri, Jan 12, 2024 at 3:42 AM Vignesh Raman
>  wrote:
> >
> > Hi Rob,
> >
> >
> > On 09/01/24 01:20, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > The msm tests should skip on non-msm hw, so I think it should be safe to
> > > enable everywhere.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > >   drivers/gpu/drm/ci/testlist.txt | 49 +
> > >   1 file changed, 49 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/ci/testlist.txt 
> > > b/drivers/gpu/drm/ci/testlist.txt
> > > index f82cd90372f4..eaeb751bb0ad 100644
> > > --- a/drivers/gpu/drm/ci/testlist.txt
> > > +++ b/drivers/gpu/drm/ci/testlist.txt
> > > @@ -2910,3 +2910,52 @@ kms_writeback@writeback-invalid-parameters
> > >   kms_writeback@writeback-fb-id
> > >   kms_writeback@writeback-check-output
> > >   prime_mmap_kms@buffer-sharing
> > > +msm_shrink@copy-gpu-sanitycheck-8
> > > +msm_shrink@copy-gpu-sanitycheck-32
> > > +msm_shrink@copy-gpu-8
> > > +msm_shrink@copy-gpu-32
> > > +msm_shrink@copy-gpu-madvise-8
> > > +msm_shrink@copy-gpu-madvise-32
> > > +msm_shrink@copy-gpu-oom-8
> > > +msm_shrink@copy-gpu-oom-32
> > > +msm_shrink@copy-mmap-sanitycheck-8
> > > +msm_shrink@copy-mmap-sanitycheck-32
> > > +msm_shrink@copy-mmap-8
> > > +msm_shrink@copy-mmap-32
> > > +msm_shrink@copy-mmap-madvise-8
> > > +msm_shrink@copy-mmap-madvise-32
> > > +msm_shrink@copy-mmap-oom-8
> > > +msm_shrink@copy-mmap-oom-32
> > > +msm_shrink@copy-mmap-dmabuf-sanitycheck-8
> > > +msm_shrink@copy-mmap-dmabuf-sanitycheck-32
> > > +msm_shrink@copy-mmap-dmabuf-8
> > > +msm_shrink@copy-mmap-dmabuf-32
> > > +msm_shrink@copy-mmap-dmabuf-madvise-8
> > > +msm_shrink@copy-mmap-dmabuf-madvise-32
> > > +msm_shrink@copy-mmap-dmabuf-oom-8
> > > +msm_shrink@copy-mmap-dmabuf-oom-32
> > > +msm_mapping@ring
> > > +msm_mapping@sqefw
> > > +msm_mapping@shadow
> > > +msm_submitoverhead@submitbench-10-bos
> > > +msm_submitoverhead@submitbench-10-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-100-bos
> > > +msm_submitoverhead@submitbench-100-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-250-bos
> > > +msm_submitoverhead@submitbench-250-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-500-bos
> > > +msm_submitoverhead@submitbench-500-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-1000-bos
> > > +msm_submitoverhead@submitbench-1000-bos-no-implicit-sync
> > > +msm_recovery@hangcheck
> > > +msm_recovery@gpu-fault
> > > +msm_recovery@gpu-fault-parallel
> > > +msm_recovery@iova-fault
> > > +msm_submit@empty-submit
> > > +msm_submit@invalid-queue-submit
> > > +msm_submit@invalid-flags-submit
> > > +msm_submit@invalid-in-fence-submit
> > > +msm_submit@invalid-duplicate-bo-submit
> > > +msm_submit@invalid-cmd-idx-submit
> > > +msm_submit@invalid-cmd-type-submit
> > > +msm_submit@valid-submit
> >
> > I tested this patch with latest drm-misc/drm-misc-next and there was
> > some failures seen for the newly added msm tests. I have updated the
> > xfails with below commit,
> >
> > https://gitlab.freedesktop.org/vigneshraman/linux/-/commit/d012893597a661d6ebbb755bf2607dfb055524a1
> >
> > I will notify the maintainers about the flaky tests, update the url in
> > the flakes.txt, and submit a separate patch for this change.

Oh, you should probably move msm_mapping@* to skips on sdm845.  I had
a closer look at those, and they are failing due to a bootloader/fw
issue.  We work around this in mesa CI with these two patches:

https://gitlab.freedesktop.org/gfx-ci/linux/-/commit/4b49f902ec6f2bb382cbbf489870573f4b43371e
https://gitlab.freedesktop.org/gfx-ci/linux/-/commit/38cdf4c5559771e2474ae0fecef8469f65147bc1

But given that sdm845 is similar to sc7180 as far as kernel gpu
driver, it is probably just better to skip these on sdm845 (with a
comment referring to the hack patches we use in mesa CI)

BR,
-R

>
> Thanks, it looks like you also have a relatively recent igt (there
> were some msm_submit fails until I fixed the test)..
>
> BR,
> -R
>
> > Regards,
> > Vignesh

Re: [PATCH] drm/ci: Add msm tests

2024-01-12 Thread Rob Clark

On Fri, Jan 12, 2024 at 3:42 AM Vignesh Raman
 wrote:
>
> Hi Rob,
>
>
> On 09/01/24 01:20, Rob Clark wrote:
> > From: Rob Clark 
> >
> > The msm tests should skip on non-msm hw, so I think it should be safe to
> > enable everywhere.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/ci/testlist.txt | 49 +
> >   1 file changed, 49 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/ci/testlist.txt 
> > b/drivers/gpu/drm/ci/testlist.txt
> > index f82cd90372f4..eaeb751bb0ad 100644
> > --- a/drivers/gpu/drm/ci/testlist.txt
> > +++ b/drivers/gpu/drm/ci/testlist.txt
> > @@ -2910,3 +2910,52 @@ kms_writeback@writeback-invalid-parameters
> >   kms_writeback@writeback-fb-id
> >   kms_writeback@writeback-check-output
> >   prime_mmap_kms@buffer-sharing
> > +msm_shrink@copy-gpu-sanitycheck-8
> > +msm_shrink@copy-gpu-sanitycheck-32
> > +msm_shrink@copy-gpu-8
> > +msm_shrink@copy-gpu-32
> > +msm_shrink@copy-gpu-madvise-8
> > +msm_shrink@copy-gpu-madvise-32
> > +msm_shrink@copy-gpu-oom-8
> > +msm_shrink@copy-gpu-oom-32
> > +msm_shrink@copy-mmap-sanitycheck-8
> > +msm_shrink@copy-mmap-sanitycheck-32
> > +msm_shrink@copy-mmap-8
> > +msm_shrink@copy-mmap-32
> > +msm_shrink@copy-mmap-madvise-8
> > +msm_shrink@copy-mmap-madvise-32
> > +msm_shrink@copy-mmap-oom-8
> > +msm_shrink@copy-mmap-oom-32
> > +msm_shrink@copy-mmap-dmabuf-sanitycheck-8
> > +msm_shrink@copy-mmap-dmabuf-sanitycheck-32
> > +msm_shrink@copy-mmap-dmabuf-8
> > +msm_shrink@copy-mmap-dmabuf-32
> > +msm_shrink@copy-mmap-dmabuf-madvise-8
> > +msm_shrink@copy-mmap-dmabuf-madvise-32
> > +msm_shrink@copy-mmap-dmabuf-oom-8
> > +msm_shrink@copy-mmap-dmabuf-oom-32
> > +msm_mapping@ring
> > +msm_mapping@sqefw
> > +msm_mapping@shadow
> > +msm_submitoverhead@submitbench-10-bos
> > +msm_submitoverhead@submitbench-10-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-100-bos
> > +msm_submitoverhead@submitbench-100-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-250-bos
> > +msm_submitoverhead@submitbench-250-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-500-bos
> > +msm_submitoverhead@submitbench-500-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-1000-bos
> > +msm_submitoverhead@submitbench-1000-bos-no-implicit-sync
> > +msm_recovery@hangcheck
> > +msm_recovery@gpu-fault
> > +msm_recovery@gpu-fault-parallel
> > +msm_recovery@iova-fault
> > +msm_submit@empty-submit
> > +msm_submit@invalid-queue-submit
> > +msm_submit@invalid-flags-submit
> > +msm_submit@invalid-in-fence-submit
> > +msm_submit@invalid-duplicate-bo-submit
> > +msm_submit@invalid-cmd-idx-submit
> > +msm_submit@invalid-cmd-type-submit
> > +msm_submit@valid-submit
>
> I tested this patch with latest drm-misc/drm-misc-next and there was
> some failures seen for the newly added msm tests. I have updated the
> xfails with below commit,
>
> https://gitlab.freedesktop.org/vigneshraman/linux/-/commit/d012893597a661d6ebbb755bf2607dfb055524a1
>
> I will notify the maintainers about the flaky tests, update the url in
> the flakes.txt, and submit a separate patch for this change.

Thanks, it looks like you also have a relatively recent igt (there
were some msm_submit fails until I fixed the test)..

BR,
-R

> Regards,
> Vignesh

Re: [PATCH] Revert "drm/msm/gpu: Push gpu lock down past runpm"

2024-01-10 Thread Rob Clark

On Wed, Jan 10, 2024 at 2:50 AM Daniel Vetter  wrote:
>
> On Tue, Jan 09, 2024 at 10:22:17AM -0800, Rob Clark wrote:
> > From: Rob Clark 
> >
> > This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.
> >
> > Changing the locking order means that scheduler/msm_job_run() can race
> > with the recovery kthread worker, with the result that the GPU gets an
> > extra runpm get when we are trying to power it off.  Leaving the GPU in
> > an unrecovered state.
>
> The recovery kthread is supposed to stop all the relevant schedulers,
> which should remove any possible race conditions. So unless there's more
> going on, or you have your own recovery kthread (don't, reuse the one from
> the scheduler with your own work items, that's why you can provide that)
> this looks like an incomplete/incorrect explanation ... ?
>
> Slightly confused

msm still uses it's own recovery, which pre-dates the scheduler
conversion.  At one point (a yr or two back?) I started looking at
integrating recovery w/ scheduler.. at the time I think you talked me
out of it, but I don't remember the reason

BR,
-R

> -Sima
>
> >
> > I'll need to come up with a different scheme for appeasing lockdep.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/msm_gpu.c| 11 +--
> >  drivers/gpu/drm/msm/msm_ringbuffer.c |  7 +--
> >  2 files changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index 095390774f22..655002b21b0d 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -751,12 +751,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit)
> >   struct msm_ringbuffer *ring = submit->ring;
> >   unsigned long flags;
> >
> > - pm_runtime_get_sync(>pdev->dev);
> > + WARN_ON(!mutex_is_locked(>lock));
> >
> > - mutex_lock(>lock);
> > + pm_runtime_get_sync(>pdev->dev);
> >
> >   msm_gpu_hw_init(gpu);
> >
> > + submit->seqno = submit->hw_fence->seqno;
> > +
> >   update_sw_cntrs(gpu);
> >
> >   /*
> > @@ -781,11 +783,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit)
> >   gpu->funcs->submit(gpu, submit);
> >   gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
> >
> > - hangcheck_timer_reset(gpu);
> > -
> > - mutex_unlock(>lock);
> > -
> >   pm_runtime_put(>pdev->dev);
> > + hangcheck_timer_reset(gpu);
> >  }
> >
> >  /*
> > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
> > b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > index e0ed27739449..548f5266a7d3 100644
> > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > @@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
> > *job)
> >
> >   msm_fence_init(submit->hw_fence, fctx);
> >
> > - submit->seqno = submit->hw_fence->seqno;
> > -
> >   mutex_lock(>lru.lock);
> >
> >   for (i = 0; i < submit->nr_bos; i++) {
> > @@ -35,8 +33,13 @@ static struct dma_fence *msm_job_run(struct 
> > drm_sched_job *job)
> >
> >   mutex_unlock(>lru.lock);
> >
> > + /* TODO move submit path over to using a per-ring lock.. */
> > + mutex_lock(>lock);
> > +
> >   msm_gpu_submit(gpu, submit);
> >
> > + mutex_unlock(>lock);
> > +
> >   return dma_fence_get(submit->hw_fence);
> >  }
> >
> > --
> > 2.43.0
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

[PATCH] Revert "drm/msm/gpu: Push gpu lock down past runpm"

2024-01-09 Thread Rob Clark

From: Rob Clark 

This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.

Changing the locking order means that scheduler/msm_job_run() can race
with the recovery kthread worker, with the result that the GPU gets an
extra runpm get when we are trying to power it off.  Leaving the GPU in
an unrecovered state.

I'll need to come up with a different scheme for appeasing lockdep.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_ringbuffer.c |  7 +--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 095390774f22..655002b21b0d 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -751,12 +751,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
struct msm_ringbuffer *ring = submit->ring;
unsigned long flags;
 
-   pm_runtime_get_sync(>pdev->dev);
+   WARN_ON(!mutex_is_locked(>lock));
 
-   mutex_lock(>lock);
+   pm_runtime_get_sync(>pdev->dev);
 
msm_gpu_hw_init(gpu);
 
+   submit->seqno = submit->hw_fence->seqno;
+
update_sw_cntrs(gpu);
 
/*
@@ -781,11 +783,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
gpu->funcs->submit(gpu, submit);
gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
 
-   hangcheck_timer_reset(gpu);
-
-   mutex_unlock(>lock);
-
pm_runtime_put(>pdev->dev);
+   hangcheck_timer_reset(gpu);
 }
 
 /*
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index e0ed27739449..548f5266a7d3 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
 
msm_fence_init(submit->hw_fence, fctx);
 
-   submit->seqno = submit->hw_fence->seqno;
-
mutex_lock(>lru.lock);
 
for (i = 0; i < submit->nr_bos; i++) {
@@ -35,8 +33,13 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
 
mutex_unlock(>lru.lock);
 
+   /* TODO move submit path over to using a per-ring lock.. */
+   mutex_lock(>lock);
+
msm_gpu_submit(gpu, submit);
 
+   mutex_unlock(>lock);
+
return dma_fence_get(submit->hw_fence);
 }
 
-- 
2.43.0

Re: [PATCH] drm/ci: Add msm tests

2024-01-09 Thread Rob Clark

On Mon, Jan 8, 2024 at 6:13 PM Rob Clark  wrote:
>
> On Mon, Jan 8, 2024 at 2:58 PM Abhinav Kumar  
> wrote:
> >
> >
> >
> > On 1/8/2024 11:50 AM, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > The msm tests should skip on non-msm hw, so I think it should be safe to
> > > enable everywhere.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > >   drivers/gpu/drm/ci/testlist.txt | 49 +
> > >   1 file changed, 49 insertions(+)
> > >
> >
> > I do see that all these tests use igt_msm_dev_open() to make sure it
> > opens only the MSM card.
> >
> > But if igt_msm_dev_open() fails, I dont see a igt_require() on some of
> > the tests to skip them. So how will it safely skip on non-msm HW?
> >
> > Unless i am missing something here 
>
> hmm, at the time I added the initial msm tests, and
> igt_msm_dev_open(), I verified that they skipped on intel.. but since
> then I'd switched from intel to sc8280xp device for primary dev
> device, so I'd need to re-test to remember how it works.  If these
> aren't skipping on !msm, it is a bug

I double checked, these tests skip in drm_open_driver() with "No known
gpu found for chipset flags 0x64 (msm)", so no problem to run them on
all CI runners.

BR,
-R


> BR,
> -R
>
> > > diff --git a/drivers/gpu/drm/ci/testlist.txt 
> > > b/drivers/gpu/drm/ci/testlist.txt
> > > index f82cd90372f4..eaeb751bb0ad 100644
> > > --- a/drivers/gpu/drm/ci/testlist.txt
> > > +++ b/drivers/gpu/drm/ci/testlist.txt
> > > @@ -2910,3 +2910,52 @@ kms_writeback@writeback-invalid-parameters
> > >   kms_writeback@writeback-fb-id
> > >   kms_writeback@writeback-check-output
> > >   prime_mmap_kms@buffer-sharing
> > > +msm_shrink@copy-gpu-sanitycheck-8
> > > +msm_shrink@copy-gpu-sanitycheck-32
> > > +msm_shrink@copy-gpu-8
> > > +msm_shrink@copy-gpu-32
> > > +msm_shrink@copy-gpu-madvise-8
> > > +msm_shrink@copy-gpu-madvise-32
> > > +msm_shrink@copy-gpu-oom-8
> > > +msm_shrink@copy-gpu-oom-32
> > > +msm_shrink@copy-mmap-sanitycheck-8
> > > +msm_shrink@copy-mmap-sanitycheck-32
> > > +msm_shrink@copy-mmap-8
> > > +msm_shrink@copy-mmap-32
> > > +msm_shrink@copy-mmap-madvise-8
> > > +msm_shrink@copy-mmap-madvise-32
> > > +msm_shrink@copy-mmap-oom-8
> > > +msm_shrink@copy-mmap-oom-32
> > > +msm_shrink@copy-mmap-dmabuf-sanitycheck-8
> > > +msm_shrink@copy-mmap-dmabuf-sanitycheck-32
> > > +msm_shrink@copy-mmap-dmabuf-8
> > > +msm_shrink@copy-mmap-dmabuf-32
> > > +msm_shrink@copy-mmap-dmabuf-madvise-8
> > > +msm_shrink@copy-mmap-dmabuf-madvise-32
> > > +msm_shrink@copy-mmap-dmabuf-oom-8
> > > +msm_shrink@copy-mmap-dmabuf-oom-32
> > > +msm_mapping@ring
> > > +msm_mapping@sqefw
> > > +msm_mapping@shadow
> > > +msm_submitoverhead@submitbench-10-bos
> > > +msm_submitoverhead@submitbench-10-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-100-bos
> > > +msm_submitoverhead@submitbench-100-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-250-bos
> > > +msm_submitoverhead@submitbench-250-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-500-bos
> > > +msm_submitoverhead@submitbench-500-bos-no-implicit-sync
> > > +msm_submitoverhead@submitbench-1000-bos
> > > +msm_submitoverhead@submitbench-1000-bos-no-implicit-sync
> > > +msm_recovery@hangcheck
> > > +msm_recovery@gpu-fault
> > > +msm_recovery@gpu-fault-parallel
> > > +msm_recovery@iova-fault
> > > +msm_submit@empty-submit
> > > +msm_submit@invalid-queue-submit
> > > +msm_submit@invalid-flags-submit
> > > +msm_submit@invalid-in-fence-submit
> > > +msm_submit@invalid-duplicate-bo-submit
> > > +msm_submit@invalid-cmd-idx-submit
> > > +msm_submit@invalid-cmd-type-submit
> > > +msm_submit@valid-submit

Re: [PATCH] drm/ci: Add msm tests

2024-01-08 Thread Rob Clark

On Mon, Jan 8, 2024 at 2:58 PM Abhinav Kumar  wrote:
>
>
>
> On 1/8/2024 11:50 AM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > The msm tests should skip on non-msm hw, so I think it should be safe to
> > enable everywhere.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/ci/testlist.txt | 49 +
> >   1 file changed, 49 insertions(+)
> >
>
> I do see that all these tests use igt_msm_dev_open() to make sure it
> opens only the MSM card.
>
> But if igt_msm_dev_open() fails, I dont see a igt_require() on some of
> the tests to skip them. So how will it safely skip on non-msm HW?
>
> Unless i am missing something here 

hmm, at the time I added the initial msm tests, and
igt_msm_dev_open(), I verified that they skipped on intel.. but since
then I'd switched from intel to sc8280xp device for primary dev
device, so I'd need to re-test to remember how it works.  If these
aren't skipping on !msm, it is a bug

BR,
-R

> > diff --git a/drivers/gpu/drm/ci/testlist.txt 
> > b/drivers/gpu/drm/ci/testlist.txt
> > index f82cd90372f4..eaeb751bb0ad 100644
> > --- a/drivers/gpu/drm/ci/testlist.txt
> > +++ b/drivers/gpu/drm/ci/testlist.txt
> > @@ -2910,3 +2910,52 @@ kms_writeback@writeback-invalid-parameters
> >   kms_writeback@writeback-fb-id
> >   kms_writeback@writeback-check-output
> >   prime_mmap_kms@buffer-sharing
> > +msm_shrink@copy-gpu-sanitycheck-8
> > +msm_shrink@copy-gpu-sanitycheck-32
> > +msm_shrink@copy-gpu-8
> > +msm_shrink@copy-gpu-32
> > +msm_shrink@copy-gpu-madvise-8
> > +msm_shrink@copy-gpu-madvise-32
> > +msm_shrink@copy-gpu-oom-8
> > +msm_shrink@copy-gpu-oom-32
> > +msm_shrink@copy-mmap-sanitycheck-8
> > +msm_shrink@copy-mmap-sanitycheck-32
> > +msm_shrink@copy-mmap-8
> > +msm_shrink@copy-mmap-32
> > +msm_shrink@copy-mmap-madvise-8
> > +msm_shrink@copy-mmap-madvise-32
> > +msm_shrink@copy-mmap-oom-8
> > +msm_shrink@copy-mmap-oom-32
> > +msm_shrink@copy-mmap-dmabuf-sanitycheck-8
> > +msm_shrink@copy-mmap-dmabuf-sanitycheck-32
> > +msm_shrink@copy-mmap-dmabuf-8
> > +msm_shrink@copy-mmap-dmabuf-32
> > +msm_shrink@copy-mmap-dmabuf-madvise-8
> > +msm_shrink@copy-mmap-dmabuf-madvise-32
> > +msm_shrink@copy-mmap-dmabuf-oom-8
> > +msm_shrink@copy-mmap-dmabuf-oom-32
> > +msm_mapping@ring
> > +msm_mapping@sqefw
> > +msm_mapping@shadow
> > +msm_submitoverhead@submitbench-10-bos
> > +msm_submitoverhead@submitbench-10-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-100-bos
> > +msm_submitoverhead@submitbench-100-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-250-bos
> > +msm_submitoverhead@submitbench-250-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-500-bos
> > +msm_submitoverhead@submitbench-500-bos-no-implicit-sync
> > +msm_submitoverhead@submitbench-1000-bos
> > +msm_submitoverhead@submitbench-1000-bos-no-implicit-sync
> > +msm_recovery@hangcheck
> > +msm_recovery@gpu-fault
> > +msm_recovery@gpu-fault-parallel
> > +msm_recovery@iova-fault
> > +msm_submit@empty-submit
> > +msm_submit@invalid-queue-submit
> > +msm_submit@invalid-flags-submit
> > +msm_submit@invalid-in-fence-submit
> > +msm_submit@invalid-duplicate-bo-submit
> > +msm_submit@invalid-cmd-idx-submit
> > +msm_submit@invalid-cmd-type-submit
> > +msm_submit@valid-submit

[PATCH] drm/ci: Add msm tests

2024-01-08 Thread Rob Clark

From: Rob Clark 

The msm tests should skip on non-msm hw, so I think it should be safe to
enable everywhere.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/ci/testlist.txt | 49 +
 1 file changed, 49 insertions(+)

diff --git a/drivers/gpu/drm/ci/testlist.txt b/drivers/gpu/drm/ci/testlist.txt
index f82cd90372f4..eaeb751bb0ad 100644
--- a/drivers/gpu/drm/ci/testlist.txt
+++ b/drivers/gpu/drm/ci/testlist.txt
@@ -2910,3 +2910,52 @@ kms_writeback@writeback-invalid-parameters
 kms_writeback@writeback-fb-id
 kms_writeback@writeback-check-output
 prime_mmap_kms@buffer-sharing
+msm_shrink@copy-gpu-sanitycheck-8
+msm_shrink@copy-gpu-sanitycheck-32
+msm_shrink@copy-gpu-8
+msm_shrink@copy-gpu-32
+msm_shrink@copy-gpu-madvise-8
+msm_shrink@copy-gpu-madvise-32
+msm_shrink@copy-gpu-oom-8
+msm_shrink@copy-gpu-oom-32
+msm_shrink@copy-mmap-sanitycheck-8
+msm_shrink@copy-mmap-sanitycheck-32
+msm_shrink@copy-mmap-8
+msm_shrink@copy-mmap-32
+msm_shrink@copy-mmap-madvise-8
+msm_shrink@copy-mmap-madvise-32
+msm_shrink@copy-mmap-oom-8
+msm_shrink@copy-mmap-oom-32
+msm_shrink@copy-mmap-dmabuf-sanitycheck-8
+msm_shrink@copy-mmap-dmabuf-sanitycheck-32
+msm_shrink@copy-mmap-dmabuf-8
+msm_shrink@copy-mmap-dmabuf-32
+msm_shrink@copy-mmap-dmabuf-madvise-8
+msm_shrink@copy-mmap-dmabuf-madvise-32
+msm_shrink@copy-mmap-dmabuf-oom-8
+msm_shrink@copy-mmap-dmabuf-oom-32
+msm_mapping@ring
+msm_mapping@sqefw
+msm_mapping@shadow
+msm_submitoverhead@submitbench-10-bos
+msm_submitoverhead@submitbench-10-bos-no-implicit-sync
+msm_submitoverhead@submitbench-100-bos
+msm_submitoverhead@submitbench-100-bos-no-implicit-sync
+msm_submitoverhead@submitbench-250-bos
+msm_submitoverhead@submitbench-250-bos-no-implicit-sync
+msm_submitoverhead@submitbench-500-bos
+msm_submitoverhead@submitbench-500-bos-no-implicit-sync
+msm_submitoverhead@submitbench-1000-bos
+msm_submitoverhead@submitbench-1000-bos-no-implicit-sync
+msm_recovery@hangcheck
+msm_recovery@gpu-fault
+msm_recovery@gpu-fault-parallel
+msm_recovery@iova-fault
+msm_submit@empty-submit
+msm_submit@invalid-queue-submit
+msm_submit@invalid-flags-submit
+msm_submit@invalid-in-fence-submit
+msm_submit@invalid-duplicate-bo-submit
+msm_submit@invalid-cmd-idx-submit
+msm_submit@invalid-cmd-type-submit
+msm_submit@valid-submit
-- 
2.43.0

Re: [PATCH 1/2] drm: update drm_show_memory_stats() for dma-bufs

2024-01-08 Thread Rob Clark

On Thu, Dec 7, 2023 at 10:02 AM Alex Deucher  wrote:
>
> Show buffers as shared if they are shared via dma-buf as well
> (e.g., shared with v4l or some other subsystem).
>
> Signed-off-by: Alex Deucher 
> Cc: Rob Clark 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/drm_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 5ddaffd32586..5d5f93b9c263 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -973,7 +973,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
> drm_file *file)
> DRM_GEM_OBJECT_PURGEABLE;
> }
>
> -   if (obj->handle_count > 1) {
> +   if ((obj->handle_count > 1) || obj->dma_buf) {
> status.shared += obj->size;
> } else {
> status.private += obj->size;
> --
> 2.42.0
>

Re: [PATCH] drm/msm/a7xx: Fix LLC typo

2024-01-03 Thread Rob Clark

On Tue, Jan 2, 2024 at 12:12 PM Konrad Dybcio  wrote:
>
> On 2.01.2024 20:33, Rob Clark wrote:
> > From: Rob Clark 
> >
> > We'd miss actually activating LLC.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index a5660d63535b..54dc5eb37f70 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1646,7 +1646,7 @@ static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> >
> >   msm_devfreq_resume(gpu);
> >
> > - adreno_is_a7xx(adreno_gpu) ? a7xx_llc_activate : 
> > a6xx_llc_activate(a6xx_gpu);
> > + adreno_is_a7xx(adreno_gpu) ? a7xx_llc_activate(a6xx_gpu) : 
> > a6xx_llc_activate(a6xx_gpu);
>
> /me cleans glasses
>
> oh..
>
> Reviewed-by: Konrad Dybcio 

I suppose I should also add,

Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support")

> Konrad

[PATCH] drm/msm/a7xx: Fix LLC typo

2024-01-02 Thread Rob Clark

From: Rob Clark 

We'd miss actually activating LLC.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index a5660d63535b..54dc5eb37f70 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1646,7 +1646,7 @@ static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
 
msm_devfreq_resume(gpu);
 
-   adreno_is_a7xx(adreno_gpu) ? a7xx_llc_activate : 
a6xx_llc_activate(a6xx_gpu);
+   adreno_is_a7xx(adreno_gpu) ? a7xx_llc_activate(a6xx_gpu) : 
a6xx_llc_activate(a6xx_gpu);
 
return ret;
 }
-- 
2.43.0

Re: [PATCH] drm/msm/a6xx: Fix recovery vs runpm race

2023-12-22 Thread Rob Clark

On Fri, Dec 22, 2023 at 11:58 AM Akhil P Oommen
 wrote:
>
> On Mon, Dec 18, 2023 at 07:59:24AM -0800, Rob Clark wrote:
> >
> > From: Rob Clark 
> >
> > a6xx_recover() is relying on the gpu lock to serialize against incoming
> > submits doing a runpm get, as it tries to temporarily balance out the
> > runpm gets with puts in order to power off the GPU.  Unfortunately this
> > gets worse when we (in a later patch) will move the runpm get out of the
> > scheduler thread/work to move it out of the fence signaling path.
> >
> > Instead we can just simplify the whole thing by using force_suspend() /
> > force_resume() instead of trying to be clever.
>
> At some places, we take a pm_runtime vote and access the gpu
> registers assuming it will be powered until we drop the vote.  
> a6xx_get_timestamp()
> is an example. If we do a force suspend, it may cause bus errors from
> those threads. Now you have to serialize every place we do runtime_get/put 
> with a
> mutex. Or is there a better way to handle the 'later patch' you
> mentioned?

So I was running into issues, when I started adding an igt test to
stress test recovery vs multi-threaded submit, with cxpd not always
suspending and getting "cx gdsc did not collapse", which may be
related.

I was considering using force_suspend() on the gmu and cxpd if
gpu->hang==true, I'm not sure.  I ran out of time to play with this
when I was in the office.

The issue the 'later patch' is trying to deal with is getting memory
allocations out of the "fence signaling path", ie. out from the
drm/sched kthread/worker.  One way to do that, without dragging all of
runpm/device-link/etc into it is to do the runpm get in the submit
ioctl before enqueuing the job to the scheduler.  But then we can hold
a lock to protect against racing with recovery.

BR,
-R

> -Akhil.
>
> >
> > Reported-by: David Heidelberg 
> > Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10272
> > Fixes: abe2023b4cea ("drm/msm/gpu: Push gpu lock down past runpm")
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++--
> >  1 file changed, 2 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 268737e59131..a5660d63535b 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1244,12 +1244,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >   dev_pm_genpd_add_notifier(gmu->cxpd, >pd_nb);
> >   dev_pm_genpd_synced_poweroff(gmu->cxpd);
> >
> > - /* Drop the rpm refcount from active submits */
> > - if (active_submits)
> > - pm_runtime_put(>pdev->dev);
> > -
> > - /* And the final one from recover worker */
> > - pm_runtime_put_sync(>pdev->dev);
> > + pm_runtime_force_suspend(>pdev->dev);
> >
> >   if (!wait_for_completion_timeout(>pd_gate, 
> > msecs_to_jiffies(1000)))
> >   DRM_DEV_ERROR(>pdev->dev, "cx gdsc didn't collapse\n");
> > @@ -1258,10 +1253,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >
> >   pm_runtime_use_autosuspend(>pdev->dev);
> >
> > - if (active_submits)
> > - pm_runtime_get(>pdev->dev);
> > -
> > - pm_runtime_get_sync(>pdev->dev);
> > + pm_runtime_force_resume(>pdev->dev);
> >
> >   gpu->active_submits = active_submits;
> >   mutex_unlock(>active_lock);
> > --
> > 2.43.0
> >

[PATCH] drm/msm/a6xx: Fix recovery vs runpm race

2023-12-18 Thread Rob Clark

From: Rob Clark 

a6xx_recover() is relying on the gpu lock to serialize against incoming
submits doing a runpm get, as it tries to temporarily balance out the
runpm gets with puts in order to power off the GPU.  Unfortunately this
gets worse when we (in a later patch) will move the runpm get out of the
scheduler thread/work to move it out of the fence signaling path.

Instead we can just simplify the whole thing by using force_suspend() /
force_resume() instead of trying to be clever.

Reported-by: David Heidelberg 
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10272
Fixes: abe2023b4cea ("drm/msm/gpu: Push gpu lock down past runpm")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 268737e59131..a5660d63535b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1244,12 +1244,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
dev_pm_genpd_add_notifier(gmu->cxpd, >pd_nb);
dev_pm_genpd_synced_poweroff(gmu->cxpd);
 
-   /* Drop the rpm refcount from active submits */
-   if (active_submits)
-   pm_runtime_put(>pdev->dev);
-
-   /* And the final one from recover worker */
-   pm_runtime_put_sync(>pdev->dev);
+   pm_runtime_force_suspend(>pdev->dev);
 
if (!wait_for_completion_timeout(>pd_gate, msecs_to_jiffies(1000)))
DRM_DEV_ERROR(>pdev->dev, "cx gdsc didn't collapse\n");
@@ -1258,10 +1253,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
 
pm_runtime_use_autosuspend(>pdev->dev);
 
-   if (active_submits)
-   pm_runtime_get(>pdev->dev);
-
-   pm_runtime_get_sync(>pdev->dev);
+   pm_runtime_force_resume(>pdev->dev);
 
gpu->active_submits = active_submits;
mutex_unlock(>active_lock);
-- 
2.43.0

Re: [PATCH] drm/msm/dpu: Ratelimit framedone timeout msgs

2023-12-11 Thread Rob Clark

On Mon, Dec 11, 2023 at 2:09 PM Marijn Suijten
 wrote:
>
> On 2023-12-11 10:19:55, Rob Clark wrote:
> > From: Rob Clark 
> >
> > When we start getting these, we get a *lot*.  So ratelimit it to not
> > flood dmesg.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >
> > dpu should probably stop rolling it's own trace macros, but that would
> > be a larger cleanup.
>
> That would be lovely, use is currently all over the place.
>
> Should this patch also ratelimit the corresponding:
>
> [drm:dpu_encoder_phys_cmd_prepare_for_kickoff] *ERROR* failed 
> wait_for_idle: id:31 ret:-110 pp:0
>
> On CMD-mode panels?

Probably it should for consistency.  But I think you normally wouldn't
get this error at 60Hz with a cmd mode panel, so probably ok to make
it ratelimited for cmd mode later.

BR,
-R

> Note that this is a prime example of using DRM_ERROR over DPU_ERROR*, 
> resulting
> in unnecessary divergence (and un-readability) between error messages and the
> code (DPU_DEBUG_CMDENC, which has a corresponding DPU_ERROR variant, is also
> used within that function...)
>
> Reviewed-by: Marijn Suijten 
>
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 5 -
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 1 +
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > index 82538844614b..7c22235d0eba 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > @@ -39,6 +39,9 @@
> >  #define DPU_ERROR_ENC(e, fmt, ...) DPU_ERROR("enc%d " fmt,\
> >   (e) ? (e)->base.base.id : -1, ##__VA_ARGS__)
> >
> > +#define DPU_ERROR_ENC_RATELIMITED(e, fmt, ...) 
> > DPU_ERROR_RATELIMITED("enc%d " fmt,\
> > + (e) ? (e)->base.base.id : -1, ##__VA_ARGS__)
> > +
> >  /*
> >   * Two to anticipate panels that can do cmd/vid dynamic switching
> >   * plan is to create all possible physical encoder types, and switch 
> > between
> > @@ -2339,7 +2342,7 @@ static void dpu_encoder_frame_done_timeout(struct 
> > timer_list *t)
> >   return;
> >   }
> >
> > - DPU_ERROR_ENC(dpu_enc, "frame done timeout\n");
> > + DPU_ERROR_ENC_RATELIMITED(dpu_enc, "frame done timeout\n");
> >
> >   event = DPU_ENCODER_FRAME_EVENT_ERROR;
> >   trace_dpu_enc_frame_done_timeout(DRMID(drm_enc), event);
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > index b6f53ca6e962..f5473d4dea92 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > @@ -51,6 +51,7 @@
> >   } while (0)
> >
> >  #define DPU_ERROR(fmt, ...) pr_err("[dpu error]" fmt, ##__VA_ARGS__)
> > +#define DPU_ERROR_RATELIMITED(fmt, ...) pr_err_ratelimited("[dpu error]" 
> > fmt, ##__VA_ARGS__)
> >
> >  /**
> >   * ktime_compare_safe - compare two ktime structures
> > --
> > 2.43.0
> >

[PATCH] drm/msm/dpu: Ratelimit framedone timeout msgs

2023-12-11 Thread Rob Clark

From: Rob Clark 

When we start getting these, we get a *lot*.  So ratelimit it to not
flood dmesg.

Signed-off-by: Rob Clark 
---

dpu should probably stop rolling it's own trace macros, but that would
be a larger cleanup.

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 5 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 82538844614b..7c22235d0eba 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -39,6 +39,9 @@
 #define DPU_ERROR_ENC(e, fmt, ...) DPU_ERROR("enc%d " fmt,\
(e) ? (e)->base.base.id : -1, ##__VA_ARGS__)
 
+#define DPU_ERROR_ENC_RATELIMITED(e, fmt, ...) DPU_ERROR_RATELIMITED("enc%d " 
fmt,\
+   (e) ? (e)->base.base.id : -1, ##__VA_ARGS__)
+
 /*
  * Two to anticipate panels that can do cmd/vid dynamic switching
  * plan is to create all possible physical encoder types, and switch between
@@ -2339,7 +2342,7 @@ static void dpu_encoder_frame_done_timeout(struct 
timer_list *t)
return;
}
 
-   DPU_ERROR_ENC(dpu_enc, "frame done timeout\n");
+   DPU_ERROR_ENC_RATELIMITED(dpu_enc, "frame done timeout\n");
 
event = DPU_ENCODER_FRAME_EVENT_ERROR;
trace_dpu_enc_frame_done_timeout(DRMID(drm_enc), event);
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
index b6f53ca6e962..f5473d4dea92 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
@@ -51,6 +51,7 @@
} while (0)
 
 #define DPU_ERROR(fmt, ...) pr_err("[dpu error]" fmt, ##__VA_ARGS__)
+#define DPU_ERROR_RATELIMITED(fmt, ...) pr_err_ratelimited("[dpu error]" fmt, 
##__VA_ARGS__)
 
 /**
  * ktime_compare_safe - compare two ktime structures
-- 
2.43.0

Re: [PATCH 1/5] drm/msm/adreno: Split up giant device table

2023-12-06 Thread Rob Clark

On Wed, Dec 6, 2023 at 4:29 AM Konrad Dybcio  wrote:
>
>
>
> On 12/5/23 23:03, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Split into a separate table per generation, in preparation to move each
> > gen's device table to it's own file.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/msm/adreno/adreno_device.c | 59 +++---
> >   1 file changed, 51 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > index 41b13dec9bef..36392801f929 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > @@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
> >   MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in 
> > place of IOMMU");
> >   module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
> >
> > -static const struct adreno_info gpulist[] = {
> > +static const struct adreno_info a2xx_gpus[] = {
> >   {
> >   .chip_ids = ADRENO_CHIP_IDS(0x0200),
> >   .family = ADRENO_2XX_GEN1,
> > @@ -55,6 +55,12 @@ static const struct adreno_info gpulist[] = {
> >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> >   .init  = a2xx_gpu_init,
> >   }, {
> > + /* sentinal */
> sentinel?
>
> > + }
> > +};
> > +
> > +static const struct adreno_info a3xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(
> >   0x03000512,
> >   0x03000520
> > @@ -110,6 +116,12 @@ static const struct adreno_info gpulist[] = {
> >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> >   .init  = a3xx_gpu_init,
> >   }, {
> > + /* sentinal */
> > + }
> > +};
> > +
> > +static const struct adreno_info a4xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x04000500),
> >   .family = ADRENO_4XX,
> >   .revn  = 405,
> > @@ -143,6 +155,12 @@ static const struct adreno_info gpulist[] = {
> >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> >   .init  = a4xx_gpu_init,
> >   }, {
> > + /* sentinal */
> > + }
> > +};
> > +
> > +static const struct adreno_info a5xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x05000600),
> >   .family = ADRENO_5XX,
> >   .revn = 506,
> > @@ -268,6 +286,12 @@ static const struct adreno_info gpulist[] = {
> >   .init = a5xx_gpu_init,
> >   .zapfw = "a540_zap.mdt",
> >   }, {
> > + /* sentinal */
> > + }
> > +};
> > +
> > +static const struct adreno_info a6xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x0601),
> >   .family = ADRENO_6XX_GEN1,
> >   .revn = 610,
> > @@ -493,6 +517,12 @@ static const struct adreno_info gpulist[] = {
> >   .hwcg = a690_hwcg,
> >   .address_space_size = SZ_16G,
> >   }, {
> > + /* sentinal */
> > + }
> > +};
> > +
> > +static const struct adreno_info a7xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x07030001),
> >   .family = ADRENO_7XX_GEN1,
> >   .fw = {
> > @@ -522,7 +552,18 @@ static const struct adreno_info gpulist[] = {
> >   .zapfw = "a740_zap.mdt",
> >   .hwcg = a740_hwcg,
> >   .address_space_size = SZ_16G,
> > - },
> > + }, {
> > + /* sentinal */
> > + }
> > +};
> > +
> > +static const struct adreno_info *gpulist[] = {
> > + a2xx_gpus,
> > + a3xx_gpus,
> > + a4xx_gpus,
> > + a5xx_gpus,
> > + a6xx_gpus,
> > + a7xx_gpus,
> >   };
> >
> >   MODULE_FIRMWARE("qcom/a300_pm4.fw");
> > @@ -557,12 +598,14 @@ static const struct adreno_info *adreno_info(uint32_t 
> > chip_id)
> >   {
> >   /* identify gpu: */
> >   for (int i = 0; i < ARRAY_SIZE(gpulist); i++) {
> > - const struct adreno_info *info = [i];
> > - if (info->machine && !of_machine_is_compatible(info->machine))
> > - continue;
> > - for (int j = 0; info->chip_ids[j]; j++)
> I'm not sure using sentinels here is a good idea, it adds a
> whole lot of stack size. Perhaps gpulist could be a struct
> of array pointers and an array of sizes?

I guess you meant text size..

But with 30 devices currently, that array would be (30 + 7) * 8 = 296
bytes.. each sentinel is ~112 bytes (and arguably we could move a bit
more out of adreno_info).  So it isn't that big of a difference.

Being able to have aNxx_info subclass adreno_effort might be a more
compelling reason to go for an array of pointers.  I'd have to see how
awkward that looks.

BR,
-R

[PATCH 5/5] drm/msm/adreno: Move CP_PROTECT settings to hw catalog

2023-12-05 Thread Rob Clark

From: Rob Clark 

Move the CP_PROTECT settings into the hw catalog.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 246 -
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 255 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   2 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  13 ++
 4 files changed, 266 insertions(+), 250 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 3fb9e249567a..b56e43282ce6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -454,6 +454,173 @@ static const struct adreno_reglist a690_hwcg[] = {
{}
 };
 
+/* For a615, a616, a618, a619, a630, a640 and a680 */
+static const u32 a630_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e70, 0x0001),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x11c00, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a630_protect, 32);
+
+/* These are for a620 and a650 */
+static const u32 a650_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x08e80, 0x027f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e60, 0x0011),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0b608, 0x0007),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x18400, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1a800, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1f400, 0x0443),
+   A6XX_PROTECT_RDONLY(0x1f844, 0x007b),
+   A6XX_PROTECT_NORDWR(0x1f887, 0x001b),
+   A6XX_PROTECT_NORDWR(0x1f8c0, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a650_protect, 48);
+
+/* These are for a635 and a660 */
+static const u32 a660_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001

[PATCH 4/5] drm/msm/adreno: Move hwcg table into a6xx specific info

2023-12-05 Thread Rob Clark

From: Rob Clark 

Introduce a6xx_info where we can stash gen specific stuff without
polluting the toplevel adreno_info struct.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 55 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  9 
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  6 ++-
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index a35d4c112a61..3fb9e249567a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,7 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx_gpu.h"
 #include "a6xx.xml.h"
 #include "a6xx_gmu.xml.h"
 
@@ -465,7 +466,9 @@ const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a610_zap.mdt",
-   .hwcg = a612_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a612_hwcg,
+   },
/*
 * There are (at least) three SoCs implementing A610: SM6125
 * (trinket), SM6115 (bengal) and SM6225 (khaje). Trinket does
@@ -492,6 +495,8 @@ const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
+   .a6xx = &(struct a6xx_info) {
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 169, 1 },
@@ -510,7 +515,9 @@ const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 138, 1 },
@@ -529,7 +536,9 @@ const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 190, 1 },
@@ -548,7 +557,9 @@ const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 120, 4 },
@@ -572,7 +583,9 @@ const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a630_zap.mdt",
-   .hwcg = a630_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a630_hwcg,
+   },
}, {
.chip_ids = ADRENO_CHIP_IDS(0x06040001),
.family = ADRENO_6XX_GEN2,
@@ -586,7 +599,9 @@ const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a640_zap.mdt",
-   .hwcg = a640_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a640_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0, 0 },
{ 1, 1 },
@@ -605,7 +620,9 @@ const struct adreno_info a6xx_gpus[] = {
ADRENO_QUIRK_HAS_HW_APRIV,
.init = a6xx_gpu_init,
.zapfw = "a650_zap.mdt",
-   .hwcg = a650_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a650_hwcg,
+   },
.address_space_size = SZ_16G,
.speedbins = ADRENO_SPEEDBINS(
{ 0, 0 },
@@ -627,7 +644,9 @@ const struct adreno_info a6xx_gpus[] = {
ADRENO_QUIRK_HAS_HW_APRIV,
.init = a6xx_gpu_init,
.zapfw = "a660_zap.mdt",
-   .hwcg = a660_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a660_hwcg,
+   },

[PATCH 3/5] drm/msm/adreno: Move hwcg regs to a6xx hw catalog

2023-12-05 Thread Rob Clark

From: Rob Clark 

Move the hwcg tables into the hw catalog.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 560 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 558 -
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   3 -
 3 files changed, 560 insertions(+), 561 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 5c1199eab82b..a35d4c112a61 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,451 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx.xml.h"
+#include "a6xx_gmu.xml.h"
+
+static const struct adreno_reglist a612_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0081},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0xf3cf},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x0120},
+   {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x2220},
+   {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x0011},
+   {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x0422},
+   {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x4000},
+   {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x0200},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+   {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x0004},
+   {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x0002},
+   {REG_A6XX_RBBM_ISDB_CNT, 0x0182},
+   {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x},
+   {REG_A6XX_RBBM_SP_HYST_CNT, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
+   {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
+   {},
+};
+
+/* For a615 family (a615, a616, a618 and a619) */
+static const struct adreno_reglist a615_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0080},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0,  0xF3CF},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP1,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP1, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP1,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP1, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP1, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE,  0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_UCHE, 0x},
+   {REG_A6XX_RBB

[PATCH 2/5] drm/msm/adreno: Split catalog into separate files

2023-12-05 Thread Rob Clark

From: Rob Clark 

Split each gen's gpu table into it's own file.  Only code-motion, no
functional change.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/Makefile   |   5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |  53 ++
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |  75 +++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |  51 ++
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  | 145 ++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 285 +++
 drivers/gpu/drm/msm/adreno/adreno_device.c | 570 +
 7 files changed, 620 insertions(+), 564 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 49671364fdcf..32f2fd980452 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -7,12 +7,17 @@ ccflags-$(CONFIG_DRM_MSM_DP) += -I $(srctree)/$(src)/dp
 msm-y := \
adreno/adreno_device.o \
adreno/adreno_gpu.o \
+   adreno/a2xx_catalog.o \
adreno/a2xx_gpu.o \
+   adreno/a3xx_catalog.o \
adreno/a3xx_gpu.o \
+   adreno/a4xx_catalog.o \
adreno/a4xx_gpu.o \
+   adreno/a5xx_catalog.o \
adreno/a5xx_gpu.o \
adreno/a5xx_power.o \
adreno/a5xx_preempt.o \
+   adreno/a6xx_catalog.o \
adreno/a6xx_gpu.o \
adreno/a6xx_gmu.o \
adreno/a6xx_hfi.o \
diff --git a/drivers/gpu/drm/msm/adreno/a2xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
new file mode 100644
index ..1a4d182279fc
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+const struct adreno_info a2xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x0200),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 200,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_256K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, { /* a200 on i.mx51 has only 128kib gmem */
+   .chip_ids = ADRENO_CHIP_IDS(0x0201),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 201,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x0202),
+   .family = ADRENO_2XX_GEN2,
+   .revn  = 220,
+   .fw = {
+   [ADRENO_FW_PM4] = "leia_pm4_470.fw",
+   [ADRENO_FW_PFP] = "leia_pfp_470.fw",
+   },
+   .gmem  = SZ_512K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, {
+   /* sentinal */
+   }
+};
+
+MODULE_FIRMWARE("qcom/leia_pfp_470.fw");
+MODULE_FIRMWARE("qcom/leia_pm4_470.fw");
+MODULE_FIRMWARE("qcom/yamato_pfp.fw");
+MODULE_FIRMWARE("qcom/yamato_pm4.fw");
\ No newline at end of file
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
new file mode 100644
index ..1f1fa70c5e5e
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+const struct adreno_info a3xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(
+   0x03000512,
+   0x03000520
+   ),
+   .family = ADRENO_3XX,
+   .revn  = 305,
+   .fw = {
+   [ADRENO_FW_PM4] = "a300_pm4.fw",
+   [ADRENO_FW_PFP] = "a300_pfp.fw",
+   },
+   .gmem  = SZ_256K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a3xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000600),
+   .family = ADRE

[PATCH 1/5] drm/msm/adreno: Split up giant device table

2023-12-05 Thread Rob Clark

From: Rob Clark 

Split into a separate table per generation, in preparation to move each
gen's device table to it's own file.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 59 +++---
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 41b13dec9bef..36392801f929 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
 MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in place of 
IOMMU");
 module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
 
-static const struct adreno_info gpulist[] = {
+static const struct adreno_info a2xx_gpus[] = {
{
.chip_ids = ADRENO_CHIP_IDS(0x0200),
.family = ADRENO_2XX_GEN1,
@@ -55,6 +55,12 @@ static const struct adreno_info gpulist[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a2xx_gpu_init,
}, {
+   /* sentinal */
+   }
+};
+
+static const struct adreno_info a3xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(
0x03000512,
0x03000520
@@ -110,6 +116,12 @@ static const struct adreno_info gpulist[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a3xx_gpu_init,
}, {
+   /* sentinal */
+   }
+};
+
+static const struct adreno_info a4xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x04000500),
.family = ADRENO_4XX,
.revn  = 405,
@@ -143,6 +155,12 @@ static const struct adreno_info gpulist[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a4xx_gpu_init,
}, {
+   /* sentinal */
+   }
+};
+
+static const struct adreno_info a5xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x05000600),
.family = ADRENO_5XX,
.revn = 506,
@@ -268,6 +286,12 @@ static const struct adreno_info gpulist[] = {
.init = a5xx_gpu_init,
.zapfw = "a540_zap.mdt",
}, {
+   /* sentinal */
+   }
+};
+
+static const struct adreno_info a6xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x0601),
.family = ADRENO_6XX_GEN1,
.revn = 610,
@@ -493,6 +517,12 @@ static const struct adreno_info gpulist[] = {
.hwcg = a690_hwcg,
.address_space_size = SZ_16G,
}, {
+   /* sentinal */
+   }
+};
+
+static const struct adreno_info a7xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x07030001),
.family = ADRENO_7XX_GEN1,
.fw = {
@@ -522,7 +552,18 @@ static const struct adreno_info gpulist[] = {
.zapfw = "a740_zap.mdt",
.hwcg = a740_hwcg,
.address_space_size = SZ_16G,
-   },
+   }, {
+   /* sentinal */
+   }
+};
+
+static const struct adreno_info *gpulist[] = {
+   a2xx_gpus,
+   a3xx_gpus,
+   a4xx_gpus,
+   a5xx_gpus,
+   a6xx_gpus,
+   a7xx_gpus,
 };
 
 MODULE_FIRMWARE("qcom/a300_pm4.fw");
@@ -557,12 +598,14 @@ static const struct adreno_info *adreno_info(uint32_t 
chip_id)
 {
/* identify gpu: */
for (int i = 0; i < ARRAY_SIZE(gpulist); i++) {
-   const struct adreno_info *info = [i];
-   if (info->machine && !of_machine_is_compatible(info->machine))
-   continue;
-   for (int j = 0; info->chip_ids[j]; j++)
-   if (info->chip_ids[j] == chip_id)
-   return info;
+   for (int j = 0; gpulist[i][j].chip_ids; j++) {
+   const struct adreno_info *info = [i][j];
+   if (info->machine && 
!of_machine_is_compatible(info->machine))
+   continue;
+   for (int k = 0; info->chip_ids[k]; k++)
+   if (info->chip_ids[k] == chip_id)
+   return info;
+   }
}
 
return NULL;
-- 
2.42.0

[PATCH 0/5] drm/msm/adreno: Introduce/rework device hw catalog

2023-12-05 Thread Rob Clark

From: Rob Clark 

Split the single flat gpulist table into per-gen tables that exist in
their own per-gen files, and start moving more info into the device
table.  This at least gets all the big tables of register settings out
of the heart of the a6xx_gpu code.  Probably more could be moved, to
remove at least some of the per-gen if/else ladders, but this seemed
like a reasonably good start.

Rob Clark (5):
  drm/msm/adreno: Split up giant device table
  drm/msm/adreno: Split catalog into separate files
  drm/msm/adreno: Move hwcg regs to a6xx hw catalog
  drm/msm/adreno: Move hwcg table into a6xx specific info
  drm/msm/adreno: Move CP_PROTECT settings to hw catalog

 drivers/gpu/drm/msm/Makefile   |5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |   53 +
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |   75 ++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |   51 +
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  |  145 +++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 1118 
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  817 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |   11 +
 drivers/gpu/drm/msm/adreno/adreno_device.c |  559 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|   22 +-
 10 files changed, 1506 insertions(+), 1350 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

-- 
2.42.0

[PATCH] drm/msm: Expose syncobj timeline support

2023-12-05 Thread Rob Clark

From: Rob Clark 

This does unfortunately require a mesa fix to avoid turnip hanging, but
we don't have a good way to know the userspace version.  Fortunately
that fix is now in mesa-23.3.0-rc3 and later[1].

[1] 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/2bd7e293bfed5d2956a5dcb3e17555d0f6817986

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c|  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c | 13 +
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 50b65ffc24b1..c7ac2c0a7e27 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -893,6 +893,7 @@ static const struct drm_driver msm_driver = {
DRIVER_RENDER |
DRIVER_ATOMIC |
DRIVER_MODESET |
+   DRIVER_SYNCOBJ_TIMELINE |
DRIVER_SYNCOBJ,
.open   = msm_open,
.postclose  = msm_postclose,
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 40878c26a749..9cffa4b50c39 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -511,12 +511,6 @@ static struct drm_syncobj **msm_parse_deps(struct 
msm_gem_submit *submit,
break;
}
 
-   if (syncobj_desc.point &&
-   !drm_core_check_feature(submit->dev, 
DRIVER_SYNCOBJ_TIMELINE)) {
-   ret = -EOPNOTSUPP;
-   break;
-   }
-
if (syncobj_desc.flags & ~MSM_SUBMIT_SYNCOBJ_FLAGS) {
ret = -EINVAL;
break;
@@ -593,12 +587,6 @@ static struct msm_submit_post_dep 
*msm_parse_post_deps(struct drm_device *dev,
}
 
if (syncobj_desc.point) {
-   if (!drm_core_check_feature(dev,
-   DRIVER_SYNCOBJ_TIMELINE)) {
-   ret = -EOPNOTSUPP;
-   break;
-   }
-
post_deps[i].chain = dma_fence_chain_alloc();
if (!post_deps[i].chain) {
ret = -ENOMEM;
@@ -617,6 +605,7 @@ static struct msm_submit_post_dep 
*msm_parse_post_deps(struct drm_device *dev,
if (ret) {
for (j = 0; j <= i; ++j) {
dma_fence_chain_free(post_deps[j].chain);
+   post_deps[j].chain = NULL;
if (post_deps[j].syncobj)
drm_syncobj_put(post_deps[j].syncobj);
}
-- 
2.42.0

[PATCH] drm/scheduler: Unwrap job dependencies

2023-12-05 Thread Rob Clark

From: Rob Clark 

Container fences have burner contexts, which makes the trick to store at
most one fence per context somewhat useless if we don't unwrap array or
chain fences.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/scheduler/sched_main.c | 47 ++
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 9762464e3f99..16b550949c57 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -684,27 +685,14 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 }
 EXPORT_SYMBOL(drm_sched_job_arm);
 
-/**
- * drm_sched_job_add_dependency - adds the fence as a job dependency
- * @job: scheduler job to add the dependencies to
- * @fence: the dma_fence to add to the list of dependencies.
- *
- * Note that @fence is consumed in both the success and error cases.
- *
- * Returns:
- * 0 on success, or an error on failing to expand the array.
- */
-int drm_sched_job_add_dependency(struct drm_sched_job *job,
-struct dma_fence *fence)
+static int drm_sched_job_add_single_dependency(struct drm_sched_job *job,
+  struct dma_fence *fence)
 {
struct dma_fence *entry;
unsigned long index;
u32 id = 0;
int ret;
 
-   if (!fence)
-   return 0;
-
/* Deduplicate if we already depend on a fence from the same context.
 * This lets the size of the array of deps scale with the number of
 * engines involved, rather than the number of BOs.
@@ -728,6 +716,35 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
 
return ret;
 }
+
+/**
+ * drm_sched_job_add_dependency - adds the fence as a job dependency
+ * @job: scheduler job to add the dependencies to
+ * @fence: the dma_fence to add to the list of dependencies.
+ *
+ * Note that @fence is consumed in both the success and error cases.
+ *
+ * Returns:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_add_dependency(struct drm_sched_job *job,
+struct dma_fence *fence)
+{
+   struct dma_fence_unwrap iter;
+   struct dma_fence *f;
+   int ret = 0;
+
+   dma_fence_unwrap_for_each (f, , fence) {
+   dma_fence_get(f);
+   ret = drm_sched_job_add_single_dependency(job, f);
+   if (ret)
+   break;
+   }
+
+   dma_fence_put(fence);
+
+   return ret;
+}
 EXPORT_SYMBOL(drm_sched_job_add_dependency);
 
 /**
-- 
2.42.0

Re: [RFC] drm/scheduler: Unwrap job dependencies

2023-12-05 Thread Rob Clark

On Tue, Dec 5, 2023 at 8:56 AM Rob Clark  wrote:
>
> On Tue, Dec 5, 2023 at 7:58 AM Christian König  
> wrote:
> >
> > Am 05.12.23 um 16:41 schrieb Rob Clark:
> > > On Mon, Dec 4, 2023 at 10:46 PM Christian König
> > >  wrote:
> > >> Am 04.12.23 um 22:54 schrieb Rob Clark:
> > >>> On Thu, Mar 23, 2023 at 2:30 PM Rob Clark  wrote:
> > >>>> [SNIP]
> > >>> So, this patch turns out to blow up spectacularly with dma_fence
> > >>> refcnt underflows when I enable DRIVER_SYNCOBJ_TIMELINE .. I think,
> > >>> because it starts unwrapping fence chains, possibly in parallel with
> > >>> fence signaling on the retire path.  Is it supposed to be permissible
> > >>> to unwrap a fence chain concurrently?
> > >> The DMA-fence chain object and helper functions were designed so that
> > >> concurrent accesses to all elements are always possible.
> > >>
> > >> See dma_fence_chain_walk() and dma_fence_chain_get_prev() for example.
> > >> dma_fence_chain_walk() starts with a reference to the current fence (the
> > >> anchor of the walk) and tries to grab an up to date reference on the
> > >> previous fence in the chain. Only after that reference is successfully
> > >> acquired we drop the reference to the anchor where we started.
> > >>
> > >> Same for dma_fence_array_first(), dma_fence_array_next(). Here we hold a
> > >> reference to the array which in turn holds references to each fence
> > >> inside the array until it is destroyed itself.
> > >>
> > >> When this blows up we have somehow mixed up the references somewhere.
> > > That's what it looked like to me, but wanted to make sure I wasn't
> > > overlooking something subtle.  And in this case, the fence actually
> > > should be the syncobj timeline point fence, not the fence chain.
> > > Virtgpu has essentially the same logic (there we really do want to
> > > unwrap fences so we can pass host fences back to host rather than
> > > waiting in guest), I'm not sure if it would blow up in the same way.
> >
> > Well do you have a backtrace of what exactly happens?
> >
> > Maybe we have some _put() before _get() or something like this.
>
> I hacked up something to store the backtrace in dma_fence_release()
> (and leak the block so the backtrace would still be around later when
> dma_fence_get/put was later called) and ended up with:
>
> [  152.811360] freed at:
> [  152.813718]  dma_fence_release+0x30/0x134
> [  152.817865]  dma_fence_put+0x38/0x98 [gpu_sched]
> [  152.822657]  drm_sched_job_add_dependency+0x160/0x18c [gpu_sched]
> [  152.828948]  drm_sched_job_add_syncobj_dependency+0x58/0x88 [gpu_sched]
> [  152.835770]  msm_ioctl_gem_submit+0x580/0x1160 [msm]
> [  152.841070]  drm_ioctl_kernel+0xec/0x16c
> [  152.845132]  drm_ioctl+0x2e8/0x3f4
> [  152.848646]  vfs_ioctl+0x30/0x50
> [  152.851982]  __arm64_sys_ioctl+0x80/0xb4
> [  152.856039]  invoke_syscall+0x8c/0x120
> [  152.859919]  el0_svc_common.constprop.0+0xc0/0xdc
> [  152.864777]  do_el0_svc+0x24/0x30
> [  152.868207]  el0_svc+0x8c/0xd8
> [  152.871365]  el0t_64_sync_handler+0x84/0x12c
> [  152.875771]  el0t_64_sync+0x190/0x194
>
> I suppose that doesn't guarantee that this was the problematic put.
> But dropping this patch to unwrap the fence makes the problem go
> away..

Oh, hmm, _add_dependency() is consuming the fence reference

BR,
-R

> BR,
> -R
>
> > Thanks,
> > Christian.
> >
> > >
> > > BR,
> > > -R
> > >
> > >> Regards,
> > >> Christian.
> > >>
> > >>> BR,
> > >>> -R
> >

Re: [RFC] drm/scheduler: Unwrap job dependencies

2023-12-05 Thread Rob Clark

On Tue, Dec 5, 2023 at 7:58 AM Christian König  wrote:
>
> Am 05.12.23 um 16:41 schrieb Rob Clark:
> > On Mon, Dec 4, 2023 at 10:46 PM Christian König
> >  wrote:
> >> Am 04.12.23 um 22:54 schrieb Rob Clark:
> >>> On Thu, Mar 23, 2023 at 2:30 PM Rob Clark  wrote:
> >>>> [SNIP]
> >>> So, this patch turns out to blow up spectacularly with dma_fence
> >>> refcnt underflows when I enable DRIVER_SYNCOBJ_TIMELINE .. I think,
> >>> because it starts unwrapping fence chains, possibly in parallel with
> >>> fence signaling on the retire path.  Is it supposed to be permissible
> >>> to unwrap a fence chain concurrently?
> >> The DMA-fence chain object and helper functions were designed so that
> >> concurrent accesses to all elements are always possible.
> >>
> >> See dma_fence_chain_walk() and dma_fence_chain_get_prev() for example.
> >> dma_fence_chain_walk() starts with a reference to the current fence (the
> >> anchor of the walk) and tries to grab an up to date reference on the
> >> previous fence in the chain. Only after that reference is successfully
> >> acquired we drop the reference to the anchor where we started.
> >>
> >> Same for dma_fence_array_first(), dma_fence_array_next(). Here we hold a
> >> reference to the array which in turn holds references to each fence
> >> inside the array until it is destroyed itself.
> >>
> >> When this blows up we have somehow mixed up the references somewhere.
> > That's what it looked like to me, but wanted to make sure I wasn't
> > overlooking something subtle.  And in this case, the fence actually
> > should be the syncobj timeline point fence, not the fence chain.
> > Virtgpu has essentially the same logic (there we really do want to
> > unwrap fences so we can pass host fences back to host rather than
> > waiting in guest), I'm not sure if it would blow up in the same way.
>
> Well do you have a backtrace of what exactly happens?
>
> Maybe we have some _put() before _get() or something like this.

I hacked up something to store the backtrace in dma_fence_release()
(and leak the block so the backtrace would still be around later when
dma_fence_get/put was later called) and ended up with:

[  152.811360] freed at:
[  152.813718]  dma_fence_release+0x30/0x134
[  152.817865]  dma_fence_put+0x38/0x98 [gpu_sched]
[  152.822657]  drm_sched_job_add_dependency+0x160/0x18c [gpu_sched]
[  152.828948]  drm_sched_job_add_syncobj_dependency+0x58/0x88 [gpu_sched]
[  152.835770]  msm_ioctl_gem_submit+0x580/0x1160 [msm]
[  152.841070]  drm_ioctl_kernel+0xec/0x16c
[  152.845132]  drm_ioctl+0x2e8/0x3f4
[  152.848646]  vfs_ioctl+0x30/0x50
[  152.851982]  __arm64_sys_ioctl+0x80/0xb4
[  152.856039]  invoke_syscall+0x8c/0x120
[  152.859919]  el0_svc_common.constprop.0+0xc0/0xdc
[  152.864777]  do_el0_svc+0x24/0x30
[  152.868207]  el0_svc+0x8c/0xd8
[  152.871365]  el0t_64_sync_handler+0x84/0x12c
[  152.875771]  el0t_64_sync+0x190/0x194

I suppose that doesn't guarantee that this was the problematic put.
But dropping this patch to unwrap the fence makes the problem go
away..

BR,
-R

> Thanks,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> BR,
> >>> -R
>

Re: [RFC] drm/scheduler: Unwrap job dependencies

2023-12-05 Thread Rob Clark

On Mon, Dec 4, 2023 at 10:46 PM Christian König
 wrote:
>
> Am 04.12.23 um 22:54 schrieb Rob Clark:
> > On Thu, Mar 23, 2023 at 2:30 PM Rob Clark  wrote:
> >> [SNIP]
> > So, this patch turns out to blow up spectacularly with dma_fence
> > refcnt underflows when I enable DRIVER_SYNCOBJ_TIMELINE .. I think,
> > because it starts unwrapping fence chains, possibly in parallel with
> > fence signaling on the retire path.  Is it supposed to be permissible
> > to unwrap a fence chain concurrently?
>
> The DMA-fence chain object and helper functions were designed so that
> concurrent accesses to all elements are always possible.
>
> See dma_fence_chain_walk() and dma_fence_chain_get_prev() for example.
> dma_fence_chain_walk() starts with a reference to the current fence (the
> anchor of the walk) and tries to grab an up to date reference on the
> previous fence in the chain. Only after that reference is successfully
> acquired we drop the reference to the anchor where we started.
>
> Same for dma_fence_array_first(), dma_fence_array_next(). Here we hold a
> reference to the array which in turn holds references to each fence
> inside the array until it is destroyed itself.
>
> When this blows up we have somehow mixed up the references somewhere.

That's what it looked like to me, but wanted to make sure I wasn't
overlooking something subtle.  And in this case, the fence actually
should be the syncobj timeline point fence, not the fence chain.
Virtgpu has essentially the same logic (there we really do want to
unwrap fences so we can pass host fences back to host rather than
waiting in guest), I'm not sure if it would blow up in the same way.

BR,
-R

> Regards,
> Christian.
>
> >
> > BR,
> > -R

Re: [RFC] drm/scheduler: Unwrap job dependencies

2023-12-04 Thread Rob Clark

On Thu, Mar 23, 2023 at 2:30 PM Rob Clark  wrote:
>
> On Thu, Mar 23, 2023 at 7:03 AM Christian König
>  wrote:
> >
> > Am 23.03.23 um 14:54 schrieb Rob Clark:
> > > On Thu, Mar 23, 2023 at 12:35 AM Christian König
> > >  wrote:
> > >> Am 22.03.23 um 23:44 schrieb Rob Clark:
> > >>> From: Rob Clark 
> > >>>
> > >>> Container fences have burner contexts, which makes the trick to store at
> > >>> most one fence per context somewhat useless if we don't unwrap array or
> > >>> chain fences.
> > >> Mhm, we intentionally kept them not unwrapped since this way they only
> > >> occupy one fence slot.
> > >>
> > >> But it might be better to unwrap them if you add many of those 
> > >> dependencies.
> > >>
> > >>> Signed-off-by: Rob Clark 
> > >>> ---
> > >>> tbh, I'm not sure why we weren't doing this already, unless there is
> > >>> something I'm overlooking
> > >>>
> > >>>drivers/gpu/drm/scheduler/sched_main.c | 43 
> > >>> +-
> > >>>1 file changed, 28 insertions(+), 15 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > >>> b/drivers/gpu/drm/scheduler/sched_main.c
> > >>> index c2ee44d6224b..f59e5335afbb 100644
> > >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> > >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > >>> @@ -41,20 +41,21 @@
> > >>> * 4. Entities themselves maintain a queue of jobs that will be 
> > >>> scheduled on
> > >>> *the hardware.
> > >>> *
> > >>> * The jobs in a entity are always scheduled in the order that they 
> > >>> were pushed.
> > >>> */
> > >>>
> > >>>#include 
> > >>>#include 
> > >>>#include 
> > >>>#include 
> > >>> +#include 
> > >>>#include 
> > >>>#include 
> > >>>
> > >>>#include 
> > >>>#include 
> > >>>#include 
> > >>>#include 
> > >>>
> > >>>#define CREATE_TRACE_POINTS
> > >>>#include "gpu_scheduler_trace.h"
> > >>> @@ -665,41 +666,27 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> > >>>sched = entity->rq->sched;
> > >>>
> > >>>job->sched = sched;
> > >>>job->s_priority = entity->rq - sched->sched_rq;
> > >>>job->id = atomic64_inc_return(>job_id_count);
> > >>>
> > >>>drm_sched_fence_init(job->s_fence, job->entity);
> > >>>}
> > >>>EXPORT_SYMBOL(drm_sched_job_arm);
> > >>>
> > >>> -/**
> > >>> - * drm_sched_job_add_dependency - adds the fence as a job dependency
> > >>> - * @job: scheduler job to add the dependencies to
> > >>> - * @fence: the dma_fence to add to the list of dependencies.
> > >>> - *
> > >>> - * Note that @fence is consumed in both the success and error cases.
> > >>> - *
> > >>> - * Returns:
> > >>> - * 0 on success, or an error on failing to expand the array.
> > >>> - */
> > >>> -int drm_sched_job_add_dependency(struct drm_sched_job *job,
> > >>> -  struct dma_fence *fence)
> > >>> +static int _add_dependency(struct drm_sched_job *job, struct dma_fence 
> > >>> *fence)
> > >> Please keep the drm_sched_job_ prefix here even for static functions.
> > >> The symbol _add_dependency just sucks in a backtrace, especially when
> > >> it's tail optimized.
> > >>
> > >>>{
> > >>>struct dma_fence *entry;
> > >>>unsigned long index;
> > >>>u32 id = 0;
> > >>>int ret;
> > >>>
> > >>> - if (!fence)
> > >>> - return 0;
> > >>> -
> > >>>/* Deduplicate if we already depend on a fence from the same 
> > >>> context.
> > >>> * This lets the size of the array of deps scale with the number 
>

[PATCH] drm/msm/dpu: Correct UBWC settings for sc8280xp

2023-11-30 Thread Rob Clark

From: Rob Clark 

The UBWC settings need to match between the display and GPU.  When we
updated the GPU settings, we forgot to make the corresponding update on
the display side.

Reported-by: Steev Klimaszewski 
Fixes: 07e6de738aa6 ("drm/msm/a690: Fix reg values for a690")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_mdss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
index 6865db1e3ce8..29bb38f0bb2c 100644
--- a/drivers/gpu/drm/msm/msm_mdss.c
+++ b/drivers/gpu/drm/msm/msm_mdss.c
@@ -545,7 +545,7 @@ static const struct msm_mdss_data sc8280xp_data = {
.ubwc_dec_version = UBWC_4_0,
.ubwc_swizzle = 6,
.ubwc_static = 1,
-   .highest_bank_bit = 2,
+   .highest_bank_bit = 3,
.macrotile_mode = 1,
 };
 
-- 
2.42.0

Re: [PATCH] drm/amdgpu: add shared fdinfo stats

2023-11-30 Thread Rob Clark

On Thu, Nov 30, 2023 at 5:13 AM Christian König
 wrote:
>
> Am 28.11.23 um 18:52 schrieb Rob Clark:
> > On Tue, Nov 28, 2023 at 6:28 AM Alex Deucher  wrote:
> >> On Tue, Nov 28, 2023 at 9:17 AM Christian König
> >>  wrote:
> >>> Am 17.11.23 um 20:56 schrieb Alex Deucher:
> >>>> Add shared stats.  Useful for seeing shared memory.
> >>>>
> >>>> Signed-off-by: Alex Deucher 
> >>>> ---
> >>>>drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
> >>>>drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
> >>>>drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
> >>>>3 files changed, 21 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >>>> index 5706b282a0c7..c7df7fa3459f 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> >>>> @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
> >>>> drm_file *file)
> >>>>   stats.requested_visible_vram/1024UL);
> >>>>drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
> >>>>   stats.requested_gtt/1024UL);
> >>>> + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", 
> >>>> stats.vram_shared/1024UL);
> >>>> + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", 
> >>>> stats.gtt_shared/1024UL);
> >>>> + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", 
> >>>> stats.cpu_shared/1024UL);
> >>>> +
> >>>>for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> >>>>if (!usage[hw_ip])
> >>>>continue;
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>> index d79b4ca1ecfc..c24f7b2c04c1 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>> @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
> >>>>  struct amdgpu_mem_stats *stats)
> >>>>{
> >>>>uint64_t size = amdgpu_bo_size(bo);
> >>>> + struct drm_gem_object *obj;
> >>>>unsigned int domain;
> >>>> + bool shared;
> >>>>
> >>>>/* Abort if the BO doesn't currently have a backing store */
> >>>>if (!bo->tbo.resource)
> >>>>return;
> >>>>
> >>>> + obj = >tbo.base;
> >>>> + shared = obj->handle_count > 1;
> >>> Interesting approach but I don't think that this is correct.
> >>>
> >>> The handle_count is basically how many GEM handles are there for BO, so
> >>> for example it doesn't catch sharing things with V4L.
> >>>
> >>> What we should probably rather do is to take a look if
> >>> bo->tbo.base.dma_buf is NULL or not.
> >> +Rob, dri-devel
> >>
> >> This is what the generic drm helper code does.  See
> >> drm_show_memory_stats().  If that is not correct that code should
> >> probably be fixed too.
> > OTOH, v4l doesn't expose fdinfo.  What "shared" is telling you is
> > whether the BO is counted multiple times when you look at all
> > processes fdinfo.
>
> Oh, then that's not fully correct either.
>
> You can have multiple handles for the same GEM object in a single client
> as well.
>
> This for example happens when you interact with KMS to get an handle for
> a displayed BO.

so, the handle is unique per drm_file which is (at least usually)
unique per process.  The handle_count is agnostic to _how_ you got the
handle (ie. via flink or dma-buf)

> DRM flink was one of the major other reasons, but I hope we are not
> using that widely any more.
>
> What exactly is the purpose? To avoid counting a BO multiple times
> because you go over the handles in the common code?
>
> If yes than I would say use obj->handle_count in the common code and
> ob->dma_buf in amdgpu because that is certainly unique.

Because the drm_file is (usually) unique per process, the purpose was
to show the amount of memory that is

Re: [PATCH] drm/amdgpu: add shared fdinfo stats

2023-11-28 Thread Rob Clark

On Tue, Nov 28, 2023 at 6:28 AM Alex Deucher  wrote:
>
> On Tue, Nov 28, 2023 at 9:17 AM Christian König
>  wrote:
> >
> > Am 17.11.23 um 20:56 schrieb Alex Deucher:
> > > Add shared stats.  Useful for seeing shared memory.
> > >
> > > Signed-off-by: Alex Deucher 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
> > >   3 files changed, 21 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > index 5706b282a0c7..c7df7fa3459f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
> > > drm_file *file)
> > >  stats.requested_visible_vram/1024UL);
> > >   drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
> > >  stats.requested_gtt/1024UL);
> > > + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", 
> > > stats.vram_shared/1024UL);
> > > + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", 
> > > stats.gtt_shared/1024UL);
> > > + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", 
> > > stats.cpu_shared/1024UL);
> > > +
> > >   for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> > >   if (!usage[hw_ip])
> > >   continue;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index d79b4ca1ecfc..c24f7b2c04c1 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
> > > struct amdgpu_mem_stats *stats)
> > >   {
> > >   uint64_t size = amdgpu_bo_size(bo);
> > > + struct drm_gem_object *obj;
> > >   unsigned int domain;
> > > + bool shared;
> > >
> > >   /* Abort if the BO doesn't currently have a backing store */
> > >   if (!bo->tbo.resource)
> > >   return;
> > >
> > > + obj = >tbo.base;
> > > + shared = obj->handle_count > 1;
> >
> > Interesting approach but I don't think that this is correct.
> >
> > The handle_count is basically how many GEM handles are there for BO, so
> > for example it doesn't catch sharing things with V4L.
> >
> > What we should probably rather do is to take a look if
> > bo->tbo.base.dma_buf is NULL or not.
>
> +Rob, dri-devel
>
> This is what the generic drm helper code does.  See
> drm_show_memory_stats().  If that is not correct that code should
> probably be fixed too.

OTOH, v4l doesn't expose fdinfo.  What "shared" is telling you is
whether the BO is counted multiple times when you look at all
processes fdinfo.

But I guess it would be ok to look for obj->handle_count > 1 || obj->dma_buf

BR,
-R

>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> >
> > > +
> > >   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
> > >   switch (domain) {
> > >   case AMDGPU_GEM_DOMAIN_VRAM:
> > >   stats->vram += size;
> > >   if (amdgpu_bo_in_cpu_visible_vram(bo))
> > >   stats->visible_vram += size;
> > > + if (shared)
> > > + stats->vram_shared += size;
> > >   break;
> > >   case AMDGPU_GEM_DOMAIN_GTT:
> > >   stats->gtt += size;
> > > + if (shared)
> > > + stats->gtt_shared += size;
> > >   break;
> > >   case AMDGPU_GEM_DOMAIN_CPU:
> > >   default:
> > >   stats->cpu += size;
> > > + if (shared)
> > > + stats->cpu_shared += size;
> > >   break;
> > >   }
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > index d28e21baef16..0503af75dc26 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > @@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
> > >   struct amdgpu_mem_stats {
> > >   /* current VRAM usage, includes visible VRAM */
> > >   uint64_t vram;
> > > + /* current shared VRAM usage, includes visible VRAM */
> > > + uint64_t vram_shared;
> > >   /* current visible VRAM usage */
> > >   uint64_t visible_vram;
> > >   /* current GTT usage */
> > >   uint64_t gtt;
> > > + /* current shared GTT usage */
> > > + uint64_t gtt_shared;
> > >   /* current system memory usage */
> > >   uint64_t cpu;
> > > + /* current shared system memory usage */
> > > + uint64_t cpu_shared;
> > >   /* sum of evicted buffers, includes visible VRAM */
> > >   uint64_t evicted_vram;
> > >   /* sum of evicted buffers due to CPU access */
> >

Re: [PATCH v2] Remove custom dumb_map_offset implementation in msm driver

2023-11-27 Thread Rob Clark

On Tue, Nov 21, 2023 at 5:14 AM Dmitry Baryshkov
 wrote:
>
> On Tue, 21 Nov 2023 at 04:26, Rob Clark  wrote:
> >
> > On Wed, Nov 15, 2023 at 11:33 AM Dmitry Baryshkov
> >  wrote:
> > >
> > > On Wed, 15 Nov 2023 at 20:46, Dipam Turkar  wrote:
> > > >
> > > > They are not outdated, my bad. I went through the locks' code and saw 
> > > > that they have been updated. But they are probably not necessary here 
> > > > as most of the drivers do not use any form of locking in their 
> > > > implementations. The generic implementations drm_gem_dumb_map_offset() 
> > > > and drm_gem_ttm_dumb_map_offset() do not have any locking mechanisms 
> > > > either.
> > >
> > > Excuse me, but this doesn't sound right to me. There are different
> > > drivers with different implementations. So either we'd need a good
> > > explanation of why it is not necessary, or this patch is NAKed.
> >
> > Digging a bit thru history, it looks like commit 0de23977cfeb
> > ("drm/gem: convert to new unified vma manager") made external locking
> > unnecessary, since the vma mgr already had it's own internal locking.
>
> So, should we drop our own locking system?

specifically for _just_ vma_offset_manager/vma_node, we could.  But I
think that only amounts to mmap_offset().

BR,
-R

> >
> > BR,
> > -R
> >
> > > >
> > > > Thanks and regards
> > > > Dipam Turkar
> > > >
> > > > On Wed, Nov 15, 2023 at 8:37 PM Dmitry Baryshkov 
> > > >  wrote:
> > > >>
> > > >> On Wed, 15 Nov 2023 at 16:30, Dipam Turkar  
> > > >> wrote:
> > > >> >
> > > >> > Make msm use drm_gem_create_map_offset() instead of its custom
> > > >> > implementation for associating GEM object with a fake offset. Since,
> > > >> > we already have this generic implementation, we don't need the custom
> > > >> > implementation and it is better to standardize the code for GEM based
> > > >> > drivers. This also removes the outdated locking leftovers.
> > > >>
> > > >> Why are they outdated?
> > > >>
> > > >> >
> > > >> > Signed-off-by: Dipam Turkar 
> > > >> > ---
> > > >> >  drivers/gpu/drm/msm/msm_drv.c |  2 +-
> > > >> >  drivers/gpu/drm/msm/msm_gem.c | 21 -
> > > >> >  drivers/gpu/drm/msm/msm_gem.h |  2 --
> > > >> >  3 files changed, 1 insertion(+), 24 deletions(-)
> > > >> >
> > > >> > Changes in v2:
> > > >> > Modify commit message to include the absence of internal locking 
> > > >> > leftovers
> > > >> > around allocating a fake offset in msm_gem_mmap_offset() in the 
> > > >> > generic
> > > >> > implementation drm_gem_create_map_offset().
> > > >> >
> > > >> > diff --git a/drivers/gpu/drm/msm/msm_drv.c 
> > > >> > b/drivers/gpu/drm/msm/msm_drv.c
> > > >> > index a428951ee539..86a15992c717 100644
> > > >> > --- a/drivers/gpu/drm/msm/msm_drv.c
> > > >> > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > > >> > @@ -1085,7 +1085,7 @@ static const struct drm_driver msm_driver = {
> > > >> > .open   = msm_open,
> > > >> > .postclose  = msm_postclose,
> > > >> > .dumb_create= msm_gem_dumb_create,
> > > >> > -   .dumb_map_offset= msm_gem_dumb_map_offset,
> > > >> > +   .dumb_map_offset= drm_gem_dumb_map_offset,
> > > >> > .gem_prime_import_sg_table = msm_gem_prime_import_sg_table,
> > > >> >  #ifdef CONFIG_DEBUG_FS
> > > >> > .debugfs_init   = msm_debugfs_init,
> > > >> > diff --git a/drivers/gpu/drm/msm/msm_gem.c 
> > > >> > b/drivers/gpu/drm/msm/msm_gem.c
> > > >> > index db1e748daa75..489694ef79cb 100644
> > > >> > --- a/drivers/gpu/drm/msm/msm_gem.c
> > > >> > +++ b/drivers/gpu/drm/msm/msm_gem.c
> > > >> > @@ -671,27 +671,6 @@ int msm_gem_dumb_create(struct drm_file *file, 
> > > >> > struct drm_device *dev,
> > > >> > MSM_BO_SCANOUT | MSM_BO_WC, >handle, 
> > > >> > "dumb");
> > &

[PATCH v2 2/2] drm/msm/a690: Fix reg values for a690

2023-11-25 Thread Rob Clark

From: Danylo Piliaiev 

KGSL doesn't support a690 so all reg values were the same as
on a660. Now we know the values and they are different from the
windows driver.

This fixes hangs on D3D12 games and some CTS tests.

Signed-off-by: Danylo Piliaiev 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index d10b22eeda74..7784d7d39192 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1312,6 +1312,7 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
 
if (adreno_is_a650(adreno_gpu) ||
adreno_is_a660(adreno_gpu) ||
+   adreno_is_a690(adreno_gpu) ||
adreno_is_a730(adreno_gpu) ||
adreno_is_a740_family(adreno_gpu)) {
/* TODO: get ddr type from bootloader and use 2 for LPDDR4 */
@@ -1321,13 +1322,6 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
uavflagprd_inv = 2;
}
 
-   if (adreno_is_a690(adreno_gpu)) {
-   hbb_lo = 2;
-   amsbc = 1;
-   rgb565_predicator = 1;
-   uavflagprd_inv = 2;
-   }
-
if (adreno_is_7c3(adreno_gpu)) {
hbb_lo = 1;
amsbc = 1;
@@ -1741,7 +1735,9 @@ static int hw_init(struct msm_gpu *gpu)
/* Setting the primFifo thresholds default values,
 * and vccCacheSkipDis=1 bit (0x200) for A640 and newer
*/
-   if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu) || 
adreno_is_a690(adreno_gpu))
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00800200);
+   else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
else if (adreno_is_a640_family(adreno_gpu) || adreno_is_7c3(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
@@ -1775,6 +1771,8 @@ static int hw_init(struct msm_gpu *gpu)
if (adreno_is_a730(adreno_gpu) ||
adreno_is_a740_family(adreno_gpu))
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0xcf);
+   else if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0x4f);
else if (adreno_is_a619(adreno_gpu))
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0x3f);
else if (adreno_is_a610(adreno_gpu))
@@ -1808,12 +1806,17 @@ static int hw_init(struct msm_gpu *gpu)
a6xx_set_cp_protect(gpu);
 
if (adreno_is_a660_family(adreno_gpu)) {
-   gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x1);
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x00028801);
+   else
+   gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x1);
gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x0);
}
 
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x90);
/* Set dualQ + disable afull for A660 GPU */
-   if (adreno_is_a660(adreno_gpu))
+   else if (adreno_is_a660(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x66906);
else if (adreno_is_a7xx(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG,
-- 
2.42.0

[PATCH v2 1/2] drm/msm/a6xx: Add missing BIT(7) to REG_A6XX_UCHE_CLIENT_PF

2023-11-25 Thread Rob Clark

From: Danylo Piliaiev 

Downstream always set BIT(7)

Signed-off-by: Danylo Piliaiev 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 8176ea8da7a7..d10b22eeda74 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1782,7 +1782,7 @@ static int hw_init(struct msm_gpu *gpu)
else
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0x1f);
 
-   gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
+   gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
 
/* Set weights for bicubic filtering */
if (adreno_is_a650_family(adreno_gpu)) {
-- 
2.42.0

Re: [PATCH] drm/msm/dpu: Fix encoder CRC to account for CTM enablement

2023-11-21 Thread Rob Clark

On Tue, Nov 21, 2023 at 4:41 PM Abhinav Kumar  wrote:
>
>
>
> On 10/24/2023 12:01 PM, Abhinav Kumar wrote:
> >
> >
> > On 10/23/2023 4:03 PM, Dmitry Baryshkov wrote:
> >> On Tue, 24 Oct 2023 at 01:36, Rob Clark  wrote:
> >>>
> >>> On Mon, Oct 23, 2023 at 3:30 PM Dmitry Baryshkov
> >>>  wrote:
> >>>>
> >>>> On Tue, 24 Oct 2023 at 01:12, Rob Clark  wrote:
> >>>>>
> >>>>> From: Rob Clark 
> >>>>>
> >>>>> Seems like we need to pick INPUT_SEL=1 when CTM is enabled.  But not
> >>>>> otherwise.
> >>>>>
> >>>>> Suggested-by: Dmitry Baryshkov 
> >>>>> Signed-off-by: Rob Clark 
> >>>>> ---
> >
> > I cannot find anything in the docs which suggest this solution is correct.
> >
> > Different blocks in the DPU pipeline have their own CRC (MISR) registers
> > like LM, intf etc.
> >
> > We dont need to change INPUT_SEL to tell DPU from which pipeline to take
> > the CRC from as each of them have their own registers.
> >
> > INPUT_SEL is controlling whether the CRC needs to be calculated over the
> > entire display timings or only the active pixels. I am unable to tell at
> > the moment why this is making a difference in this use-case.
> >
> > Since I am unable to find any documentation proving this solution is
> > correct so far, unfortunately I would hold this back till then.
> >
> > We will investigate this issue and report our findings on this thread on
> > how to proceed.
> >
>
> Alright, we debugged and also found some more answers.
>
> The correct solution is indeed to set INPUT_SEL = 1 but let me explain
> why and what should be the correct way.
>
> INPUT_SEL was indeed telling whether to compute CRC over active pixels
> or active pixels + timings like I wrote before but this behavior changed
> since some chipsets.
>
> Now, INPUT_SEL = 0 means compute CRC *only* over timings and not the
> active area (and not display + timings like before) and like mentioned
> before this has nothing to do with what is the input to the CRC. Not
> covering the active area will not change the CRC at all as Rob reported
> but its not specific to CTM.
>
> Which means we should have been setting INPUT_SEL=1 whenever we use INTF
> CRC irrespective of whether CTM is used or not.
>
> What this also means is INTF CRC was not working correctly at all so far
> irrespecive of CTM or not because it was always computing CRC only on
> the timings (non-active area).
>
> This was not caught so far because it looks like IGT's
> kms_pipe_crc_basic test which was used to validate this only compares
> CRC between two frames of the same content to match if they were equal
> and not changing contents and comparing like kms_plane does. It will
> pass as CRC would not have changed.
>
> Now coming to the fix, the reset value of this register INTF_MISR_CTRL
> already sets the INPUT_SEL bit (or unsets it) correctly based on
> whichever DPU version is used so we should just change the
> dpu_hw_setup_misr() to a read on the register followed by ORing the
> required bits without touching INPUT_SEL and write.
>
> That will address this issue and also cover version control since the
> expected value of this bit has changed across DPU revisions.

Ok, thanks for following up on this.  Mind posting a patch to
supersede this one?

BR,
-R

> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c| 2 +-
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 ++--
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h | 3 ++-
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 ++--
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h | 2 +-
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c   | 2 +-
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_hw_util.c | 5 -
> >>>>>   drivers/gpu/drm/msm/disp/dpu1/dpu_hw_util.h | 3 ++-
> >>>>>   8 files changed, 15 insertions(+), 10 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >>>>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >>>>> index 2b83a13b3aa9..d93a92ffd5df 100644
> >>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >>>>> @@ -134,7 +134,7 @@ static void dpu_crtc_setup_encoder_misr(struct
> >>>>> drm_crtc *crtc)
> >>>>>

[pull] drm/msm: drm-msm-fixes-2023-11-21 for v6.7-rc3

2023-11-21 Thread Rob Clark

Hi Dave,

A few fixes for v6.7, description below

The following changes since commit b08d26dac1a1075c874f40ee02ec8ddc39e20146:

  drm/msm/a7xx: actually use a7xx state registers (2023-10-16 09:38:56 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2023-11-21

for you to fetch changes up to 56466f653cb59a8f46e991ad1e285f43afdca7d4:

  drm/msm: remove unnecessary NULL check (2023-11-17 15:32:49 -0800)


Fixes for v6.7-rc3:

- Fix the VREG_CTRL_1 for 4nm CPHY to match downstream
- Remove duplicate call to drm_kms_helper_poll_init() in msm_drm_init()
- Fix the safe_lut_tbl[] for sc8280xp to match downstream
- Don't attach the drm_dp_set_subconnector_property() for eDP
- Fix to attach drm_dp_set_subconnector_property() for DP. Otherwise
  there is a bootup crash on multiple targets
- Remove unnecessary NULL check left behind during cleanup


Abel Vesa (1):
  drm/msm/dp: don't touch DP subconnector property in eDP case

Bjorn Andersson (1):
  drm/msm/dpu: Add missing safe_lut_tbl in sc8280xp catalog

Dan Carpenter (1):
  drm/msm: remove unnecessary NULL check

Dmitry Baryshkov (2):
  drm/msm: remove exra drm_kms_helper_poll_init() call
  drm/msm/dp: attach the DP subconnector property

Jonathan Marek (1):
  drm/msm/dsi: use the correct VREG_CTRL_1 value for 4nm cphy

 drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h |  1 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  3 +--
 drivers/gpu/drm/msm/dp/dp_display.c  | 15 ++-
 drivers/gpu/drm/msm/dp/dp_drm.c  |  3 +++
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c|  2 +-
 drivers/gpu/drm/msm/msm_drv.c|  2 --
 6 files changed, 16 insertions(+), 10 deletions(-)

[PATCH] drm/msm/a690: Fix reg values for a690

2023-11-21 Thread Rob Clark

From: Danylo Piliaiev 

KGSL doesn't support a690 so all reg values were the same as
on a660. Now we know the values and they are different from the
windows driver.

This fixes hangs on D3D12 games and some CTS tests.

Signed-off-by: Danylo Piliaiev 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 8176ea8da7a7..75e1ea0404d3 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1326,6 +1326,7 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
amsbc = 1;
rgb565_predicator = 1;
uavflagprd_inv = 2;
+   ubwc_mode = 2;
}
 
if (adreno_is_7c3(adreno_gpu)) {
@@ -1741,7 +1742,9 @@ static int hw_init(struct msm_gpu *gpu)
/* Setting the primFifo thresholds default values,
 * and vccCacheSkipDis=1 bit (0x200) for A640 and newer
*/
-   if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu) || 
adreno_is_a690(adreno_gpu))
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00800200);
+   else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
else if (adreno_is_a640_family(adreno_gpu) || adreno_is_7c3(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
@@ -1775,6 +1778,8 @@ static int hw_init(struct msm_gpu *gpu)
if (adreno_is_a730(adreno_gpu) ||
adreno_is_a740_family(adreno_gpu))
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0xcf);
+   else if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0x4f);
else if (adreno_is_a619(adreno_gpu))
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0x3f);
else if (adreno_is_a610(adreno_gpu))
@@ -1782,7 +1787,10 @@ static int hw_init(struct msm_gpu *gpu)
else
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) 
| 0x1f);
 
-   gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 0x81);
+   else
+   gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
 
/* Set weights for bicubic filtering */
if (adreno_is_a650_family(adreno_gpu)) {
@@ -1808,12 +1816,17 @@ static int hw_init(struct msm_gpu *gpu)
a6xx_set_cp_protect(gpu);
 
if (adreno_is_a660_family(adreno_gpu)) {
-   gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x1);
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x00028801);
+   else
+   gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x1);
gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x0);
}
 
+   if (adreno_is_a690(adreno_gpu))
+   gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x90);
/* Set dualQ + disable afull for A660 GPU */
-   if (adreno_is_a660(adreno_gpu))
+   else if (adreno_is_a660(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x66906);
else if (adreno_is_a7xx(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG,
-- 
2.42.0

Re: [PATCH v2] Remove custom dumb_map_offset implementation in msm driver

2023-11-20 Thread Rob Clark

On Wed, Nov 15, 2023 at 11:33 AM Dmitry Baryshkov
 wrote:
>
> On Wed, 15 Nov 2023 at 20:46, Dipam Turkar  wrote:
> >
> > They are not outdated, my bad. I went through the locks' code and saw that 
> > they have been updated. But they are probably not necessary here as most of 
> > the drivers do not use any form of locking in their implementations. The 
> > generic implementations drm_gem_dumb_map_offset() and 
> > drm_gem_ttm_dumb_map_offset() do not have any locking mechanisms either.
>
> Excuse me, but this doesn't sound right to me. There are different
> drivers with different implementations. So either we'd need a good
> explanation of why it is not necessary, or this patch is NAKed.

Digging a bit thru history, it looks like commit 0de23977cfeb
("drm/gem: convert to new unified vma manager") made external locking
unnecessary, since the vma mgr already had it's own internal locking.

BR,
-R

> >
> > Thanks and regards
> > Dipam Turkar
> >
> > On Wed, Nov 15, 2023 at 8:37 PM Dmitry Baryshkov 
> >  wrote:
> >>
> >> On Wed, 15 Nov 2023 at 16:30, Dipam Turkar  wrote:
> >> >
> >> > Make msm use drm_gem_create_map_offset() instead of its custom
> >> > implementation for associating GEM object with a fake offset. Since,
> >> > we already have this generic implementation, we don't need the custom
> >> > implementation and it is better to standardize the code for GEM based
> >> > drivers. This also removes the outdated locking leftovers.
> >>
> >> Why are they outdated?
> >>
> >> >
> >> > Signed-off-by: Dipam Turkar 
> >> > ---
> >> >  drivers/gpu/drm/msm/msm_drv.c |  2 +-
> >> >  drivers/gpu/drm/msm/msm_gem.c | 21 -
> >> >  drivers/gpu/drm/msm/msm_gem.h |  2 --
> >> >  3 files changed, 1 insertion(+), 24 deletions(-)
> >> >
> >> > Changes in v2:
> >> > Modify commit message to include the absence of internal locking 
> >> > leftovers
> >> > around allocating a fake offset in msm_gem_mmap_offset() in the generic
> >> > implementation drm_gem_create_map_offset().
> >> >
> >> > diff --git a/drivers/gpu/drm/msm/msm_drv.c 
> >> > b/drivers/gpu/drm/msm/msm_drv.c
> >> > index a428951ee539..86a15992c717 100644
> >> > --- a/drivers/gpu/drm/msm/msm_drv.c
> >> > +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> > @@ -1085,7 +1085,7 @@ static const struct drm_driver msm_driver = {
> >> > .open   = msm_open,
> >> > .postclose  = msm_postclose,
> >> > .dumb_create= msm_gem_dumb_create,
> >> > -   .dumb_map_offset= msm_gem_dumb_map_offset,
> >> > +   .dumb_map_offset= drm_gem_dumb_map_offset,
> >> > .gem_prime_import_sg_table = msm_gem_prime_import_sg_table,
> >> >  #ifdef CONFIG_DEBUG_FS
> >> > .debugfs_init   = msm_debugfs_init,
> >> > diff --git a/drivers/gpu/drm/msm/msm_gem.c 
> >> > b/drivers/gpu/drm/msm/msm_gem.c
> >> > index db1e748daa75..489694ef79cb 100644
> >> > --- a/drivers/gpu/drm/msm/msm_gem.c
> >> > +++ b/drivers/gpu/drm/msm/msm_gem.c
> >> > @@ -671,27 +671,6 @@ int msm_gem_dumb_create(struct drm_file *file, 
> >> > struct drm_device *dev,
> >> > MSM_BO_SCANOUT | MSM_BO_WC, >handle, 
> >> > "dumb");
> >> >  }
> >> >
> >> > -int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device 
> >> > *dev,
> >> > -   uint32_t handle, uint64_t *offset)
> >> > -{
> >> > -   struct drm_gem_object *obj;
> >> > -   int ret = 0;
> >> > -
> >> > -   /* GEM does all our handle to object mapping */
> >> > -   obj = drm_gem_object_lookup(file, handle);
> >> > -   if (obj == NULL) {
> >> > -   ret = -ENOENT;
> >> > -   goto fail;
> >> > -   }
> >> > -
> >> > -   *offset = msm_gem_mmap_offset(obj);
> >> > -
> >> > -   drm_gem_object_put(obj);
> >> > -
> >> > -fail:
> >> > -   return ret;
> >> > -}
> >> > -
> >> >  static void *get_vaddr(struct drm_gem_object *obj, unsigned madv)
> >> >  {
> >> > struct msm_gem_object *msm_obj = to_msm_bo(obj);
> >> > diff --git a/drivers/gpu/drm/msm/msm_gem.h 
> >> > b/drivers/gpu/drm/msm/msm_gem.h
> >> > index 8ddef5443140..dc74a0ef865d 100644
> >> > --- a/drivers/gpu/drm/msm/msm_gem.h
> >> > +++ b/drivers/gpu/drm/msm/msm_gem.h
> >> > @@ -139,8 +139,6 @@ struct page **msm_gem_pin_pages(struct 
> >> > drm_gem_object *obj);
> >> >  void msm_gem_unpin_pages(struct drm_gem_object *obj);
> >> >  int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
> >> > struct drm_mode_create_dumb *args);
> >> > -int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device 
> >> > *dev,
> >> > -   uint32_t handle, uint64_t *offset);
> >> >  void *msm_gem_get_vaddr_locked(struct drm_gem_object *obj);
> >> >  void *msm_gem_get_vaddr(struct drm_gem_object *obj);
> >> >  void *msm_gem_get_vaddr_active(struct drm_gem_object *obj);
> >> > --
> >> > 2.34.1
> >> >
> >>
> >>
> >> --
> >> With best wishes
> >> Dmitry
>
>
>
> --
> With best wishes
> Dmitry

[PATCH v2 7/7] drm/msm/gem: Convert to drm_exec

2023-11-20 Thread Rob Clark

From: Rob Clark 

Replace the ww_mutex locking dance with the drm_exec helper.

v2: Error path fixes, move drm_exec_fini so we only call it once (and
only if we have drm_exec_init()

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/Kconfig  |   1 +
 drivers/gpu/drm/msm/msm_gem.h|   5 +-
 drivers/gpu/drm/msm/msm_gem_submit.c | 119 +--
 3 files changed, 24 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 6309a857ca31..f91d87afc0d3 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -16,6 +16,7 @@ config DRM_MSM
select DRM_DP_AUX_BUS
select DRM_DISPLAY_DP_HELPER
select DRM_DISPLAY_HELPER
+   select DRM_EXEC
select DRM_KMS_HELPER
select DRM_PANEL
select DRM_BRIDGE
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index af884ced7a0d..7f34263048a3 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include "drm/drm_exec.h"
 #include "drm/gpu_scheduler.h"
 #include "msm_drv.h"
 
@@ -254,7 +255,7 @@ struct msm_gem_submit {
struct msm_gpu *gpu;
struct msm_gem_address_space *aspace;
struct list_head node;   /* node in ring submit list */
-   struct ww_acquire_ctx ticket;
+   struct drm_exec exec;
uint32_t seqno; /* Sequence number of the submit on the ring */
 
/* Hw fence, which is created when the scheduler executes the job, and
@@ -287,8 +288,6 @@ struct msm_gem_submit {
struct drm_msm_gem_submit_reloc *relocs;
} *cmd;  /* array of size nr_cmds */
struct {
-/* make sure these don't conflict w/ MSM_SUBMIT_BO_x */
-#define BO_LOCKED  0x4000  /* obj lock is held */
uint32_t flags;
union {
struct drm_gem_object *obj;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 603f04d851d9..40878c26a749 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -248,85 +248,30 @@ static int submit_lookup_cmds(struct msm_gem_submit 
*submit,
return ret;
 }
 
-static void submit_unlock_bo(struct msm_gem_submit *submit, int i)
-{
-   struct drm_gem_object *obj = submit->bos[i].obj;
-   unsigned cleanup_flags = BO_LOCKED;
-   unsigned flags = submit->bos[i].flags & cleanup_flags;
-
-   /*
-* Clear flags bit before dropping lock, so that the msm_job_run()
-* path isn't racing with submit_cleanup() (ie. the read/modify/
-* write is protected by the obj lock in all paths)
-*/
-   submit->bos[i].flags &= ~cleanup_flags;
-
-   if (flags & BO_LOCKED)
-   dma_resv_unlock(obj->resv);
-}
-
 /* This is where we make sure all the bo's are reserved and pin'd: */
 static int submit_lock_objects(struct msm_gem_submit *submit)
 {
-   int contended, slow_locked = -1, i, ret = 0;
-
-retry:
-   for (i = 0; i < submit->nr_bos; i++) {
-   struct drm_gem_object *obj = submit->bos[i].obj;
-
-   if (slow_locked == i)
-   slow_locked = -1;
+   int ret;
 
-   contended = i;
+   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 
submit->nr_bos);
 
-   if (!(submit->bos[i].flags & BO_LOCKED)) {
-   ret = dma_resv_lock_interruptible(obj->resv,
- >ticket);
+   drm_exec_until_all_locked (>exec) {
+   for (unsigned i = 0; i < submit->nr_bos; i++) {
+   struct drm_gem_object *obj = submit->bos[i].obj;
+   ret = drm_exec_prepare_obj(>exec, obj, 1);
+   drm_exec_retry_on_contention(>exec);
if (ret)
-   goto fail;
-   submit->bos[i].flags |= BO_LOCKED;
+   goto error;
}
}
 
-   ww_acquire_done(>ticket);
-
return 0;
 
-fail:
-   if (ret == -EALREADY) {
-   SUBMIT_ERROR(submit, "handle %u at index %u already on submit 
list\n",
-submit->bos[i].handle, i);
-   ret = -EINVAL;
-   }
-
-   for (; i >= 0; i--)
-   submit_unlock_bo(submit, i);
-
-   if (slow_locked > 0)
-   submit_unlock_bo(submit, slow_locked);
-
-   if (ret == -EDEADLK) {
-   struct drm_gem_object *obj = submit->bos[contended].obj;
-   /* we lost out in a seqno race, lock and retry.. */
-   ret = dma_resv_lock_slow_interruptible(obj->resv,
-  >ticke

[PATCH v2 6/7] drm/exec: Pass in initial # of objects

2023-11-20 Thread Rob Clark

From: Rob Clark 

In cases where the # is known ahead of time, it is silly to do the table
resize dance.

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  2 +-
 drivers/gpu/drm/drm_exec.c   | 13 ++---
 drivers/gpu/drm/nouveau/nouveau_exec.c   |  2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c   |  2 +-
 drivers/gpu/drm/tests/drm_exec_test.c| 16 
 include/drm/drm_exec.h   |  2 +-
 12 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 41fbc4fd0fac..0bd3c4a6267a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1137,7 +1137,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 
ctx->n_vms = 1;
ctx->sync = >sync;
-   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked(>exec) {
ret = amdgpu_vm_lock_pd(vm, >exec, 2);
drm_exec_retry_on_contention(>exec);
@@ -1176,7 +1176,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
int ret;
 
ctx->sync = >sync;
-   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked(>exec) {
ctx->n_vms = 0;
list_for_each_entry(entry, >attachments, list) {
@@ -2552,7 +2552,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
 
amdgpu_sync_create();
 
-   drm_exec_init(, 0);
+   drm_exec_init(, 0, 0);
/* Reserve all BOs and page tables for validation */
drm_exec_until_all_locked() {
/* Reserve all the page directories */
@@ -2793,7 +2793,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
 
mutex_lock(_info->lock);
 
-   drm_exec_init(, 0);
+   drm_exec_init(, 0, 0);
drm_exec_until_all_locked() {
list_for_each_entry(peer_vm, _info->vm_list_head,
vm_list_node) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index df3ecfa9e13f..2464606494d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -66,7 +66,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p,
 
amdgpu_sync_create(>sync);
drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
- DRM_EXEC_IGNORE_DUPLICATES);
+ DRM_EXEC_IGNORE_DUPLICATES, 0);
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 720011019741..796fa6f1420b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -70,7 +70,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
struct drm_exec exec;
int r;
 
-   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked() {
r = amdgpu_vm_lock_pd(vm, , 0);
if (likely(!r))
@@ -110,7 +110,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
struct drm_exec exec;
int r;
 
-   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked() {
r = amdgpu_vm_lock_pd(vm, , 0);
if (likely(!r))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 84beeaa4d21c..49a5f1c73b3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -203,7 +203,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object 
*obj,
struct drm_exec exec;
long r;
 
-   drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES);
+   drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES, 0);
drm_exec_until_all_locked() {
r = drm_exec_prepare_obj(, >tbo.base, 1);
drm_exec_retry_on_contention();
@@ -739,7 +739,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,

[PATCH v2 5/7] drm/msm/gem: Cleanup submit_cleanup_bo()

2023-11-20 Thread Rob Clark

From: Rob Clark 

Now that it only handles unlock duty, drop the superfluous arg and
rename it.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index d001bf286606..603f04d851d9 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -248,14 +248,10 @@ static int submit_lookup_cmds(struct msm_gem_submit 
*submit,
return ret;
 }
 
-/* Unwind bo state, according to cleanup_flags.  In the success case, only
- * the lock is dropped at the end of the submit (and active/pin ref is dropped
- * later when the submit is retired).
- */
-static void submit_cleanup_bo(struct msm_gem_submit *submit, int i,
-   unsigned cleanup_flags)
+static void submit_unlock_bo(struct msm_gem_submit *submit, int i)
 {
struct drm_gem_object *obj = submit->bos[i].obj;
+   unsigned cleanup_flags = BO_LOCKED;
unsigned flags = submit->bos[i].flags & cleanup_flags;
 
/*
@@ -304,10 +300,10 @@ static int submit_lock_objects(struct msm_gem_submit 
*submit)
}
 
for (; i >= 0; i--)
-   submit_cleanup_bo(submit, i, BO_LOCKED);
+   submit_unlock_bo(submit, i);
 
if (slow_locked > 0)
-   submit_cleanup_bo(submit, slow_locked, BO_LOCKED);
+   submit_unlock_bo(submit, slow_locked);
 
if (ret == -EDEADLK) {
struct drm_gem_object *obj = submit->bos[contended].obj;
@@ -533,7 +529,6 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct drm_gem_object *ob
  */
 static void submit_cleanup(struct msm_gem_submit *submit, bool error)
 {
-   unsigned cleanup_flags = BO_LOCKED;
unsigned i;
 
if (error)
@@ -541,7 +536,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, 
bool error)
 
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = submit->bos[i].obj;
-   submit_cleanup_bo(submit, i, cleanup_flags);
+   submit_unlock_bo(submit, i);
if (error)
drm_gem_object_put(obj);
}
-- 
2.42.0

[PATCH v2 1/7] drm/msm/gem: Remove "valid" tracking

2023-11-20 Thread Rob Clark

From: Rob Clark 

This was a small optimization for pre-soft-pin userspace.  But mesa
switched to soft-pin nearly 5yrs ago.  So lets drop the optimization
and simplify the code.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.h|  2 --
 drivers/gpu/drm/msm/msm_gem_submit.c | 44 +---
 2 files changed, 8 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 8ddef5443140..c36c1c1fa222 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -271,7 +271,6 @@ struct msm_gem_submit {
struct msm_gpu_submitqueue *queue;
struct pid *pid;/* submitting process */
bool fault_dumped;  /* Limit devcoredump dumping to one per submit */
-   bool valid; /* true if no cmdstream patching needed */
bool in_rb; /* "sudo" mode, copy cmds into RB */
struct msm_ringbuffer *ring;
unsigned int nr_cmds;
@@ -288,7 +287,6 @@ struct msm_gem_submit {
} *cmd;  /* array of size nr_cmds */
struct {
 /* make sure these don't conflict w/ MSM_SUBMIT_BO_x */
-#define BO_VALID   0x8000  /* is current addr in cmdstream correct/valid? 
*/
 #define BO_LOCKED  0x4000  /* obj lock is held */
 #define BO_PINNED  0x2000  /* obj (pages) is pinned and on active list */
uint32_t flags;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 6d8ec1337e8b..996274ef32a6 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -150,8 +150,6 @@ static int submit_lookup_objects(struct msm_gem_submit 
*submit,
 
submit->bos[i].handle = submit_bo.handle;
submit->bos[i].flags = submit_bo.flags;
-   /* in validate_objects() we figure out if this is true: */
-   submit->bos[i].iova  = submit_bo.presumed;
}
 
spin_lock(>table_lock);
@@ -278,9 +276,6 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit 
*submit, int i)
 {
unsigned cleanup_flags = BO_PINNED | BO_LOCKED;
submit_cleanup_bo(submit, i, cleanup_flags);
-
-   if (!(submit->bos[i].flags & BO_VALID))
-   submit->bos[i].iova = 0;
 }
 
 /* This is where we make sure all the bo's are reserved and pin'd: */
@@ -390,8 +385,6 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
struct msm_drm_private *priv = submit->dev->dev_private;
int i, ret = 0;
 
-   submit->valid = true;
-
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = submit->bos[i].obj;
struct msm_gem_vma *vma;
@@ -407,14 +400,7 @@ static int submit_pin_objects(struct msm_gem_submit 
*submit)
if (ret)
break;
 
-   if (vma->iova == submit->bos[i].iova) {
-   submit->bos[i].flags |= BO_VALID;
-   } else {
-   submit->bos[i].iova = vma->iova;
-   /* iova changed, so address in cmdstream is not valid: 
*/
-   submit->bos[i].flags &= ~BO_VALID;
-   submit->valid = false;
-   }
+   submit->bos[i].iova = vma->iova;
}
 
/*
@@ -451,7 +437,7 @@ static void submit_attach_object_fences(struct 
msm_gem_submit *submit)
 }
 
 static int submit_bo(struct msm_gem_submit *submit, uint32_t idx,
-   struct drm_gem_object **obj, uint64_t *iova, bool *valid)
+   struct drm_gem_object **obj, uint64_t *iova)
 {
if (idx >= submit->nr_bos) {
SUBMIT_ERROR(submit, "invalid buffer index: %u (out of %u)\n",
@@ -463,8 +449,6 @@ static int submit_bo(struct msm_gem_submit *submit, 
uint32_t idx,
*obj = submit->bos[idx].obj;
if (iova)
*iova = submit->bos[idx].iova;
-   if (valid)
-   *valid = !!(submit->bos[idx].flags & BO_VALID);
 
return 0;
 }
@@ -477,9 +461,6 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct drm_gem_object *ob
uint32_t *ptr;
int ret = 0;
 
-   if (!nr_relocs)
-   return 0;
-
if (offset % 4) {
SUBMIT_ERROR(submit, "non-aligned cmdstream buffer: %u\n", 
offset);
return -EINVAL;
@@ -500,7 +481,6 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct drm_gem_object *ob
struct drm_msm_gem_submit_reloc submit_reloc = relocs[i];
uint32_t off;
uint64_t iova;
-   bool valid;
 
if (submit_reloc.submit_offset % 4) {
SUBMIT_ERROR(submit, "non-aligned reloc offset: %u\n",
@@ -519,13 +499,10 @@ static int submit_reloc(struct msm_ge

[PATCH v2 4/7] drm/msm/gem: Split out submit_unpin_objects() helper

2023-11-20 Thread Rob Clark

From: Rob Clark 

Untangle unpinning from unlock/unref loop.  The unpin only happens in
error paths so it is easier to decouple from the normal unlock path.

Since we never have an intermediate state where a subset of buffers
are pinned (ie. we never bail out of the pin or unpin loops) we can
replace the bo state flag bit with a global flag in the submit.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.h|  6 +++---
 drivers/gpu/drm/msm/msm_gem_submit.c | 22 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c |  3 ++-
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index c36c1c1fa222..af884ced7a0d 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -270,8 +270,9 @@ struct msm_gem_submit {
int fence_id;   /* key into queue->fence_idr */
struct msm_gpu_submitqueue *queue;
struct pid *pid;/* submitting process */
-   bool fault_dumped;  /* Limit devcoredump dumping to one per submit */
-   bool in_rb; /* "sudo" mode, copy cmds into RB */
+   bool bos_pinned : 1;
+   bool fault_dumped:1;/* Limit devcoredump dumping to one per submit */
+   bool in_rb : 1; /* "sudo" mode, copy cmds into RB */
struct msm_ringbuffer *ring;
unsigned int nr_cmds;
unsigned int nr_bos;
@@ -288,7 +289,6 @@ struct msm_gem_submit {
struct {
 /* make sure these don't conflict w/ MSM_SUBMIT_BO_x */
 #define BO_LOCKED  0x4000  /* obj lock is held */
-#define BO_PINNED  0x2000  /* obj (pages) is pinned and on active list */
uint32_t flags;
union {
struct drm_gem_object *obj;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 786b48a55309..d001bf286606 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -265,9 +265,6 @@ static void submit_cleanup_bo(struct msm_gem_submit 
*submit, int i,
 */
submit->bos[i].flags &= ~cleanup_flags;
 
-   if (flags & BO_PINNED)
-   msm_gem_unpin_locked(obj);
-
if (flags & BO_LOCKED)
dma_resv_unlock(obj->resv);
 }
@@ -407,13 +404,28 @@ static int submit_pin_objects(struct msm_gem_submit 
*submit)
mutex_lock(>lru.lock);
for (i = 0; i < submit->nr_bos; i++) {
msm_gem_pin_obj_locked(submit->bos[i].obj);
-   submit->bos[i].flags |= BO_PINNED;
}
mutex_unlock(>lru.lock);
 
+   submit->bos_pinned = true;
+
return ret;
 }
 
+static void submit_unpin_objects(struct msm_gem_submit *submit)
+{
+   if (!submit->bos_pinned)
+   return;
+
+   for (int i = 0; i < submit->nr_bos; i++) {
+   struct drm_gem_object *obj = submit->bos[i].obj;
+
+   msm_gem_unpin_locked(obj);
+   }
+
+   submit->bos_pinned = false;
+}
+
 static void submit_attach_object_fences(struct msm_gem_submit *submit)
 {
int i;
@@ -525,7 +537,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, 
bool error)
unsigned i;
 
if (error)
-   cleanup_flags |= BO_PINNED;
+   submit_unpin_objects(submit);
 
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = submit->bos[i].obj;
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 929df7243792..bd125ca4d230 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -29,9 +29,10 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
struct drm_gem_object *obj = submit->bos[i].obj;
 
msm_gem_unpin_active(obj);
-   submit->bos[i].flags &= ~BO_PINNED;
}
 
+   submit->bos_pinned = false;
+
mutex_unlock(>lru.lock);
 
msm_gpu_submit(gpu, submit);
-- 
2.42.0

[PATCH v2 3/7] drm/msm/gem: Don't queue job to sched in error cases

2023-11-20 Thread Rob Clark

From: Rob Clark 

We shouldn't be running the job in error cases.  This also avoids having
to think too hard about where the objs get unpinned (and if necessary,
the resv takes over tracking that the obj is busy).. ie. error cases it
always happens synchronously, and normal cases it happens from scheduler
job_run() callback.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 2d5527dc3e1a..786b48a55309 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -946,6 +946,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
}
}
 
+   if (ret)
+   goto out;
+
submit_attach_object_fences(submit);
 
/* The scheduler owns a ref now: */
-- 
2.42.0

[PATCH v2 2/7] drm/msm/gem: Remove submit_unlock_unpin_bo()

2023-11-20 Thread Rob Clark

From: Rob Clark 

The only point it is called is before pinning objects, so the "unpin"
part of the name is fiction.  Just remove it and call submit_cleanup_bo()
directly.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 996274ef32a6..2d5527dc3e1a 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -272,12 +272,6 @@ static void submit_cleanup_bo(struct msm_gem_submit 
*submit, int i,
dma_resv_unlock(obj->resv);
 }
 
-static void submit_unlock_unpin_bo(struct msm_gem_submit *submit, int i)
-{
-   unsigned cleanup_flags = BO_PINNED | BO_LOCKED;
-   submit_cleanup_bo(submit, i, cleanup_flags);
-}
-
 /* This is where we make sure all the bo's are reserved and pin'd: */
 static int submit_lock_objects(struct msm_gem_submit *submit)
 {
@@ -313,10 +307,10 @@ static int submit_lock_objects(struct msm_gem_submit 
*submit)
}
 
for (; i >= 0; i--)
-   submit_unlock_unpin_bo(submit, i);
+   submit_cleanup_bo(submit, i, BO_LOCKED);
 
if (slow_locked > 0)
-   submit_unlock_unpin_bo(submit, slow_locked);
+   submit_cleanup_bo(submit, slow_locked, BO_LOCKED);
 
if (ret == -EDEADLK) {
struct drm_gem_object *obj = submit->bos[contended].obj;
-- 
2.42.0

[PATCH v2 0/7] drm/msm/gem: drm_exec conversion

2023-11-20 Thread Rob Clark

From: Rob Clark 

Simplify the exec path (removing a legacy optimization) and convert to
drm_exec.  One drm_exec patch to allow passing in the expected # of GEM
objects to avoid re-allocation.

I'd be a bit happier if I could avoid the extra objects table allocation
in drm_exec in the first place, but wasn't really happy with any of the
things I tried to get rid of that.

v2: updates in 6/7 and other nit-addressing

Rob Clark (7):
  drm/msm/gem: Remove "valid" tracking
  drm/msm/gem: Remove submit_unlock_unpin_bo()
  drm/msm/gem: Don't queue job to sched in error cases
  drm/msm/gem: Split out submit_unpin_objects() helper
  drm/msm/gem: Cleanup submit_cleanup_bo()
  drm/exec: Pass in initial # of objects
  drm/msm/gem: Convert to drm_exec

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c  |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |   2 +-
 drivers/gpu/drm/drm_exec.c|  13 +-
 drivers/gpu/drm/msm/Kconfig   |   1 +
 drivers/gpu/drm/msm/msm_gem.h |  13 +-
 drivers/gpu/drm/msm/msm_gem_submit.c  | 199 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c  |   3 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c|   2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c|   2 +-
 drivers/gpu/drm/tests/drm_exec_test.c |  16 +-
 include/drm/drm_exec.h|   2 +-
 16 files changed, 92 insertions(+), 187 deletions(-)

-- 
2.42.0

[PATCH] drm/msm/gpu: Skip retired submits in recover worker

2023-11-17 Thread Rob Clark

From: Rob Clark 

If we somehow raced with submit retiring, either while waiting for
worker to have a chance to run or acquiring the gpu lock, then the
recover worker should just bail.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.c | 41 +++
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3fad5d58262f..fd3dceed86f8 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -365,29 +365,31 @@ static void recover_worker(struct kthread_work *work)
DRM_DEV_ERROR(dev->dev, "%s: hangcheck recover!\n", gpu->name);
 
submit = find_submit(cur_ring, cur_ring->memptrs->fence + 1);
-   if (submit) {
-   /* Increment the fault counts */
-   submit->queue->faults++;
-   if (submit->aspace)
-   submit->aspace->faults++;
 
-   get_comm_cmdline(submit, , );
+   /*
+* If the submit retired while we were waiting for the worker to run,
+* or waiting to acquire the gpu lock, then nothing more to do.
+*/
+   if (!submit)
+   goto out_unlock;
 
-   if (comm && cmd) {
-   DRM_DEV_ERROR(dev->dev, "%s: offending task: %s (%s)\n",
-   gpu->name, comm, cmd);
+   /* Increment the fault counts */
+   submit->queue->faults++;
+   if (submit->aspace)
+   submit->aspace->faults++;
 
-   msm_rd_dump_submit(priv->hangrd, submit,
-   "offending task: %s (%s)", comm, cmd);
-   } else {
-   msm_rd_dump_submit(priv->hangrd, submit, NULL);
-   }
+   get_comm_cmdline(submit, , );
+
+   if (comm && cmd) {
+   DRM_DEV_ERROR(dev->dev, "%s: offending task: %s (%s)\n",
+ gpu->name, comm, cmd);
+
+   msm_rd_dump_submit(priv->hangrd, submit,
+  "offending task: %s (%s)", comm, cmd);
} else {
-   /*
-* We couldn't attribute this fault to any particular context,
-* so increment the global fault count instead.
-*/
-   gpu->global_faults++;
+   DRM_DEV_ERROR(dev->dev, "%s: offending task: unknown\n", 
gpu->name);
+
+   msm_rd_dump_submit(priv->hangrd, submit, NULL);
}
 
/* Record the crash state */
@@ -440,6 +442,7 @@ static void recover_worker(struct kthread_work *work)
 
pm_runtime_put(>pdev->dev);
 
+out_unlock:
mutex_unlock(>lock);
 
msm_gpu_retire(gpu);
-- 
2.41.0

[PATCH] drm/msm: Reduce fallout of fence signaling vs reclaim hangs

2023-11-17 Thread Rob Clark

From: Rob Clark 

Until various PM devfreq/QoS and interconnect patches land, we could
potentially trigger reclaim from gpu scheduler thread, and under enough
memory pressure that could trigger a sort of deadlock.  Eventually the
wait will timeout and we'll move on to consider other GEM objects.  But
given that there is still a potential for deadlock/stalling, we should
reduce the timeout to contain the damage.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 5a7d48c02c4b..07ca4ddfe4e3 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -75,7 +75,7 @@ static bool
 wait_for_idle(struct drm_gem_object *obj)
 {
enum dma_resv_usage usage = dma_resv_usage_rw(true);
-   return dma_resv_wait_timeout(obj->resv, usage, false, 1000) > 0;
+   return dma_resv_wait_timeout(obj->resv, usage, false, 10) > 0;
 }
 
 static bool
-- 
2.41.0

[PATCH] drm/msm/gpu: Move gpu devcore's to gpu device

2023-11-15 Thread Rob Clark

From: Rob Clark 

The dpu devcore's are already associated with the dpu device.  So we
should associate the gpu devcore's with the gpu device, for easier
classification.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index cfcb4317afdb..3fad5d58262f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -292,8 +292,7 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
/* Set the active crash state to be dumped on failure */
gpu->crashstate = state;
 
-   /* FIXME: Release the crashstate if this errors out? */
-   dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, GFP_KERNEL,
+   dev_coredumpm(>pdev->dev, THIS_MODULE, gpu, 0, GFP_KERNEL,
msm_gpu_devcoredump_read, msm_gpu_devcoredump_free);
 }
 #else
-- 
2.41.0

[PATCH v3 2/2] drm/msm/gem: Add metadata

2023-11-06 Thread Rob Clark

From: Rob Clark 

The EXT_external_objects extension is a bit awkward as it doesn't pass
explicit modifiers, leaving the importer to guess with incomplete
information.  In the case of vk (turnip) exporting and gl (freedreno)
importing, the "OPTIMAL_TILING_EXT" layout depends on VkImageCreateInfo
flags (among other things), which the importer does not know.  Which
unfortunately leaves us with the need for a metadata back-channel.

The contents of the metadata are defined by userspace.  The
EXT_external_objects extension is only required to work between
compatible versions of gl and vk drivers, as defined by device and
driver UUIDs.

v2: add missing metadata kfree
v3: Rework to move copy_from/to_user out from under gem obj lock
to avoid angering lockdep about deadlocks against fs-reclaim

Signed-off-by: Rob Clark 
---
Note, I dropped Dmitry's r-b on this version because it was a bit of
a re-write of the original patch.

 drivers/gpu/drm/msm/msm_drv.c | 92 ++-
 drivers/gpu/drm/msm/msm_gem.c |  1 +
 drivers/gpu/drm/msm/msm_gem.h |  4 ++
 include/uapi/drm/msm_drm.h|  2 +
 4 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 781db689fb16..c05c27a70c34 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -49,9 +49,10 @@
  * - 1.9.0 - Add MSM_SUBMIT_FENCE_SN_IN
  * - 1.10.0 - Add MSM_SUBMIT_BO_NO_IMPLICIT
  * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
+ * - 1.12.0 - Add MSM_INFO_SET_METADATA and MSM_INFO_GET_METADATA
  */
 #define MSM_VERSION_MAJOR  1
-#define MSM_VERSION_MINOR  11
+#define MSM_VERSION_MINOR  12
 #define MSM_VERSION_PATCHLEVEL 0
 
 static void msm_deinit_vram(struct drm_device *ddev);
@@ -822,6 +823,85 @@ static int msm_ioctl_gem_info_set_iova(struct drm_device 
*dev,
return msm_gem_set_iova(obj, ctx->aspace, iova);
 }
 
+static int msm_ioctl_gem_info_set_metadata(struct drm_gem_object *obj,
+  __user void *metadata,
+  u32 metadata_size)
+{
+   struct msm_gem_object *msm_obj = to_msm_bo(obj);
+   void *buf;
+   int ret;
+
+   /* Impose a moderate upper bound on metadata size: */
+   if (metadata_size > 128) {
+   return -EOVERFLOW;
+   }
+
+   /* Use a temporary buf to keep copy_from_user() outside of gem obj 
lock: */
+   buf = memdup_user(metadata, metadata_size);
+   if (IS_ERR(buf))
+   return PTR_ERR(buf);
+
+   ret = msm_gem_lock_interruptible(obj);
+   if (ret)
+   goto out;
+
+   msm_obj->metadata =
+   krealloc(msm_obj->metadata, metadata_size, GFP_KERNEL);
+   msm_obj->metadata_size = metadata_size;
+   memcpy(msm_obj->metadata, buf, metadata_size);
+
+   msm_gem_unlock(obj);
+
+out:
+   kfree(buf);
+
+   return ret;
+}
+
+static int msm_ioctl_gem_info_get_metadata(struct drm_gem_object *obj,
+  __user void *metadata,
+  u32 *metadata_size)
+{
+   struct msm_gem_object *msm_obj = to_msm_bo(obj);
+   void *buf;
+   int ret, len;
+
+   if (!metadata) {
+   /*
+* Querying the size is inherently racey, but
+* EXT_external_objects expects the app to confirm
+* via device and driver UUIDs that the exporter and
+* importer versions match.  All we can do from the
+* kernel side is check the length under obj lock
+* when userspace tries to retrieve the metadata
+*/
+   *metadata_size = msm_obj->metadata_size;
+   return 0;
+   }
+
+   ret = msm_gem_lock_interruptible(obj);
+   if (ret)
+   return ret;
+
+   /* Avoid copy_to_user() under gem obj lock: */
+   len = msm_obj->metadata_size;
+   buf = kmemdup(msm_obj->metadata, len, GFP_KERNEL);
+
+   msm_gem_unlock(obj);
+
+   if (*metadata_size < len) {
+   ret = -ETOOSMALL;
+   } else if (copy_to_user(metadata, buf, len)) {
+   ret = -EFAULT;
+   } else {
+   *metadata_size = len;
+   }
+
+   kfree(buf);
+
+   return 0;
+}
+
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
struct drm_file *file)
 {
@@ -844,6 +924,8 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
break;
case MSM_INFO_SET_NAME:
case MSM_INFO_GET_NAME:
+   case MSM_INFO_SET_METADATA:
+   case MSM_INFO_GET_METADATA:
break;
default:
return -EINVAL;
@@ -906,6 +988,14 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
ret = -EFAULT;

[PATCH v3 1/2] drm/msm: Small uabi fixes

2023-11-06 Thread Rob Clark

From: Rob Clark 

Correct the minor version exposed and error return value for
MSM_INFO_GET_NAME.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/msm_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4bd028fa7500..781db689fb16 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -51,7 +51,7 @@
  * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
  */
 #define MSM_VERSION_MAJOR  1
-#define MSM_VERSION_MINOR  10
+#define MSM_VERSION_MINOR  11
 #define MSM_VERSION_PATCHLEVEL 0
 
 static void msm_deinit_vram(struct drm_device *ddev);
@@ -896,7 +896,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
break;
case MSM_INFO_GET_NAME:
if (args->value && (args->len < strlen(msm_obj->name))) {
-   ret = -EINVAL;
+   ret = -ETOOSMALL;
break;
}
args->len = strlen(msm_obj->name);
-- 
2.41.0

[PATCH v3 0/2] drm/msm/gem: Add metadata uapi

2023-11-06 Thread Rob Clark

From: Rob Clark 

Add metadata mechanism to provide a back-channel to communicate image
layout information between vk and gl, because EXT_external_objects
doesn't support explicit modifiers and "OPTIMAL_TILING_EXT" is not
enough information for the importer to deduce the layout.

Rob Clark (2):
  drm/msm: Small uabi fixes
  drm/msm/gem: Add metadata

 drivers/gpu/drm/msm/msm_drv.c | 94 ++-
 drivers/gpu/drm/msm/msm_gem.c |  1 +
 drivers/gpu/drm/msm/msm_gem.h |  4 ++
 include/uapi/drm/msm_drm.h|  2 +
 4 files changed, 99 insertions(+), 2 deletions(-)

-- 
2.41.0

Re: [PATCH] drm/msm/dpu: Add missing safe_lut_tbl in sc8280xp catalog

2023-10-31 Thread Rob Clark

On Tue, Oct 31, 2023 at 5:35 AM Johan Hovold  wrote:
>
> On Mon, Oct 30, 2023 at 04:23:20PM -0700, Bjorn Andersson wrote:
> > During USB transfers on the SC8280XP __arm_smmu_tlb_sync() is seen to
> > typically take 1-2ms to complete. As expected this results in poor
> > performance, something that has been mitigated by proposing running the
> > iommu in non-strict mode (boot with iommu.strict=0).
> >
> > This turns out to be related to the SAFE logic, and programming the QOS
> > SAFE values in the DPU (per suggestion from Rob and Doug) reduces the
> > TLB sync time to below 10us, which means significant less time spent
> > with interrupts disabled and a significant boost in throughput.
>
> I ran some tests with a gigabit ethernet adapter to get an idea of how
> this performs in comparison to using lazy iommu mode ("non-strict"):
>
> 6.6 6.6-lazy6.6-dpu 6.6-dpu-lazy
> iperf3 recv 114 941 941 941 MBit/s
> iperf3 send 124 891 703 940 MBit/s
>
> scp recv14.6110 110 111 MB/s
> scp send12.598.991.5110 MB/s
>
> This patch in itself indeed improves things quite a bit, but there is
> still some performance that can be gained by using lazy iommu mode.
>
> Notably, lazy mode with this patch applied appears to saturate the link
> in both directions.

Maybe there is still room for SoC specific udev rules so dma masters
without firmware can be configured as "lazy", ie. like:

https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/refs/heads/main/baseboard-trogdor/chromeos-base/chromeos-bsp-baseboard-trogdor/files/98-qcom-nonstrict-iommu.rules

BR,
-R

> Tested-by: Johan Hovold 
>
> Johan

Re: [PATCH] drm/msm/dpu: Add missing safe_lut_tbl in sc8280xp catalog

2023-10-31 Thread Rob Clark

On Tue, Oct 31, 2023 at 1:19 AM Manivannan Sadhasivam
 wrote:
>
> On Mon, Oct 30, 2023 at 04:23:20PM -0700, Bjorn Andersson wrote:
> > During USB transfers on the SC8280XP __arm_smmu_tlb_sync() is seen to
> > typically take 1-2ms to complete. As expected this results in poor
> > performance, something that has been mitigated by proposing running the
> > iommu in non-strict mode (boot with iommu.strict=0).
> >
> > This turns out to be related to the SAFE logic, and programming the QOS
> > SAFE values in the DPU (per suggestion from Rob and Doug) reduces the
> > TLB sync time to below 10us, which means significant less time spent
> > with interrupts disabled and a significant boost in throughput.
> >
> > Fixes: 4a352c2fc15a ("drm/msm/dpu: Introduce SC8280XP")
> > Cc: sta...@vger.kernel.org
> > Suggested-by: Doug Anderson 
> > Suggested-by: Rob Clark 
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h 
> > b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
> > index 1ccd1edd693c..4c0528794e7a 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
> > @@ -406,6 +406,7 @@ static const struct dpu_perf_cfg sc8280xp_perf_data = {
> >   .min_llcc_ib = 0,
> >   .min_dram_ib = 80,
> >   .danger_lut_tbl = {0xf, 0x, 0x0},
> > + .safe_lut_tbl = {0xfe00, 0xfe00, 0x},
>
> What does these values represent? And how SAFE is to override the default QoS
> values?
>
> I'm not too familiar with the MSM DRM driver, so please excuse my ignorance.

for realtime dma (like scanout) there is a sort of "safe" signal from
the dma master to the smmu to indicate when it has enough data
buffered for it to be safe to do tlbinv without risking underflow.
When things aren't "safe" the smmu will stall tlbinv.  This is just
configuring the thresholds for the "safe" signal.

BR,
-R

> - Mani
>
> >   .qos_lut_tbl = {
> >   {.nentry = ARRAY_SIZE(sc8180x_qos_linear),
> >   .entries = sc8180x_qos_linear
> >
> > ---
> > base-commit: c503e3eec382ac708ee7adf874add37b77c5d312
> > change-id: 20231030-sc8280xp-dpu-safe-lut-9769027b8452
> >
> > Best regards,
> > --
> > Bjorn Andersson 
> >
>
> --
> மணிவண்ணன் சதாசிவம்

Re: [PATCH 6/7] drm/exec: Pass in initial # of objects

2023-10-30 Thread Rob Clark

On Mon, Oct 30, 2023 at 9:01 AM Christian König
 wrote:
>
> Am 30.10.23 um 14:38 schrieb Rob Clark:
> > On Mon, Oct 30, 2023 at 1:05 AM Christian König
> >  wrote:
> >> Am 27.10.23 um 18:58 schrieb Rob Clark:
> >>> From: Rob Clark 
> >>>
> >>> In cases where the # is known ahead of time, it is silly to do the table
> >>> resize dance.
> >> Ah, yes that was my initial implementation as well, but I ditched that
> >> because nobody actually used it.
> >>
> >> One comment below.
> >>
> >>> Signed-off-by: Rob Clark 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c |  4 ++--
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  4 ++--
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |  4 ++--
> >>>drivers/gpu/drm/drm_exec.c  | 15 ---
> >>>drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
> >>>drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  2 +-
> >>>include/drm/drm_exec.h  |  2 +-
> >>>8 files changed, 22 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> index efdb1c48f431..d27ca8f61929 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> @@ -65,7 +65,7 @@ static int amdgpu_cs_parser_init(struct 
> >>> amdgpu_cs_parser *p,
> >>>}
> >>>
> >>>amdgpu_sync_create(>sync);
> >>> - drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
> >>> + drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> >>>return 0;
> >>>}
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >>> index 720011019741..796fa6f1420b 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> >>> @@ -70,7 +70,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, 
> >>> struct amdgpu_vm *vm,
> >>>struct drm_exec exec;
> >>>int r;
> >>>
> >>> - drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
> >>> + drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> >>>drm_exec_until_all_locked() {
> >>>r = amdgpu_vm_lock_pd(vm, , 0);
> >>>if (likely(!r))
> >>> @@ -110,7 +110,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device 
> >>> *adev, struct amdgpu_vm *vm,
> >>>struct drm_exec exec;
> >>>int r;
> >>>
> >>> - drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
> >>> + drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> >>>drm_exec_until_all_locked() {
> >>>r = amdgpu_vm_lock_pd(vm, , 0);
> >>>if (likely(!r))
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>> index ca4d2d430e28..16f1715148ad 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>> @@ -203,7 +203,7 @@ static void amdgpu_gem_object_close(struct 
> >>> drm_gem_object *obj,
> >>>struct drm_exec exec;
> >>>long r;
> >>>
> >>> - drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES);
> >>> + drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES, 0);
> >>>drm_exec_until_all_locked() {
> >>>r = drm_exec_prepare_obj(, >tbo.base, 1);
> >>>drm_exec_retry_on_contention();
> >>> @@ -739,7 +739,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
> >>> *data,
> >>>}
> >>>
> >>>drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT |
> >>> -   DRM_EXEC_IGNORE_DUPLICATES);
> >>> +   DRM_EXEC_IGNORE_DUPLICATES, 0);
> >>>drm_exec_until_all_locked() {
> >>>if (gobj) {
> >>>r = drm_exec_lock_obj(, gobj);
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
> >

[PATCH] drm/gpuva: Fix comment typo

2023-10-30 Thread Rob Clark

From: Rob Clark 

Just something I noticed in passing.

Signed-off-by: Rob Clark 
---
 include/drm/drm_gpuva_mgr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index ed8d50200cc3..26a2c0880bac 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -189,7 +189,7 @@ static inline bool drm_gpuva_invalidated(struct drm_gpuva 
*va)
  * struct drm_gpuva_manager - DRM GPU VA Manager
  *
  * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
- * _tree structures. Typically, this structure is embedded in bigger
+ *  structures. Typically, this structure is embedded in bigger
  * driver structures.
  *
  * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
-- 
2.41.0

Re: [PATCH 6/7] drm/exec: Pass in initial # of objects

2023-10-30 Thread Rob Clark

On Mon, Oct 30, 2023 at 1:05 AM Christian König
 wrote:
>
> Am 27.10.23 um 18:58 schrieb Rob Clark:
> > From: Rob Clark 
> >
> > In cases where the # is known ahead of time, it is silly to do the table
> > resize dance.
>
> Ah, yes that was my initial implementation as well, but I ditched that
> because nobody actually used it.
>
> One comment below.
>
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c |  4 ++--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  4 ++--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |  4 ++--
> >   drivers/gpu/drm/drm_exec.c  | 15 ---
> >   drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
> >   drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  2 +-
> >   include/drm/drm_exec.h  |  2 +-
> >   8 files changed, 22 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index efdb1c48f431..d27ca8f61929 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -65,7 +65,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
> > *p,
> >   }
> >
> >   amdgpu_sync_create(>sync);
> > - drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
> > + drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> >   return 0;
> >   }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > index 720011019741..796fa6f1420b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > @@ -70,7 +70,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, 
> > struct amdgpu_vm *vm,
> >   struct drm_exec exec;
> >   int r;
> >
> > - drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
> > + drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> >   drm_exec_until_all_locked() {
> >   r = amdgpu_vm_lock_pd(vm, , 0);
> >   if (likely(!r))
> > @@ -110,7 +110,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, 
> > struct amdgpu_vm *vm,
> >   struct drm_exec exec;
> >   int r;
> >
> > - drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
> > + drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> >   drm_exec_until_all_locked() {
> >   r = amdgpu_vm_lock_pd(vm, , 0);
> >   if (likely(!r))
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > index ca4d2d430e28..16f1715148ad 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > @@ -203,7 +203,7 @@ static void amdgpu_gem_object_close(struct 
> > drm_gem_object *obj,
> >   struct drm_exec exec;
> >   long r;
> >
> > - drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES);
> > + drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES, 0);
> >   drm_exec_until_all_locked() {
> >   r = drm_exec_prepare_obj(, >tbo.base, 1);
> >   drm_exec_retry_on_contention();
> > @@ -739,7 +739,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
> > *data,
> >   }
> >
> >   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT |
> > -   DRM_EXEC_IGNORE_DUPLICATES);
> > +   DRM_EXEC_IGNORE_DUPLICATES, 0);
> >   drm_exec_until_all_locked() {
> >   if (gobj) {
> >   r = drm_exec_lock_obj(, gobj);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > index b6015157763a..3c351941701e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > @@ -1105,7 +1105,7 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device 
> > *adev,
> >
> >   amdgpu_sync_create();
> >
> > - drm_exec_init(, 0);
> > + drm_exec_init(, 0, 0);
> >   drm_exec_until_all_locked() {
> >   r = drm_exec_lock_obj(,
> > _data->meta_data_obj->tbo.base);
> > @@ -1176,7 +1176,7 @@ int amdgpu_mes_ctx_unmap_meta_data(struct 
> > amdgpu_device *adev,
> >   struct drm_exec exec;
> >   long r;
> >
> > - drm_exec_init(, 0);
> &

[PATCH v2 2/2] drm/msm/gem: Add metadata

2023-10-29 Thread Rob Clark

From: Rob Clark 

The EXT_external_objects extension is a bit awkward as it doesn't pass
explicit modifiers, leaving the importer to guess with incomplete
information.  In the case of vk (turnip) exporting and gl (freedreno)
importing, the "OPTIMAL_TILING_EXT" layout depends on VkImageCreateInfo
flags (among other things), which the importer does not know.  Which
unfortunately leaves us with the need for a metadata back-channel.

The contents of the metadata are defined by userspace.  The
EXT_external_objects extension is only required to work between
compatible versions of gl and vk drivers, as defined by device and
driver UUIDs.

v2: add missing metadata kfree

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c | 57 ++-
 drivers/gpu/drm/msm/msm_gem.c |  1 +
 drivers/gpu/drm/msm/msm_gem.h |  4 +++
 include/uapi/drm/msm_drm.h|  2 ++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 781db689fb16..9ec74ab4cfea 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -49,9 +49,10 @@
  * - 1.9.0 - Add MSM_SUBMIT_FENCE_SN_IN
  * - 1.10.0 - Add MSM_SUBMIT_BO_NO_IMPLICIT
  * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
+ * - 1.12.0 - Add MSM_INFO_SET_METADATA and MSM_INFO_GET_METADATA
  */
 #define MSM_VERSION_MAJOR  1
-#define MSM_VERSION_MINOR  11
+#define MSM_VERSION_MINOR  12
 #define MSM_VERSION_PATCHLEVEL 0
 
 static void msm_deinit_vram(struct drm_device *ddev);
@@ -844,6 +845,8 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
break;
case MSM_INFO_SET_NAME:
case MSM_INFO_GET_NAME:
+   case MSM_INFO_SET_METADATA:
+   case MSM_INFO_GET_METADATA:
break;
default:
return -EINVAL;
@@ -905,6 +908,58 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
 msm_obj->name, args->len))
ret = -EFAULT;
}
+   break;
+   case MSM_INFO_SET_METADATA:
+   /* Impose a moderate upper bound on metadata size: */
+   if (args->len > 128) {
+   ret = -EOVERFLOW;
+   break;
+   }
+
+   ret = msm_gem_lock_interruptible(obj);
+   if (ret)
+   break;
+
+   msm_obj->metadata =
+   krealloc(msm_obj->metadata, args->len, GFP_KERNEL);
+   msm_obj->metadata_size = args->len;
+
+   if (copy_from_user(msm_obj->metadata, 
u64_to_user_ptr(args->value),
+  args->len)) {
+   msm_obj->metadata_size = 0;
+   ret = -EFAULT;
+   }
+
+   msm_gem_unlock(obj);
+
+   break;
+   case MSM_INFO_GET_METADATA:
+   if (!args->value) {
+   /*
+* Querying the size is inherently racey, but
+* EXT_external_objects expects the app to confirm
+* via device and driver UUIDs that the exporter and
+* importer versions match.  All we can do from the
+* kernel side is check the length under obj lock
+* when userspace tries to retrieve the metadata
+*/
+   args->len = msm_obj->metadata_size;
+   break;
+   }
+
+   ret = msm_gem_lock_interruptible(obj);
+   if (ret)
+   break;
+
+   if (args->len < msm_obj->metadata_size) {
+   ret = -ETOOSMALL;
+   } else if (copy_to_user(u64_to_user_ptr(args->value),
+   msm_obj->metadata, args->len)) {
+   ret = -EFAULT;
+   }
+
+   msm_gem_unlock(obj);
+
break;
}
 
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 1113e6b2ec8e..175ee4ab8a6f 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1058,6 +1058,7 @@ static void msm_gem_free_object(struct drm_gem_object 
*obj)
 
drm_gem_object_release(obj);
 
+   kfree(msm_obj->metadata);
kfree(msm_obj);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7f34263048a3..8d414b072c29 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -109,6 +109,10 @@ struct msm_gem_object {
 
char name[32]; /* Identifier to print for the debugfs files */
 
+   /* userspace metadata backchannel */
+   void *metadata;
+   u32 metadata_size;
+

[PATCH v2 1/2] drm/msm: Small uabi fixes

2023-10-29 Thread Rob Clark

From: Rob Clark 

Correct the minor version exposed and error return value for
MSM_INFO_GET_NAME.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4bd028fa7500..781db689fb16 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -51,7 +51,7 @@
  * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
  */
 #define MSM_VERSION_MAJOR  1
-#define MSM_VERSION_MINOR  10
+#define MSM_VERSION_MINOR  11
 #define MSM_VERSION_PATCHLEVEL 0
 
 static void msm_deinit_vram(struct drm_device *ddev);
@@ -896,7 +896,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
break;
case MSM_INFO_GET_NAME:
if (args->value && (args->len < strlen(msm_obj->name))) {
-   ret = -EINVAL;
+   ret = -ETOOSMALL;
break;
}
args->len = strlen(msm_obj->name);
-- 
2.41.0

[PATCH v2 0/2] drm/msm/gem: Add metadata uapi

2023-10-29 Thread Rob Clark

From: Rob Clark 

Add metadata mechanism to provide a back-channel to communicate image
layout information between vk and gl, because EXT_external_objects
doesn't support explicit modifiers and "OPTIMAL_TILING_EXT" is not
enough information for the importer to deduce the layout.

Rob Clark (2):
  drm/msm: Small uabi fixes
  drm/msm/gem: Add metadata

 drivers/gpu/drm/msm/msm_drv.c | 59 +--
 drivers/gpu/drm/msm/msm_gem.c |  1 +
 drivers/gpu/drm/msm/msm_gem.h |  4 +++
 include/uapi/drm/msm_drm.h|  2 ++
 4 files changed, 64 insertions(+), 2 deletions(-)

-- 
2.41.0

Re: [PATCH] drm/msm/gem: Add metadata

2023-10-28 Thread Rob Clark

On Fri, Oct 27, 2023 at 6:16 PM Dmitry Baryshkov
 wrote:
>
> On Fri, 27 Oct 2023 at 22:45, Rob Clark  wrote:
> >
> > From: Rob Clark 
> >
> > The EXT_external_objects extension is a bit awkward as it doesn't pass
> > explicit modifiers, leaving the importer to guess with incomplete
> > information.  In the case of vk (turnip) exporting and gl (freedreno)
> > importing, the "OPTIMAL_TILING_EXT" layout depends on VkImageCreateInfo
> > flags (among other things), which the importer does not know.  Which
> > unfortunately leaves us with the need for a metadata back-channel.
> >
> > The contents of the metadata are defined by userspace.  The
> > EXT_external_objects extension is only required to work between
> > compatible versions of gl and vk drivers, as defined by device and
> > driver UUIDs.

jfyi, userspace side of this at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25945

> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/msm_drv.c | 59 +--
> >  drivers/gpu/drm/msm/msm_gem.h |  4 +++
> >  include/uapi/drm/msm_drm.h|  2 ++
> >  3 files changed, 63 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > index b61ccea05327..8fe2677ea37a 100644
> > --- a/drivers/gpu/drm/msm/msm_drv.c
> > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > @@ -37,9 +37,10 @@
> >   * - 1.9.0 - Add MSM_SUBMIT_FENCE_SN_IN
> >   * - 1.10.0 - Add MSM_SUBMIT_BO_NO_IMPLICIT
> >   * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
> > + * - 1.12.0 - Add MSM_INFO_SET_METADATA and MSM_INFO_GET_METADATA
> >   */
> >  #define MSM_VERSION_MAJOR  1
> > -#define MSM_VERSION_MINOR  10
> > +#define MSM_VERSION_MINOR  12
> >  #define MSM_VERSION_PATCHLEVEL 0
> >
> >  static void msm_deinit_vram(struct drm_device *ddev);
> > @@ -566,6 +567,8 @@ static int msm_ioctl_gem_info(struct drm_device *dev, 
> > void *data,
> > break;
> > case MSM_INFO_SET_NAME:
> > case MSM_INFO_GET_NAME:
> > +   case MSM_INFO_SET_METADATA:
> > +   case MSM_INFO_GET_METADATA:
> > break;
> > default:
> > return -EINVAL;
> > @@ -618,7 +621,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, 
> > void *data,
> > break;
> > case MSM_INFO_GET_NAME:
> > if (args->value && (args->len < strlen(msm_obj->name))) {
> > -   ret = -EINVAL;
> > +   ret = -ETOOSMALL;
>
> This is unrelated and it also slightly changes user interface, so it
> IMO should come as a separate commit/

fair, although it was changed for consistency with GET_METADATA

> > break;
> > }
> > args->len = strlen(msm_obj->name);
> > @@ -627,6 +630,58 @@ static int msm_ioctl_gem_info(struct drm_device *dev, 
> > void *data,
> >  msm_obj->name, args->len))
> > ret = -EFAULT;
> > }
> > +   break;
> > +   case MSM_INFO_SET_METADATA:
> > +   /* Impose a moderate upper bound on metadata size: */
> > +   if (args->len > 128) {
> > +   ret = -EOVERFLOW;
> > +   break;
> > +   }
> > +
> > +   ret = msm_gem_lock_interruptible(obj);
> > +   if (ret)
> > +   break;
> > +
> > +   msm_obj->metadata =
> > +   krealloc(msm_obj->metadata, args->len, GFP_KERNEL);
> > +   msm_obj->metadata_size = args->len;
> > +
> > +   if (copy_from_user(msm_obj->metadata, 
> > u64_to_user_ptr(args->value),
> > +  args->len)) {
> > +   msm_obj->metadata_size = 0;
> > +   ret = -EFAULT;
> > +   }
> > +
> > +   msm_gem_unlock(obj);
> > +
> > +   break;
> > +   case MSM_INFO_GET_METADATA:
> > +   if (!args->value) {
> > +   /*
> > +* Querying the size is inherently racey, but
> > +* EXT_external_objects expects the app to confirm
> > +* via device and d

[PATCH] drm/msm/gem: Add metadata

2023-10-27 Thread Rob Clark

From: Rob Clark 

The EXT_external_objects extension is a bit awkward as it doesn't pass
explicit modifiers, leaving the importer to guess with incomplete
information.  In the case of vk (turnip) exporting and gl (freedreno)
importing, the "OPTIMAL_TILING_EXT" layout depends on VkImageCreateInfo
flags (among other things), which the importer does not know.  Which
unfortunately leaves us with the need for a metadata back-channel.

The contents of the metadata are defined by userspace.  The
EXT_external_objects extension is only required to work between
compatible versions of gl and vk drivers, as defined by device and
driver UUIDs.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c | 59 +--
 drivers/gpu/drm/msm/msm_gem.h |  4 +++
 include/uapi/drm/msm_drm.h|  2 ++
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index b61ccea05327..8fe2677ea37a 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -37,9 +37,10 @@
  * - 1.9.0 - Add MSM_SUBMIT_FENCE_SN_IN
  * - 1.10.0 - Add MSM_SUBMIT_BO_NO_IMPLICIT
  * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
+ * - 1.12.0 - Add MSM_INFO_SET_METADATA and MSM_INFO_GET_METADATA
  */
 #define MSM_VERSION_MAJOR  1
-#define MSM_VERSION_MINOR  10
+#define MSM_VERSION_MINOR  12
 #define MSM_VERSION_PATCHLEVEL 0
 
 static void msm_deinit_vram(struct drm_device *ddev);
@@ -566,6 +567,8 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
break;
case MSM_INFO_SET_NAME:
case MSM_INFO_GET_NAME:
+   case MSM_INFO_SET_METADATA:
+   case MSM_INFO_GET_METADATA:
break;
default:
return -EINVAL;
@@ -618,7 +621,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
break;
case MSM_INFO_GET_NAME:
if (args->value && (args->len < strlen(msm_obj->name))) {
-   ret = -EINVAL;
+   ret = -ETOOSMALL;
break;
}
args->len = strlen(msm_obj->name);
@@ -627,6 +630,58 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
 msm_obj->name, args->len))
ret = -EFAULT;
}
+   break;
+   case MSM_INFO_SET_METADATA:
+   /* Impose a moderate upper bound on metadata size: */
+   if (args->len > 128) {
+   ret = -EOVERFLOW;
+   break;
+   }
+
+   ret = msm_gem_lock_interruptible(obj);
+   if (ret)
+   break;
+
+   msm_obj->metadata =
+   krealloc(msm_obj->metadata, args->len, GFP_KERNEL);
+   msm_obj->metadata_size = args->len;
+
+   if (copy_from_user(msm_obj->metadata, 
u64_to_user_ptr(args->value),
+  args->len)) {
+   msm_obj->metadata_size = 0;
+   ret = -EFAULT;
+   }
+
+   msm_gem_unlock(obj);
+
+   break;
+   case MSM_INFO_GET_METADATA:
+   if (!args->value) {
+   /*
+* Querying the size is inherently racey, but
+* EXT_external_objects expects the app to confirm
+* via device and driver UUIDs that the exporter and
+* importer versions match.  All we can do from the
+* kernel side is check the length under obj lock
+* when userspace tries to retrieve the metadata
+*/
+   args->len = msm_obj->metadata_size;
+   break;
+   }
+
+   ret = msm_gem_lock_interruptible(obj);
+   if (ret)
+   break;
+
+   if (args->len < msm_obj->metadata_size) {
+   ret = -ETOOSMALL;
+   } else if (copy_to_user(u64_to_user_ptr(args->value),
+   msm_obj->metadata, args->len)) {
+   ret = -EFAULT;
+   }
+
+   msm_gem_unlock(obj);
+
break;
}
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7f34263048a3..8d414b072c29 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -109,6 +109,10 @@ struct msm_gem_object {
 
char name[32]; /* Identifier to print for the debugfs files */
 
+   /* userspace metadata backchannel */
+   void *metadata;
+   u32 metadata_size;
+
/**
 * pin_count:

[PATCH 6/7] drm/exec: Pass in initial # of objects

2023-10-27 Thread Rob Clark

From: Rob Clark 

In cases where the # is known ahead of time, it is silly to do the table
resize dance.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |  4 ++--
 drivers/gpu/drm/drm_exec.c  | 15 ---
 drivers/gpu/drm/nouveau/nouveau_exec.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  2 +-
 include/drm/drm_exec.h  |  2 +-
 8 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index efdb1c48f431..d27ca8f61929 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -65,7 +65,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p,
}
 
amdgpu_sync_create(>sync);
-   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 720011019741..796fa6f1420b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -70,7 +70,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
struct drm_exec exec;
int r;
 
-   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked() {
r = amdgpu_vm_lock_pd(vm, , 0);
if (likely(!r))
@@ -110,7 +110,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
struct drm_exec exec;
int r;
 
-   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
+   drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked() {
r = amdgpu_vm_lock_pd(vm, , 0);
if (likely(!r))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index ca4d2d430e28..16f1715148ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -203,7 +203,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object 
*obj,
struct drm_exec exec;
long r;
 
-   drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES);
+   drm_exec_init(, DRM_EXEC_IGNORE_DUPLICATES, 0);
drm_exec_until_all_locked() {
r = drm_exec_prepare_obj(, >tbo.base, 1);
drm_exec_retry_on_contention();
@@ -739,7 +739,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
}
 
drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT |
- DRM_EXEC_IGNORE_DUPLICATES);
+ DRM_EXEC_IGNORE_DUPLICATES, 0);
drm_exec_until_all_locked() {
if (gobj) {
r = drm_exec_lock_obj(, gobj);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index b6015157763a..3c351941701e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1105,7 +1105,7 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device 
*adev,
 
amdgpu_sync_create();
 
-   drm_exec_init(, 0);
+   drm_exec_init(, 0, 0);
drm_exec_until_all_locked() {
r = drm_exec_lock_obj(,
  _data->meta_data_obj->tbo.base);
@@ -1176,7 +1176,7 @@ int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device 
*adev,
struct drm_exec exec;
long r;
 
-   drm_exec_init(, 0);
+   drm_exec_init(, 0, 0);
drm_exec_until_all_locked() {
r = drm_exec_lock_obj(,
  _data->meta_data_obj->tbo.base);
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
index 5d2809de4517..27d11c20d148 100644
--- a/drivers/gpu/drm/drm_exec.c
+++ b/drivers/gpu/drm/drm_exec.c
@@ -69,16 +69,25 @@ static void drm_exec_unlock_all(struct drm_exec *exec)
  * drm_exec_init - initialize a drm_exec object
  * @exec: the drm_exec object to initialize
  * @flags: controls locking behavior, see DRM_EXEC_* defines
+ * @nr: the initial # of objects
  *
  * Initialize the object and make sure that we can track locked objects.
+ *
+ * If nr is non-zero then it is used as the initial objects table size.
+ * In either case, the table will grow (be re-allocated) on demand.
  */
-void drm_exec_init(struct drm_exec *exec, uint32_t flags)
+void drm_exec_init(struct drm_exec *exec, uint32_t flags, unsigned nr)
 {
+   size_t sz = PAGE_SIZE;
+
+   if (nr)
+   sz = (size_t)nr * sizeof(void *);
+
exec->flags = flags;
-   exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+   exec->

[PATCH 7/7] drm/msm/gem: Convert to drm_exec

2023-10-27 Thread Rob Clark

From: Rob Clark 

Replace the ww_mutex locking dance with the drm_exec helper.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/Kconfig  |   1 +
 drivers/gpu/drm/msm/msm_gem.h|   5 +-
 drivers/gpu/drm/msm/msm_gem_submit.c | 117 +--
 3 files changed, 24 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 6309a857ca31..f91d87afc0d3 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -16,6 +16,7 @@ config DRM_MSM
select DRM_DP_AUX_BUS
select DRM_DISPLAY_DP_HELPER
select DRM_DISPLAY_HELPER
+   select DRM_EXEC
select DRM_KMS_HELPER
select DRM_PANEL
select DRM_BRIDGE
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index af884ced7a0d..7f34263048a3 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include "drm/drm_exec.h"
 #include "drm/gpu_scheduler.h"
 #include "msm_drv.h"
 
@@ -254,7 +255,7 @@ struct msm_gem_submit {
struct msm_gpu *gpu;
struct msm_gem_address_space *aspace;
struct list_head node;   /* node in ring submit list */
-   struct ww_acquire_ctx ticket;
+   struct drm_exec exec;
uint32_t seqno; /* Sequence number of the submit on the ring */
 
/* Hw fence, which is created when the scheduler executes the job, and
@@ -287,8 +288,6 @@ struct msm_gem_submit {
struct drm_msm_gem_submit_reloc *relocs;
} *cmd;  /* array of size nr_cmds */
struct {
-/* make sure these don't conflict w/ MSM_SUBMIT_BO_x */
-#define BO_LOCKED  0x4000  /* obj lock is held */
uint32_t flags;
union {
struct drm_gem_object *obj;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 603f04d851d9..f8d14d4ccfef 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -248,85 +248,31 @@ static int submit_lookup_cmds(struct msm_gem_submit 
*submit,
return ret;
 }
 
-static void submit_unlock_bo(struct msm_gem_submit *submit, int i)
-{
-   struct drm_gem_object *obj = submit->bos[i].obj;
-   unsigned cleanup_flags = BO_LOCKED;
-   unsigned flags = submit->bos[i].flags & cleanup_flags;
-
-   /*
-* Clear flags bit before dropping lock, so that the msm_job_run()
-* path isn't racing with submit_cleanup() (ie. the read/modify/
-* write is protected by the obj lock in all paths)
-*/
-   submit->bos[i].flags &= ~cleanup_flags;
-
-   if (flags & BO_LOCKED)
-   dma_resv_unlock(obj->resv);
-}
-
 /* This is where we make sure all the bo's are reserved and pin'd: */
 static int submit_lock_objects(struct msm_gem_submit *submit)
 {
-   int contended, slow_locked = -1, i, ret = 0;
-
-retry:
-   for (i = 0; i < submit->nr_bos; i++) {
-   struct drm_gem_object *obj = submit->bos[i].obj;
-
-   if (slow_locked == i)
-   slow_locked = -1;
+   int ret;
 
-   contended = i;
+   drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 
submit->nr_bos);
 
-   if (!(submit->bos[i].flags & BO_LOCKED)) {
-   ret = dma_resv_lock_interruptible(obj->resv,
- >ticket);
+   drm_exec_until_all_locked (>exec) {
+   for (unsigned i = 0; i < submit->nr_bos; i++) {
+   struct drm_gem_object *obj = submit->bos[i].obj;
+   ret = drm_exec_prepare_obj(>exec, obj, 1);
+   drm_exec_retry_on_contention(>exec);
if (ret)
-   goto fail;
-   submit->bos[i].flags |= BO_LOCKED;
+   goto error;
}
}
 
-   ww_acquire_done(>ticket);
-
return 0;
 
-fail:
-   if (ret == -EALREADY) {
-   SUBMIT_ERROR(submit, "handle %u at index %u already on submit 
list\n",
-submit->bos[i].handle, i);
-   ret = -EINVAL;
-   }
-
-   for (; i >= 0; i--)
-   submit_unlock_bo(submit, i);
-
-   if (slow_locked > 0)
-   submit_unlock_bo(submit, slow_locked);
-
-   if (ret == -EDEADLK) {
-   struct drm_gem_object *obj = submit->bos[contended].obj;
-   /* we lost out in a seqno race, lock and retry.. */
-   ret = dma_resv_lock_slow_interruptible(obj->resv,
-  >ticket);
-   if (!ret) {
-   submit->bos[contended].flags |= BO_LOCKED;
-

[PATCH 5/7] drm/msm/gem: Cleanup submit_cleanup_bo()

2023-10-27 Thread Rob Clark

From: Rob Clark 

Now that it only handles unlock duty, drop the superfluous arg and
rename it.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index d001bf286606..603f04d851d9 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -248,14 +248,10 @@ static int submit_lookup_cmds(struct msm_gem_submit 
*submit,
return ret;
 }
 
-/* Unwind bo state, according to cleanup_flags.  In the success case, only
- * the lock is dropped at the end of the submit (and active/pin ref is dropped
- * later when the submit is retired).
- */
-static void submit_cleanup_bo(struct msm_gem_submit *submit, int i,
-   unsigned cleanup_flags)
+static void submit_unlock_bo(struct msm_gem_submit *submit, int i)
 {
struct drm_gem_object *obj = submit->bos[i].obj;
+   unsigned cleanup_flags = BO_LOCKED;
unsigned flags = submit->bos[i].flags & cleanup_flags;
 
/*
@@ -304,10 +300,10 @@ static int submit_lock_objects(struct msm_gem_submit 
*submit)
}
 
for (; i >= 0; i--)
-   submit_cleanup_bo(submit, i, BO_LOCKED);
+   submit_unlock_bo(submit, i);
 
if (slow_locked > 0)
-   submit_cleanup_bo(submit, slow_locked, BO_LOCKED);
+   submit_unlock_bo(submit, slow_locked);
 
if (ret == -EDEADLK) {
struct drm_gem_object *obj = submit->bos[contended].obj;
@@ -533,7 +529,6 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct drm_gem_object *ob
  */
 static void submit_cleanup(struct msm_gem_submit *submit, bool error)
 {
-   unsigned cleanup_flags = BO_LOCKED;
unsigned i;
 
if (error)
@@ -541,7 +536,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, 
bool error)
 
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = submit->bos[i].obj;
-   submit_cleanup_bo(submit, i, cleanup_flags);
+   submit_unlock_bo(submit, i);
if (error)
drm_gem_object_put(obj);
}
-- 
2.41.0

[PATCH 4/7] drm/msm/gem: Split out submit_unpin_objects() helper

2023-10-27 Thread Rob Clark

From: Rob Clark 

Untangle unpinning from unlock/unref loop.  The unpin only happens in
error paths so it is easier to decouple from the normal unlock path.

Since we never have an intermediate state where a subset of buffers
are pinned (ie. we never bail out of the pin or unpin loops) we can
replace the bo state flag bit with a global flag in the submit.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.h|  6 +++---
 drivers/gpu/drm/msm/msm_gem_submit.c | 22 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c |  3 ++-
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index c36c1c1fa222..af884ced7a0d 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -270,8 +270,9 @@ struct msm_gem_submit {
int fence_id;   /* key into queue->fence_idr */
struct msm_gpu_submitqueue *queue;
struct pid *pid;/* submitting process */
-   bool fault_dumped;  /* Limit devcoredump dumping to one per submit */
-   bool in_rb; /* "sudo" mode, copy cmds into RB */
+   bool bos_pinned : 1;
+   bool fault_dumped:1;/* Limit devcoredump dumping to one per submit */
+   bool in_rb : 1; /* "sudo" mode, copy cmds into RB */
struct msm_ringbuffer *ring;
unsigned int nr_cmds;
unsigned int nr_bos;
@@ -288,7 +289,6 @@ struct msm_gem_submit {
struct {
 /* make sure these don't conflict w/ MSM_SUBMIT_BO_x */
 #define BO_LOCKED  0x4000  /* obj lock is held */
-#define BO_PINNED  0x2000  /* obj (pages) is pinned and on active list */
uint32_t flags;
union {
struct drm_gem_object *obj;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 786b48a55309..d001bf286606 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -265,9 +265,6 @@ static void submit_cleanup_bo(struct msm_gem_submit 
*submit, int i,
 */
submit->bos[i].flags &= ~cleanup_flags;
 
-   if (flags & BO_PINNED)
-   msm_gem_unpin_locked(obj);
-
if (flags & BO_LOCKED)
dma_resv_unlock(obj->resv);
 }
@@ -407,13 +404,28 @@ static int submit_pin_objects(struct msm_gem_submit 
*submit)
mutex_lock(>lru.lock);
for (i = 0; i < submit->nr_bos; i++) {
msm_gem_pin_obj_locked(submit->bos[i].obj);
-   submit->bos[i].flags |= BO_PINNED;
}
mutex_unlock(>lru.lock);
 
+   submit->bos_pinned = true;
+
return ret;
 }
 
+static void submit_unpin_objects(struct msm_gem_submit *submit)
+{
+   if (!submit->bos_pinned)
+   return;
+
+   for (int i = 0; i < submit->nr_bos; i++) {
+   struct drm_gem_object *obj = submit->bos[i].obj;
+
+   msm_gem_unpin_locked(obj);
+   }
+
+   submit->bos_pinned = false;
+}
+
 static void submit_attach_object_fences(struct msm_gem_submit *submit)
 {
int i;
@@ -525,7 +537,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, 
bool error)
unsigned i;
 
if (error)
-   cleanup_flags |= BO_PINNED;
+   submit_unpin_objects(submit);
 
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = submit->bos[i].obj;
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 9d6e2e10d25a..7ea5eca118eb 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -29,9 +29,10 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
struct drm_gem_object *obj = submit->bos[i].obj;
 
msm_gem_unpin_active(obj);
-   submit->bos[i].flags &= ~BO_PINNED;
}
 
+   submit->bos_pinned = false;
+
mutex_unlock(>lru.lock);
 
msm_gpu_submit(gpu, submit);
-- 
2.41.0

[PATCH 3/7] drm/msm/gem: Don't queue job to sched in error cases

2023-10-27 Thread Rob Clark

From: Rob Clark 

We shouldn't be running the job in error cases.  This also avoids having
to think too hard about where the objs get unpinned (and if necessary,
the resv takes over tracking that the obj is busy).. ie. error cases it
always happens synchronously, and normal cases it happens from scheduler
job_run() callback.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 2d5527dc3e1a..786b48a55309 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -946,6 +946,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
}
}
 
+   if (ret)
+   goto out;
+
submit_attach_object_fences(submit);
 
/* The scheduler owns a ref now: */
-- 
2.41.0

[PATCH 2/7] drm/msm/gem: Remove submit_unlock_unpin_bo()

2023-10-27 Thread Rob Clark

From: Rob Clark 

The only point it is called is before pinning objects, so the "unpin"
part of the name is fiction.  Just remove call submit_cleanup_bo()
directly.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 996274ef32a6..2d5527dc3e1a 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -272,12 +272,6 @@ static void submit_cleanup_bo(struct msm_gem_submit 
*submit, int i,
dma_resv_unlock(obj->resv);
 }
 
-static void submit_unlock_unpin_bo(struct msm_gem_submit *submit, int i)
-{
-   unsigned cleanup_flags = BO_PINNED | BO_LOCKED;
-   submit_cleanup_bo(submit, i, cleanup_flags);
-}
-
 /* This is where we make sure all the bo's are reserved and pin'd: */
 static int submit_lock_objects(struct msm_gem_submit *submit)
 {
@@ -313,10 +307,10 @@ static int submit_lock_objects(struct msm_gem_submit 
*submit)
}
 
for (; i >= 0; i--)
-   submit_unlock_unpin_bo(submit, i);
+   submit_cleanup_bo(submit, i, BO_LOCKED);
 
if (slow_locked > 0)
-   submit_unlock_unpin_bo(submit, slow_locked);
+   submit_cleanup_bo(submit, slow_locked, BO_LOCKED);
 
if (ret == -EDEADLK) {
struct drm_gem_object *obj = submit->bos[contended].obj;
-- 
2.41.0

[PATCH 1/7] drm/msm/gem: Remove "valid" tracking

2023-10-27 Thread Rob Clark

From: Rob Clark 

This was a small optimization for pre-soft-pin userspace.  But mesa
switched to soft-pin nearly 5yrs ago.  So lets drop the optimization
and simplify the code.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.h|  2 --
 drivers/gpu/drm/msm/msm_gem_submit.c | 44 +---
 2 files changed, 8 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 8ddef5443140..c36c1c1fa222 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -271,7 +271,6 @@ struct msm_gem_submit {
struct msm_gpu_submitqueue *queue;
struct pid *pid;/* submitting process */
bool fault_dumped;  /* Limit devcoredump dumping to one per submit */
-   bool valid; /* true if no cmdstream patching needed */
bool in_rb; /* "sudo" mode, copy cmds into RB */
struct msm_ringbuffer *ring;
unsigned int nr_cmds;
@@ -288,7 +287,6 @@ struct msm_gem_submit {
} *cmd;  /* array of size nr_cmds */
struct {
 /* make sure these don't conflict w/ MSM_SUBMIT_BO_x */
-#define BO_VALID   0x8000  /* is current addr in cmdstream correct/valid? 
*/
 #define BO_LOCKED  0x4000  /* obj lock is held */
 #define BO_PINNED  0x2000  /* obj (pages) is pinned and on active list */
uint32_t flags;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 6d8ec1337e8b..996274ef32a6 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -150,8 +150,6 @@ static int submit_lookup_objects(struct msm_gem_submit 
*submit,
 
submit->bos[i].handle = submit_bo.handle;
submit->bos[i].flags = submit_bo.flags;
-   /* in validate_objects() we figure out if this is true: */
-   submit->bos[i].iova  = submit_bo.presumed;
}
 
spin_lock(>table_lock);
@@ -278,9 +276,6 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit 
*submit, int i)
 {
unsigned cleanup_flags = BO_PINNED | BO_LOCKED;
submit_cleanup_bo(submit, i, cleanup_flags);
-
-   if (!(submit->bos[i].flags & BO_VALID))
-   submit->bos[i].iova = 0;
 }
 
 /* This is where we make sure all the bo's are reserved and pin'd: */
@@ -390,8 +385,6 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
struct msm_drm_private *priv = submit->dev->dev_private;
int i, ret = 0;
 
-   submit->valid = true;
-
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = submit->bos[i].obj;
struct msm_gem_vma *vma;
@@ -407,14 +400,7 @@ static int submit_pin_objects(struct msm_gem_submit 
*submit)
if (ret)
break;
 
-   if (vma->iova == submit->bos[i].iova) {
-   submit->bos[i].flags |= BO_VALID;
-   } else {
-   submit->bos[i].iova = vma->iova;
-   /* iova changed, so address in cmdstream is not valid: 
*/
-   submit->bos[i].flags &= ~BO_VALID;
-   submit->valid = false;
-   }
+   submit->bos[i].iova = vma->iova;
}
 
/*
@@ -451,7 +437,7 @@ static void submit_attach_object_fences(struct 
msm_gem_submit *submit)
 }
 
 static int submit_bo(struct msm_gem_submit *submit, uint32_t idx,
-   struct drm_gem_object **obj, uint64_t *iova, bool *valid)
+   struct drm_gem_object **obj, uint64_t *iova)
 {
if (idx >= submit->nr_bos) {
SUBMIT_ERROR(submit, "invalid buffer index: %u (out of %u)\n",
@@ -463,8 +449,6 @@ static int submit_bo(struct msm_gem_submit *submit, 
uint32_t idx,
*obj = submit->bos[idx].obj;
if (iova)
*iova = submit->bos[idx].iova;
-   if (valid)
-   *valid = !!(submit->bos[idx].flags & BO_VALID);
 
return 0;
 }
@@ -477,9 +461,6 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct drm_gem_object *ob
uint32_t *ptr;
int ret = 0;
 
-   if (!nr_relocs)
-   return 0;
-
if (offset % 4) {
SUBMIT_ERROR(submit, "non-aligned cmdstream buffer: %u\n", 
offset);
return -EINVAL;
@@ -500,7 +481,6 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct drm_gem_object *ob
struct drm_msm_gem_submit_reloc submit_reloc = relocs[i];
uint32_t off;
uint64_t iova;
-   bool valid;
 
if (submit_reloc.submit_offset % 4) {
SUBMIT_ERROR(submit, "non-aligned reloc offset: %u\n",
@@ -519,13 +499,10 @@ static int submit_reloc(struct msm_ge

[PATCH 0/7] drm/msm/gem: drm_exec conversion

2023-10-27 Thread Rob Clark

From: Rob Clark 

Simplify the exec path (removing a legacy optimization) and convert to
drm_exec.  One drm_exec patch to allow passing in the expected # of GEM
objects to avoid re-allocation.

I'd be a bit happier if I could avoid the extra objects table allocation
in drm_exec in the first place, but wasn't really happy with any of the
things I tried to get rid of that.

Rob Clark (7):
  drm/msm/gem: Remove "valid" tracking
  drm/msm/gem: Remove submit_unlock_unpin_bo()
  drm/msm/gem: Don't queue job to sched in error cases
  drm/msm/gem: Split out submit_unpin_objects() helper
  drm/msm/gem: Cleanup submit_cleanup_bo()
  drm/exec: Pass in initial # of objects
  drm/msm/gem: Convert to drm_exec

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |   4 +-
 drivers/gpu/drm/drm_exec.c  |  15 +-
 drivers/gpu/drm/msm/Kconfig |   1 +
 drivers/gpu/drm/msm/msm_gem.h   |  13 +-
 drivers/gpu/drm/msm/msm_gem_submit.c| 197 ++--
 drivers/gpu/drm/msm/msm_ringbuffer.c|   3 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c  |   2 +-
 include/drm/drm_exec.h  |   2 +-
 12 files changed, 79 insertions(+), 170 deletions(-)

-- 
2.41.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 5720 matches

Mail list logo