date:20210617

Re: [PATCH] drm/bridge: ti-sn65dsi83: Fix null pointer dereference in remove callback

2021-06-17 Thread Marek Vasut


On 6/18/21 5:06 AM, Jonathan Liu wrote:

Hi Marek,


Hi,


Hi Jonathan,

Thank you for the patch.

On Thu, Jun 17, 2021 at 09:19:25PM +1000, Jonathan Liu wrote:

If attach has not been called, unloading the driver can result in a null
pointer dereference in mipi_dsi_detach as ctx->dsi has not been assigned
yet.


Shouldn't this be done in a brige .detach() operation instead ?



Could you please take a look?
I don't have a working setup to test moving the code to detach.


I just replied to your other email regarding bringing the chip up, so 
please bring your setup up first, then test this patch again, and then 
let's revisit this topic.

[PATCH v2] drm/meson: fix potential NULL pointer exception in meson_drv_unbind()

2021-06-17 Thread Jiajun Cao

Fix a potential NULL pointer exception when meson_drv_unbind()
attempts to operate on the driver_data priv which may be NULL.
Add a null pointer check on the priv struct to avoid the NULL
pointer dereference after calling dev_get_drvdata(), just like
the null pointer checks done on the struct priv in the function
meson_drv_shutdown(), meson_drv_pm_suspend() and meson_drv_pm_resume().

Signed-off-by: Jiajun Cao 
Signed-off-by: Xin Tan 
---
 drivers/gpu/drm/meson/meson_drv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/meson/meson_drv.c 
b/drivers/gpu/drm/meson/meson_drv.c
index 07fcd12dca16..f544fba8c44e 100644
--- a/drivers/gpu/drm/meson/meson_drv.c
+++ b/drivers/gpu/drm/meson/meson_drv.c
@@ -380,7 +380,10 @@ static int meson_drv_bind(struct device *dev)
 static void meson_drv_unbind(struct device *dev)
 {
struct meson_drm *priv = dev_get_drvdata(dev);
-   struct drm_device *drm = priv->drm;
+   struct drm_device *drm;
+
+   if (!priv)
+   return;
 
if (priv->canvas) {
meson_canvas_free(priv->canvas, priv->canvas_id_osd1);
@@ -389,6 +392,7 @@ static void meson_drv_unbind(struct device *dev)
meson_canvas_free(priv->canvas, priv->canvas_id_vd1_2);
}
 
+   drm = priv->drm;
drm_dev_unregister(drm);
drm_kms_helper_poll_fini(drm);
drm_atomic_helper_shutdown(drm);
-- 
2.17.1

Re: [PATCH] drm/bridge: ti-sn65dsi83: Fix null pointer dereference in remove callback

2021-06-17 Thread Jonathan Liu

Hi Marek,

On Fri, 18 Jun 2021 at 00:14, Laurent Pinchart
 wrote:
>
> Hi Jonathan,
>
> Thank you for the patch.
>
> On Thu, Jun 17, 2021 at 09:19:25PM +1000, Jonathan Liu wrote:
> > If attach has not been called, unloading the driver can result in a null
> > pointer dereference in mipi_dsi_detach as ctx->dsi has not been assigned
> > yet.
>
> Shouldn't this be done in a brige .detach() operation instead ?
>

Could you please take a look?
I don't have a working setup to test moving the code to detach.

> > Fixes: ceb515ba29ba6b ("drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and 
> > SN65DSI84 driver")
> > Signed-off-by: Jonathan Liu 
> > ---
> >  drivers/gpu/drm/bridge/ti-sn65dsi83.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c 
> > b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
> > index 750f2172ef08..8e9f45c5c7c1 100644
> > --- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
> > +++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
> > @@ -671,8 +671,11 @@ static int sn65dsi83_remove(struct i2c_client *client)
> >  {
> >   struct sn65dsi83 *ctx = i2c_get_clientdata(client);
> >
> > - mipi_dsi_detach(ctx->dsi);
> > - mipi_dsi_device_unregister(ctx->dsi);
> > + if (ctx->dsi) {
> > + mipi_dsi_detach(ctx->dsi);
> > + mipi_dsi_device_unregister(ctx->dsi);
> > + }
> > +
> >   drm_bridge_remove(>bridge);
> >   of_node_put(ctx->host_node);
> >

Thanks.

Regards,
Jonathan

Re: [PATCH v2 2/2] drm: Protect drm_master pointers in drm_lease.c

2021-06-17 Thread Desmond Cheong Zhi Xi


On 18/6/21 1:12 am, Daniel Vetter wrote:

On Tue, Jun 15, 2021 at 10:36:45AM +0800, Desmond Cheong Zhi Xi wrote:

This patch ensures that the device's master mutex is acquired before
accessing pointers to struct drm_master that are subsequently
dereferenced. Without the mutex, the struct drm_master may be freed
concurrently by another process calling drm_setmaster_ioctl(). This
could then lead to use-after-free errors.

Reported-by: Daniel Vetter 
Signed-off-by: Desmond Cheong Zhi Xi 
Reviewed-by: Emil Velikov 
---
  drivers/gpu/drm/drm_lease.c | 58 +++--
  1 file changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c
index da4f085fc09e..3e6f689236e5 100644
--- a/drivers/gpu/drm/drm_lease.c
+++ b/drivers/gpu/drm/drm_lease.c
@@ -107,10 +107,16 @@ static bool _drm_has_leased(struct drm_master *master, 
int id)
   */
  bool _drm_lease_held(struct drm_file *file_priv, int id)
  {
+   bool ret;
+
if (!file_priv || !file_priv->master)
return true;
  
-	return _drm_lease_held_master(file_priv->master, id);

+   mutex_lock(_priv->master->dev->master_mutex);


So maybe we have a bug somewhere, and the kerneldoc isn't 100% clear, but
I thought file_priv->master is invariant over the lifetime of file_priv.
So we don't need a lock to check anything here.

It's the drm_device->master derefence that gets us into trouble. Well also
file_priv->is_owner is protected by dev->master_mutex.

So I think with your previous patch all the access here in drm_lease.c is
ok and already protected? Or am I missing something?

Thanks, Daniel



My thinking was that file_priv->master is invariant only if it is the 
creator of master. If file_priv->is_master is false, then a call to 
drm_setmaster_ioctl will invoke drm_new_set_master, which then allocates 
a new master for file_priv, and puts the old master.


This could be an issue in _drm_lease_held_master, because we dereference 
master to get master->dev, master->lessor, and master->leases.


With the same reasoning, in other parts of drm_lease.c, if there's an 
access to drm_file->master that's subsequently dereferenced, I added a 
lock around them.


I could definitely be mistaken on this, so apologies if this scenario 
doesn't arise.


Best wishes,
Desmond

Re:Re: [PATCH] drm/auth: Move master pointer from drm_device to drm_file

2021-06-17 Thread 马强

> That sounds like a bug. drm_file->master should be always the same -> either you become a new stand-alone thing, our you get linked to the> current master.> > Or I'm completely missing what you're trying to fix here.Now I have a bug, the soft cursor disappears when switching users.debug it, the fpriv->is_master is found to be 0. The reason it is 0 is that switching the user frees pointer of drm_file, and creating a new pointer of drm_file, and setting master. However, since dev->master is non-0, drm_new_set_master() of the setting master function will not be executed.--Qiang MaOn Thu, Jun 17, 2021 at 05:47:33PM +0800, Qiang Ma wrote:  > The drm_file pointer clears to zero during multi-user switching,  > so it needs to call drm_new_set_master for master pointer from drm_file.  
  That sounds like a bug. drm_file->master should be always the same -  either you become a new stand-alone thing, our you get linked to the  current master.  
  Or I'm completely missing what you're trying to fix here.  -Daniel  
  > 
  > Signed-off-by: Qiang Ma   > ---  > drivers/gpu/drm/drm_auth.c | 2 +-  > 1 file changed, 1 insertion(+), 1 deletion(-)  > 
  > diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c  > index f2d46b7ac6f9..02431af6d0c5 100644  > --- a/drivers/gpu/drm/drm_auth.c  > +++ b/drivers/gpu/drm/drm_auth.c  > @@ -302,7 +302,7 @@ int drm_master_open(struct drm_file *file_priv)  > /* if there is no current master make this fd it, but do not create  > * any master object for render clients */  > mutex_lock(>master_mutex);  > - if (!dev->master)  > + if (!file_priv->master)  > ret = drm_new_set_master(dev, file_priv);  > else  > file_priv->master = drm_master_get(dev->master);  > -- 
  > 2.20.1  > 
  > 
  > 
  
  -- 
  Daniel Vetter  Software Engineer, Intel Corporation  http://blog.ffwll.ch

Re: [PATCH v2 1/2] drm: Add a locked version of drm_is_current_master

2021-06-17 Thread Desmond Cheong Zhi Xi


On 18/6/21 1:03 am, Daniel Vetter wrote:

On Tue, Jun 15, 2021 at 10:36:44AM +0800, Desmond Cheong Zhi Xi wrote:

While checking the master status of the DRM file in
drm_is_current_master(), the device's master mutex should be
held. Without the mutex, the pointer fpriv->master may be freed
concurrently by another process calling drm_setmaster_ioctl(). This
could lead to use-after-free errors when the pointer is subsequently
dereferenced in drm_lease_owner().

The callers of drm_is_current_master() from drm_auth.c hold the
device's master mutex, but external callers do not. Hence, we implement
drm_is_current_master_locked() to be used within drm_auth.c, and
modify drm_is_current_master() to grab the device's master mutex
before checking the master status.

Reported-by: Daniel Vetter 
Signed-off-by: Desmond Cheong Zhi Xi 
Reviewed-by: Emil Velikov 
---
  drivers/gpu/drm/drm_auth.c | 23 +++
  1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 232abbba3686..c6bf52c310a9 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -61,6 +61,8 @@
   * trusted clients.
   */
  
+static bool drm_is_current_master_locked(struct drm_file *fpriv);


A bit a bikeshed, but we try to avoid forward declarations when they're
not needed. If you don't want to tear apart drm_is_current_master and the
_locked version then just move them together.

Can you pls do that and respin?

Otherwise looks all great.
-Daniel




Yeah, I was trying to keep the logic in _locked close to 
drm_is_current_master. But got it, will do.

Re: [PULL] drm-misc-next-fixes

2021-06-17 Thread Dave Airlie

On Fri, 18 Jun 2021 at 12:26, Dave Airlie  wrote:
>
> when I pulled this in drm-next I got these.
>
> were the mst fixes meant for next or fixes btw? I'm not really sure,
> but either way I don't think this is a local reason it doesn't build
> or did I miss something?

Hi Thomas,

Please resend with the fix Lyude has pushed (just keep building, just
keep building).

Thanks,
Dave.
>
> Dave.
>
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:
> In function ‘drm_dp_update_payload_part1’:
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:450:27:
> error: request for member ‘dev’ in something not a structure or union
>   450 |  drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_KMS, fmt, 
> ##__VA_ARGS__)
>   |   ^~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3392:5:
> note: in expansion of macro ‘drm_dbg_kms’
>  3392 | drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
>   | ^~~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3392:68:
> warning: passing argument 3 of ‘drm_dev_dbg’ makes pointer from
> integer without a cast [-Wint-conversion]
>  3392 | drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
>   |^
>   ||
>   |int
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:450:53:
> note: in definition of macro ‘drm_dbg_kms’
>   450 |  drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_KMS, fmt, 
> ##__VA_ARGS__)
>   | ^~~
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:338:16:
> note: expected ‘const char *’ but argument is of type ‘int’
>   338 |const char *format, ...);
>   |^~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:53:
> error: macro "drm_dbg_kms" requires 3 arguments, but only 1 given
>  3407 |   drm_dbg_kms("Fail:set payload to invalid sink");
>   | ^
> In file included from
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:45:
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:449: note:
> macro "drm_dbg_kms" defined here
>   449 | #define drm_dbg_kms(drm, fmt, ...) \
>   |
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:7:
> error: ‘drm_dbg_kms’ undeclared (first use in this function)
>  3407 |   drm_dbg_kms("Fail:set payload to invalid sink");
>   |   ^~~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:7:
> note: each undeclared identifier is reported only once for each
> function it appears in
> make[4]: *** [/home/airlied/devel/kernel/dim/src/scripts/Makefile.build:272:
> drivers/gpu/drm/drm_dp_mst_topology.o] Error 1
> make[4]: *** Waiting for unfinished jobs
>
> On Thu, 17 Jun 2021 at 04:30, Thomas Zimmermann  wrote:
> >
> > Hi Dave and Daniel,
> >
> > here's this week's PR for drm-misc-next-fixes.
> >
> > Best regards
> > Thomas
> >
> > drm-misc-next-fixes-2021-06-16:
> > Short summary of fixes pull:
> >
> >  * hyperv: advertise the correct formatmodifiers for its primary plane
> >  * dp_mst: VCPI fixes to make it work with StarTech hub
> >
> > The following changes since commit 1bd8a7dc28c1c410f1ceefae1f2a97c06d1a67c2:
> >
> >   Merge tag 'exynos-drm-next-for-v5.14' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into 
> > drm-next (2021-06-11 14:19:12 +1000)
> >
> > are available in the Git repository at:
> >
> >   git://anongit.freedesktop.org/drm/drm-misc 
> > tags/drm-misc-next-fixes-2021-06-16
> >
> > for you to fetch changes up to 3769e4c0af5b82c8ea21d037013cb9564dfaa51f:
> >
> >   drm/dp_mst: Avoid to mess up payload table by ports in stale topology 
> > (2021-06-16 12:57:46 -0400)
> >
> > 
> > Short summary of fixes pull:
> >
> >  * hyperv: advertise the correct formatmodifiers for its primary plane
> >  * dp_mst: VCPI fixes to make it work with StarTech hub
> >
> > 
> > Pu Lehui (1):
> >   drm/hyperv: Fix unused const variable 'hyperv_modifiers'
> >
> > Wayne Lin (2):
> >   drm/dp_mst: Do not set proposed vcpi directly
> >   drm/dp_mst: Avoid to mess up payload table by ports in stale topology
> >
> >  drivers/gpu/drm/drm_dp_mst_topology.c   | 65 
> > +
> >  drivers/gpu/drm/hyperv/hyperv_drm_modeset.c |  2 +-
> >  2 files changed, 40 insertions(+), 27 deletions(-)
> >
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Software Solutions Germany GmbH
> >

Re: [PULL] drm-misc-next-fixes

2021-06-17 Thread Lyude Paul

22:27  airlied: re: the pull, I should have pushed a fix for the
compilation error today. that was something I pulled in from amd that they
didn't compile check and I missed :S
22:28  airlied: 24ff3dc18b99c4b912ab1746e803ddb3be5ced4c in drm-
misc/drm-misc-next-fixes

sorry about this - I already talked to hwentlan the other day about trying to
make sure that AMD is more on top of actually making sure things compile
before submitting them, was my fault for missing this during the initial
review of that fix.

On Fri, 2021-06-18 at 12:26 +1000, Dave Airlie wrote:
> when I pulled this in drm-next I got these.
> 
> were the mst fixes meant for next or fixes btw? I'm not really sure,
> but either way I don't think this is a local reason it doesn't build
> or did I miss something?
> 
> Dave.
> 
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:
> In function ‘drm_dp_update_payload_part1’:
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:450:27:
> error: request for member ‘dev’ in something not a structure or union
>   450 |  drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_KMS, fmt,
> ##__VA_ARGS__)
>   |   ^~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3392:
> 5:
> note: in expansion of macro ‘drm_dbg_kms’
>  3392 | drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
>   | ^~~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3392:
> 68:
> warning: passing argument 3 of ‘drm_dev_dbg’ makes pointer from
> integer without a cast [-Wint-conversion]
>  3392 | drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
>   |    ^
>   |    |
>   |    int
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:450:53:
> note: in definition of macro ‘drm_dbg_kms’
>   450 |  drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_KMS, fmt,
> ##__VA_ARGS__)
>   | ^~~
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:338:16:
> note: expected ‘const char *’ but argument is of type ‘int’
>   338 |    const char *format, ...);
>   |    ^~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:
> 53:
> error: macro "drm_dbg_kms" requires 3 arguments, but only 1 given
>  3407 |   drm_dbg_kms("Fail:set payload to invalid sink");
>   | ^
> In file included from
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:45:
> /home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:449: note:
> macro "drm_dbg_kms" defined here
>   449 | #define drm_dbg_kms(drm, fmt, ...) \
>   |
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:
> 7:
> error: ‘drm_dbg_kms’ undeclared (first use in this function)
>  3407 |   drm_dbg_kms("Fail:set payload to invalid sink");
>   |   ^~~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:
> 7:
> note: each undeclared identifier is reported only once for each
> function it appears in
> make[4]: *** [/home/airlied/devel/kernel/dim/src/scripts/Makefile.build:272:
> drivers/gpu/drm/drm_dp_mst_topology.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> 
> On Thu, 17 Jun 2021 at 04:30, Thomas Zimmermann  wrote:
> > 
> > Hi Dave and Daniel,
> > 
> > here's this week's PR for drm-misc-next-fixes.
> > 
> > Best regards
> > Thomas
> > 
> > drm-misc-next-fixes-2021-06-16:
> > Short summary of fixes pull:
> > 
> >  * hyperv: advertise the correct formatmodifiers for its primary plane
> >  * dp_mst: VCPI fixes to make it work with StarTech hub
> > 
> > The following changes since commit 1bd8a7dc28c1c410f1ceefae1f2a97c06d1a67c2:
> > 
> >   Merge tag 'exynos-drm-next-for-v5.14' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-
> > next (2021-06-11 14:19:12 +1000)
> > 
> > are available in the Git repository at:
> > 
> >   git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-next-fixes-2021-
> > 06-16
> > 
> > for you to fetch changes up to 3769e4c0af5b82c8ea21d037013cb9564dfaa51f:
> > 
> >   drm/dp_mst: Avoid to mess up payload table by ports in stale topology
> > (2021-06-16 12:57:46 -0400)
> > 
> > 
> > Short summary of fixes pull:
> > 
> >  * hyperv: advertise the correct formatmodifiers for its primary plane
> >  * dp_mst: VCPI fixes to make it work with StarTech hub
> > 
> > 
> > Pu Lehui (1):
> >   drm/hyperv: Fix unused const variable 'hyperv_modifiers'
> > 
> > Wayne Lin (2):
> >   drm/dp_mst: Do not

Re: drm/i915: __GFP_RETRY_MAYFAIL allocations in stable kernels

2021-06-17 Thread Sergey Senozhatsky

On (21/06/17 19:27), Daniel Vetter wrote:
> > 
> > So can all allocations in gen8_init_scratch() use
> > GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN
> 
> Yeah that looks all fairly broken tbh. The only thing I didn't know was
> that GFP_DMA32 wasn't a full gfp mask with reclaim bits set as needed. I
> guess it would be clearer if we use GFP_KERNEL | __GFP_DMA32 for these.

Looks good.

> The commit that introduced a lot of this, including I915_GFP_ALLOW_FAIL
> seems to be
> 
> commit 1abb70f5955d1a9021f96359a2c6502ca569b68d
> Author: Chris Wilson 
> Date:   Tue May 22 09:36:43 2018 +0100
> 
> drm/i915/gtt: Allow pagedirectory allocations to fail
> 
> which used a selftest as justification, not real world workloads, so looks
> rather dubious.

Exactly, the commit we landed internally partially reverts 1abb70f5955
in 4.19 and 5.4 kernels. I don't mind I915_GFP_ALLOW_FAIL and so on, I
kept those bits, but we need reclaim. I can reproduce cases when order:0
allocation fails with
__GFP_HIGHMEM|__GFP_RETRY_MAYFAIL
but succeeds with
GFP_KERNEL|__GFP_HIGHMEM|__GFP_RETRY_MAYFAIL

ON a side note, I'm not very sure if __GFP_RETRY_MAYFAIL is actually
needed. Especially seeing it in syscalls is a bit uncommon:

  drm_ioctl()
   i915_gem_context_create_ioctl()
i915_gem_create_context()
 i915_ppgtt_create()
  setup_scratch_page()   // __GFP_RETRY_MAYFAIL

But with GFP_KERNEL at least it tries to make some reclaim progress
between retries, so it seems to be good enough.

Re: [PULL] drm-misc-next-fixes

2021-06-17 Thread Dave Airlie

when I pulled this in drm-next I got these.

were the mst fixes meant for next or fixes btw? I'm not really sure,
but either way I don't think this is a local reason it doesn't build
or did I miss something?

Dave.

/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:
In function ‘drm_dp_update_payload_part1’:
/home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:450:27:
error: request for member ‘dev’ in something not a structure or union
  450 |  drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_KMS, fmt, ##__VA_ARGS__)
  |   ^~
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3392:5:
note: in expansion of macro ‘drm_dbg_kms’
 3392 | drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
  | ^~~
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3392:68:
warning: passing argument 3 of ‘drm_dev_dbg’ makes pointer from
integer without a cast [-Wint-conversion]
 3392 | drm_dbg_kms("Virtual channel %d is not in current topology\n", i);
  |^
  ||
  |int
/home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:450:53:
note: in definition of macro ‘drm_dbg_kms’
  450 |  drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_KMS, fmt, ##__VA_ARGS__)
  | ^~~
/home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:338:16:
note: expected ‘const char *’ but argument is of type ‘int’
  338 |const char *format, ...);
  |^~
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:53:
error: macro "drm_dbg_kms" requires 3 arguments, but only 1 given
 3407 |   drm_dbg_kms("Fail:set payload to invalid sink");
  | ^
In file included from
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:45:
/home/airlied/devel/kernel/dim/src/include/drm/drm_print.h:449: note:
macro "drm_dbg_kms" defined here
  449 | #define drm_dbg_kms(drm, fmt, ...) \
  |
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:7:
error: ‘drm_dbg_kms’ undeclared (first use in this function)
 3407 |   drm_dbg_kms("Fail:set payload to invalid sink");
  |   ^~~
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/drm_dp_mst_topology.c:3407:7:
note: each undeclared identifier is reported only once for each
function it appears in
make[4]: *** [/home/airlied/devel/kernel/dim/src/scripts/Makefile.build:272:
drivers/gpu/drm/drm_dp_mst_topology.o] Error 1
make[4]: *** Waiting for unfinished jobs

On Thu, 17 Jun 2021 at 04:30, Thomas Zimmermann  wrote:
>
> Hi Dave and Daniel,
>
> here's this week's PR for drm-misc-next-fixes.
>
> Best regards
> Thomas
>
> drm-misc-next-fixes-2021-06-16:
> Short summary of fixes pull:
>
>  * hyperv: advertise the correct formatmodifiers for its primary plane
>  * dp_mst: VCPI fixes to make it work with StarTech hub
>
> The following changes since commit 1bd8a7dc28c1c410f1ceefae1f2a97c06d1a67c2:
>
>   Merge tag 'exynos-drm-next-for-v5.14' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into 
> drm-next (2021-06-11 14:19:12 +1000)
>
> are available in the Git repository at:
>
>   git://anongit.freedesktop.org/drm/drm-misc 
> tags/drm-misc-next-fixes-2021-06-16
>
> for you to fetch changes up to 3769e4c0af5b82c8ea21d037013cb9564dfaa51f:
>
>   drm/dp_mst: Avoid to mess up payload table by ports in stale topology 
> (2021-06-16 12:57:46 -0400)
>
> 
> Short summary of fixes pull:
>
>  * hyperv: advertise the correct formatmodifiers for its primary plane
>  * dp_mst: VCPI fixes to make it work with StarTech hub
>
> 
> Pu Lehui (1):
>   drm/hyperv: Fix unused const variable 'hyperv_modifiers'
>
> Wayne Lin (2):
>   drm/dp_mst: Do not set proposed vcpi directly
>   drm/dp_mst: Avoid to mess up payload table by ports in stale topology
>
>  drivers/gpu/drm/drm_dp_mst_topology.c   | 65 
> +
>  drivers/gpu/drm/hyperv/hyperv_drm_modeset.c |  2 +-
>  2 files changed, 40 insertions(+), 27 deletions(-)
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer

[PATCH] drm/amd/display: Remove the repeated dpp1_full_bypass declaration

2021-06-17 Thread Shaokun Zhang

Function 'dpp1_full_bypass' is declared twice, so remove the repeated
declaration and unnessary blank line.

Cc: Harry Wentland 
Cc: Leo Li 
Cc: Alex Deucher 
Signed-off-by: Shaokun Zhang 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h
index 9a1f40eb5c47..71b3a6949001 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h
@@ -1497,8 +1497,6 @@ void dpp1_cnv_setup (
enum dc_color_space input_color_space,
struct cnv_alpha_2bit_lut *alpha_2bit_lut);
 
-void dpp1_full_bypass(struct dpp *dpp_base);
-
 void dpp1_dppclk_control(
struct dpp *dpp_base,
bool dppclk_div,
-- 
2.7.4

[PATCH 4/8] drm/i915: Move active tracking to i915_sched_engine

2021-06-17 Thread Matthew Brost

Move active request tracking and its lock to i915_sched_engine. This
lock is also the submission lock so having it in the i915_sched_engine
is the correct place.

v3:
 (Jason Ekstrand)
  Add kernel doc
v6:
  Rebase

Signed-off-by: Matthew Brost 
Reviewed-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/gt/intel_engine.h|  2 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 43 +++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 --
 .../drm/i915/gt/intel_execlists_submission.c  | 98 ++-
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 12 +--
 drivers/gpu/drm/i915/gt/mock_engine.c |  7 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 20 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c |  4 +-
 drivers/gpu/drm/i915/i915_request.c   | 32 +++---
 drivers/gpu/drm/i915/i915_request.h   |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 30 --
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 16 +++
 13 files changed, 140 insertions(+), 136 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 62f7440bc111..45ee08fc40a1 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -269,8 +269,6 @@ intel_engine_create_pinned_context(struct intel_engine_cs 
*engine,
 
 void intel_engine_destroy_pinned_context(struct intel_context *ce);
 
-void intel_engine_init_active(struct intel_engine_cs *engine,
- unsigned int subclass);
 #define ENGINE_PHYSICAL0
 #define ENGINE_MOCK1
 #define ENGINE_VIRTUAL 2
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index aeaf5c6f94b2..b7efc4e399c5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -721,7 +721,6 @@ static int engine_setup_common(struct intel_engine_cs 
*engine)
if (err)
goto err_cmd_parser;
 
-   intel_engine_init_active(engine, ENGINE_PHYSICAL);
intel_engine_init_execlists(engine);
intel_engine_init__pm(engine);
intel_engine_init_retire(engine);
@@ -780,11 +779,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
frame->rq.ring = >ring;
 
mutex_lock(>timeline->mutex);
-   spin_lock_irq(>active.lock);
+   spin_lock_irq(>sched_engine->lock);
 
dw = engine->emit_fini_breadcrumb(>rq, frame->cs) - frame->cs;
 
-   spin_unlock_irq(>active.lock);
+   spin_unlock_irq(>sched_engine->lock);
mutex_unlock(>timeline->mutex);
 
GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */
@@ -793,28 +792,6 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
return dw;
 }
 
-void
-intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
-{
-   INIT_LIST_HEAD(>active.requests);
-   INIT_LIST_HEAD(>active.hold);
-
-   spin_lock_init(>active.lock);
-   lockdep_set_subclass(>active.lock, subclass);
-
-   /*
-* Due to an interesting quirk in lockdep's internal debug tracking,
-* after setting a subclass we must ensure the lock is used. Otherwise,
-* nr_unused_locks is incremented once too often.
-*/
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-   local_irq_disable();
-   lock_map_acquire(>active.lock.dep_map);
-   lock_map_release(>active.lock.dep_map);
-   local_irq_enable();
-#endif
-}
-
 struct intel_context *
 intel_engine_create_pinned_context(struct intel_engine_cs *engine,
   struct i915_address_space *vm,
@@ -969,7 +946,7 @@ int intel_engines_init(struct intel_gt *gt)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
-   GEM_BUG_ON(!list_empty(>active.requests));
+   GEM_BUG_ON(!list_empty(>sched_engine->requests));
tasklet_kill(>execlists.tasklet); /* flush the callback */
 
i915_sched_engine_put(engine->sched_engine);
@@ -1362,7 +1339,7 @@ static struct intel_timeline *get_timeline(struct 
i915_request *rq)
struct intel_timeline *tl;
 
/*
-* Even though we are holding the engine->active.lock here, there
+* Even though we are holding the engine->sched_engine->lock here, there
 * is no control over the submission queue per-se and we are
 * inspecting the active state at a random point in time, with an
 * unknown queue. Play safe and make sure the timeline remains valid.
@@ -1709,7 +1686,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
drm_printf(m, "\tRequests:\n");
 
-   spin_lock_irqsave(>active.lock, flags);
+   spin_lock_irqsave(>sched_engine->lock, flags);
rq = intel_engine_find_active_request(engine);
if (rq) {
struct intel_timeline *tl = get_timeline(rq);
@@ -1740,8 +1717,9 @@ void

[PATCH 6/8] drm/i915: Add kick_backend function to i915_sched_engine

2021-06-17 Thread Matthew Brost

Not all back-ends require a kick after a  scheduling update, so make the
kick a call-back function that the  back-end can opt-in to. Also move
the current kick function from the scheduler to the execlists file as it
is specific to that back-end.

Signed-off-by: Matthew Brost 
Reviewed-by: Daniele Ceraolo Spurio 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 52 
 drivers/gpu/drm/i915/i915_scheduler.c | 62 +--
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  6 ++
 3 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 8a3d4014fd2c..9487d9e0be62 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3116,10 +3116,61 @@ static bool can_preempt(struct intel_engine_cs *engine)
return engine->class != RENDER_CLASS;
 }
 
+static void kick_execlists(const struct i915_request *rq, int prio)
+{
+   struct intel_engine_cs *engine = rq->engine;
+   struct i915_sched_engine *sched_engine = engine->sched_engine;
+   const struct i915_request *inflight;
+
+   /*
+* We only need to kick the tasklet once for the high priority
+* new context we add into the queue.
+*/
+   if (prio <= sched_engine->queue_priority_hint)
+   return;
+
+   rcu_read_lock();
+
+   /* Nothing currently active? We're overdue for a submission! */
+   inflight = execlists_active(>execlists);
+   if (!inflight)
+   goto unlock;
+
+   /*
+* If we are already the currently executing context, don't
+* bother evaluating if we should preempt ourselves.
+*/
+   if (inflight->context == rq->context)
+   goto unlock;
+
+   ENGINE_TRACE(engine,
+"bumping queue-priority-hint:%d for rq:%llx:%lld, 
inflight:%llx:%lld prio %d\n",
+prio,
+rq->fence.context, rq->fence.seqno,
+inflight->fence.context, inflight->fence.seqno,
+inflight->sched.attr.priority);
+
+   sched_engine->queue_priority_hint = prio;
+
+   /*
+* Allow preemption of low -> normal -> high, but we do
+* not allow low priority tasks to preempt other low priority
+* tasks under the impression that latency for low priority
+* tasks does not matter (as much as background throughput),
+* so kiss.
+*/
+   if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight)))
+   tasklet_hi_schedule(>execlists.tasklet);
+
+unlock:
+   rcu_read_unlock();
+}
+
 static void execlists_set_default_submission(struct intel_engine_cs *engine)
 {
engine->submit_request = execlists_submit_request;
engine->sched_engine->schedule = i915_schedule;
+   engine->sched_engine->kick_backend = kick_execlists;
engine->execlists.tasklet.callback = execlists_submission_tasklet;
 }
 
@@ -3702,6 +3753,7 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
ve->base.request_alloc = execlists_request_alloc;
 
ve->base.sched_engine->schedule = i915_schedule;
+   ve->base.sched_engine->kick_backend = kick_execlists;
ve->base.submit_request = virtual_submit_request;
ve->base.bond_execute = virtual_bond_execute;
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 4bc6969f6a97..035b88f2e4aa 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -157,65 +157,6 @@ sched_lock_engine(const struct i915_sched_node *node,
return locked;
 }
 
-static inline int rq_prio(const struct i915_request *rq)
-{
-   return rq->sched.attr.priority;
-}
-
-static inline bool need_preempt(int prio, int active)
-{
-   /*
-* Allow preemption of low -> normal -> high, but we do
-* not allow low priority tasks to preempt other low priority
-* tasks under the impression that latency for low priority
-* tasks does not matter (as much as background throughput),
-* so kiss.
-*/
-   return prio >= max(I915_PRIORITY_NORMAL, active);
-}
-
-static void kick_submission(struct intel_engine_cs *engine,
-   const struct i915_request *rq,
-   int prio)
-{
-   const struct i915_request *inflight;
-
-   /*
-* We only need to kick the tasklet once for the high priority
-* new context we add into the queue.
-*/
-   if (prio <= engine->sched_engine->queue_priority_hint)
-   return;
-
-   rcu_read_lock();
-
-   /* Nothing currently active? We're overdue for a submission! */
-   inflight = execlists_active(>execlists);
-   if (!inflight)
-   goto unlock;
-
-   /*
-* If we are

[PATCH 5/8] drm/i915: Move engine->schedule to i915_sched_engine

2021-06-17 Thread Matthew Brost

The schedule function should be in the schedule object.

v3:
 (Jason Ekstrand)
  Add kernel doc

Signed-off-by: Matthew Brost 
Reviewed-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/gem/i915_gem_wait.c |  4 ++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c|  3 ---
 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c |  4 ++--
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  8 
 drivers/gpu/drm/i915/gt/intel_engine_user.c  |  2 +-
 .../gpu/drm/i915/gt/intel_execlists_submission.c |  4 ++--
 drivers/gpu/drm/i915/gt/selftest_execlists.c | 16 
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c |  4 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c|  2 +-
 drivers/gpu/drm/i915/i915_request.c  | 10 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h  | 10 ++
 11 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 1e97520c62b2..1070d3afdce7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
engine = rq->engine;
 
rcu_read_lock(); /* RCU serialisation for set-wedged protection */
-   if (engine->schedule)
-   engine->schedule(rq, attr);
+   if (engine->sched_engine->schedule)
+   engine->sched_engine->schedule(rq, attr);
rcu_read_unlock();
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index b7efc4e399c5..354c1c260726 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -328,9 +328,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
if (engine->context_size)
DRIVER_CAPS(i915)->has_logical_contexts = true;
 
-   /* Nothing to do here, execute in order of dependencies */
-   engine->schedule = NULL;
-
ewma__engine_latency_init(>latency);
seqcount_init(>stats.lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b99ac41695f3..b6a305e6a974 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -121,7 +121,7 @@ static void heartbeat(struct work_struct *wrk)
 * but all other contexts, including the kernel
 * context are stuck waiting for the signal.
 */
-   } else if (engine->schedule &&
+   } else if (engine->sched_engine->schedule &&
   rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
/*
 * Gradually raise the priority of the heartbeat to
@@ -136,7 +136,7 @@ static void heartbeat(struct work_struct *wrk)
attr.priority = I915_PRIORITY_BARRIER;
 
local_bh_disable();
-   engine->schedule(rq, );
+   engine->sched_engine->schedule(rq, );
local_bh_enable();
} else {
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 5e0f39d202ef..0bb65c57d274 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -428,14 +428,6 @@ struct intel_engine_cs {
void(*bond_execute)(struct i915_request *rq,
struct dma_fence *signal);
 
-   /*
-* Call when the priority on a request has changed and it and its
-* dependencies may need rescheduling. Note the request itself may
-* not be ready to run!
-*/
-   void(*schedule)(struct i915_request *request,
-   const struct i915_sched_attr *attr);
-
void(*release)(struct intel_engine_cs *engine);
 
struct intel_engine_execlists execlists;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 3cca7ea2d6ea..84142127ebd8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -108,7 +108,7 @@ static void set_scheduler_caps(struct drm_i915_private 
*i915)
for_each_uabi_engine(engine, i915) { /* all engines must agree! */
int i;
 
-   if (engine->schedule)
+   if (engine->sched_engine->schedule)
enabled |= (I915_SCHEDULER_CAP_ENABLED |
I915_SCHEDULER_CAP_PRIORITY);
else
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c

[PATCH 3/8] drm/i915: Reset sched_engine.no_priolist immediately after dequeue

2021-06-17 Thread Matthew Brost

Rather than touching schedule state in the generic PM code, reset the
priolist allocation when empty in the submission code. Add a wrapper
function to do this and update the backends to call it in the correct
place.

v3:
 (Jason Ekstrand)
  Update patch commit message with a better description

Signed-off-by: Matthew Brost 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c| 2 --
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c| 2 ++
 drivers/gpu/drm/i915/i915_scheduler.h| 7 +++
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index b6a00dd72808..1f07ac4e0672 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -280,8 +280,6 @@ static int __engine_park(struct intel_wakeref *wf)
if (engine->park)
engine->park(engine);
 
-   engine->sched_engine->no_priolist = false;
-
/* While gt calls i915_vma_parked(), we have to break the lock cycle */
intel_gt_pm_put_async(engine->gt);
return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index e36b0e81876a..47a43aafa39f 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1553,6 +1553,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * interrupt for secondary ports).
 */
sched_engine->queue_priority_hint = queue_prio(sched_engine);
+   i915_sched_engine_reset_on_empty(sched_engine);
spin_unlock(>active.lock);
 
/*
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d65a7665b38e..9887a514a4d5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -263,6 +263,8 @@ static void guc_submission_tasklet(struct tasklet_struct *t)
 
__guc_dequeue(engine);
 
+   i915_sched_engine_reset_on_empty(engine->sched_engine);
+
spin_unlock_irqrestore(>active.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 5bec7b3b8456..713c38c99de9 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -72,6 +72,13 @@ i915_sched_engine_is_empty(struct i915_sched_engine 
*sched_engine)
return RB_EMPTY_ROOT(_engine->queue.rb_root);
 }
 
+static inline void
+i915_sched_engine_reset_on_empty(struct i915_sched_engine *sched_engine)
+{
+   if (i915_sched_engine_is_empty(sched_engine))
+   sched_engine->no_priolist = false;
+}
+
 void i915_request_show_with_schedule(struct drm_printer *m,
 const struct i915_request *rq,
 const char *prefix,
-- 
2.28.0

[PATCH 8/8] drm/i915: Move submission tasklet to i915_sched_engine

2021-06-17 Thread Matthew Brost

The submission tasklet operates on i915_sched_engine, thus it is the
correct place for it.

v3:
 (Jason Ekstrand)
  Change sched_engine->engine to a void* private data pointer
  Add kernel doc
v4:
 (Daniele)
  Update private_data comment
  Set queue_priority_hint in kick_execlists
v5:
 (CI)
  Rebase and fix build error

Signed-off-by: Matthew Brost 
Reviewed-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/gt/intel_engine.h| 14 
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 12 +--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  5 --
 .../drm/i915/gt/intel_execlists_submission.c  | 84 ++-
 drivers/gpu/drm/i915/gt/mock_engine.c |  1 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 14 ++--
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|  6 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 25 +++---
 drivers/gpu/drm/i915/i915_scheduler.c |  1 +
 drivers/gpu/drm/i915/i915_scheduler.h | 14 
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 10 +++
 13 files changed, 100 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 45ee08fc40a1..f911c1224ab2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -125,20 +125,6 @@ execlists_active(const struct intel_engine_execlists 
*execlists)
return active;
 }
 
-static inline void
-execlists_active_lock_bh(struct intel_engine_execlists *execlists)
-{
-   local_bh_disable(); /* prevent local softirq and lock recursion */
-   tasklet_lock(>tasklet);
-}
-
-static inline void
-execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
-{
-   tasklet_unlock(>tasklet);
-   local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
-}
-
 struct i915_request *
 execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 354c1c260726..14b92bfd321b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -713,6 +713,7 @@ static int engine_setup_common(struct intel_engine_cs 
*engine)
err = -ENOMEM;
goto err_sched_engine;
}
+   engine->sched_engine->private_data = engine;
 
err = intel_engine_init_cmd_parser(engine);
if (err)
@@ -944,7 +945,6 @@ int intel_engines_init(struct intel_gt *gt)
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
GEM_BUG_ON(!list_empty(>sched_engine->requests));
-   tasklet_kill(>execlists.tasklet); /* flush the callback */
 
i915_sched_engine_put(engine->sched_engine);
intel_breadcrumbs_free(engine->breadcrumbs);
@@ -1230,7 +1230,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
 
 void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync)
 {
-   struct tasklet_struct *t = >execlists.tasklet;
+   struct tasklet_struct *t = >sched_engine->tasklet;
 
if (!t->callback)
return;
@@ -1491,8 +1491,8 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
 
drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, 
timeslice? %s\n",
   yesno(test_bit(TASKLET_STATE_SCHED,
- >execlists.tasklet.state)),
-  
enableddisabled(!atomic_read(>execlists.tasklet.count)),
+ 
>sched_engine->tasklet.state)),
+  
enableddisabled(!atomic_read(>sched_engine->tasklet.count)),
   repr_timer(>execlists.preempt),
   repr_timer(>execlists.timer));
 
@@ -1516,7 +1516,7 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
   idx, hws[idx * 2], hws[idx * 2 + 1]);
}
 
-   execlists_active_lock_bh(execlists);
+   i915_sched_engine_active_lock_bh(engine->sched_engine);
rcu_read_lock();
for (port = execlists->active; (rq = *port); port++) {
char hdr[160];
@@ -1547,7 +1547,7 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
i915_request_show(m, rq, hdr, 0);
}
rcu_read_unlock();
-   execlists_active_unlock_bh(execlists);
+   i915_sched_engine_active_unlock_bh(engine->sched_engine);
} else if (GRAPHICS_VER(dev_priv) > 6) {
drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
   ENGINE_READ(engine, RING_PP_DIR_BASE));
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h

[PATCH 7/8] drm/i915: Update i915_scheduler to operate on i915_sched_engine

2021-06-17 Thread Matthew Brost

Rather passing around an intel_engine_cs in the scheduling code, pass
around a i915_sched_engine.

v3:
 (Jason Ekstrand)
  Add READ_ONCE around rq->engine in lock_sched_engine

Signed-off-by: Matthew Brost 
Reviewed-by: Jason Ekstrand 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 11 +++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 46 +--
 drivers/gpu/drm/i915/i915_scheduler.h |  2 +-
 4 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 9487d9e0be62..ffad4d98cec0 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -382,7 +382,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
-   pl = i915_sched_lookup_priolist(engine, prio);
+   pl = i915_sched_lookup_priolist(engine->sched_engine,
+   prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
 
@@ -1096,7 +1097,8 @@ static void defer_active(struct intel_engine_cs *engine)
if (!rq)
return;
 
-   defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
+   defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine,
+rq_prio(rq)));
 }
 
 static bool
@@ -2083,7 +2085,7 @@ static void __execlists_unhold(struct i915_request *rq)
 
i915_request_clear_hold(rq);
list_move_tail(>sched.link,
-  i915_sched_lookup_priolist(rq->engine,
+  
i915_sched_lookup_priolist(rq->engine->sched_engine,
  rq_prio(rq)));
set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
 
@@ -2452,7 +2454,8 @@ static void queue_request(struct intel_engine_cs *engine,
 {
GEM_BUG_ON(!list_empty(>sched.link));
list_add_tail(>sched.link,
- i915_sched_lookup_priolist(engine, rq_prio(rq)));
+ i915_sched_lookup_priolist(engine->sched_engine,
+rq_prio(rq)));
set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 60121809e6e2..cb13cc586c67 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -503,7 +503,7 @@ static inline void queue_request(struct intel_engine_cs 
*engine,
 {
GEM_BUG_ON(!list_empty(>sched.link));
list_add_tail(>sched.link,
- i915_sched_lookup_priolist(engine, prio));
+ i915_sched_lookup_priolist(engine->sched_engine, prio));
set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 035b88f2e4aa..fa8863df9513 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -61,14 +61,13 @@ static void assert_priolists(struct i915_sched_engine * 
const sched_engine)
 }
 
 struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
+i915_sched_lookup_priolist(struct i915_sched_engine *sched_engine, int prio)
 {
-   struct i915_sched_engine * const sched_engine = engine->sched_engine;
struct i915_priolist *p;
struct rb_node **parent, *rb;
bool first = true;
 
-   lockdep_assert_held(>sched_engine->lock);
+   lockdep_assert_held(_engine->lock);
assert_priolists(sched_engine);
 
if (unlikely(sched_engine->no_priolist))
@@ -130,13 +129,13 @@ struct sched_cache {
struct list_head *priolist;
 };
 
-static struct intel_engine_cs *
-sched_lock_engine(const struct i915_sched_node *node,
- struct intel_engine_cs *locked,
+static struct i915_sched_engine *
+lock_sched_engine(struct i915_sched_node *node,
+ struct i915_sched_engine *locked,
  struct sched_cache *cache)
 {
const struct i915_request *rq = node_to_request(node);
-   struct intel_engine_cs *engine;
+   struct i915_sched_engine *sched_engine;
 
GEM_BUG_ON(!locked);
 
@@ -146,14 +145,14 @@ sched_lock_engine(const struct i915_sched_node *node,
 * engine lock. The simple ploy we use is to take the lock then
 * check that the rq still belongs to the newly locked engine.
 */
-   while (locked != (engine = READ_ONCE(rq->engine))) {
-

[PATCH 1/8] drm/i915: Move priolist to new i915_sched_engine object

2021-06-17 Thread Matthew Brost

Introduce i915_sched_engine object which is lower level data structure
that i915_scheduler / generic code can operate on without touching
execlist specific structures. This allows additional submission backends
to be added without breaking the layering. Currently the execlists
backend uses 1 of these object per each engine (physical or virtual) but
future backends like the GuC will point to less instances utilizing the
reference counting.

This is a bit of detour to integrating the i915 with the DRM scheduler
but this object will still exist when the DRM scheduler lands in the
i915. It will however look a bit different. It will encapsulate the
drm_gpu_scheduler object plus and common variables (to the backends)
related to scheduling. Regardless this is a step in the right direction.

This patch starts the aforementioned transition by moving the priolist
into the i915_sched_engine object.

v3:
 (Jason Ekstrand)
  Update comment next to intel_engine_cs.virtual
  Add kernel doc
 (Checkpatch)
  Fix double the in commit message
v4:
 (Daniele)
  Update comment message.
  Add comment about subclass field

Signed-off-by: Matthew Brost 
Reviewed-by: Daniele Ceraolo Spurio 
---
 Documentation/gpu/i915.rst|  5 ++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 14 +++-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |  4 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 32 ++--
 .../drm/i915/gt/intel_execlists_submission.c  | 81 +++
 drivers/gpu/drm/i915/gt/mock_engine.c |  9 ++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 19 ++---
 drivers/gpu/drm/i915/i915_scheduler.c | 53 +---
 drivers/gpu/drm/i915/i915_scheduler.h | 18 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 47 +++
 10 files changed, 192 insertions(+), 90 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 42ce0196930a..1d5ce5676d35 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -425,6 +425,11 @@ User Batchbuffer Execution
 .. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
:doc: User command execution
 
+Scheduling
+--
+.. kernel-doc:: drivers/gpu/drm/i915/i915_scheduler_types.h
+   :functions: i915_sched_engine
+
 Logical Rings, Logical Ring Contexts and Execlists
 --
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index fcbaad18ac91..4772ed1a1f3a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -585,9 +585,6 @@ void intel_engine_init_execlists(struct intel_engine_cs 
*engine)
memset(execlists->pending, 0, sizeof(execlists->pending));
execlists->active =
memset(execlists->inflight, 0, sizeof(execlists->inflight));
-
-   execlists->queue_priority_hint = INT_MIN;
-   execlists->queue = RB_ROOT_CACHED;
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
@@ -714,6 +711,12 @@ static int engine_setup_common(struct intel_engine_cs 
*engine)
goto err_status;
}
 
+   engine->sched_engine = i915_sched_engine_create(ENGINE_PHYSICAL);
+   if (!engine->sched_engine) {
+   err = -ENOMEM;
+   goto err_sched_engine;
+   }
+
err = intel_engine_init_cmd_parser(engine);
if (err)
goto err_cmd_parser;
@@ -737,6 +740,8 @@ static int engine_setup_common(struct intel_engine_cs 
*engine)
return 0;
 
 err_cmd_parser:
+   i915_sched_engine_put(engine->sched_engine);
+err_sched_engine:
intel_breadcrumbs_free(engine->breadcrumbs);
 err_status:
cleanup_status_page(engine);
@@ -967,6 +972,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs 
*engine)
GEM_BUG_ON(!list_empty(>active.requests));
tasklet_kill(>execlists.tasklet); /* flush the callback */
 
+   i915_sched_engine_put(engine->sched_engine);
intel_breadcrumbs_free(engine->breadcrumbs);
 
intel_engine_fini_retire(engine);
@@ -1290,7 +1296,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
intel_engine_flush_submission(engine);
 
/* ELSP is empty, but there are ready requests? E.g. after reset */
-   if (!RB_EMPTY_ROOT(>execlists.queue.rb_root))
+   if (!RB_EMPTY_ROOT(>sched_engine->queue.rb_root))
return false;
 
/* Ring stopped? */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 47f4397095e5..b6a00dd72808 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -275,12 +275,12 @@ static int __engine_park(struct intel_wakeref *wf)
intel_breadcrumbs_park(engine->breadcrumbs);
 
/* Must be reset upon idling, or we may miss the busy wakeup. */
-

[PATCH 2/8] drm/i915: Add i915_sched_engine_is_empty function

2021-06-17 Thread Matthew Brost

Add wrapper function around RB tree to determine if i915_sched_engine is
empty.

Signed-off-by: Matthew Brost 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c| 2 +-
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c| 2 +-
 drivers/gpu/drm/i915/i915_scheduler.h| 6 ++
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4772ed1a1f3a..aeaf5c6f94b2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1296,7 +1296,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
intel_engine_flush_submission(engine);
 
/* ELSP is empty, but there are ready requests? E.g. after reset */
-   if (!RB_EMPTY_ROOT(>sched_engine->queue.rb_root))
+   if (!i915_sched_engine_is_empty(engine->sched_engine))
return false;
 
/* Ring stopped? */
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4f759559a792..e36b0e81876a 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -384,7 +384,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(engine, prio);
}
-   GEM_BUG_ON(RB_EMPTY_ROOT(>sched_engine->queue.rb_root));
+   GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
 
list_move(>sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
@@ -1139,7 +1139,7 @@ static bool needs_timeslice(const struct intel_engine_cs 
*engine,
}
 
/* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */
-   if (!RB_EMPTY_ROOT(>sched_engine->queue.rb_root)) {
+   if (!i915_sched_engine_is_empty(engine->sched_engine)) {
ENGINE_TRACE(engine, "timeslice required for queue\n");
return true;
}
@@ -2487,7 +2487,7 @@ static void execlists_submit_request(struct i915_request 
*request)
} else {
queue_request(engine, request);
 
-   GEM_BUG_ON(RB_EMPTY_ROOT(>sched_engine->queue.rb_root));
+   GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
GEM_BUG_ON(list_empty(>sched.link));
 
if (submit_queue(engine, request))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 5c5f33f40055..d65a7665b38e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -515,7 +515,7 @@ static void guc_submit_request(struct i915_request *rq)
 
queue_request(engine, rq, rq_prio(rq));
 
-   GEM_BUG_ON(RB_EMPTY_ROOT(>sched_engine->queue.rb_root));
+   GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
GEM_BUG_ON(list_empty(>sched.link));
 
tasklet_hi_schedule(>execlists.tasklet);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 91a04e34cac5..5bec7b3b8456 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -66,6 +66,12 @@ i915_sched_engine_put(struct i915_sched_engine *sched_engine)
kref_put(_engine->ref, i915_sched_engine_free);
 }
 
+static inline bool
+i915_sched_engine_is_empty(struct i915_sched_engine *sched_engine)
+{
+   return RB_EMPTY_ROOT(_engine->queue.rb_root);
+}
+
 void i915_request_show_with_schedule(struct drm_printer *m,
 const struct i915_request *rq,
 const char *prefix,
-- 
2.28.0

[PATCH 0/8] Introduce i915_sched_engine object

2021-06-17 Thread Matthew Brost

As discussed in [1] we are breaking that large series into a several
smaller ones. This series is stand alone patch part of step #4 which has
no other dependencies or patches relevant to it.

v2:
 (Daniel Vetter):
  - Split into several smaller patches
  - Add kernel doc for i915_sched_engine
 (Matthew Brost):
  - Drop wrapper functions for tasklet as eventually tasklet will be
dropped 

v3:
 (Jason Ekstrand)
  - Address his comments, change logs in individual patches
  - Squash documentation patch into previous patches as needed
 (Checkpatch)
  - Fix warnings
 (Docs)
  - Fix warnings

v4:
 (Daniele)
  - Update a comments / commit messages
  - Set queue_priority_hint

v5:
 (CI)
  - Rebase and fix build error

v6:
  - Rebase

Signed-off-by: Matthew Brost 

[1] https://patchwork.freedesktop.org/series/89844/

Matthew Brost (8):
  drm/i915: Move priolist to new i915_sched_engine object
  drm/i915: Add i915_sched_engine_is_empty function
  drm/i915: Reset sched_engine.no_priolist immediately after dequeue
  drm/i915: Move active tracking to i915_sched_engine
  drm/i915: Move engine->schedule to i915_sched_engine
  drm/i915: Add kick_backend function to i915_sched_engine
  drm/i915: Update i915_scheduler to operate on i915_sched_engine
  drm/i915: Move submission tasklet to i915_sched_engine

 Documentation/gpu/i915.rst|   5 +
 drivers/gpu/drm/i915/gem/i915_gem_wait.c  |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine.h|  16 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  72 ++--
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  49 +--
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |   2 +-
 .../drm/i915/gt/intel_execlists_submission.c  | 325 +++---
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  12 +-
 drivers/gpu/drm/i915/gt/mock_engine.c |  17 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  34 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|   6 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  70 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c |   4 +-
 drivers/gpu/drm/i915/i915_request.c   |  42 +--
 drivers/gpu/drm/i915/i915_request.h   |   2 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 168 +
 drivers/gpu/drm/i915/i915_scheduler.h |  47 ++-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  89 +
 22 files changed, 556 insertions(+), 424 deletions(-)

-- 
2.28.0

Re: [PATCH v13 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
> support the memory allocation from restricted DMA pool.
> 
> The restricted DMA pool is preferred if available.
> 
> Note that since coherent allocation needs remapping, one must set up
> another device coherent pool by shared-dma-pool and use
> dma_alloc_from_dev_coherent instead for atomic coherent allocation.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  include/linux/swiotlb.h | 26 ++
>  kernel/dma/direct.c | 49 +++--
>  kernel/dma/swiotlb.c| 38 ++--
>  3 files changed, 99 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 8d8855c77d9a..a73fad460162 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
>   * @debugfs: The dentry to debugfs.
>   * @late_alloc:  %true if allocated using the page allocator
>   * @force_bounce: %true if swiotlb bouncing is forced
> + * @for_alloc:  %true if the pool is used for memory allocation
>   */
>  struct io_tlb_mem {
>   phys_addr_t start;
> @@ -96,6 +97,7 @@ struct io_tlb_mem {
>   struct dentry *debugfs;
>   bool late_alloc;
>   bool force_bounce;
> + bool for_alloc;
>   struct io_tlb_slot {
>   phys_addr_t orig_addr;
>   size_t alloc_size;
> @@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long 
> size)
>  extern void swiotlb_print_info(void);
>  extern void swiotlb_set_max_segment(unsigned int);
>  
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +struct page *swiotlb_alloc(struct device *dev, size_t size);
> +bool swiotlb_free(struct device *dev, struct page *page, size_t size);
> +
> +static inline bool is_swiotlb_for_alloc(struct device *dev)
> +{
> + return dev->dma_io_tlb_mem->for_alloc;
> +}
> +#else
> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> +{
> + return NULL;
> +}
> +static inline bool swiotlb_free(struct device *dev, struct page *page,
> + size_t size)
> +{
> + return false;
> +}
> +static inline bool is_swiotlb_for_alloc(struct device *dev)
> +{
> + return false;
> +}
> +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index a92465b4eb12..2de33e5d302b 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, 
> phys_addr_t phys, size_t size)
>   min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
>  }
>  
> +static void __dma_direct_free_pages(struct device *dev, struct page *page,
> + size_t size)
> +{
> + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
> + swiotlb_free(dev, page, size))
> + return;
> + dma_free_contiguous(dev, page, size);
> +}
> +
>  static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   gfp_t gfp)
>  {
> @@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
> *dev, size_t size,
>  
>   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
>  _limit);
> + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
> + is_swiotlb_for_alloc(dev)) {
> + page = swiotlb_alloc(dev, size);
> + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> + __dma_direct_free_pages(dev, page, size);
> + return NULL;
> + }
> + return page;
> + }
> +
>   page = dma_alloc_contiguous(dev, size, gfp);
>   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   dma_free_contiguous(dev, page, size);
> @@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   gfp |= __GFP_NOWARN;
>  
>   if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> - !force_dma_unencrypted(dev)) {
> + !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
>   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
>   if (!page)
>   return NULL;
> @@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   }
>  
>   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
> - !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> - !dev_is_dma_coherent(dev))
> + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
> + !is_swiotlb_for_alloc(dev))
>   return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
>  
>   /*
>*

Re: [PATCH v13 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   | 11 +++
>  kernel/dma/direct.c   |  2 +-
>  kernel/dma/direct.h   |  2 +-
>  kernel/dma/swiotlb.c  |  4 
>  5 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 0c6ed09f8513..4730a146fa35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device 
> *dev, struct page *page,
>   if (dma_capable(dev, dev_addr, size, true) &&
>   !range_straddles_page_boundary(phys, size) &&
>   !xen_arch_need_swiotlb(dev, phys, dev_addr) &&
> - swiotlb_force != SWIOTLB_FORCE)
> + !is_swiotlb_force_bounce(dev))
>   goto done;
>  
>   /*
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index dd1c30a83058..8d8855c77d9a 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
>   *   unmap calls.
>   * @debugfs: The dentry to debugfs.
>   * @late_alloc:  %true if allocated using the page allocator
> + * @force_bounce: %true if swiotlb bouncing is forced
>   */
>  struct io_tlb_mem {
>   phys_addr_t start;
> @@ -94,6 +95,7 @@ struct io_tlb_mem {
>   spinlock_t lock;
>   struct dentry *debugfs;
>   bool late_alloc;
> + bool force_bounce;
>   struct io_tlb_slot {
>   phys_addr_t orig_addr;
>   size_t alloc_size;
> @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>   return mem && paddr >= mem->start && paddr < mem->end;
>  }
>  
> +static inline bool is_swiotlb_force_bounce(struct device *dev)
> +{
> + return dev->dma_io_tlb_mem->force_bounce;
> +}
> +
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  {
>   return false;
>  }
> +static inline bool is_swiotlb_force_bounce(struct device *dev)
> +{
> + return false;
> +}
>  static inline void swiotlb_exit(void)
>  {
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 7a88c34d0867..a92465b4eb12 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  {
>   /* If SWIOTLB is active, use its maximum mapping size */
>   if (is_swiotlb_active(dev) &&
> - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
> + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
>   return swiotlb_max_mapping_size(dev);
>   return SIZE_MAX;
>  }
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 13e9e7158d94..4632b0f4f72e 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
> *dev,
>   phys_addr_t phys = page_to_phys(page) + offset;
>   dma_addr_t dma_addr = phys_to_dma(dev, phys);
>  
> - if (unlikely(swiotlb_force == SWIOTLB_FORCE))
> + if (is_swiotlb_force_bounce(dev))
>   return swiotlb_map(dev, phys, size, dir, attrs);
>  
>   if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 409694d7a8ad..13891d5de8c9 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->end = mem->start + bytes;
>   mem->index = 0;
>   mem->late_alloc = late_alloc;
> +
> + if (swiotlb_force == SWIOTLB_FORCE)
> + mem->force_bounce = true;
> +
>   spin_lock_init(>lock);
>   for (i = 0; i < mem->nslabs; i++) {
>   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> -- 
> 2.32.0.288.g62a8d224e6-goog
>

Re: [PATCH v13 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Update is_swiotlb_active to add a struct device argument. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
>  drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
>  drivers/pci/xen-pcifront.c   | 2 +-
>  include/linux/swiotlb.h  | 4 ++--
>  kernel/dma/direct.c  | 2 +-
>  kernel/dma/swiotlb.c | 4 ++--
>  6 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> index a9d65fc8aa0e..4b7afa0fc85d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
> drm_i915_gem_object *obj)
>  
>   max_order = MAX_ORDER;
>  #ifdef CONFIG_SWIOTLB
> - if (is_swiotlb_active()) {
> + if (is_swiotlb_active(obj->base.dev->dev)) {
>   unsigned int max_segment;
>  
>   max_segment = swiotlb_max_segment();
> diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
> b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> index 9662522aa066..be15bfd9e0ee 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> @@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
>   }
>  
>  #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
> - need_swiotlb = is_swiotlb_active();
> + need_swiotlb = is_swiotlb_active(dev->dev);
>  #endif
>  
>   ret = ttm_bo_device_init(>ttm.bdev, _bo_driver,
> diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
> index b7a8f3a1921f..0d56985bfe81 100644
> --- a/drivers/pci/xen-pcifront.c
> +++ b/drivers/pci/xen-pcifront.c
> @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
> pcifront_device *pdev)
>  
>   spin_unlock(_dev_lock);
>  
> - if (!err && !is_swiotlb_active()) {
> + if (!err && !is_swiotlb_active(>xdev->dev)) {
>   err = pci_xen_swiotlb_init_late();
>   if (err)
>   dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index d1f3d95881cd..dd1c30a83058 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> -bool is_swiotlb_active(void);
> +bool is_swiotlb_active(struct device *dev);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> @@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct 
> device *dev)
>   return SIZE_MAX;
>  }
>  
> -static inline bool is_swiotlb_active(void)
> +static inline bool is_swiotlb_active(struct device *dev)
>  {
>   return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 84c9feb5474a..7a88c34d0867 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  size_t dma_direct_max_mapping_size(struct device *dev)
>  {
>   /* If SWIOTLB is active, use its maximum mapping size */
> - if (is_swiotlb_active() &&
> + if (is_swiotlb_active(dev) &&
>   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
>   return swiotlb_max_mapping_size(dev);
>   return SIZE_MAX;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index de79e9437030..409694d7a8ad 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
>   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
>  }
>  
> -bool is_swiotlb_active(void)
> +bool is_swiotlb_active(struct device *dev)
>  {
> - return io_tlb_default_mem != NULL;
> + return dev->dma_io_tlb_mem != NULL;
>  }
>  EXPORT_SYMBOL_GPL(is_swiotlb_active);
>  
> -- 
> 2.32.0.288.g62a8d224e6-goog
>

Re: [PATCH v13 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Update is_swiotlb_buffer to add a struct device argument. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  drivers/iommu/dma-iommu.c | 12 ++--
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   |  7 ---
>  kernel/dma/direct.c   |  6 +++---
>  kernel/dma/direct.h   |  6 +++---
>  5 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 3087d9fa6065..10997ef541f8 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
> dma_addr_t dma_addr,
>  
>   __iommu_dma_unmap(dev, dma_addr, size);
>  
> - if (unlikely(is_swiotlb_buffer(phys)))
> + if (unlikely(is_swiotlb_buffer(dev, phys)))
>   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  
> @@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> *dev, phys_addr_t phys,
>   }
>  
>   iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> + if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
>   swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>   return iova;
>  }
> @@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
> *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(phys, size, dir);
>  
> - if (is_swiotlb_buffer(phys))
> + if (is_swiotlb_buffer(dev, phys))
>   swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> @@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct 
> device *dev,
>   return;
>  
>   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> - if (is_swiotlb_buffer(phys))
> + if (is_swiotlb_buffer(dev, phys))
>   swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
>   if (!dev_is_dma_coherent(dev))
> @@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (is_swiotlb_buffer(dev, sg_phys(sg)))
>   swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   sg->length, dir);
>   }
> @@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device 
> *dev,
>   return;
>  
>   for_each_sg(sgl, sg, nelems, i) {
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (is_swiotlb_buffer(dev, sg_phys(sg)))
>   swiotlb_sync_single_for_device(dev, sg_phys(sg),
>  sg->length, dir);
>  
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 4c89afc0df62..0c6ed09f8513 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
> dma_addr_t dma_addr)
>* in our domain. Therefore _only_ check address within our domain.
>*/
>   if (pfn_valid(PFN_DOWN(paddr)))
> - return is_swiotlb_buffer(paddr);
> + return is_swiotlb_buffer(dev, paddr);
>   return 0;
>  }
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..d1f3d95881cd 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -2,6 +2,7 @@
>  #ifndef __LINUX_SWIOTLB_H
>  #define __LINUX_SWIOTLB_H
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -101,9 +102,9 @@ struct io_tlb_mem {
>  };
>  extern struct io_tlb_mem *io_tlb_default_mem;
>  
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  
>   return mem && paddr >= mem->start && paddr < mem->end;
>  }
> @@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
>   return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..84c9feb5474a 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>   for_each_sg(sgl, sg,

Re: [PATCH v13 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Always have the pointer to the swiotlb pool used in struct device. This
> could help simplify the code for other pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 

> ---
>  drivers/base/core.c| 4 
>  include/linux/device.h | 4 
>  kernel/dma/swiotlb.c   | 8 
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index f29839382f81..cb3123e3954d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include  /* for dma_default_coherent */
>  
> @@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
>  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   dev->dma_coherent = dma_default_coherent;
>  #endif
> +#ifdef CONFIG_SWIOTLB
> + dev->dma_io_tlb_mem = io_tlb_default_mem;
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> diff --git a/include/linux/device.h b/include/linux/device.h
> index ba660731bd25..240d652a0696 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -416,6 +416,7 @@ struct dev_links_info {
>   * @dma_pools:   Dma pools (if dma'ble device).
>   * @dma_mem: Internal for coherent mem override.
>   * @cma_area:Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> @@ -518,6 +519,9 @@ struct device {
>  #ifdef CONFIG_DMA_CMA
>   struct cma *cma_area;   /* contiguous memory area for dma
>  allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
>  #endif
>   /* arch specific additions */
>   struct dev_archdata archdata;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 2dba659a1e73..de79e9437030 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -340,7 +340,7 @@ void __init swiotlb_exit(void)
>  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
> size,
>  enum dma_data_direction dir)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
>   unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
>   phys_addr_t orig_addr = mem->slots[index].orig_addr;
> @@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
> unsigned int index)
>  static int find_slots(struct device *dev, phys_addr_t orig_addr,
>   size_t alloc_size)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned long boundary_mask = dma_get_seg_boundary(dev);
>   dma_addr_t tbl_dma_addr =
>   phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
> @@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
> phys_addr_t orig_addr,
>   size_t mapping_size, size_t alloc_size,
>   enum dma_data_direction dir, unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>   unsigned int i;
>   int index;
> @@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> size_t mapping_size, enum dma_data_direction dir,
> unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
>   unsigned long flags;
>   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
>   int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> -- 
> 2.32.0.288.g62a8d224e6-goog
>

Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
> initialization to make the code reusable.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 
> ---
>  kernel/dma/swiotlb.c | 50 ++--
>  1 file changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 52e2ac526757..47bb2a766798 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
>  
> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> + unsigned long nslabs, bool late_alloc)
>  {
> + void *vaddr = phys_to_virt(start);
>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
> +
> + mem->nslabs = nslabs;
> + mem->start = start;
> + mem->end = mem->start + bytes;
> + mem->index = 0;
> + mem->late_alloc = late_alloc;
> + spin_lock_init(>lock);
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + mem->slots[i].alloc_size = 0;
> + }
> + memset(vaddr, 0, bytes);
> +}
> +
> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +{
>   struct io_tlb_mem *mem;
>   size_t alloc_size;
>  
> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   if (!mem)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
> - mem->nslabs = nslabs;
> - mem->start = __pa(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> +
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>  
>   io_tlb_default_mem = mem;
>   if (verbose)
> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>  int
>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>  {
> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>   struct io_tlb_mem *mem;
> + unsigned long bytes = nslabs << IO_TLB_SHIFT;
>  
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>   return 0;
> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
> nslabs)
>   if (!mem)
>   return -ENOMEM;
>  
> - mem->nslabs = nslabs;
> - mem->start = virt_to_phys(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - mem->late_alloc = 1;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> -
> + memset(mem, 0, sizeof(*mem));
> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
> - memset(tlb, 0, bytes);
 
This is good for swiotlb_late_init_with_tbl. However I have just noticed
that mem could also be allocated from swiotlb_init_with_tbl, in which
case the zeroing is missing. I think we need another memset in
swiotlb_init_with_tbl as well. Or maybe it could be better to have a
single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to
you.

[pull] drm/msm: drm-msm-next-2021-06-17 for v5.14

2021-06-17 Thread Rob Clark

Hi Dave & Daniel,

Here is msm-next for v5.14

Notable additions this time around:

* devcoredump support for display errors
* dpu: irq cleanup/refactor
* dpu: dt bindings conversion to yaml
* dsi: dt bindings conversion to yaml
* mdp5: alpha/blend_mode/zpos support
* mdp5: dynamic bandwidth management
* a6xx: cached coherent buffer support
* a660 support
* gpu iova fault improvements:
   - info about which block triggered the fault, etc
   - generation of gpu devcoredump on fault
* assortment of other cleanups and fixes

The following changes since commit c4681547bcce777daf576925a966ffa824edd09d:

  Linux 5.13-rc3 (2021-05-23 11:42:48 -1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git drm-msm-next-2021-06-17

for you to fetch changes up to 7e0230fd096c03e9662e66150f951075dd16e496:

  drm/msm/mdp5: provide dynamic bandwidth management (2021-06-17 09:51:44 -0700)


Abhinav Kumar (7):
  drm: allow drm_atomic_print_state() to accept any drm_printer
  drm/msm: add support to take dpu snapshot
  drm/msm/dsi: add API to take DSI register snapshot
  drm/msm/dp: add API to take DP register snapshot
  drm/msm/disp/dpu1: add API to take DPU register snapshot
  drm/msm: add support to take dsi, dp and dpu snapshot
  drm/msm: add disp snapshot points across dpu driver

Alexey Minnekhanov (1):
  drm/msm: Init mm_list before accessing it for use_vram path

Arnd Bergmann (1):
  drm/msm/dsi: fix 32-bit clang warning

Bernard Zhao (1):
  drm/msm: remove unneeded variable ret

Bhaskar Chowdhury (3):
  drm/msm/dpu: Fix a typo
  drm/msm/dpu: Fix a typo
  drm/msm/dp: Fixed couple of typos

Bjorn Andersson (1):
  drm/msm/dpu: Avoid ABBA deadlock between IRQ modules

Dmitry Baryshkov (23):
  drm/msm: pass dump state as a function argument
  drm/msm: make msm_disp_state transient data struct
  drm/msm: get rid of msm_iomap_size
  drm/msm/dsi: add DSI PHY registers to snapshot data
  drm/msm: fix display snapshotting if DP or DSI is disabled
  drm/msm/dpu: merge dpu_hw_intr_get_interrupt_statuses into
dpu_hw_intr_dispatch_irqs
  drm/msm/dpu: hw_intr: always call dpu_hw_intr_clear_intr_status_nolock
  drm/msm/dpu: define interrupt register names
  drm/msm/dpu: replace IRQ lookup with the data in hw catalog
  drm/msm/dpu: drop remains of old irq lookup subsystem
  drm/msm/dpu: simplify IRQ enabling/disabling
  drm/msm/dsi: print error code when MIPI DSI host registration fails
  drm/msm/dpu: remove unused dpu_hw_blk features
  drm/msm/dpu: drop dpu_hw_blk_destroy function
  drm/msm/dpu: use struct dpu_hw_merge_3d in dpu_hw_pingpong
  drm/msm/dpu: hw_blk: make dpu_hw_blk empty opaque structure
  drm/msm/dsi: do not enable PHYs when called for the slave DSI interface
  drm/msm/mdp5: use drm atomic helpers to handle base drm plane state
  drm/msm/mdp5: use drm_plane_state for storing alpha value
  drm/msm/mdp5: use drm_plane_state for pixel blend mode
  drm/msm/mdp5: add support for alpha/blend_mode properties
  drm/msm/mdp5: switch to standard zpos property
  drm/msm/mdp5: provide dynamic bandwidth management

Guenter Roeck (2):
  drm/msm/dp: Drop unnecessary NULL checks after container_of
  drm/msm/dpu: Drop unnecessary NULL checks after container_of in
dpu_encoder

James Willcox (1):
  drm/msm/mdp5: add perf blocks for holding fudge factors

Jonathan Marek (13):
  drm/msm: remove unnecessary mmap logic for cached BOs
  drm/msm: replace MSM_BO_UNCACHED with MSM_BO_WC for internal objects
  drm/msm: use the right pgprot when mapping BOs in the kernel
  drm/msm: add MSM_BO_CACHED_COHERENT
  drm/msm: deprecate MSM_BO_UNCACHED (map as writecombine instead)
  drm/msm/a6xx: update/fix CP_PROTECT initialization
  drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650
  drm/msm/a6xx: avoid shadow NULL reference in failure path
  drm/msm: remove unused icc_path/ocmem_icc_path
  drm/msm/a6xx: use AOP-initialized PDC for a650
  drm/msm/a6xx: add GMU_CX_GMU_CX_FALNEXT_INTF write for a650
  drm/msm/a6xx: add missing PC_DBG_ECO_CNTL bit for a640/a650
  drm/msm/a6xx: add support for Adreno 660 GPU

Jordan Crouse (3):
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get
pagefault info
  drm/msm: Improve the a6xx page fault handler

Krishna Manikandan (5):
  dt-bindings: msm: disp: add yaml schemas for DPU bindings
  dt-bindings: msm: dsi: add yaml schemas for DSI bindings
  dt-bindings: msm: dsi: add yaml schemas for DSI PHY bindings
  dt-bindings: msm/dp: Add bindings of MSM DisplayPort controller
  drm/msm/disp/dpu1: avoid perf update in frame done event

Kuogee Hsieh (2):
  drm/msm/dp: handle irq_hpd with sink_count =

Re: [PATCH] drm/msm/dp: add logs across DP driver for ease of debugging

2021-06-17 Thread Stephen Boyd

Quoting maitreye (2021-06-16 18:08:54)
> From: Maitreyee Rao 
>
> Add trace points across the MSM DP driver to help debug
> interop issues.
>
> Signed-off-by: Maitreyee Rao 
> ---
>  drivers/gpu/drm/msm/dp/dp_aux.c |  5 +++--
>  drivers/gpu/drm/msm/dp/dp_catalog.c |  4 
>  drivers/gpu/drm/msm/dp/dp_ctrl.c|  7 +++
>  drivers/gpu/drm/msm/dp/dp_display.c | 16 
>  drivers/gpu/drm/msm/dp/dp_link.c| 20 +---
>  drivers/gpu/drm/msm/dp/dp_panel.c   |  2 ++
>  drivers/gpu/drm/msm/dp/dp_power.c   |  3 +++
>  7 files changed, 48 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/dp/dp_aux.c b/drivers/gpu/drm/msm/dp/dp_aux.c
> index 4a3293b..5fdff18d 100644
> --- a/drivers/gpu/drm/msm/dp/dp_aux.c
> +++ b/drivers/gpu/drm/msm/dp/dp_aux.c
> @@ -121,9 +121,10 @@ static ssize_t dp_aux_cmd_fifo_tx(struct dp_aux_private 
> *aux,
>
> time_left = wait_for_completion_timeout(>comp,
> msecs_to_jiffies(250));
> -   if (!time_left)
> +   if (!time_left) {
> +   DRM_DEBUG_DP("%s aux timeout error timeout:%lu\n", __func__, 
> time_left);

This will always print 0 for "no time left". Is that useful to know? I'd
rather we just drop that. Also, __func__ shouldn't be needed given that
__drm_dbg() uses builtin_return_address(). And then, I believe the DP
aux core code already adds logs on the transfer to indicate how it
failed, so probably this whole line can be dropped.

> return -ETIMEDOUT;
> -
> +   }
> return ret;
>  }
>
> diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
> b/drivers/gpu/drm/msm/dp/dp_catalog.c
> index 32f3575..5de5dcd 100644
> --- a/drivers/gpu/drm/msm/dp/dp_catalog.c
> +++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
> @@ -372,6 +372,7 @@ void dp_catalog_ctrl_mainlink_ctrl(struct dp_catalog 
> *dp_catalog,
> struct dp_catalog_private *catalog = container_of(dp_catalog,
> struct dp_catalog_private, dp_catalog);
>
> +   DRM_DEBUG_DP("%s enable=0x%x\n", __func__, enable);

Again, drop __func__. 'enable' is a bool, why is printed in hex format?

> if (enable) {
> /*
>  * To make sure link reg writes happens before other 
> operation,
> @@ -580,6 +581,7 @@ void dp_catalog_hpd_config_intr(struct dp_catalog 
> *dp_catalog,
>
> config = (en ? config | intr_mask : config & ~intr_mask);
>
> +   DRM_DEBUG_DP("%s intr_mask=0x%x config=0x%x\n", __func__, intr_mask, 
> config);
> dp_write_aux(catalog, REG_DP_DP_HPD_INT_MASK,
> config & DP_DP_HPD_INT_MASK);
>  }
> @@ -610,6 +612,7 @@ u32 dp_catalog_link_is_connected(struct dp_catalog 
> *dp_catalog)
> u32 status;
>
> status = dp_read_aux(catalog, REG_DP_DP_HPD_INT_STATUS);
> +   DRM_DEBUG_DP("%s aux status:0x%x\n", __func__, status);
> status >>= DP_DP_HPD_STATE_STATUS_BITS_SHIFT;
> status &= DP_DP_HPD_STATE_STATUS_BITS_MASK;
>
> @@ -685,6 +688,7 @@ void dp_catalog_ctrl_send_phy_pattern(struct dp_catalog 
> *dp_catalog,
> /* Make sure to clear the current pattern before starting a new one */
> dp_write_link(catalog, REG_DP_STATE_CTRL, 0x0);
>
> +   DRM_DEBUG_DP("%s pattern:0x%x\n", __func__, pattern);
> switch (pattern) {
> case DP_PHY_TEST_PATTERN_D10_2:
> dp_write_link(catalog, REG_DP_STATE_CTRL,
> diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c 
> b/drivers/gpu/drm/msm/dp/dp_ctrl.c
> index 2a8955c..7fd1e3f 100644
> --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
> +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
> @@ -99,6 +99,7 @@ static int dp_aux_link_configure(struct drm_dp_aux *aux,
> values[0] = drm_dp_link_rate_to_bw_code(link->rate);
> values[1] = link->num_lanes;
>
> +   DRM_DEBUG_DP("%s value0:0x%x value1:0x%x\n", __func__, values[0], 
> values[1]);

The drm_dp_dpcd_write() soon after should tell us what this is, so is
this necessary?

> if (link->capabilities & DP_LINK_CAP_ENHANCED_FRAMING)
> values[1] |= DP_LANE_COUNT_ENHANCED_FRAME_EN;
>
> @@ -122,6 +123,7 @@ void dp_ctrl_push_idle(struct dp_ctrl *dp_ctrl)
> IDLE_PATTERN_COMPLETION_TIMEOUT_JIFFIES))
> pr_warn("PUSH_IDLE pattern timedout\n");
>
> +   DRM_DEBUG_DP("PUSH IDLE\n");
> pr_debug("mainlink off done\n");

Can these two printks be combined into one DRM_DEBUG_DP()?

>  }
>
> @@ -1013,6 +1015,8 @@ static int dp_ctrl_update_vx_px(struct dp_ctrl_private 
> *ctrl)
> u32 voltage_swing_level = link->phy_params.v_level;
> u32 pre_emphasis_level = link->phy_params.p_level;
>
> +   DRM_DEBUG_DP("%s: voltage level:%d emphasis level:%d\n", __func__,

Can we unstick the colon : from the printk format?

voltage level: %d emphasis level: %d

> +   voltage_swing_level, pre_emphasis_level);
> ret =

[PATCH v2 7/7] drm/msm/dpu: remove struct dpu_encoder_irq and enum dpu_intr_idx

2021-06-17 Thread Dmitry Baryshkov

Drop the wrapping structures and the enum used to index those structures
in dpu_kms. Instead of them use IRQ indices and callback functions
directly.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 47 +-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  | 48 +++---
 .../drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c  | 94 +++
 .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  | 53 ---
 drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h | 12 +--
 5 files changed, 92 insertions(+), 162 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 3d8864df8605..55ae3ede5846 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -241,11 +241,11 @@ static void _dpu_encoder_setup_dither(struct 
dpu_hw_pingpong *hw_pp, unsigned bp
 }
 
 void dpu_encoder_helper_report_irq_timeout(struct dpu_encoder_phys *phys_enc,
-   enum dpu_intr_idx intr_idx)
+   int irq_idx)
 {
DRM_ERROR("irq timeout id=%u, intf=%d, pp=%d, intr=%d\n",
  DRMID(phys_enc->parent), phys_enc->intf_idx - INTF_0,
- phys_enc->hw_pp->idx - PINGPONG_0, intr_idx);
+ phys_enc->hw_pp->idx - PINGPONG_0, irq_idx);
 
if (phys_enc->parent_ops->handle_frame_done)
phys_enc->parent_ops->handle_frame_done(
@@ -257,75 +257,70 @@ static int dpu_encoder_helper_wait_event_timeout(int32_t 
drm_id,
u32 irq_idx, struct dpu_encoder_wait_info *info);
 
 int dpu_encoder_helper_wait_for_irq(struct dpu_encoder_phys *phys_enc,
-   enum dpu_intr_idx intr_idx,
+   int irq_idx, void (*irq_cb)(void *, int),
struct dpu_encoder_wait_info *wait_info)
 {
-   struct dpu_encoder_irq *irq;
u32 irq_status;
int ret;
 
-   if (!wait_info || intr_idx >= INTR_IDX_MAX) {
+   if (!wait_info || irq_idx < 0) {
DPU_ERROR("invalid params\n");
return -EINVAL;
}
-   irq = _enc->irq[intr_idx];
 
/* note: do master / slave checking outside */
 
/* return EWOULDBLOCK since we know the wait isn't necessary */
if (phys_enc->enable_state == DPU_ENC_DISABLED) {
-   DRM_ERROR("encoder is disabled id=%u, intr=%d, irq=%d",
- DRMID(phys_enc->parent), intr_idx,
- irq->irq_idx);
+   DRM_ERROR("encoder is disabled id=%u, irq=%d",
+ DRMID(phys_enc->parent), irq_idx);
return -EWOULDBLOCK;
}
 
-   if (irq->irq_idx < 0) {
-   DRM_DEBUG_KMS("skip irq wait id=%u, intr=%d, irq=%s",
- DRMID(phys_enc->parent), intr_idx,
- irq->name);
+   if (irq_idx < 0) {
+   DRM_DEBUG_KMS("skip irq wait id=%u", DRMID(phys_enc->parent));
return 0;
}
 
-   DRM_DEBUG_KMS("id=%u, intr=%d, irq=%d, pp=%d, pending_cnt=%d",
- DRMID(phys_enc->parent), intr_idx,
- irq->irq_idx, phys_enc->hw_pp->idx - PINGPONG_0,
+   DRM_DEBUG_KMS("id=%u, irq=%d, pp=%d, pending_cnt=%d",
+ DRMID(phys_enc->parent),
+ irq_idx, phys_enc->hw_pp->idx - PINGPONG_0,
  atomic_read(wait_info->atomic_cnt));
 
ret = dpu_encoder_helper_wait_event_timeout(
DRMID(phys_enc->parent),
-   irq->irq_idx,
+   irq_idx,
wait_info);
 
if (ret <= 0) {
irq_status = dpu_core_irq_read(phys_enc->dpu_kms,
-   irq->irq_idx, true);
+   irq_idx, true);
if (irq_status) {
unsigned long flags;
 
-   DRM_DEBUG_KMS("irq not triggered id=%u, intr=%d, "
+   DRM_DEBUG_KMS("irq not triggered id=%u, "
  "irq=%d, pp=%d, atomic_cnt=%d",
- DRMID(phys_enc->parent), intr_idx,
- irq->irq_idx,
+ DRMID(phys_enc->parent),
+ irq_idx,
  phys_enc->hw_pp->idx - PINGPONG_0,
  atomic_read(wait_info->atomic_cnt));
local_irq_save(flags);
-   irq->func(phys_enc, irq->irq_idx);
+   irq_cb(phys_enc, irq_idx);
local_irq_restore(flags);
ret = 0;
} else {
ret = -ETIMEDOUT;
-   DRM_DEBUG_KMS("irq timeout id=%u, intr=%d, "
+   DRM_DEBUG_KMS("irq timeout id=%u, "

[PATCH v2 4/7] drm/msm/dpu: allow just single IRQ callback

2021-06-17 Thread Dmitry Baryshkov

DPU interrupts code allows multiple callbacks per interrut. In reality
none of the interrupts is shared between blocks (and will probably never
be). Drop support for registering multiple callbacks per interrupt to
simplify interrupt handling code.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h  |  18 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   |   6 +-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |   2 +-
 .../drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c  |  10 +-
 .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  |   6 +-
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 144 +++---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.h |  12 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   |  12 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h |  10 +-
 9 files changed, 86 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
index 90ae6c9ccc95..44ab97fb2964 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
@@ -46,10 +46,8 @@ u32 dpu_core_irq_read(
  * interrupt
  * @dpu_kms:   DPU handle
  * @irq_idx:   irq index
- * @irq_cb:IRQ callback structure, containing callback function
- * and argument. Passing NULL for irq_cb will unregister
- * the callback for the given irq_idx
- * This must exist until un-registration.
+ * @irq_cb:IRQ callback funcion.
+ * @irq_arg:   IRQ callback argument.
  * @return:0 for success registering callback, otherwise failure
  *
  * This function supports registration of multiple callbacks for each 
interrupt.
@@ -57,17 +55,16 @@ u32 dpu_core_irq_read(
 int dpu_core_irq_register_callback(
struct dpu_kms *dpu_kms,
int irq_idx,
-   struct dpu_irq_callback *irq_cb);
+   void (*irq_cb)(void *arg, int irq_idx),
+   void *irq_arg);
 
 /**
  * dpu_core_irq_unregister_callback - For unregistering callback function on 
IRQ
  * interrupt
  * @dpu_kms:   DPU handle
  * @irq_idx:   irq index
- * @irq_cb:IRQ callback structure, containing callback function
- * and argument. Passing NULL for irq_cb will unregister
- * the callback for the given irq_idx
- * This must match with registration.
+ * @irq_cb:IRQ callback funcion.
+ * @irq_arg:   IRQ callback argument.
  * @return:0 for success registering callback, otherwise failure
  *
  * This function supports registration of multiple callbacks for each 
interrupt.
@@ -75,7 +72,8 @@ int dpu_core_irq_register_callback(
 int dpu_core_irq_unregister_callback(
struct dpu_kms *dpu_kms,
int irq_idx,
-   struct dpu_irq_callback *irq_cb);
+   void (*irq_cb)(void *arg, int irq_idx),
+   void *irq_arg);
 
 /**
  * dpu_debugfs_core_irq_init - register core irq debugfs
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 1c04b7cce43e..d3557b0f4db9 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -310,7 +310,7 @@ int dpu_encoder_helper_wait_for_irq(struct dpu_encoder_phys 
*phys_enc,
  phys_enc->hw_pp->idx - PINGPONG_0,
  atomic_read(wait_info->atomic_cnt));
local_irq_save(flags);
-   irq->cb.func(phys_enc, irq->irq_idx);
+   irq->func(phys_enc, irq->irq_idx);
local_irq_restore(flags);
ret = 0;
} else {
@@ -352,7 +352,7 @@ int dpu_encoder_helper_register_irq(struct dpu_encoder_phys 
*phys_enc,
}
 
ret = dpu_core_irq_register_callback(phys_enc->dpu_kms, irq->irq_idx,
-   >cb);
+   irq->func, phys_enc);
if (ret) {
DPU_ERROR_PHYS(phys_enc,
"failed to register IRQ callback for %s\n",
@@ -384,7 +384,7 @@ int dpu_encoder_helper_unregister_irq(struct 
dpu_encoder_phys *phys_enc,
}
 
ret = dpu_core_irq_unregister_callback(phys_enc->dpu_kms, irq->irq_idx,
-   >cb);
+   irq->func, phys_enc);
if (ret) {
DRM_ERROR("unreg cb fail id=%u, intr=%d, irq=%d ret=%d",
  DRMID(phys_enc->parent), intr_idx,
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index e7270eb6b84b..80d87871fd94 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++

[PATCH v2 6/7] drm/msm/dpu: get rid of dpu_encoder_helper_(un)register_irq

2021-06-17 Thread Dmitry Baryshkov

Get rid of dpu_encoder_helper_register_irq/unregister_irq helpers, call
dpu_core_register/unregister_callback directly, without surrounding them
with helpers.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 64 ---
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  | 18 --
 .../drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c  | 39 +++
 .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  | 21 --
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c |  4 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h | 29 +++--
 6 files changed, 56 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index d3557b0f4db9..3d8864df8605 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -333,70 +333,6 @@ int dpu_encoder_helper_wait_for_irq(struct 
dpu_encoder_phys *phys_enc,
return ret;
 }
 
-int dpu_encoder_helper_register_irq(struct dpu_encoder_phys *phys_enc,
-   enum dpu_intr_idx intr_idx)
-{
-   struct dpu_encoder_irq *irq;
-   int ret = 0;
-
-   if (intr_idx >= INTR_IDX_MAX) {
-   DPU_ERROR("invalid params\n");
-   return -EINVAL;
-   }
-   irq = _enc->irq[intr_idx];
-
-   if (irq->irq_idx < 0) {
-   DPU_ERROR_PHYS(phys_enc,
-   "invalid IRQ index:%d\n", irq->irq_idx);
-   return -EINVAL;
-   }
-
-   ret = dpu_core_irq_register_callback(phys_enc->dpu_kms, irq->irq_idx,
-   irq->func, phys_enc);
-   if (ret) {
-   DPU_ERROR_PHYS(phys_enc,
-   "failed to register IRQ callback for %s\n",
-   irq->name);
-   irq->irq_idx = -EINVAL;
-   return ret;
-   }
-
-   trace_dpu_enc_irq_register_success(DRMID(phys_enc->parent), intr_idx,
-   irq->irq_idx);
-
-   return ret;
-}
-
-int dpu_encoder_helper_unregister_irq(struct dpu_encoder_phys *phys_enc,
-   enum dpu_intr_idx intr_idx)
-{
-   struct dpu_encoder_irq *irq;
-   int ret;
-
-   irq = _enc->irq[intr_idx];
-
-   /* silently skip irqs that weren't registered */
-   if (irq->irq_idx < 0) {
-   DRM_ERROR("duplicate unregister id=%u, intr=%d, irq=%d",
- DRMID(phys_enc->parent), intr_idx,
- irq->irq_idx);
-   return 0;
-   }
-
-   ret = dpu_core_irq_unregister_callback(phys_enc->dpu_kms, irq->irq_idx,
-   irq->func, phys_enc);
-   if (ret) {
-   DRM_ERROR("unreg cb fail id=%u, intr=%d, irq=%d ret=%d",
- DRMID(phys_enc->parent), intr_idx,
- irq->irq_idx, ret);
-   }
-
-   trace_dpu_enc_irq_unregister_success(DRMID(phys_enc->parent), intr_idx,
-irq->irq_idx);
-
-   return 0;
-}
-
 int dpu_encoder_get_frame_count(struct drm_encoder *drm_enc)
 {
struct dpu_encoder_virt *dpu_enc;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 80d87871fd94..ff2218155b44 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -364,22 +364,4 @@ int dpu_encoder_helper_wait_for_irq(struct 
dpu_encoder_phys *phys_enc,
enum dpu_intr_idx intr_idx,
struct dpu_encoder_wait_info *wait_info);
 
-/**
- * dpu_encoder_helper_register_irq - register and enable an irq
- * @phys_enc: Pointer to physical encoder structure
- * @intr_idx: encoder interrupt index
- * @Return: 0 or -ERROR
- */
-int dpu_encoder_helper_register_irq(struct dpu_encoder_phys *phys_enc,
-   enum dpu_intr_idx intr_idx);
-
-/**
- * dpu_encoder_helper_unregister_irq - unregister and disable an irq
- * @phys_enc: Pointer to physical encoder structure
- * @intr_idx: encoder interrupt index
- * @Return: 0 or -ERROR
- */
-int dpu_encoder_helper_unregister_irq(struct dpu_encoder_phys *phys_enc,
-   enum dpu_intr_idx intr_idx);
-
 #endif /* __dpu_encoder_phys_H__ */
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c
index f921a5c99456..4bfeac821f51 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c
@@ -211,7 +211,9 @@ static int _dpu_encoder_phys_cmd_handle_ppdone_timeout(
  cmd_enc->pp_timeout_report_cnt,
  atomic_read(_enc->pending_kickoff_cnt));
msm_disp_snapshot_state(drm_enc->dev);
-   dpu_encoder_helper_unregister_irq(phys_enc, INTR_IDX_RDPTR);
+   dpu_core_irq_unregister_callback(phys_enc->dpu_kms,
+

[PATCH v2 5/7] drm/msm/dpu: remove extra wrappers around dpu_core_irq

2021-06-17 Thread Dmitry Baryshkov

Remove extra dpu_irq_* wrappers from dpu_kms.c, merge them directly into
dpu_core_irq_* functions.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h  | 12 -
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c |  9 ---
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c   | 27 +++
 3 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
index 44ab97fb2964..afc8cd546368 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
@@ -10,24 +10,24 @@
 
 /**
  * dpu_core_irq_preinstall - perform pre-installation of core IRQ handler
- * @dpu_kms:   DPU handle
+ * @kms:   MSM KMS handle
  * @return:none
  */
-void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms);
+void dpu_core_irq_preinstall(struct msm_kms *kms);
 
 /**
  * dpu_core_irq_uninstall - uninstall core IRQ handler
- * @dpu_kms:   DPU handle
+ * @kms:   MSM KMS handle
  * @return:none
  */
-void dpu_core_irq_uninstall(struct dpu_kms *dpu_kms);
+void dpu_core_irq_uninstall(struct msm_kms *kms);
 
 /**
  * dpu_core_irq - core IRQ handler
- * @dpu_kms:   DPU handle
+ * @kms:   MSM KMS handle
  * @return:interrupt handling status
  */
-irqreturn_t dpu_core_irq(struct dpu_kms *dpu_kms);
+irqreturn_t dpu_core_irq(struct msm_kms *kms);
 
 /**
  * dpu_core_irq_read - IRQ helper function for reading IRQ status
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
index 7062e7f0e860..9b74cfdf5355 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
@@ -140,8 +140,9 @@ static void dpu_core_irq_callback_handler(struct dpu_kms 
*dpu_kms, int irq_idx)

dpu_kms->hw_intr->irq_tbl[irq_idx].cb(dpu_kms->hw_intr->irq_tbl[irq_idx].arg, 
irq_idx);
 }
 
-irqreturn_t dpu_core_irq(struct dpu_kms *dpu_kms)
+irqreturn_t dpu_core_irq(struct msm_kms *kms)
 {
+   struct dpu_kms *dpu_kms = to_dpu_kms(kms);
struct dpu_hw_intr *intr = dpu_kms->hw_intr;
int reg_idx;
int irq_idx;
@@ -526,8 +527,9 @@ void dpu_debugfs_core_irq_init(struct dpu_kms *dpu_kms,
 }
 #endif
 
-void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms)
+void dpu_core_irq_preinstall(struct msm_kms *kms)
 {
+   struct dpu_kms *dpu_kms = to_dpu_kms(kms);
int i;
 
pm_runtime_get_sync(_kms->pdev->dev);
@@ -539,8 +541,9 @@ void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms)
atomic_set(_kms->hw_intr->irq_tbl[i].count, 0);
 }
 
-void dpu_core_irq_uninstall(struct dpu_kms *dpu_kms)
+void dpu_core_irq_uninstall(struct msm_kms *kms)
 {
+   struct dpu_kms *dpu_kms = to_dpu_kms(kms);
int i;
 
pm_runtime_get_sync(_kms->pdev->dev);
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 1d3a4f395e74..c1706205a514 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -761,20 +761,6 @@ static void _dpu_kms_set_encoder_mode(struct msm_kms *kms,
encoder->base.id, rc);
 }
 
-static irqreturn_t dpu_irq(struct msm_kms *kms)
-{
-   struct dpu_kms *dpu_kms = to_dpu_kms(kms);
-
-   return dpu_core_irq(dpu_kms);
-}
-
-static void dpu_irq_preinstall(struct msm_kms *kms)
-{
-   struct dpu_kms *dpu_kms = to_dpu_kms(kms);
-
-   dpu_core_irq_preinstall(dpu_kms);
-}
-
 static int dpu_irq_postinstall(struct msm_kms *kms)
 {
struct msm_drm_private *priv;
@@ -792,13 +778,6 @@ static int dpu_irq_postinstall(struct msm_kms *kms)
return 0;
 }
 
-static void dpu_irq_uninstall(struct msm_kms *kms)
-{
-   struct dpu_kms *dpu_kms = to_dpu_kms(kms);
-
-   dpu_core_irq_uninstall(dpu_kms);
-}
-
 static void dpu_kms_mdp_snapshot(struct msm_disp_state *disp_state, struct 
msm_kms *kms)
 {
int i;
@@ -846,10 +825,10 @@ static void dpu_kms_mdp_snapshot(struct msm_disp_state 
*disp_state, struct msm_k
 
 static const struct msm_kms_funcs kms_funcs = {
.hw_init = dpu_kms_hw_init,
-   .irq_preinstall  = dpu_irq_preinstall,
+   .irq_preinstall  = dpu_core_irq_preinstall,
.irq_postinstall = dpu_irq_postinstall,
-   .irq_uninstall   = dpu_irq_uninstall,
-   .irq = dpu_irq,
+   .irq_uninstall   = dpu_core_irq_uninstall,
+   .irq = dpu_core_irq,
.enable_commit   = dpu_kms_enable_commit,
.disable_commit  = dpu_kms_disable_commit,
.vsync_time  = dpu_kms_vsync_time,
-- 
2.30.2

[PATCH v2 2/7] drm/msm/dpu: don't clear IRQ register twice

2021-06-17 Thread Dmitry Baryshkov

We already clear the IRQ status register before processing IRQs, so do
not clear the register again. Especially do not clear the IRQ status
_after_ processing the IRQ as this way we can loose the event.

Signed-off-by: Dmitry Baryshkov 
---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c   | 17 -
 1 file changed, 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
index 2437b0c7c073..28e9b0d448db 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
@@ -120,21 +120,6 @@ static const struct dpu_intr_reg dpu_intr_set[] = {
 #define DPU_IRQ_REG(irq_idx)   (irq_idx / 32)
 #define DPU_IRQ_MASK(irq_idx)  (BIT(irq_idx % 32))
 
-static void dpu_hw_intr_clear_intr_status_nolock(struct dpu_hw_intr *intr,
-   int irq_idx)
-{
-   int reg_idx;
-
-   if (!intr)
-   return;
-
-   reg_idx = DPU_IRQ_REG(irq_idx);
-   DPU_REG_WRITE(>hw, dpu_intr_set[reg_idx].clr_off, 
DPU_IRQ_MASK(irq_idx));
-
-   /* ensure register writes go through */
-   wmb();
-}
-
 /**
  * dpu_core_irq_callback_handler - dispatch core interrupts
  * @arg:   private data of callback handler
@@ -203,8 +188,6 @@ irqreturn_t dpu_core_irq(struct dpu_kms *dpu_kms)
 
dpu_core_irq_callback_handler(dpu_kms, irq_idx);
 
-   dpu_hw_intr_clear_intr_status_nolock(intr, irq_idx);
-
/*
 * When callback finish, clear the irq_status
 * with the matching mask. Once irq_status
-- 
2.30.2

[PATCH v2 3/7] drm/msm/dpu: merge struct dpu_irq into struct dpu_hw_intr

2021-06-17 Thread Dmitry Baryshkov

As dpu_core_irq was merged into dpu_hw_intr, merge data structures too,
removing the need for a separate data structure.

Signed-off-by: Dmitry Baryshkov 
---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 51 +--
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.h |  5 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 13 -
 3 files changed, 28 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
index 28e9b0d448db..d2b6dca487e3 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
@@ -127,20 +127,19 @@ static const struct dpu_intr_reg dpu_intr_set[] = {
  */
 static void dpu_core_irq_callback_handler(struct dpu_kms *dpu_kms, int irq_idx)
 {
-   struct dpu_irq *irq_obj = _kms->irq_obj;
struct dpu_irq_callback *cb;
 
VERB("irq_idx=%d\n", irq_idx);
 
-   if (list_empty(_obj->irq_cb_tbl[irq_idx]))
+   if (list_empty(_kms->hw_intr->irq_cb_tbl[irq_idx]))
DRM_ERROR("no registered cb, idx:%d\n", irq_idx);
 
-   atomic_inc(_obj->irq_counts[irq_idx]);
+   atomic_inc(_kms->hw_intr->irq_counts[irq_idx]);
 
/*
 * Perform registered function callback
 */
-   list_for_each_entry(cb, _obj->irq_cb_tbl[irq_idx], list)
+   list_for_each_entry(cb, _kms->hw_intr->irq_cb_tbl[irq_idx], list)
if (cb->func)
cb->func(cb->arg, irq_idx);
 }
@@ -420,6 +419,10 @@ void dpu_hw_intr_destroy(struct dpu_hw_intr *intr)
 {
if (intr) {
kfree(intr->cache_irq_mask);
+
+   kfree(intr->irq_cb_tbl);
+   kfree(intr->irq_counts);
+
kfree(intr);
}
 }
@@ -429,7 +432,7 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, 
int irq_idx,
 {
unsigned long irq_flags;
 
-   if (!dpu_kms->irq_obj.irq_cb_tbl) {
+   if (!dpu_kms->hw_intr->irq_cb_tbl) {
DPU_ERROR("invalid params\n");
return -EINVAL;
}
@@ -453,9 +456,9 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, 
int irq_idx,
trace_dpu_core_irq_register_callback(irq_idx, register_irq_cb);
list_del_init(_irq_cb->list);
list_add_tail(_irq_cb->list,
-   _kms->irq_obj.irq_cb_tbl[irq_idx]);
+   _kms->hw_intr->irq_cb_tbl[irq_idx]);
if (list_is_first(_irq_cb->list,
-   _kms->irq_obj.irq_cb_tbl[irq_idx])) {
+   _kms->hw_intr->irq_cb_tbl[irq_idx])) {
int ret = dpu_hw_intr_enable_irq_locked(
dpu_kms->hw_intr,
irq_idx);
@@ -473,7 +476,7 @@ int dpu_core_irq_unregister_callback(struct dpu_kms 
*dpu_kms, int irq_idx,
 {
unsigned long irq_flags;
 
-   if (!dpu_kms->irq_obj.irq_cb_tbl) {
+   if (!dpu_kms->hw_intr->irq_cb_tbl) {
DPU_ERROR("invalid params\n");
return -EINVAL;
}
@@ -497,7 +500,7 @@ int dpu_core_irq_unregister_callback(struct dpu_kms 
*dpu_kms, int irq_idx,
trace_dpu_core_irq_unregister_callback(irq_idx, register_irq_cb);
list_del_init(_irq_cb->list);
/* empty callback list but interrupt is still enabled */
-   if (list_empty(_kms->irq_obj.irq_cb_tbl[irq_idx])) {
+   if (list_empty(_kms->hw_intr->irq_cb_tbl[irq_idx])) {
int ret = dpu_hw_intr_disable_irq_locked(
dpu_kms->hw_intr,
irq_idx);
@@ -515,19 +518,18 @@ int dpu_core_irq_unregister_callback(struct dpu_kms 
*dpu_kms, int irq_idx,
 static int dpu_debugfs_core_irq_show(struct seq_file *s, void *v)
 {
struct dpu_kms *dpu_kms = s->private;
-   struct dpu_irq *irq_obj = _kms->irq_obj;
struct dpu_irq_callback *cb;
unsigned long irq_flags;
int i, irq_count, cb_count;
 
-   if (WARN_ON(!irq_obj->irq_cb_tbl))
+   if (WARN_ON(!dpu_kms->hw_intr->irq_cb_tbl))
return 0;
 
-   for (i = 0; i < irq_obj->total_irqs; i++) {
+   for (i = 0; i < dpu_kms->hw_intr->total_irqs; i++) {
spin_lock_irqsave(_kms->hw_intr->irq_lock, irq_flags);
cb_count = 0;
-   irq_count = atomic_read(_obj->irq_counts[i]);
-   list_for_each_entry(cb, _obj->irq_cb_tbl[i], list)
+   irq_count = atomic_read(_kms->hw_intr->irq_counts[i]);
+   list_for_each_entry(cb, _kms->hw_intr->irq_cb_tbl[i], list)
cb_count++;
spin_unlock_irqrestore(_kms->hw_intr->irq_lock, irq_flags);
 
@@ -559,14 +561,13 @@ void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms)
pm_runtime_put_sync(_kms->pdev->dev);
 
/* Create irq callbacks for all possible irq_idx */
-   dpu_kms->irq_obj.total_irqs = dpu_kms->hw_intr->total_irqs;

[PATCH v2 1/7] drm/msm/dpu: squash dpu_core_irq into dpu_hw_interrupts

2021-06-17 Thread Dmitry Baryshkov

With dpu_core_irq being the wrapper around dpu_hw_interrupts, there is
little sense in having them separate. Squash them together to remove
another layer of abstraction (hw_intr ops).

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/Makefile  |   1 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c  | 256 -
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 269 ++
 .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.h |  87 --
 4 files changed, 214 insertions(+), 399 deletions(-)
 delete mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 2c00aa70b708..a5245e8d0f14 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -51,7 +51,6 @@ msm-y := \
disp/mdp5/mdp5_mixer.o \
disp/mdp5/mdp5_plane.o \
disp/mdp5/mdp5_smp.o \
-   disp/dpu1/dpu_core_irq.o \
disp/dpu1/dpu_core_perf.o \
disp/dpu1/dpu_crtc.o \
disp/dpu1/dpu_encoder.o \
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
deleted file mode 100644
index d2457490930b..
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
+++ /dev/null
@@ -1,256 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/* Copyright (c) 2015-2018, The Linux Foundation. All rights reserved.
- */
-
-#define pr_fmt(fmt)"[drm:%s:%d] " fmt, __func__, __LINE__
-
-#include 
-#include 
-#include 
-#include 
-
-#include "dpu_core_irq.h"
-#include "dpu_trace.h"
-
-/**
- * dpu_core_irq_callback_handler - dispatch core interrupts
- * @arg:   private data of callback handler
- * @irq_idx:   interrupt index
- */
-static void dpu_core_irq_callback_handler(void *arg, int irq_idx)
-{
-   struct dpu_kms *dpu_kms = arg;
-   struct dpu_irq *irq_obj = _kms->irq_obj;
-   struct dpu_irq_callback *cb;
-
-   VERB("irq_idx=%d\n", irq_idx);
-
-   if (list_empty(_obj->irq_cb_tbl[irq_idx]))
-   DRM_ERROR("no registered cb, idx:%d\n", irq_idx);
-
-   atomic_inc(_obj->irq_counts[irq_idx]);
-
-   /*
-* Perform registered function callback
-*/
-   list_for_each_entry(cb, _obj->irq_cb_tbl[irq_idx], list)
-   if (cb->func)
-   cb->func(cb->arg, irq_idx);
-}
-
-u32 dpu_core_irq_read(struct dpu_kms *dpu_kms, int irq_idx, bool clear)
-{
-   if (!dpu_kms->hw_intr ||
-   !dpu_kms->hw_intr->ops.get_interrupt_status)
-   return 0;
-
-   if (irq_idx < 0) {
-   DPU_ERROR("[%pS] invalid irq_idx=%d\n",
-   __builtin_return_address(0), irq_idx);
-   return 0;
-   }
-
-   return dpu_kms->hw_intr->ops.get_interrupt_status(dpu_kms->hw_intr,
-   irq_idx, clear);
-}
-
-int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx,
-   struct dpu_irq_callback *register_irq_cb)
-{
-   unsigned long irq_flags;
-
-   if (!dpu_kms->irq_obj.irq_cb_tbl) {
-   DPU_ERROR("invalid params\n");
-   return -EINVAL;
-   }
-
-   if (!register_irq_cb || !register_irq_cb->func) {
-   DPU_ERROR("invalid irq_cb:%d func:%d\n",
-   register_irq_cb != NULL,
-   register_irq_cb ?
-   register_irq_cb->func != NULL : -1);
-   return -EINVAL;
-   }
-
-   if (irq_idx < 0 || irq_idx >= dpu_kms->hw_intr->total_irqs) {
-   DPU_ERROR("invalid IRQ index: [%d]\n", irq_idx);
-   return -EINVAL;
-   }
-
-   VERB("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx);
-
-   irq_flags = dpu_kms->hw_intr->ops.lock(dpu_kms->hw_intr);
-   trace_dpu_core_irq_register_callback(irq_idx, register_irq_cb);
-   list_del_init(_irq_cb->list);
-   list_add_tail(_irq_cb->list,
-   _kms->irq_obj.irq_cb_tbl[irq_idx]);
-   if (list_is_first(_irq_cb->list,
-   _kms->irq_obj.irq_cb_tbl[irq_idx])) {
-   int ret = dpu_kms->hw_intr->ops.enable_irq_locked(
-   dpu_kms->hw_intr,
-   irq_idx);
-   if (ret)
-   DPU_ERROR("Fail to enable IRQ for irq_idx:%d\n",
-   irq_idx);
-   }
-   dpu_kms->hw_intr->ops.unlock(dpu_kms->hw_intr, irq_flags);
-
-   return 0;
-}
-
-int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx,
-   struct dpu_irq_callback *register_irq_cb)
-{
-   unsigned long irq_flags;
-
-   if (!dpu_kms->irq_obj.irq_cb_tbl) {
-   DPU_ERROR("invalid params\n");
-   return -EINVAL;
-   }
-
-   if (!register_irq_cb || !register_irq_cb->func) {
-   DPU_ERROR("invalid irq_cb:%d

[PARCH v2 0/7] drm/msm/dpu: merge dpu_core_irq into dpu_hw_interrupts

2021-06-17 Thread Dmitry Baryshkov

This patch series reworks DPU's irq handling code by merging
dpu_core_irq into dpu_hw_intr, reworking/dropping irq-related helpers
and wrappers, etc.

Changes since v1:
 - Rework callbacks registration code to allow just single callback per
   interrupt. This removes need to do any memory allocation in reg/unreg
   code and simplifies handling of interrupts.

The following changes since commit 7e0230fd096c03e9662e66150f951075dd16e496:

  drm/msm/mdp5: provide dynamic bandwidth management (2021-06-17 09:51:44 -0700)

are available in the Git repository at:

  https://git.linaro.org/people/dmitry.baryshkov/kernel.git dpu-irq-simplify-5

for you to fetch changes up to b2ae835c61b2065037c55b4596e16053484f4904:

  drm/msm/dpu: remove struct dpu_encoder_irq and enum dpu_intr_idx (2021-06-18 
01:12:04 +0300)


Dmitry Baryshkov (7):
  drm/msm/dpu: squash dpu_core_irq into dpu_hw_interrupts
  drm/msm/dpu: don't clear IRQ register twice
  drm/msm/dpu: merge struct dpu_irq into struct dpu_hw_intr
  drm/msm/dpu: allow just single IRQ callback
  drm/msm/dpu: remove extra wrappers around dpu_core_irq
  drm/msm/dpu: get rid of dpu_encoder_helper_(un)register_irq
  drm/msm/dpu: remove struct dpu_encoder_irq and enum dpu_intr_idx

 drivers/gpu/drm/msm/Makefile   |   1 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c   | 256 
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h   |  30 ++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c| 111 ++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h   |  66 +-
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c   |  99 
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c   |  56 ++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c  | 264 +++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.h  |  96 +---
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|  27 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h|  25 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h  |  51 ++--
 12 files changed, 334 insertions(+), 748 deletions(-)
 delete mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-17 Thread Rob Herring

On Thu, Jun 17, 2021 at 10:06 AM Suman Anna  wrote:
>
> Hi Rob,
>
> On 6/15/21 2:15 PM, Rob Herring wrote:
> > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> > same size as the list is redundant and can be dropped. Note that is DT
> > schema specific behavior and not standard json-schema behavior. The tooling
> > will fixup the final schema adding any unspecified minItems/maxItems.
> >
> > This condition is partially checked with the meta-schema already, but
> > only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> > An improved meta-schema is pending.
> >
> > Cc: Jens Axboe 
> > Cc: Stephen Boyd 
> > Cc: Herbert Xu 
> > Cc: "David S. Miller" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Vinod Koul 
> > Cc: Bartosz Golaszewski 
> > Cc: Kamal Dasu 
> > Cc: Jonathan Cameron 
> > Cc: Lars-Peter Clausen 
> > Cc: Thomas Gleixner 
> > Cc: Marc Zyngier 
> > Cc: Joerg Roedel 
> > Cc: Jassi Brar 
> > Cc: Mauro Carvalho Chehab 
> > Cc: Krzysztof Kozlowski 
> > Cc: Ulf Hansson 
> > Cc: Jakub Kicinski 
> > Cc: Wolfgang Grandegger 
> > Cc: Marc Kleine-Budde 
> > Cc: Andrew Lunn 
> > Cc: Vivien Didelot 
> > Cc: Vladimir Oltean 
> > Cc: Bjorn Helgaas 
> > Cc: Kishon Vijay Abraham I 
> > Cc: Linus Walleij 
> > Cc: "Uwe Kleine-König" 
> > Cc: Lee Jones 
> > Cc: Ohad Ben-Cohen 
> > Cc: Mathieu Poirier 
> > Cc: Philipp Zabel 
> > Cc: Paul Walmsley 
> > Cc: Palmer Dabbelt 
> > Cc: Albert Ou 
> > Cc: Alessandro Zummo 
> > Cc: Alexandre Belloni 
> > Cc: Greg Kroah-Hartman 
> > Cc: Mark Brown 
> > Cc: Zhang Rui 
> > Cc: Daniel Lezcano 
> > Cc: Wim Van Sebroeck 
> > Cc: Guenter Roeck 
> > Signed-off-by: Rob Herring 
> > ---
> >  .../devicetree/bindings/ata/nvidia,tegra-ahci.yaml  | 1 -
> >  .../devicetree/bindings/clock/allwinner,sun4i-a10-ccu.yaml  | 2 --
> >  .../devicetree/bindings/clock/qcom,gcc-apq8064.yaml | 1 -
> >  Documentation/devicetree/bindings/clock/qcom,gcc-sdx55.yaml | 2 --
> >  .../devicetree/bindings/clock/qcom,gcc-sm8350.yaml  | 2 --
> >  .../devicetree/bindings/clock/sprd,sc9863a-clk.yaml | 1 -
> >  .../devicetree/bindings/crypto/allwinner,sun8i-ce.yaml  | 2 --
> >  Documentation/devicetree/bindings/crypto/fsl-dcp.yaml   | 1 -
> >  .../display/allwinner,sun4i-a10-display-backend.yaml| 6 --
> >  .../bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml  | 1 -
> >  .../bindings/display/allwinner,sun8i-a83t-dw-hdmi.yaml  | 4 
> >  .../bindings/display/allwinner,sun8i-a83t-hdmi-phy.yaml | 2 --
> >  .../bindings/display/allwinner,sun8i-r40-tcon-top.yaml  | 2 --
> >  .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml   | 2 --
> >  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 2 --
> >  Documentation/devicetree/bindings/display/st,stm32-dsi.yaml | 2 --
> >  .../devicetree/bindings/display/st,stm32-ltdc.yaml  | 1 -
> >  .../devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml | 4 
> >  .../devicetree/bindings/dma/renesas,rcar-dmac.yaml  | 1 -
> >  .../devicetree/bindings/edac/amazon,al-mc-edac.yaml | 2 --
> >  Documentation/devicetree/bindings/eeprom/at24.yaml  | 1 -
> >  Documentation/devicetree/bindings/example-schema.yaml   | 2 --
> >  Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 -
> >  Documentation/devicetree/bindings/gpu/vivante,gc.yaml   | 1 -
> >  Documentation/devicetree/bindings/i2c/brcm,brcmstb-i2c.yaml | 1 -
> >  .../devicetree/bindings/i2c/marvell,mv64xxx-i2c.yaml| 2 --
> >  .../devicetree/bindings/i2c/mellanox,i2c-mlxbf.yaml | 1 -
> >  .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml   | 1 -
> >  .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 --
> >  .../bindings/interrupt-controller/fsl,irqsteer.yaml | 1 -
> >  .../bindings/interrupt-controller/loongson,liointc.yaml | 1 -
> >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml| 1 -
> >  .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml   | 1 -
> >  .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml  | 2 --
> >  .../devicetree/bindings/media/amlogic,gx-vdec.yaml  | 1 -
> >  Documentation/devicetree/bindings/media/i2c/adv7604.yaml| 1 -
> >  .../devicetree/bindings/media/marvell,mmp2-ccic.yaml| 1 -
> >  .../devicetree/bindings/media/qcom,sc7180-venus.yaml| 1 -
> >  .../devicetree/bindings/media/qcom,sdm845-venus-v2.yaml | 1 -
> >  .../devicetree/bindings/media/qcom,sm8250-venus.yaml| 1 -
> >  Documentation/devicetree/bindings/media/renesas,drif.yaml   | 1 -
> >  .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 ++
> >  .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 1 -
> >  .../devicetree/bindings/mmc/allwinner,sun4i-a10-mmc.yaml| 2 --
> >  Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml| 1 -
> >  Documentation/devicetree/bindings/mmc/mtk-sd.yaml   | 2 --
> >

[PATCH] drm/i915/display: Do not zero past infoframes.vsc

2021-06-17 Thread Kees Cook

intel_dp_vsc_sdp_unpack() was using a memset() size (36, struct dp_sdp)
larger than the destination (24, struct drm_dp_vsc_sdp), clobbering
fields in struct intel_crtc_state after infoframes.vsc. Use the actual
target size for the memset().

Fixes: 1b404b7dbb10 ("drm/i915/dp: Read out DP SDPs")
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 5c983044..6cc03b9e4321 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -2868,7 +2868,7 @@ static int intel_dp_vsc_sdp_unpack(struct drm_dp_vsc_sdp 
*vsc,
if (size < sizeof(struct dp_sdp))
return -EINVAL;
 
-   memset(vsc, 0, size);
+   memset(vsc, 0, sizeof(*vsc));
 
if (sdp->sdp_header.HB0 != 0)
return -EINVAL;
-- 
2.25.1

Re: [PATCH 4/7] drm/msm/dpu: hide struct dpu_irq_callback

2021-06-17 Thread Dmitry Baryshkov

Hello,

On Thu, 17 Jun 2021 at 22:01, Bjorn Andersson
 wrote:
>
> On Thu 17 Jun 09:09 CDT 2021, Dmitry Baryshkov wrote:
>
> > The struct dpu_irq_callbacks looks internal to IRQ handling code. Hide
> > it from the rest of the DPU driver.
> >
> > Signed-off-by: Dmitry Baryshkov 
> > ---
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h  | 18 +++---
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   |  6 +-
> >  .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  2 +-
> >  .../drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c  | 10 ++-
> >  .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  |  6 +-
> >  .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 62 ++-
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 12 
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h |  8 +--
> >  8 files changed, 69 insertions(+), 55 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
> > index 90ae6c9ccc95..44ab97fb2964 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
> > @@ -46,10 +46,8 @@ u32 dpu_core_irq_read(
> >   * interrupt
> >   * @dpu_kms: DPU handle
> >   * @irq_idx: irq index
> > - * @irq_cb:  IRQ callback structure, containing callback function
> > - *   and argument. Passing NULL for irq_cb will unregister
> > - *   the callback for the given irq_idx
> > - *   This must exist until un-registration.
> > + * @irq_cb:  IRQ callback funcion.
> > + * @irq_arg: IRQ callback argument.
> >   * @return:  0 for success registering callback, otherwise failure
> >   *
> >   * This function supports registration of multiple callbacks for each 
> > interrupt.
> > @@ -57,17 +55,16 @@ u32 dpu_core_irq_read(
> >  int dpu_core_irq_register_callback(
> >   struct dpu_kms *dpu_kms,
> >   int irq_idx,
> > - struct dpu_irq_callback *irq_cb);
> > + void (*irq_cb)(void *arg, int irq_idx),
> > + void *irq_arg);
> >
> >  /**
> >   * dpu_core_irq_unregister_callback - For unregistering callback function 
> > on IRQ
> >   * interrupt
> >   * @dpu_kms: DPU handle
> >   * @irq_idx: irq index
> > - * @irq_cb:  IRQ callback structure, containing callback function
> > - *   and argument. Passing NULL for irq_cb will unregister
> > - *   the callback for the given irq_idx
> > - *   This must match with registration.
> > + * @irq_cb:  IRQ callback funcion.
> > + * @irq_arg: IRQ callback argument.
> >   * @return:  0 for success registering callback, otherwise failure
> >   *
> >   * This function supports registration of multiple callbacks for each 
> > interrupt.
> > @@ -75,7 +72,8 @@ int dpu_core_irq_register_callback(
> >  int dpu_core_irq_unregister_callback(
> >   struct dpu_kms *dpu_kms,
> >   int irq_idx,
> > - struct dpu_irq_callback *irq_cb);
> > + void (*irq_cb)(void *arg, int irq_idx),
> > + void *irq_arg);
> >
> >  /**
> >   * dpu_debugfs_core_irq_init - register core irq debugfs
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > index 7f06238a7c64..186b2f87d193 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > @@ -310,7 +310,7 @@ int dpu_encoder_helper_wait_for_irq(struct 
> > dpu_encoder_phys *phys_enc,
> > phys_enc->hw_pp->idx - PINGPONG_0,
> > atomic_read(wait_info->atomic_cnt));
> >   local_irq_save(flags);
> > - irq->cb.func(phys_enc, irq->irq_idx);
> > + irq->func(phys_enc, irq->irq_idx);
> >   local_irq_restore(flags);
> >   ret = 0;
> >   } else {
> > @@ -352,7 +352,7 @@ int dpu_encoder_helper_register_irq(struct 
> > dpu_encoder_phys *phys_enc,
> >   }
> >
> >   ret = dpu_core_irq_register_callback(phys_enc->dpu_kms, irq->irq_idx,
> > - >cb);
> > + irq->func, phys_enc);
> >   if (ret) {
> >   DPU_ERROR_PHYS(phys_enc,
> >   "failed to register IRQ callback for %s\n",
> > @@ -384,7 +384,7 @@ int dpu_encoder_helper_unregister_irq(struct 
> > dpu_encoder_phys *phys_enc,
> >   }
> >
> >   ret = dpu_core_irq_unregister_callback(phys_enc->dpu_kms, 
> > irq->irq_idx,
> > - >cb);
> > + irq->func, phys_enc);
> >   if (ret) {
> >   DRM_ERROR("unreg cb fail id=%u, intr=%d, irq=%d ret=%d",
> > DRMID(phys_enc->parent), intr_idx,
> > diff --git

Re: [PATCH v4 1/3] dt-bindings: msm: dsi: add missing 7nm bindings

2021-06-17 Thread Rob Herring

On Thu, 17 Jun 2021 10:43:33 -0400, Jonathan Marek wrote:
> These got lost when going from .txt to .yaml bindings, add them back.
> 
> Signed-off-by: Jonathan Marek 
> ---
>  .../bindings/display/msm/dsi-phy-7nm.yaml | 66 +++
>  1 file changed, 66 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
Unknown file referenced: [Errno 2] No such file or directory: 
'/usr/local/lib/python3.8/dist-packages/dtschema/schemas/display/msm/dsi-phy-common.yaml'
xargs: dt-doc-validate: exited with status 255; aborting
make[1]: *** Deleting file 
'Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.example.dt.yaml'
Unknown file referenced: [Errno 2] No such file or directory: 
'/usr/local/lib/python3.8/dist-packages/dtschema/schemas/display/msm/dsi-phy-common.yaml'
make[1]: *** [scripts/Makefile.lib:380: 
Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.example.dt.yaml] 
Error 255
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1416: dt_binding_check] Error 2
\ndoc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1493583

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

Re: [PATCH 2/2] drm/amdgpu: rework dma_resv handling v3

2021-06-17 Thread Alex Deucher

On Mon, Jun 14, 2021 at 1:45 PM Christian König
 wrote:
>
> Drop the workaround and instead implement a better solution.
>
> Basically we are now chaining all submissions using a dma_fence_chain
> container and adding them as exclusive fence to the dma_resv object.
>
> This way other drivers can still sync to the single exclusive fence
> while amdgpu only sync to fences from different processes.
>
> v3: add the shared fence first before the exclusive one
>
> Signed-off-by: Christian König 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 62 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>  6 files changed, 55 insertions(+), 79 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> index a130e766cbdb..c905a4cfc173 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>  struct amdgpu_bo_list_entry {
> struct ttm_validate_buffer  tv;
> struct amdgpu_bo_va *bo_va;
> +   struct dma_fence_chain  *chain;
> uint32_tpriority;
> struct page **user_pages;
> booluser_invalidated;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9ce649a1a8d3..25655414e9c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -572,6 +572,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> goto out;
> }
>
> +   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +
> +   e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +
> +   if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> +   e->chain = dma_fence_chain_alloc();
> +   if (!e->chain) {
> +   r = -ENOMEM;
> +   goto error_validate;
> +   }
> +   }
> +   }
> +
> amdgpu_cs_get_threshold_for_moves(p->adev, >bytes_moved_threshold,
>   >bytes_moved_vis_threshold);
> p->bytes_moved = 0;
> @@ -599,15 +613,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> gws = p->bo_list->gws_obj;
> oa = p->bo_list->oa_obj;
>
> -   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> -   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> -   /* Make sure we use the exclusive slot for shared BOs */
> -   if (bo->prime_shared_count)
> -   e->tv.num_shared = 0;
> -   e->bo_va = amdgpu_vm_bo_find(vm, bo);
> -   }
> -
> if (gds) {
> p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
> p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> @@ -629,8 +634,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> }
>
>  error_validate:
> -   if (r)
> +   if (r) {
> +   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +   dma_fence_chain_free(e->chain);
> +   e->chain = NULL;
> +   }
> ttm_eu_backoff_reservation(>ticket, >validated);
> +   }
>  out:
> return r;
>  }
> @@ -670,9 +680,17 @@ static void amdgpu_cs_parser_fini(struct 
> amdgpu_cs_parser *parser, int error,
>  {
> unsigned i;
>
> -   if (error && backoff)
> +   if (error && backoff) {
> +   struct amdgpu_bo_list_entry *e;
> +
> +   amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> +   dma_fence_chain_free(e->chain);
> +   e->chain = NULL;
> +   }
> +
> ttm_eu_backoff_reservation(>ticket,
>>validated);
> +   }
>
> for (i = 0; i < parser->num_post_deps; i++) {
> drm_syncobj_put(parser->post_deps[i].syncobj);
> @@ -1245,6 +1263,28 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>
> amdgpu_vm_move_to_lru_tail(p->adev, >vm);
>
> +   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +   struct dma_resv *resv = e->tv.bo->base.resv;
> +   struct dma_fence_chain *chain = e->chain;
> +
> +   if (!chain)
> +   continue;
> +
> +   /*
> +* Work around dma_resv

Re: [PATCH v2] drm/i915: Document the Virtual Engine uAPI

2021-06-17 Thread Tvrtko Ursulin




On 17/06/2021 18:17, Daniel Vetter wrote:

On Mon, Jun 14, 2021 at 10:09:59AM +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

A little bit of documentation covering the topics of engine discovery,
context engine maps and virtual engines. It is not very detailed but
supposed to be a starting point of giving a brief high level overview of
general principles and intended use cases.

v2:
  * Have the text in uapi header and link from there.

Signed-off-by: Tvrtko Ursulin 
Cc: Daniel Vetter 


What I meant was the kerneldoc directly as kerneldoc for the uapi structs,
like Matt has done for e.g. drm_i915_gem_create_ext_memory_regions.


Hm I wanted to add some commentary to give a high level picture of this 
area and not necessarily focus on uapi structs details. Some of them (at 
least one I think) already have their own documentation and the rest 
could be added in detail. But I do think a short "story" in the order of 
chapters I added to i915.rst makes sense as reading material.



But then I also realized that Matt hasn't set up the include for this, so
it's not automatic at all yet :-/


No idea what where how you mean. The fact i915_drm.h docs are not pulled 
in anywhere?


Regards,

Tvrtko


-Daniel


---
  Documentation/gpu/i915.rst  |  18 
  include/uapi/drm/i915_drm.h | 188 
  2 files changed, 206 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 42ce0196930a..00aa55bbe0fd 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -335,6 +335,24 @@ for execution also include a list of all locations within 
buffers that
  refer to GPU-addresses so that the kernel can edit the buffer correctly.
  This process is dubbed relocation.
  
+Engine Discovery uAPI

+-
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+   :doc: Engine Discovery uAPI
+
+Context Engine Map uAPI
+---
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+   :doc: Context Engine Map uAPI
+
+Virtual Engine uAPI
+---
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+   :doc: Virtual Engine uAPI
+
  Locking Guidelines
  --
  
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h

index a1cb4aa035a9..2f70c48567c0 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1806,6 +1806,69 @@ struct drm_i915_gem_context_param_sseu {
__u32 rsvd;
  };
  
+/**

+ * DOC: Virtual Engine uAPI
+ *
+ * Virtual engine is a concept where userspace is able to configure a set of
+ * physical engines, submit a batch buffer, and let the driver execute it on 
any
+ * engine from the set as it sees fit.
+ *
+ * This is primarily useful on parts which have multiple instances of a same
+ * class engine, like for example GT3+ Skylake parts with their two VCS 
engines.
+ *
+ * For instance userspace can enumerate all engines of a certain class using 
the
+ * previously described `Engine Discovery uAPI`_. After that userspace can
+ * create a GEM context with a placeholder slot for the virtual engine (using
+ * `I915_ENGINE_CLASS_INVALID` and `I915_ENGINE_CLASS_INVALID_NONE` for class
+ * and instance respectively) and finally using the
+ * `I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE` extension place a virtual engine in
+ * the same reserved slot.
+ *
+ * Example of creating a virtual engine and submitting a batch buffer to it:
+ *
+ * .. code-block:: C
+ *
+ * I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(virtual, 2) = {
+ * .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+ * .engine_index = 0, // Place this virtual engine into engine map 
slot 0
+ * .num_siblings = 2,
+ * .engines = { { I915_ENGINE_CLASS_VIDEO, 0 },
+ *  { I915_ENGINE_CLASS_VIDEO, 1 }, },
+ * };
+ * I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 1) = {
+ * .engines = { { I915_ENGINE_CLASS_INVALID,
+ *I915_ENGINE_CLASS_INVALID_NONE } },
+ * .extensions = to_user_pointer(), // Chains after 
load_balance extension
+ * };
+ * struct drm_i915_gem_context_create_ext_setparam p_engines = {
+ * .base = {
+ * .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
+ * },
+ * .param = {
+ * .param = I915_CONTEXT_PARAM_ENGINES,
+ * .value = to_user_pointer(),
+ * .size = sizeof(engines),
+ * },
+ * };
+ * struct drm_i915_gem_context_create_ext create = {
+ * .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
+ * .extensions = to_user_pointer(_engines);
+ * };
+ *
+ * ctx_id = gem_context_create_ext(drm_fd, );
+ *
+ * // Now we have created a GEM context with its engine map containing a
+ * // single virtual engine. Submissions to this slot can go either to
+ * // vcs0 or vcs1, depending on the load

Re: [Mesa-dev] [PATCH 0/6] dma-buf: Add an API for exporting sync files (v12)

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 09:37:36AM +0200, Christian König wrote:
> Am 16.06.21 um 20:30 schrieb Jason Ekstrand:
> > On Tue, Jun 15, 2021 at 3:41 AM Christian König
> >  wrote:
> > > Hi Jason & Daniel,
> > > 
> > > maybe I should explain once more where the problem with this approach is
> > > and why I think we need to get that fixed before we can do something
> > > like this here.
> > > 
> > > To summarize what this patch here does is that it copies the exclusive
> > > fence and/or the shared fences into a sync_file. This alone is totally
> > > unproblematic.
> > > 
> > > The problem is what this implies. When you need to copy the exclusive
> > > fence to a sync_file then this means that the driver is at some point
> > > ignoring the exclusive fence on a buffer object.
> > Not necessarily.  Part of the point of this is to allow for CPU waits
> > on a past point in buffers timeline.  Today, we have poll() and
> > GEM_WAIT both of which wait for the buffer to be idle from whatever
> > GPU work is currently happening.  We want to wait on something in the
> > past and ignore anything happening now.
> 
> Good point, yes that is indeed a valid use case.
> 
> > But, to the broader point, maybe?  I'm a little fuzzy on exactly where
> > i915 inserts and/or depends on fences.
> > 
> > > When you combine that with complex drivers which use TTM and buffer
> > > moves underneath you can construct an information leak using this and
> > > give userspace access to memory which is allocated to the driver, but
> > > not yet initialized.
> > > 
> > > This way you can leak things like page tables, passwords, kernel data
> > > etc... in large amounts to userspace and is an absolutely no-go for
> > > security.
> > Ugh...  Unfortunately, I'm really out of my depth on the implications
> > going on here but I think I see your point.
> > 
> > > That's why I'm said we need to get this fixed before we upstream this
> > > patch set here and especially the driver change which is using that.
> > Well, i915 has had uAPI for a while to ignore fences.
> 
> Yeah, exactly that's illegal.

You're a few years too late with closing that barn door. The following
drives have this concept
- i915
- msm
- etnaviv

Because you can't write a competent vulkan driver without this. This was
discussed at absolute epic length in various xdcs iirc. We did ignore a
bit the vram/ttm/bo-moving problem because all the people present were
hacking on integrated gpu (see list above), but that just means we need to
treat the ttm_bo->moving fence properly.

> At least the kernel internal fences like moving or clearing a buffer object
> needs to be taken into account before a driver is allowed to access a
> buffer.

Yes i915 needs to make sure it never ignores ttm_bo->moving.

For dma-buf this isn't actually a problem, because dma-buf are pinned. You
can't move them while other drivers are using them, hence there's not
actually a ttm_bo->moving fence we can ignore.

p2p dma-buf aka dynamic dma-buf is a different beast, and i915 (and fwiw
these other drivers) need to change before they can do dynamic dma-buf.

> Otherwise we have an information leak worth a CVE and that is certainly not
> something we want.

Because yes otherwise we get a CVE. But right now I don't think we have
one.

We do have a quite big confusion on what exactly the signaling ordering is
supposed to be between exclusive and the collective set of shared fences,
and there's some unifying that needs to happen here. But I think what
Jason implements here in the import ioctl is the most defensive version
possible, so really can't break any driver. It really works like you have
an ad-hoc gpu engine that does nothing itself, but waits for the current
exclusive fence and then sets the exclusive fence with its "CS" completion
fence.

That's imo perfectly legit use-case.

Same for the export one. Waiting for a previous snapshot of implicit
fences is imo perfectly ok use-case and useful for compositors - client
might soon start more rendering, and on some drivers that always results
in the exclusive slot being set, so if you dont take a snapshot you
oversync real bad for your atomic flip.

> > Those changes are years in the past.  If we have a real problem here (not 
> > sure on
> > that yet), then we'll have to figure out how to fix it without nuking
> > uAPI.
> 
> Well, that was the basic idea of attaching flags to the fences in the
> dma_resv object.
> 
> In other words you clearly denote when you have to wait for a fence before
> accessing a buffer or you cause a security issue.

Replied somewhere else, and I do kinda like the flag idea. But the problem
is we first need a ton more encapsulation and review of drivers before we
can change the internals. One thing at a time.

And yes for amdgpu this gets triple-hard because you both have the
ttm_bo->moving fence _and_ the current uapi of using fence ownership _and_
you need to figure out how to support vulkan properly with true opt-in
fencing. I'm pretty

Re: [PATCH] dma-buf: Document DMA_BUF_IOCTL_SYNC (v3)

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 02:42:58PM -0500, Jason Ekstrand wrote:
> This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
> documentation for DMA_BUF_IOCTL_SYNC.
> 
> v2 (Daniel Vetter):
>  - Fix a couple typos
>  - Add commentary about synchronization with other devices
>  - Use item list format for describing flags
> 
> v3 (Pekka Paalanen):
>  - Clarify stalling requirements.
>  - Be more clear that that DMA_BUF_IOCTL_SYNC with SINC_END has to be
>called before more GPU work happens.
> 
> Signed-off-by: Jason Ekstrand 
> Reviewed-by: Daniel Vetter 
> Acked-by: Christian König 
> Acked-by: Pekka Paalanen 
> Cc: Sumit Semwal 

Merged to drm-misc-next, thanks.
-Daniel

> ---
>  Documentation/driver-api/dma-buf.rst |  8 +
>  include/uapi/linux/dma-buf.h | 50 +++-
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/driver-api/dma-buf.rst 
> b/Documentation/driver-api/dma-buf.rst
> index 7f21425d9435a..0d4c13ec1a800 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -88,6 +88,9 @@ consider though:
>  - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ 
> below for
>details.
>  
> +- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
> +  `DMA Buffer ioctls`_ below for details.
> +
>  Basic Operation and Device DMA Access
>  ~
>  
> @@ -106,6 +109,11 @@ Implicit Fence Poll Support
>  .. kernel-doc:: drivers/dma-buf/dma-buf.c
> :doc: implicit fence polling
>  
> +DMA Buffer ioctls
> +~
> +
> +.. kernel-doc:: include/uapi/linux/dma-buf.h
> +
>  Kernel Functions and Structures Reference
>  ~
>  
> diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
> index 7f30393b92c3b..8e4a2ca0bcbf7 100644
> --- a/include/uapi/linux/dma-buf.h
> +++ b/include/uapi/linux/dma-buf.h
> @@ -22,8 +22,56 @@
>  
>  #include 
>  
> -/* begin/end dma-buf functions used for userspace mmap. */
> +/**
> + * struct dma_buf_sync - Synchronize with CPU access.
> + *
> + * When a DMA buffer is accessed from the CPU via mmap, it is not always
> + * possible to guarantee coherency between the CPU-visible map and underlying
> + * memory.  To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracket
> + * any CPU access to give the kernel the chance to shuffle memory around if
> + * needed.
> + *
> + * Prior to accessing the map, the client must call DMA_BUF_IOCTL_SYNC
> + * with DMA_BUF_SYNC_START and the appropriate read/write flags.  Once the
> + * access is complete, the client should call DMA_BUF_IOCTL_SYNC with
> + * DMA_BUF_SYNC_END and the same read/write flags.
> + *
> + * The synchronization provided via DMA_BUF_IOCTL_SYNC only provides cache
> + * coherency.  It does not prevent other processes or devices from
> + * accessing the memory at the same time.  If synchronization with a GPU or
> + * other device driver is required, it is the client's responsibility to
> + * wait for buffer to be ready for reading or writing before calling this
> + * ioctl with DMA_BUF_SYNC_START.  Likewise, the client must ensure that
> + * follow-up work is not submitted to GPU or other device driver until
> + * after this ioctl has been called with DMA_BUF_SYNC_END?
> + *
> + * If the driver or API with which the client is interacting uses implicit
> + * synchronization, waiting for prior work to complete can be done via
> + * poll() on the DMA buffer file descriptor.  If the driver or API requires
> + * explicit synchronization, the client may have to wait on a sync_file or
> + * other synchronization primitive outside the scope of the DMA buffer API.
> + */
>  struct dma_buf_sync {
> + /**
> +  * @flags: Set of access flags
> +  *
> +  * DMA_BUF_SYNC_START:
> +  * Indicates the start of a map access session.
> +  *
> +  * DMA_BUF_SYNC_END:
> +  * Indicates the end of a map access session.
> +  *
> +  * DMA_BUF_SYNC_READ:
> +  * Indicates that the mapped DMA buffer will be read by the
> +  * client via the CPU map.
> +  *
> +  * DMA_BUF_SYNC_WRITE:
> +  * Indicates that the mapped DMA buffer will be written by the
> +  * client via the CPU map.
> +  *
> +  * DMA_BUF_SYNC_RW:
> +  * An alias for DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE.
> +  */
>   __u64 flags;
>  };
>  
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

[PATCH] dma-buf: Document DMA_BUF_IOCTL_SYNC (v3)

2021-06-17 Thread Jason Ekstrand

This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
documentation for DMA_BUF_IOCTL_SYNC.

v2 (Daniel Vetter):
 - Fix a couple typos
 - Add commentary about synchronization with other devices
 - Use item list format for describing flags

v3 (Pekka Paalanen):
 - Clarify stalling requirements.
 - Be more clear that that DMA_BUF_IOCTL_SYNC with SINC_END has to be
   called before more GPU work happens.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Acked-by: Christian König 
Acked-by: Pekka Paalanen 
Cc: Sumit Semwal 
---
 Documentation/driver-api/dma-buf.rst |  8 +
 include/uapi/linux/dma-buf.h | 50 +++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 7f21425d9435a..0d4c13ec1a800 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -88,6 +88,9 @@ consider though:
 - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below 
for
   details.
 
+- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
+  `DMA Buffer ioctls`_ below for details.
+
 Basic Operation and Device DMA Access
 ~
 
@@ -106,6 +109,11 @@ Implicit Fence Poll Support
 .. kernel-doc:: drivers/dma-buf/dma-buf.c
:doc: implicit fence polling
 
+DMA Buffer ioctls
+~
+
+.. kernel-doc:: include/uapi/linux/dma-buf.h
+
 Kernel Functions and Structures Reference
 ~
 
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
index 7f30393b92c3b..8e4a2ca0bcbf7 100644
--- a/include/uapi/linux/dma-buf.h
+++ b/include/uapi/linux/dma-buf.h
@@ -22,8 +22,56 @@
 
 #include 
 
-/* begin/end dma-buf functions used for userspace mmap. */
+/**
+ * struct dma_buf_sync - Synchronize with CPU access.
+ *
+ * When a DMA buffer is accessed from the CPU via mmap, it is not always
+ * possible to guarantee coherency between the CPU-visible map and underlying
+ * memory.  To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracket
+ * any CPU access to give the kernel the chance to shuffle memory around if
+ * needed.
+ *
+ * Prior to accessing the map, the client must call DMA_BUF_IOCTL_SYNC
+ * with DMA_BUF_SYNC_START and the appropriate read/write flags.  Once the
+ * access is complete, the client should call DMA_BUF_IOCTL_SYNC with
+ * DMA_BUF_SYNC_END and the same read/write flags.
+ *
+ * The synchronization provided via DMA_BUF_IOCTL_SYNC only provides cache
+ * coherency.  It does not prevent other processes or devices from
+ * accessing the memory at the same time.  If synchronization with a GPU or
+ * other device driver is required, it is the client's responsibility to
+ * wait for buffer to be ready for reading or writing before calling this
+ * ioctl with DMA_BUF_SYNC_START.  Likewise, the client must ensure that
+ * follow-up work is not submitted to GPU or other device driver until
+ * after this ioctl has been called with DMA_BUF_SYNC_END?
+ *
+ * If the driver or API with which the client is interacting uses implicit
+ * synchronization, waiting for prior work to complete can be done via
+ * poll() on the DMA buffer file descriptor.  If the driver or API requires
+ * explicit synchronization, the client may have to wait on a sync_file or
+ * other synchronization primitive outside the scope of the DMA buffer API.
+ */
 struct dma_buf_sync {
+   /**
+* @flags: Set of access flags
+*
+* DMA_BUF_SYNC_START:
+* Indicates the start of a map access session.
+*
+* DMA_BUF_SYNC_END:
+* Indicates the end of a map access session.
+*
+* DMA_BUF_SYNC_READ:
+* Indicates that the mapped DMA buffer will be read by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_WRITE:
+* Indicates that the mapped DMA buffer will be written by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_RW:
+* An alias for DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE.
+*/
__u64 flags;
 };
 
-- 
2.31.1

[PATCH] dma-buf: Document DMA_BUF_IOCTL_SYNC (v2)

2021-06-17 Thread Jason Ekstrand

This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
documentation for DMA_BUF_IOCTL_SYNC.

v2 (Daniel Vetter):
 - Fix a couple typos
 - Add commentary about synchronization with other devices
 - Use item list format for describing flags

v2 (Pekka Paalanen):
 - Clarify stalling requirements.
 - Be more clear that that DMA_BUF_IOCTL_SYNC with SINC_END has to be
   called before more GPU work happens.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Acked-by: Christian König 
Acked-by: Pekka Paalanen 
Cc: Sumit Semwal 
---
 Documentation/driver-api/dma-buf.rst |  8 +
 include/uapi/linux/dma-buf.h | 50 +++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 7f21425d9435a..0d4c13ec1a800 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -88,6 +88,9 @@ consider though:
 - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below 
for
   details.
 
+- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
+  `DMA Buffer ioctls`_ below for details.
+
 Basic Operation and Device DMA Access
 ~
 
@@ -106,6 +109,11 @@ Implicit Fence Poll Support
 .. kernel-doc:: drivers/dma-buf/dma-buf.c
:doc: implicit fence polling
 
+DMA Buffer ioctls
+~
+
+.. kernel-doc:: include/uapi/linux/dma-buf.h
+
 Kernel Functions and Structures Reference
 ~
 
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
index 7f30393b92c3b..8e4a2ca0bcbf7 100644
--- a/include/uapi/linux/dma-buf.h
+++ b/include/uapi/linux/dma-buf.h
@@ -22,8 +22,56 @@
 
 #include 
 
-/* begin/end dma-buf functions used for userspace mmap. */
+/**
+ * struct dma_buf_sync - Synchronize with CPU access.
+ *
+ * When a DMA buffer is accessed from the CPU via mmap, it is not always
+ * possible to guarantee coherency between the CPU-visible map and underlying
+ * memory.  To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracket
+ * any CPU access to give the kernel the chance to shuffle memory around if
+ * needed.
+ *
+ * Prior to accessing the map, the client must call DMA_BUF_IOCTL_SYNC
+ * with DMA_BUF_SYNC_START and the appropriate read/write flags.  Once the
+ * access is complete, the client should call DMA_BUF_IOCTL_SYNC with
+ * DMA_BUF_SYNC_END and the same read/write flags.
+ *
+ * The synchronization provided via DMA_BUF_IOCTL_SYNC only provides cache
+ * coherency.  It does not prevent other processes or devices from
+ * accessing the memory at the same time.  If synchronization with a GPU or
+ * other device driver is required, it is the client's responsibility to
+ * wait for buffer to be ready for reading or writing before calling this
+ * ioctl with DMA_BUF_SYNC_START.  Likewise, the client must ensure that
+ * follow-up work is not submitted to GPU or other device driver until
+ * after this ioctl has been called with DMA_BUF_SYNC_END?
+ *
+ * If the driver or API with which the client is interacting uses implicit
+ * synchronization, waiting for prior work to complete can be done via
+ * poll() on the DMA buffer file descriptor.  If the driver or API requires
+ * explicit synchronization, the client may have to wait on a sync_file or
+ * other synchronization primitive outside the scope of the DMA buffer API.
+ */
 struct dma_buf_sync {
+   /**
+* @flags: Set of access flags
+*
+* DMA_BUF_SYNC_START:
+* Indicates the start of a map access session.
+*
+* DMA_BUF_SYNC_END:
+* Indicates the end of a map access session.
+*
+* DMA_BUF_SYNC_READ:
+* Indicates that the mapped DMA buffer will be read by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_WRITE:
+* Indicates that the mapped DMA buffer will be written by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_RW:
+* An alias for DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE.
+*/
__u64 flags;
 };
 
-- 
2.31.1

Re: [PATCH 1/2] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 07:30:24PM +0200, Daniel Vetter wrote:
> On Thu, Jun 17, 2021 at 09:44:25AM +0200, Christian König wrote:
> > Alex do want to review those so that we can close the ticket?
> 
> Maybe I'm behind on mails, but 2nd patch still has the issues I think I'm
> seeing ...

Ok with temperatures getting colder towards the night the 2nd patch looks
much better now :-) I replied there.
-Daniel

> -Daniel
> 
> > 
> > Thanks,
> > Christian.
> > 
> > Am 14.06.21 um 19:45 schrieb Christian König:
> > > Unwrap the explicit fence if it is a dma_fence_chain and
> > > sync to the first fence not matching the owner rules.
> > > 
> > > Signed-off-by: Christian König 
> > > Acked-by: Daniel Vetter 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +--
> > >   1 file changed, 68 insertions(+), 50 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -28,6 +28,8 @@
> > >*Christian König 
> > >*/
> > > +#include 
> > > +
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_trace.h"
> > >   #include "amdgpu_amdkfd.h"
> > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, 
> > > struct dma_fence *fence)
> > >   return amdgpu_sync_fence(sync, fence);
> > >   }
> > > +/* Determine based on the owner and mode if we should sync to a fence or 
> > > not */
> > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > +enum amdgpu_sync_mode mode,
> > > +void *owner, struct dma_fence *f)
> > > +{
> > > + void *fence_owner = amdgpu_sync_get_owner(f);
> > > +
> > > + /* Always sync to moves, no matter what */
> > > + if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > + return true;
> > > +
> > > + /* We only want to trigger KFD eviction fences on
> > > +  * evict or move jobs. Skip KFD fences otherwise.
> > > +  */
> > > + if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > + return false;
> > > +
> > > + /* Never sync to VM updates either. */
> > > + if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > + return false;
> > > +
> > > + /* Ignore fences depending on the sync mode */
> > > + switch (mode) {
> > > + case AMDGPU_SYNC_ALWAYS:
> > > + return true;
> > > +
> > > + case AMDGPU_SYNC_NE_OWNER:
> > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > + fence_owner == owner)
> > > + return false;
> > > + break;
> > > +
> > > + case AMDGPU_SYNC_EQ_OWNER:
> > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > + fence_owner != owner)
> > > + return false;
> > > + break;
> > > +
> > > + case AMDGPU_SYNC_EXPLICIT:
> > > + return false;
> > > + }
> > > +
> > > + WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > +  "Adding eviction fence to sync obj");
> > > + return true;
> > > +}
> > > +
> > >   /**
> > >* amdgpu_sync_resv - sync to a reservation object
> > >*
> > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, 
> > > struct amdgpu_sync *sync,
> > >   /* always sync to the exclusive fence */
> > >   f = dma_resv_excl_fence(resv);
> > > - r = amdgpu_sync_fence(sync, f);
> > > + dma_fence_chain_for_each(f, f) {
> > > + struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > > +
> > > + if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > > +chain->fence : f)) {
> > > + r = amdgpu_sync_fence(sync, f);
> > > + dma_fence_put(f);
> > > + if (r)
> > > + return r;
> > > + break;
> > > + }
> > > + }
> > >   flist = dma_resv_shared_list(resv);
> > > - if (!flist || r)
> > > - return r;
> > > + if (!flist)
> > > + return 0;
> > >   for (i = 0; i < flist->shared_count; ++i) {
> > > - void *fence_owner;
> > > -
> > >   f = rcu_dereference_protected(flist->shared[i],
> > > dma_resv_held(resv));
> > > - fence_owner = amdgpu_sync_get_owner(f);
> > > -
> > > - /* Always sync to moves, no matter what */
> > > - if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > > + if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > >   r = amdgpu_sync_fence(sync, f);
> > >   if (r)
> > > - break;
> > > - }
> > > -
> > > - /* We only want to trigger KFD eviction fences on
> > > -  * evict or move jobs.

Re: [PATCH 2/2] drm/amdgpu: rework dma_resv handling v3

2021-06-17 Thread Daniel Vetter

On Mon, Jun 14, 2021 at 07:45:36PM +0200, Christian König wrote:
> Drop the workaround and instead implement a better solution.
> 
> Basically we are now chaining all submissions using a dma_fence_chain
> container and adding them as exclusive fence to the dma_resv object.
> 
> This way other drivers can still sync to the single exclusive fence
> while amdgpu only sync to fences from different processes.
> 
> v3: add the shared fence first before the exclusive one
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 62 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>  6 files changed, 55 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> index a130e766cbdb..c905a4cfc173 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>  struct amdgpu_bo_list_entry {
>   struct ttm_validate_buffer  tv;
>   struct amdgpu_bo_va *bo_va;
> + struct dma_fence_chain  *chain;
>   uint32_tpriority;
>   struct page **user_pages;
>   booluser_invalidated;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9ce649a1a8d3..25655414e9c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -572,6 +572,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   goto out;
>   }
>  
> + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +
> + e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +
> + if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> + e->chain = dma_fence_chain_alloc();
> + if (!e->chain) {
> + r = -ENOMEM;
> + goto error_validate;
> + }
> + }
> + }
> +
>   amdgpu_cs_get_threshold_for_moves(p->adev, >bytes_moved_threshold,
> >bytes_moved_vis_threshold);
>   p->bytes_moved = 0;
> @@ -599,15 +613,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   gws = p->bo_list->gws_obj;
>   oa = p->bo_list->oa_obj;
>  
> - amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> - struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> - /* Make sure we use the exclusive slot for shared BOs */
> - if (bo->prime_shared_count)
> - e->tv.num_shared = 0;
> - e->bo_va = amdgpu_vm_bo_find(vm, bo);
> - }
> -
>   if (gds) {
>   p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
>   p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> @@ -629,8 +634,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   }
>  
>  error_validate:
> - if (r)
> + if (r) {
> + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> + dma_fence_chain_free(e->chain);
> + e->chain = NULL;
> + }
>   ttm_eu_backoff_reservation(>ticket, >validated);
> + }
>  out:
>   return r;
>  }
> @@ -670,9 +680,17 @@ static void amdgpu_cs_parser_fini(struct 
> amdgpu_cs_parser *parser, int error,
>  {
>   unsigned i;
>  
> - if (error && backoff)
> + if (error && backoff) {
> + struct amdgpu_bo_list_entry *e;
> +
> + amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> + dma_fence_chain_free(e->chain);
> + e->chain = NULL;
> + }
> +
>   ttm_eu_backoff_reservation(>ticket,
>  >validated);
> + }
>  
>   for (i = 0; i < parser->num_post_deps; i++) {
>   drm_syncobj_put(parser->post_deps[i].syncobj);
> @@ -1245,6 +1263,28 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  
>   amdgpu_vm_move_to_lru_tail(p->adev, >vm);
>  
> + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> + struct dma_resv *resv = e->tv.bo->base.resv;
> + struct dma_fence_chain *chain = e->chain;
> +
> + if (!chain)
> + continue;
> +
> + /*
> +  * Work around dma_resv shortcommings by wrapping up the
> +  * submission in a dma_fence_chain and add it as exclusive
> +  * fence, but first add the

Re: [PATCH 1/7] dma-buf: add some more kerneldoc to dma_resv_add_shared_fence

2021-06-17 Thread Daniel Vetter

On Wed, Jun 16, 2021 at 10:26:49AM +0200, Christian König wrote:
> Explicitly document that code can't assume that shared fences
> signal after the exclusive fence.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index f26c71747d43..4ab02b6c387a 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -235,7 +235,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max);
>   * @fence: the shared fence to add
>   *
>   * Add a fence to a shared slot, obj->lock must be held, and
> - * dma_resv_reserve_shared() has been called.
> + * dma_resv_reserve_shared() has been called. The shared fences can signal in
> + * any order and there is especially no guarantee that shared fences signal
> + * after the exclusive one. Code relying on any signaling order is broken and
> + * needs to be fixed.

So I agree this are reasonable semantics, but you need to audit drivers
first. Because currently that's not how at least a bunch of them work.
There's way more drivers than the handful you've looked at.

Imo gold standard here is what I've tried doing for the "how do we set
fences" side, which is going through all of them. The trouble is that this
is a bit nastier, because a) drivers play much more tricks here and b)
understand each driver's scheduling logic is more work than how they set
fences for a request/cs.

Unfortunately I haven't gotten around to doing that yet, because it means
a few days of uninterrupted time crawling through way too much code. I
haven't even found time to respin my old series to make the fence setting
more consistent (since I find a few more issues there than just the amdgpu
one that sparked it all).
-Daniel

>   */
>  void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
>  {
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-17 Thread Marek Olšák

Timeline semaphore waits (polling on memory) will be unmonitored and as
fast as the roundtrip to memory. Semaphore writes will be slower because
the copy of those write requests will also be forwarded to the kernel.
Arbitrary writes are not protected by the hw but the kernel will take
action against such behavior because it will receive them too.

I don't know if that would work with dma_fence.

Marek


On Thu, Jun 17, 2021 at 3:04 PM Daniel Vetter  wrote:

> On Thu, Jun 17, 2021 at 02:28:06PM -0400, Marek Olšák wrote:
> > The kernel will know who should touch the implicit-sync semaphore next,
> and
> > at the same time, the copy of all write requests to the implicit-sync
> > semaphore will be forwarded to the kernel for monitoring and bo_wait.
> >
> > Syncobjs could either use the same monitored access as implicit sync or
> be
> > completely unmonitored. We haven't decided yet.
> >
> > Syncfiles could either use one of the above or wait for a syncobj to go
> > idle before converting to a syncfile.
>
> Hm this sounds all like you're planning to completely rewrap everything
> ... I'm assuming the plan is still that this is going to be largely
> wrapped in dma_fence? Maybe with timeline objects being a bit more
> optimized, but I'm not sure how much you can optimize without breaking the
> interfaces.
> -Daniel
>
> >
> > Marek
> >
> >
> >
> > On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter  wrote:
> >
> > > On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:
> > > > As long as we can figure out who touched to a certain sync object
> last
> > > that
> > > > would indeed work, yes.
> > >
> > > Don't you need to know who will touch it next, i.e. who is holding up
> your
> > > fence? Or maybe I'm just again totally confused.
> > > -Daniel
> > >
> > > >
> > > > Christian.
> > > >
> > > > Am 14.06.21 um 19:10 schrieb Marek Olšák:
> > > > > The call to the hw scheduler has a limitation on the size of all
> > > > > parameters combined. I think we can only pass a 32-bit sequence
> number
> > > > > and a ~16-bit global (per-GPU) syncobj handle in one call and not
> much
> > > > > else.
> > > > >
> > > > > The syncobj handle can be an element index in a global (per-GPU)
> > > syncobj
> > > > > table and it's read only for all processes with the exception of
> the
> > > > > signal command. Syncobjs can either have per VMID write access
> flags
> > > for
> > > > > the signal command (slow), or any process can write to any
> syncobjs and
> > > > > only rely on the kernel checking the write log (fast).
> > > > >
> > > > > In any case, we can execute the memory write in the queue engine
> and
> > > > > only use the hw scheduler for logging, which would be perfect.
> > > > >
> > > > > Marek
> > > > >
> > > > > On Thu, Jun 10, 2021 at 12:33 PM Christian König
> > > > >  > > > > > wrote:
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > maybe soften that a bit. Reading from the shared memory of the
> > > > > user fence is ok for everybody. What we need to take more care
> of
> > > > > is the writing side.
> > > > >
> > > > > So my current thinking is that we allow read only access, but
> > > > > writing a new sequence value needs to go through the
> > > scheduler/kernel.
> > > > >
> > > > > So when the CPU wants to signal a timeline fence it needs to
> call
> > > > > an IOCTL. When the GPU wants to signal the timeline fence it
> needs
> > > > > to hand that of to the hardware scheduler.
> > > > >
> > > > > If we lockup the kernel can check with the hardware who did the
> > > > > last write and what value was written.
> > > > >
> > > > > That together with an IOCTL to give out sequence number for
> > > > > implicit sync to applications should be sufficient for the
> kernel
> > > > > to track who is responsible if something bad happens.
> > > > >
> > > > > In other words when the hardware says that the shader wrote
> stuff
> > > > > like 0xdeadbeef 0x0 or 0x into memory we kill the
> process
> > > > > who did that.
> > > > >
> > > > > If the hardware says that seq - 1 was written fine, but seq is
> > > > > missing then the kernel blames whoever was supposed to write
> seq.
> > > > >
> > > > > Just pieping the write through a privileged instance should be
> > > > > fine to make sure that we don't run into issues.
> > > > >
> > > > > Christian.
> > > > >
> > > > > Am 10.06.21 um 17:59 schrieb Marek Olšák:
> > > > > > Hi Daniel,
> > > > > >
> > > > > > We just talked about this whole topic internally and we came
> up
> > > > > > to the conclusion that the hardware needs to understand sync
> > > > > > object handles and have high-level wait and signal
> operations in
> > > > > > the command stream. Sync objects will be backed by memory,
> but
> > > > > > they won't be readable or writable by processes directly. The
> > > > > > hardware will log all

Re: [PATCH][next] drm/amd/display: Fix fall-through warning for Clang

2021-06-17 Thread Harry Wentland




On 2021-06-16 4:52 p.m., Gustavo A. R. Silva wrote:
> In preparation to enable -Wimplicit-fallthrough for Clang, fix
> the following warning by replacing a /* fall through */ comment
> with the new pseudo-keyword macro fallthrough:
> 
> rivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:672:4: warning: 
> unannotated fall-through between switch labels [-Wimplicit-fallthrough]
> case AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER:
> ^
> 
> Notice that Clang doesn't recognize /* fall through */ comments as
> implicit fall-through markings, so in order to globally enable
> -Wimplicit-fallthrough for Clang, these comments need to be
> replaced with fallthrough; in the whole codebase.
> 
> Link: https://github.com/KSPP/linux/issues/115
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Harry Wentland 

Harry

> ---
> JFYI: We had thousands of these sorts of warnings and now we are down
>   to just 15 in linux-next. This is one of those last remaining
>   warnings.
> 
>  drivers/gpu/drm/amd/display/dc/dce/dce_aux.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c 
> b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
> index 28631714f697..2fb88e54a4bf 100644
> --- a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
> +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
> @@ -668,7 +668,7 @@ bool dce_aux_transfer_with_retries(struct ddc_service 
> *ddc,
>   /* polling_timeout_period is in us */
>   defer_time_in_ms += 
> aux110->polling_timeout_period / 1000;
>   ++aux_defer_retries;
> - /* fall through */
> + fallthrough;
>   case AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER:
>   retry_on_defer = true;
>   fallthrough;
>

Re: [PATCH] drm/auth: Move master pointer from drm_device to drm_file

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 05:47:33PM +0800, Qiang Ma wrote:
> The drm_file pointer clears to zero during multi-user switching,
> so it needs to call drm_new_set_master for master pointer from drm_file.

That sounds like a bug. drm_file->master should be always the same -
either you become a new stand-alone thing, our you get linked to the
current master.

Or I'm completely missing what you're trying to fix here.
-Daniel

> 
> Signed-off-by: Qiang Ma 
> ---
>  drivers/gpu/drm/drm_auth.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index f2d46b7ac6f9..02431af6d0c5 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -302,7 +302,7 @@ int drm_master_open(struct drm_file *file_priv)
>   /* if there is no current master make this fd it, but do not create
>* any master object for render clients */
>   mutex_lock(>master_mutex);
> - if (!dev->master)
> + if (!file_priv->master)
>   ret = drm_new_set_master(dev, file_priv);
>   else
>   file_priv->master = drm_master_get(dev->master);
> -- 
> 2.20.1
> 
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount

2021-06-17 Thread Ralph Campbell




On 6/17/21 8:16 AM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 44 -
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memremap.c  | 68 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  12 files changed, 45 insertions(+), 147 deletions(-)


I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates MEMORY_DEVICE_GENERIC
struct pages and then:
  dev_dax_fault()
dev_dax_huge_fault()
  __dev_dax_pte_fault()
vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing the page
refcount so it is zero (whereas it was one before). But using get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that though.

I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File systems have 
clear
allocate and free states for backing storage but there are the complications 
with
the page cache references, etc. to consider. The >1 to 1 reference count seems 
to
be used to tell when a page is idle (no I/O, reclaim scanners) rather than free
(not allocated to any file) but I'm not 100% sure about that since I don't 
really
understand all the issues around why a file system needs to have a DAX mount 
option
besides knowing that the storage block size has to be a multiple of the page 
size.

Re: [Intel-gfx] [PATCH v5 01/12] drm/i915: Reference objects on the ww object list

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 08:30:07AM +0200, Thomas Hellström wrote:
> Since the ww transaction endpoint easily end up far out-of-scope of
> the objects on the ww object list, particularly for contending lock
> objects, make sure we reference objects on the list so they don't
> disappear under us.
> 
> This comes with a performance penalty so it's been debated whether this
> is really needed. But I think this is motivated by the fact that locking
> is typically difficult to get right, and whatever we can do to make it
> simpler for developers moving forward should be done, unless the
> performance impact is far too high.
> 
> Signed-off-by: Thomas Hellström 
> Reviewed-by: Matthew Auld 

Acked-by: Daniel Vetter 

I've looked the past 2-3 weeks in-depth at our execbuf code. That has
definitely gone way too far into "very clevery" territory, and safe is so
much better than clever.

If there's a fundamental performance issue, we need to fix this in a
fundamental way. E.g. for this one here a possible solution could be
VM_BIND, at least in the fastpath, where we don't need to look-up any
objects, nor refcount them, nor anything else (at least that's the goal).
Only some per vm/request book-keeping and done.

Also I think we can easily claw this back once we get to the cleanup part
of this work: i915_vma_pin has a bunch of atomics (and lots of locks in
slow-paths) of its own, which are largely redundant now that object state
is protected by dma_resv_lock. Once that's cleaned up we can pay our
atomic inc/dec here with the removed atomic ops from the vma side I think.

Anyway just figured I drop some thoughts and my ack on the direction
you're pushing here.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 ++--
>  drivers/gpu/drm/i915/i915_gem.c| 4 
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index d66aa00d023a..241666931945 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -169,13 +169,17 @@ static inline int __i915_gem_object_lock(struct 
> drm_i915_gem_object *obj,
>   else
>   ret = dma_resv_lock(obj->base.resv, ww ? >ctx : NULL);
>  
> - if (!ret && ww)
> + if (!ret && ww) {
> + i915_gem_object_get(obj);
>   list_add_tail(>obj_link, >obj_list);
> + }
>   if (ret == -EALREADY)
>   ret = 0;
>  
> - if (ret == -EDEADLK)
> + if (ret == -EDEADLK) {
> + i915_gem_object_get(obj);
>   ww->contended = obj;
> + }
>  
>   return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6a0a3f0e36e1..c62dcd0e341a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1222,6 +1222,7 @@ static void i915_gem_ww_ctx_unlock_all(struct 
> i915_gem_ww_ctx *ww)
>   while ((obj = list_first_entry_or_null(>obj_list, struct 
> drm_i915_gem_object, obj_link))) {
>   list_del(>obj_link);
>   i915_gem_object_unlock(obj);
> + i915_gem_object_put(obj);
>   }
>  }
>  
> @@ -1229,6 +1230,7 @@ void i915_gem_ww_unlock_single(struct 
> drm_i915_gem_object *obj)
>  {
>   list_del(>obj_link);
>   i915_gem_object_unlock(obj);
> + i915_gem_object_put(obj);
>  }
>  
>  void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ww)
> @@ -1253,6 +1255,8 @@ int __must_check i915_gem_ww_ctx_backoff(struct 
> i915_gem_ww_ctx *ww)
>  
>   if (!ret)
>   list_add_tail(>contended->obj_link, >obj_list);
> + else
> + i915_gem_object_put(ww->contended);
>  
>   ww->contended = NULL;
>  
> -- 
> 2.31.1
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v4] Documentation: gpu: Mention the requirements for new properties

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 05:38:36PM +0200, Philippe CORNU wrote:
> 
> 
> On 6/16/21 4:38 PM, Maxime Ripard wrote:
> > New KMS properties come with a bunch of requirements to avoid each
> > driver from running their own, inconsistent, set of properties,
> > eventually leading to issues like property conflicts, inconsistencies
> > between drivers and semantics, etc.
> > 
> > Let's document what we expect.
> > 
> > Cc: Alexandre Belloni 
> > Cc: Alexandre Torgue 
> > Cc: Alex Deucher 
> > Cc: Alison Wang 
> > Cc: Alyssa Rosenzweig 
> > Cc: Andrew Jeffery 
> > Cc: Andrzej Hajda 
> > Cc: Anitha Chrisanthus 
> > Cc: Benjamin Gaignard 
> > Cc: Ben Skeggs 
> > Cc: Boris Brezillon 
> > Cc: Brian Starkey 
> > Cc: Chen Feng 
> > Cc: Chen-Yu Tsai 
> > Cc: Christian Gmeiner 
> > Cc: "Christian König" 
> > Cc: Chun-Kuang Hu 
> > Cc: Edmund Dea 
> > Cc: Eric Anholt 
> > Cc: Fabio Estevam 
> > Cc: Gerd Hoffmann 
> > Cc: Haneen Mohammed 
> > Cc: Hans de Goede 
> > Cc: "Heiko Stübner" 
> > Cc: Huang Rui 
> > Cc: Hyun Kwon 
> > Cc: Inki Dae 
> > Cc: Jani Nikula 
> > Cc: Jernej Skrabec 
> > Cc: Jerome Brunet 
> > Cc: Joel Stanley 
> > Cc: John Stultz 
> > Cc: Jonas Karlman 
> > Cc: Jonathan Hunter 
> > Cc: Joonas Lahtinen 
> > Cc: Joonyoung Shim 
> > Cc: Jyri Sarha 
> > Cc: Kevin Hilman 
> > Cc: Kieran Bingham 
> > Cc: Krzysztof Kozlowski 
> > Cc: Kyungmin Park 
> > Cc: Laurent Pinchart 
> > Cc: Linus Walleij 
> > Cc: Liviu Dudau 
> > Cc: Lucas Stach 
> > Cc: Ludovic Desroches 
> > Cc: Marek Vasut 
> > Cc: Martin Blumenstingl 
> > Cc: Matthias Brugger 
> > Cc: Maxime Coquelin 
> > Cc: Maxime Ripard 
> > Cc: Melissa Wen 
> > Cc: Neil Armstrong 
> > Cc: Nicolas Ferre 
> > Cc: "Noralf Trønnes" 
> > Cc: NXP Linux Team 
> > Cc: Oleksandr Andrushchenko 
> > Cc: Patrik Jakobsson 
> > Cc: Paul Cercueil 
> > Cc: Pekka Paalanen 
> > Cc: Pengutronix Kernel Team 
> > Cc: Philippe Cornu 
> > Cc: Philipp Zabel 
> > Cc: Qiang Yu 
> > Cc: Rob Clark 
> > Cc: Robert Foss 
> > Cc: Rob Herring 
> > Cc: Rodrigo Siqueira 
> > Cc: Rodrigo Vivi 
> > Cc: Roland Scheidegger 
> > Cc: Russell King 
> > Cc: Sam Ravnborg 
> > Cc: Sandy Huang 
> > Cc: Sascha Hauer 
> > Cc: Sean Paul 
> > Cc: Seung-Woo Kim 
> > Cc: Shawn Guo 
> > Cc: Simon Ser 
> > Cc: Stefan Agner 
> > Cc: Steven Price 
> > Cc: Sumit Semwal 
> > Cc: Thierry Reding 
> > Cc: Tian Tao 
> > Cc: Tomeu Vizoso 
> > Cc: Tomi Valkeinen 
> > Cc: VMware Graphics 
> > Cc: Xinliang Liu 
> > Cc: Xinwei Kong 
> > Cc: Yannick Fertre 
> > Cc: Zack Rusin 
> > Reviewed-by: Daniel Vetter 
> > Signed-off-by: Maxime Ripard 
> > 
> > ---
> > 
> > Changes from v3:
> >- Roll back to the v2
> >- Add Simon and Pekka in Cc
> > 
> > Changes from v2:
> >- Take into account the feedback from Laurent and Lidiu to no longer
> >  force generic properties, but prefix vendor-specific properties with
> >  the vendor name
> > 
> > Changes from v1:
> >- Typos and wording reported by Daniel and Alex
> > ---
> >   Documentation/gpu/drm-kms.rst | 19 +++
> >   1 file changed, 19 insertions(+)
> > 
> > diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> > index 87e5023e3f55..c28b464dd397 100644
> > --- a/Documentation/gpu/drm-kms.rst
> > +++ b/Documentation/gpu/drm-kms.rst
> > @@ -463,6 +463,25 @@ KMS Properties
> >   This section of the documentation is primarily aimed at user-space 
> > developers.
> >   For the driver APIs, see the other sections.
> > +Requirements
> > +
> > +
> > +KMS drivers might need to add extra properties to support new features.
> > +Each new property introduced in a driver need to meet a few
> > +requirements, in addition to the one mentioned above.:
> > +
> > +- It must be standardized, with some documentation to describe how the
> > +  property can be used.
> > +
> > +- It must provide a generic helper in the core code to register that
> > +  property on the object it attaches to.
> > +
> > +- Its content must be decoded by the core and provided in the object's
> > +  associated state structure. That includes anything drivers might want to
> > +  precompute, like :c:type:`struct drm_clip_rect ` for 
> > planes.
> > +
> > +- An IGT test must be submitted where reasonable.
> > +
> >   Property Types and Blob Property Support
> >   
> > 
> 
> Hi,
> 
> Regarding properties, we have a “case study example” related in a certain
> way to this documentation update :-)
> 
> The use case: on a front desk at an exhibition, there is a welcome screen
> you can touch for searching various information. When this welcome screen is
> in idle, a small logo is displayed at its center (around 20% of the
> fullscreen). The logo has a white background color. We want to reduce the
> ddr usage for lowering the power (the board is battery powered) so the idea
> is to use a white background color around this logo, produced by the drm
> CRTC so the image in ddr is only the size of the logo.
> 
> Reading the thread
>

Re: [PATCH -next] drm/amd/display: remove unused variable 'dc'

2021-06-17 Thread Harry Wentland




On 2021-06-16 9:16 p.m., Pu Lehui wrote:
> GCC reports the following warning with W=1:
> 
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_psr.c:70:13:
> warning:
>  variable ‘dc’ set but not used [-Wunused-but-set-variable]
> 70 |  struct dc *dc = NULL;
>| ^~
> 
> This variable is not used in function, this commit remove it to
> fix the warning.
> 
> Signed-off-by: Pu Lehui 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> index f7c77ae0d965..70a554f1e725 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> @@ -67,14 +67,12 @@ bool amdgpu_dm_link_setup_psr(struct dc_stream_state 
> *stream)
>   struct dc_link *link = NULL;
>   struct psr_config psr_config = {0};
>   struct psr_context psr_context = {0};
> - struct dc *dc = NULL;
>   bool ret = false;
>  
>   if (stream == NULL)
>   return false;
>  
>   link = stream->link;
> - dc = link->ctx->dc;
>  
>   psr_config.psr_version = link->dpcd_caps.psr_caps.psr_version;
>  
>

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 02:28:06PM -0400, Marek Olšák wrote:
> The kernel will know who should touch the implicit-sync semaphore next, and
> at the same time, the copy of all write requests to the implicit-sync
> semaphore will be forwarded to the kernel for monitoring and bo_wait.
> 
> Syncobjs could either use the same monitored access as implicit sync or be
> completely unmonitored. We haven't decided yet.
> 
> Syncfiles could either use one of the above or wait for a syncobj to go
> idle before converting to a syncfile.

Hm this sounds all like you're planning to completely rewrap everything
... I'm assuming the plan is still that this is going to be largely
wrapped in dma_fence? Maybe with timeline objects being a bit more
optimized, but I'm not sure how much you can optimize without breaking the
interfaces.
-Daniel

> 
> Marek
> 
> 
> 
> On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter  wrote:
> 
> > On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:
> > > As long as we can figure out who touched to a certain sync object last
> > that
> > > would indeed work, yes.
> >
> > Don't you need to know who will touch it next, i.e. who is holding up your
> > fence? Or maybe I'm just again totally confused.
> > -Daniel
> >
> > >
> > > Christian.
> > >
> > > Am 14.06.21 um 19:10 schrieb Marek Olšák:
> > > > The call to the hw scheduler has a limitation on the size of all
> > > > parameters combined. I think we can only pass a 32-bit sequence number
> > > > and a ~16-bit global (per-GPU) syncobj handle in one call and not much
> > > > else.
> > > >
> > > > The syncobj handle can be an element index in a global (per-GPU)
> > syncobj
> > > > table and it's read only for all processes with the exception of the
> > > > signal command. Syncobjs can either have per VMID write access flags
> > for
> > > > the signal command (slow), or any process can write to any syncobjs and
> > > > only rely on the kernel checking the write log (fast).
> > > >
> > > > In any case, we can execute the memory write in the queue engine and
> > > > only use the hw scheduler for logging, which would be perfect.
> > > >
> > > > Marek
> > > >
> > > > On Thu, Jun 10, 2021 at 12:33 PM Christian König
> > > >  > > > > wrote:
> > > >
> > > > Hi guys,
> > > >
> > > > maybe soften that a bit. Reading from the shared memory of the
> > > > user fence is ok for everybody. What we need to take more care of
> > > > is the writing side.
> > > >
> > > > So my current thinking is that we allow read only access, but
> > > > writing a new sequence value needs to go through the
> > scheduler/kernel.
> > > >
> > > > So when the CPU wants to signal a timeline fence it needs to call
> > > > an IOCTL. When the GPU wants to signal the timeline fence it needs
> > > > to hand that of to the hardware scheduler.
> > > >
> > > > If we lockup the kernel can check with the hardware who did the
> > > > last write and what value was written.
> > > >
> > > > That together with an IOCTL to give out sequence number for
> > > > implicit sync to applications should be sufficient for the kernel
> > > > to track who is responsible if something bad happens.
> > > >
> > > > In other words when the hardware says that the shader wrote stuff
> > > > like 0xdeadbeef 0x0 or 0x into memory we kill the process
> > > > who did that.
> > > >
> > > > If the hardware says that seq - 1 was written fine, but seq is
> > > > missing then the kernel blames whoever was supposed to write seq.
> > > >
> > > > Just pieping the write through a privileged instance should be
> > > > fine to make sure that we don't run into issues.
> > > >
> > > > Christian.
> > > >
> > > > Am 10.06.21 um 17:59 schrieb Marek Olšák:
> > > > > Hi Daniel,
> > > > >
> > > > > We just talked about this whole topic internally and we came up
> > > > > to the conclusion that the hardware needs to understand sync
> > > > > object handles and have high-level wait and signal operations in
> > > > > the command stream. Sync objects will be backed by memory, but
> > > > > they won't be readable or writable by processes directly. The
> > > > > hardware will log all accesses to sync objects and will send the
> > > > > log to the kernel periodically. The kernel will identify
> > > > > malicious behavior.
> > > > >
> > > > > Example of a hardware command stream:
> > > > > ...
> > > > > ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence
> > > > > number is assigned by the kernel
> > > > > Draw();
> > > > > ImplicitSyncSignalWhenDone(syncObjHandle);
> > > > > ...
> > > > >
> > > > > I'm afraid we have no other choice because of the TLB
> > > > > invalidation overhead.
> > > > >
> > > > > Marek
> > > > >
> > > > >
> > > > > On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter  > > > >

Re: [PATCH 4/7] drm/msm/dpu: hide struct dpu_irq_callback

2021-06-17 Thread Bjorn Andersson

On Thu 17 Jun 09:09 CDT 2021, Dmitry Baryshkov wrote:

> The struct dpu_irq_callbacks looks internal to IRQ handling code. Hide
> it from the rest of the DPU driver.
> 
> Signed-off-by: Dmitry Baryshkov 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h  | 18 +++---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   |  6 +-
>  .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  2 +-
>  .../drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c  | 10 ++-
>  .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  |  6 +-
>  .../gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 62 ++-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 12 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_trace.h |  8 +--
>  8 files changed, 69 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
> index 90ae6c9ccc95..44ab97fb2964 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
> @@ -46,10 +46,8 @@ u32 dpu_core_irq_read(
>   * interrupt
>   * @dpu_kms: DPU handle
>   * @irq_idx: irq index
> - * @irq_cb:  IRQ callback structure, containing callback function
> - *   and argument. Passing NULL for irq_cb will unregister
> - *   the callback for the given irq_idx
> - *   This must exist until un-registration.
> + * @irq_cb:  IRQ callback funcion.
> + * @irq_arg: IRQ callback argument.
>   * @return:  0 for success registering callback, otherwise failure
>   *
>   * This function supports registration of multiple callbacks for each 
> interrupt.
> @@ -57,17 +55,16 @@ u32 dpu_core_irq_read(
>  int dpu_core_irq_register_callback(
>   struct dpu_kms *dpu_kms,
>   int irq_idx,
> - struct dpu_irq_callback *irq_cb);
> + void (*irq_cb)(void *arg, int irq_idx),
> + void *irq_arg);
>  
>  /**
>   * dpu_core_irq_unregister_callback - For unregistering callback function on 
> IRQ
>   * interrupt
>   * @dpu_kms: DPU handle
>   * @irq_idx: irq index
> - * @irq_cb:  IRQ callback structure, containing callback function
> - *   and argument. Passing NULL for irq_cb will unregister
> - *   the callback for the given irq_idx
> - *   This must match with registration.
> + * @irq_cb:  IRQ callback funcion.
> + * @irq_arg: IRQ callback argument.
>   * @return:  0 for success registering callback, otherwise failure
>   *
>   * This function supports registration of multiple callbacks for each 
> interrupt.
> @@ -75,7 +72,8 @@ int dpu_core_irq_register_callback(
>  int dpu_core_irq_unregister_callback(
>   struct dpu_kms *dpu_kms,
>   int irq_idx,
> - struct dpu_irq_callback *irq_cb);
> + void (*irq_cb)(void *arg, int irq_idx),
> + void *irq_arg);
>  
>  /**
>   * dpu_debugfs_core_irq_init - register core irq debugfs
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> index 7f06238a7c64..186b2f87d193 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> @@ -310,7 +310,7 @@ int dpu_encoder_helper_wait_for_irq(struct 
> dpu_encoder_phys *phys_enc,
> phys_enc->hw_pp->idx - PINGPONG_0,
> atomic_read(wait_info->atomic_cnt));
>   local_irq_save(flags);
> - irq->cb.func(phys_enc, irq->irq_idx);
> + irq->func(phys_enc, irq->irq_idx);
>   local_irq_restore(flags);
>   ret = 0;
>   } else {
> @@ -352,7 +352,7 @@ int dpu_encoder_helper_register_irq(struct 
> dpu_encoder_phys *phys_enc,
>   }
>  
>   ret = dpu_core_irq_register_callback(phys_enc->dpu_kms, irq->irq_idx,
> - >cb);
> + irq->func, phys_enc);
>   if (ret) {
>   DPU_ERROR_PHYS(phys_enc,
>   "failed to register IRQ callback for %s\n",
> @@ -384,7 +384,7 @@ int dpu_encoder_helper_unregister_irq(struct 
> dpu_encoder_phys *phys_enc,
>   }
>  
>   ret = dpu_core_irq_unregister_callback(phys_enc->dpu_kms, irq->irq_idx,
> - >cb);
> + irq->func, phys_enc);
>   if (ret) {
>   DRM_ERROR("unreg cb fail id=%u, intr=%d, irq=%d ret=%d",
> DRMID(phys_enc->parent), intr_idx,
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
> index e7270eb6b84b..80d87871fd94 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
> @@ -174,7

Re: [PATCH v2 1/3] drm/dp_helper: Rework the drm_dp_aux documentation

2021-06-17 Thread Daniel Vetter

On Wed, Jun 16, 2021 at 04:15:27PM +0200, Maxime Ripard wrote:
> Split the existing documentation to move the comments on particular
> fields next to them.
> 
> Suggested-by: Daniel Vetter 
> Signed-off-by: Maxime Ripard 
> 
> ---
> 
> Changes from v1:
>   - New patch

On the series:

Acked-by: Daniel Vetter 

Thanks for doing this polish improvement.
-Daniel
> ---
>  include/drm/drm_dp_helper.h | 84 +
>  1 file changed, 57 insertions(+), 27 deletions(-)
> 
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index 1e85c2021f2f..1c5ae07ff0c7 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -1837,29 +1837,6 @@ struct drm_dp_aux_cec {
>  
>  /**
>   * struct drm_dp_aux - DisplayPort AUX channel
> - * @name: user-visible name of this AUX channel and the I2C-over-AUX adapter
> - * @ddc: I2C adapter that can be used for I2C-over-AUX communication
> - * @dev: pointer to struct device that is the parent for this AUX channel
> - * @crtc: backpointer to the crtc that is currently using this AUX channel
> - * @hw_mutex: internal mutex used for locking transfers
> - * @crc_work: worker that captures CRCs for each frame
> - * @crc_count: counter of captured frame CRCs
> - * @transfer: transfers a message representing a single AUX transaction
> - *
> - * The @dev field should be set to a pointer to the device that implements 
> the
> - * AUX channel.
> - *
> - * The @name field may be used to specify the name of the I2C adapter. If 
> set to
> - * %NULL, dev_name() of @dev will be used.
> - *
> - * Drivers provide a hardware-specific implementation of how transactions are
> - * executed via the @transfer() function. A pointer to a _dp_aux_msg
> - * structure describing the transaction is passed into this function. Upon
> - * success, the implementation should return the number of payload bytes that
> - * were transferred, or a negative error-code on failure. Helpers propagate
> - * errors from the @transfer() function, with the exception of the %-EBUSY
> - * error, which causes a transaction to be retried. On a short, helpers will
> - * return %-EPROTO to make it simpler to check for failure.
>   *
>   * An AUX channel can also be used to transport I2C messages to a sink. A
>   * typical application of that is to access an EDID that's present in the 
> sink
> @@ -1870,21 +1847,74 @@ struct drm_dp_aux_cec {
>   * transfers by default; if a partial response is received, the adapter will
>   * drop down to the size given by the partial response for this transaction
>   * only.
> - *
> - * Note that the aux helper code assumes that the @transfer() function only
> - * modifies the reply field of the _dp_aux_msg structure. The retry logic
> - * and i2c helpers assume this is the case.
>   */
>  struct drm_dp_aux {
> + /**
> +  * @name: user-visible name of this AUX channel and the
> +  * I2C-over-AUX adapter.
> +  *
> +  * It's also used to specify the name of the I2C adapter. If set
> +  * to %NULL, dev_name() of @dev will be used.
> +  */
>   const char *name;
> +
> + /**
> +  * @ddc: I2C adapter that can be used for I2C-over-AUX
> +  * communication
> +  */
>   struct i2c_adapter ddc;
> +
> + /**
> +  * @dev: pointer to struct device that is the parent for this
> +  * AUX channel.
> +  */
>   struct device *dev;
> +
> + /**
> +  * @crtc: backpointer to the crtc that is currently using this
> +  * AUX channel
> +  */
>   struct drm_crtc *crtc;
> +
> + /**
> +  * @hw_mutex: internal mutex used for locking transfers.
> +  */
>   struct mutex hw_mutex;
> +
> + /**
> +  * @crc_work: worker that captures CRCs for each frame
> +  */
>   struct work_struct crc_work;
> +
> + /**
> +  * @crc_count: counter of captured frame CRCs
> +  */
>   u8 crc_count;
> +
> + /**
> +  * @transfer: transfers a message representing a single AUX
> +  * transaction.
> +  *
> +  * This is a hardware-specific implementation of how
> +  * transactions are executed that the drivers must provide.
> +  *
> +  * A pointer to a _dp_aux_msg structure describing the
> +  * transaction is passed into this function. Upon success, the
> +  * implementation should return the number of payload bytes that
> +  * were transferred, or a negative error-code on failure.
> +  *
> +  * Helpers will propagate these errors, with the exception of
> +  * the %-EBUSY error, which causes a transaction to be retried.
> +  * On a short, helpers will return %-EPROTO to make it simpler
> +  * to check for failure.
> +  *
> +  * The @transfer() function must only modify the reply field of
> +  * the _dp_aux_msg structure. The retry logic and i2c
> +  * helpers assume this is the case.
> +  */
>   ssize_t (*transfer)(struct drm_dp_aux *aux,
>

Re: [PATCH -next] drm/amd/display: Fix gcc unused variable warning

2021-06-17 Thread Harry Wentland

On 2021-06-16 10:31 p.m., Pu Lehui wrote:
> GCC reports the following warning with W=1:
> 
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:3635:17:
> warning:
>  variable ‘status’ set but not used [-Wunused-but-set-variable]
>   3635 |  enum dc_status status = DC_ERROR_UNEXPECTED;
>| ^~
> 
> The variable should be used for error check, let's fix it.
> 
> Signed-off-by: Pu Lehui 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> index fcb635c85330..cf29265870c8 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> @@ -3681,6 +3681,10 @@ bool dp_retrieve_lttpr_cap(struct dc_link *link)
>   
> DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV,
>   lttpr_dpcd_data,
>   sizeof(lttpr_dpcd_data));
> + if (status != DC_OK) {
> + dm_error("%s: Read LTTPR caps data failed.\n", 
> __func__);
> + return false;
> + }
>  
>   link->dpcd_caps.lttpr_caps.revision.raw =
>   
> lttpr_dpcd_data[DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV -
>

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-17 Thread Marek Olšák

The kernel will know who should touch the implicit-sync semaphore next, and
at the same time, the copy of all write requests to the implicit-sync
semaphore will be forwarded to the kernel for monitoring and bo_wait.

Syncobjs could either use the same monitored access as implicit sync or be
completely unmonitored. We haven't decided yet.

Syncfiles could either use one of the above or wait for a syncobj to go
idle before converting to a syncfile.

Marek



On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter  wrote:

> On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:
> > As long as we can figure out who touched to a certain sync object last
> that
> > would indeed work, yes.
>
> Don't you need to know who will touch it next, i.e. who is holding up your
> fence? Or maybe I'm just again totally confused.
> -Daniel
>
> >
> > Christian.
> >
> > Am 14.06.21 um 19:10 schrieb Marek Olšák:
> > > The call to the hw scheduler has a limitation on the size of all
> > > parameters combined. I think we can only pass a 32-bit sequence number
> > > and a ~16-bit global (per-GPU) syncobj handle in one call and not much
> > > else.
> > >
> > > The syncobj handle can be an element index in a global (per-GPU)
> syncobj
> > > table and it's read only for all processes with the exception of the
> > > signal command. Syncobjs can either have per VMID write access flags
> for
> > > the signal command (slow), or any process can write to any syncobjs and
> > > only rely on the kernel checking the write log (fast).
> > >
> > > In any case, we can execute the memory write in the queue engine and
> > > only use the hw scheduler for logging, which would be perfect.
> > >
> > > Marek
> > >
> > > On Thu, Jun 10, 2021 at 12:33 PM Christian König
> > >  > > > wrote:
> > >
> > > Hi guys,
> > >
> > > maybe soften that a bit. Reading from the shared memory of the
> > > user fence is ok for everybody. What we need to take more care of
> > > is the writing side.
> > >
> > > So my current thinking is that we allow read only access, but
> > > writing a new sequence value needs to go through the
> scheduler/kernel.
> > >
> > > So when the CPU wants to signal a timeline fence it needs to call
> > > an IOCTL. When the GPU wants to signal the timeline fence it needs
> > > to hand that of to the hardware scheduler.
> > >
> > > If we lockup the kernel can check with the hardware who did the
> > > last write and what value was written.
> > >
> > > That together with an IOCTL to give out sequence number for
> > > implicit sync to applications should be sufficient for the kernel
> > > to track who is responsible if something bad happens.
> > >
> > > In other words when the hardware says that the shader wrote stuff
> > > like 0xdeadbeef 0x0 or 0x into memory we kill the process
> > > who did that.
> > >
> > > If the hardware says that seq - 1 was written fine, but seq is
> > > missing then the kernel blames whoever was supposed to write seq.
> > >
> > > Just pieping the write through a privileged instance should be
> > > fine to make sure that we don't run into issues.
> > >
> > > Christian.
> > >
> > > Am 10.06.21 um 17:59 schrieb Marek Olšák:
> > > > Hi Daniel,
> > > >
> > > > We just talked about this whole topic internally and we came up
> > > > to the conclusion that the hardware needs to understand sync
> > > > object handles and have high-level wait and signal operations in
> > > > the command stream. Sync objects will be backed by memory, but
> > > > they won't be readable or writable by processes directly. The
> > > > hardware will log all accesses to sync objects and will send the
> > > > log to the kernel periodically. The kernel will identify
> > > > malicious behavior.
> > > >
> > > > Example of a hardware command stream:
> > > > ...
> > > > ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence
> > > > number is assigned by the kernel
> > > > Draw();
> > > > ImplicitSyncSignalWhenDone(syncObjHandle);
> > > > ...
> > > >
> > > > I'm afraid we have no other choice because of the TLB
> > > > invalidation overhead.
> > > >
> > > > Marek
> > > >
> > > >
> > > > On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter  > > > > wrote:
> > > >
> > > > On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König
> wrote:
> > > > > Am 09.06.21 um 15:19 schrieb Daniel Vetter:
> > > > > > [SNIP]
> > > > > > > Yeah, we call this the lightweight and the heavyweight
> > > > tlb flush.
> > > > > > >
> > > > > > > The lighweight can be used when you are sure that you
> > > > don't have any of the
> > > > > > > PTEs currently in flight in the 3D/DMA engine and you
> > > > just need to
> > > > > > > invalidate the TLB.
> > >

Re: [PATCH v2 2/2] drm/bridge: ti-sn65dsi86: Implement the pwm_chip

2021-06-17 Thread Bjorn Andersson

On Thu 17 Jun 11:54 CDT 2021, Uwe Kleine-K?nig wrote:

> Hello Bjorn,
> 
> On Thu, Jun 17, 2021 at 11:38:26AM -0500, Bjorn Andersson wrote:
> > On Thu 17 Jun 01:24 CDT 2021, Uwe Kleine-K?nig wrote:
> > > On Wed, Jun 16, 2021 at 10:22:17PM -0500, Bjorn Andersson wrote:
> > > > > > +static int ti_sn_pwm_apply(struct pwm_chip *chip, struct 
> > > > > > pwm_device *pwm,
> > > > > > +  const struct pwm_state *state)
> > > > > > +{
> > > > > > +   struct ti_sn65dsi86 *pdata = pwm_chip_to_ti_sn_bridge(chip);
> > > > > > +   unsigned int pwm_en_inv;
> > > > > > +   unsigned int backlight;
> > > > > > +   unsigned int pre_div;
> > > > > > +   unsigned int scale;
> > > > > > +   int ret;
> > > > > > +
> > > > > > +   if (!pdata->pwm_enabled) {
> > > > > > +   ret = pm_runtime_get_sync(pdata->dev);
> > > > > > +   if (ret < 0)
> > > > > > +   return ret;
> > > > > > +
> > > > > > +   ret = regmap_update_bits(pdata->regmap, 
> > > > > > SN_GPIO_CTRL_REG,
> > > > > > +   SN_GPIO_MUX_MASK << (2 * 
> > > > > > SN_PWM_GPIO_IDX),
> > > > > > +   SN_GPIO_MUX_SPECIAL << (2 * 
> > > > > > SN_PWM_GPIO_IDX));
> > > > > > +   if (ret) {
> > > > > > +   dev_err(pdata->dev, "failed to mux in PWM 
> > > > > > function\n");
> > > > > > +   goto out;
> > > > > > +   }
> > > > > 
> > > > > Do you need to do this even if state->enabled is false?
> > > > 
> > > > I presume I should be able to explicitly mux in the GPIO function and
> > > > configure that to output low. But I am not able to find anything in the
> > > > data sheet that would indicate this to be preferred.
> > > 
> > > My question targetted a different case. If the PWM is off
> > > (!pdata->pwm_enabled) and should remain off (state->enabled is false)
> > > you can shortcut here, can you not?
> > 
> > Right, if we're going off->off then we don't need to touch the hardware.
> > 
> > But am I expected to -EINVAL improper period and duty cycle even though
> > enabled is false?
> > 
> > And also, what is the supposed behavior of enabled = false? Is it
> > supposedly equivalent of asking for a duty_cycle of 0?
> 
> In my book enabled = false is just syntactic sugar to say:
> "duty_cycle=0, period=something small". So to answer your questions: if
> enabled = false, the consumer doesn't really care about period and
> duty_cycle. Some care that the output becomes inactive, some others
> don't, so from my POV just emit the inactive level on the output and
> ignore period and duty_cycle.
> 

Giving this some further thought.

In order to have a known state of the PWM signal we need the sn65dsi86
to be powered. The documentation describes a "suspend mode", but this is
currently not implemented in the driver, so there's a large power cost
coming from just keeping the pin low when disabled.

As such in the current implementation I use state->enabled to also
control if the device should be kept powered, which means that this
follows the backlight power status of the pwm-backlight. Which is
sufficient as the backlight won't be powered when !state->enabled.


For the typical use case the pwm-backlight has some independent control
over gpio and power to toggle the actual backlight. But in the even that
this wouldn't be available I think we need to extend the driver to
implement the suspend mode.

In which case muxing in the PWM function should probably happen at the
time the PWM channel is requested.

This does come at an additional power cost (not as high as keeping the
chip awake though), so (in addition to the effort) I think it's
reasonable to document this as a limitation for now and implement this
as the need arise.

Thanks,
Bjorn

> > > > > Does this already modify the output pin?
> > > > 
> > > > Yes, coming out of reset this pin is configured as input, so switching
> > > > the mux here will effectively start driving the pin.
> > > 
> > > So please document this in the format the recently added drivers do,
> > > too. See e.g. drivers/pwm/pwm-sifive.c. (The idea is to start that with
> > > " * Limitations:" to make it easy to grep it.)
> > > 
> > 
> > Okay, will do. Although I believe that for this driver it makes sense to
> > place such comment close to this function, rather than at the top of the
> > driver.
> 
> Yes, agreed.
> 
> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K.   | Uwe Kleine-König|
> Industrial Linux Solutions | https://www.pengutronix.de/ |

Re: vc4_bo_create: Failed to allocate from CMA

2021-06-17 Thread Stefan Wahren

Hi Nicolas,

Am 17.06.21 um 11:36 schrieb nicolas saenz julienne:
> On Sat, 2021-06-12 at 17:17 +0200, Stefan Wahren wrote:
>> Hi,
>>
>> while testing the mainline kernel (arm64, defconfig) on Raspberry Pi 3 B
>> Plus with Raspberry Pi OS - 64 bit, sometimes X doesn't start into
>> desktop properly (unexpected and unusable login screen instead of auto
>> login or mouse pointer is show shorty and than switch back to black
>> screen in a loop). In that case dmesg shows the following:
>>
>> [   74.737106] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from
>> CMA:
>> [   74.737558] vc4-drm soc:gpu: [drm]    V3D: 
>> 28976kb BOs (10)
>> [   74.737602] vc4-drm soc:gpu: [drm] V3D
>> shader: 44kb BOs (11)
>> [   74.737632] vc4-drm soc:gpu: [drm]   dumb:  
>> 4564kb BOs (5)
>> [   74.737664] vc4-drm soc:gpu: [drm] binner: 
>> 16384kb BOs (1)
>> [   74.737697] vc4-drm soc:gpu: [drm]    total purged
>> BO:  4kb BOs (1)
>> [   74.739039] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from
>> CMA:
>> [   74.739466] vc4-drm soc:gpu: [drm]    V3D: 
>> 28972kb BOs (9)
>> [   74.739512] vc4-drm soc:gpu: [drm] V3D
>> shader: 44kb BOs (11)
>> [   74.739541] vc4-drm soc:gpu: [drm]   dumb:  
>> 4564kb BOs (5)
>> [   74.739570] vc4-drm soc:gpu: [drm] binner: 
>> 16384kb BOs (1)
>> [   74.739602] vc4-drm soc:gpu: [drm]    total purged
>> BO:  4kb BOs (1)
>> [   74.740718] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from
>> CMA:
>> [   74.741138] vc4-drm soc:gpu: [drm]    V3D: 
>> 28972kb BOs (9)
>> [   74.741171] vc4-drm soc:gpu: [drm] V3D
>> shader: 44kb BOs (11)
>> [   74.741202] vc4-drm soc:gpu: [drm]   dumb:  
>> 4564kb BOs (5)
>> [   74.741231] vc4-drm soc:gpu: [drm] binner: 
>> 16384kb BOs (1)
>> [   74.741263] vc4-drm soc:gpu: [drm]    total purged
>> BO:  4kb BOs (1)
>> ...
>>
>> I have only seen this issue on arm64 with latest mainline kernel
>> (5.13.0-rc5-00130-gf21b807c3cf8), but also with older kernel versions.
>> So it's not a regression. It seems 64 bit needs more CMA.
>>
>> In case X started properly i was also able to reproduce these errors
>> above by dis- and reconneting HDMI.
>>
>> So i increased CMA in bcm283x.dtsi and the problem disappeared:
>>
>> iff --git a/arch/arm/boot/dts/bcm283x.dtsi b/arch/arm/boot/dts/bcm283x.dtsi
>> index b83a864..d1304cb 100644
>> --- a/arch/arm/boot/dts/bcm283x.dtsi
>> +++ b/arch/arm/boot/dts/bcm283x.dtsi
>> @@ -37,7 +37,7 @@
>>  
>>      cma: linux,cma {
>>          compatible = "shared-dma-pool";
>> -            size = <0x400>; /* 64MB */
>> +            size = <0x600>; /* 96MB */
>>          reusable;
>>          linux,cma-default;
>>      };
>>
>> The questions are:
>>
>> Is this the right way (tm) to fix this problem?
> Frankly I don't know if there is a better way. IIRC opensuse and downstream 
> use
> DT overlays to cater for this limitation. It seems reasonable to bump the
> value. But it'll be in detriment of users that don't care much for graphical
> interfaces. Nonetheless, I'm not familiar with how DRM handles CMA/DMA memory.
> So let me have a look at it. Maybe there is a SW fix. At first glance I'm
> surprised they can't defer to normal page allocations when CMA isn't capable 
> of
> honoring the request (like the dma code does).

a compromise might be to increase the CMA size based on the SoC type
(newer generations have more memory)

BCM2835 => 64 MB
BCM2836, BCM2837 => 256 MB

>
>> And what is a sensible value (don't have a 4K display to test)?
> The default for downstream is 256MB. But I've read discussions in the forum
> where people needed even more. IIUC it's use-case dependent, resolution is 
> only
> one variable, you might then try to run a game and run out of memory there.

Sure this wasn't intend to make everybody happy. But i would expect to
start X reliable at least.

Regards
Stefan

>
> Regards,
> Nicolas
>

Re: [PATCH] drm/i915/gt: Fix duplicate included intel_region_lmem.h

2021-06-17 Thread Daniel Vetter

On Wed, Jun 16, 2021 at 02:01:58PM +0800, Jiapeng Chong wrote:
> Clean up the following includecheck warning:
> 
> ./drivers/gpu/drm/i915/gt/intel_region_lmem.c: intel_region_lmem.h is
> included more than once.
> 
> No functional change.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 

Already merged another one of these:

commit 6796c772850574ec0a9adc977e9889606b23d0f4 (HEAD -> drm-intel-gt-next, 
drm-intel/drm-intel-gt-next)
Author: Wan Jiabing 
Date:   Tue Jun 15 19:35:20 2021 +0800

drm/i915: Remove duplicate include of intel_region_lmem.h

Thanks anyway.

Cheers, Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_region_lmem.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c 
> b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index f7366b0..aa3cfca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -5,7 +5,6 @@
>  
>  #include "i915_drv.h"
>  #include "intel_memory_region.h"
> -#include "intel_region_lmem.h"
>  #include "intel_region_ttm.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_region.h"
> -- 
> 1.8.3.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH -next] apply: use DEFINE_SPINLOCK() instead of spin_lock_init().

2021-06-17 Thread Daniel Vetter

On Tue, Jun 15, 2021 at 07:17:13PM -0800, Yu Jiahua wrote:
> From: Jiahua Yu 
> 
> spinlock can be initialized automatically with DEFINE_SPINLOCK()
> rather than explicitly calling spin_lock_init().
> 
> Signed-off-by: Jiahua Yu 

Stuffed into drm-misc-next. The subject looked a bit strange, so I fixed
that up.
-Daniel

> ---
>  drivers/video/fbdev/omap2/omapfb/dss/apply.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/video/fbdev/omap2/omapfb/dss/apply.c 
> b/drivers/video/fbdev/omap2/omapfb/dss/apply.c
> index c71021091828..acca991c7540 100644
> --- a/drivers/video/fbdev/omap2/omapfb/dss/apply.c
> +++ b/drivers/video/fbdev/omap2/omapfb/dss/apply.c
> @@ -108,7 +108,7 @@ static struct {
>  } dss_data;
>  
>  /* protects dss_data */
> -static spinlock_t data_lock;
> +static DEFINE_SPINLOCK(data_lock);
>  /* lock for blocking functions */
>  static DEFINE_MUTEX(apply_lock);
>  static DECLARE_COMPLETION(extra_updated_completion);
> @@ -131,8 +131,6 @@ static void apply_init_priv(void)
>   struct mgr_priv_data *mp;
>   int i;
>  
> - spin_lock_init(_lock);
> -
>   for (i = 0; i < num_ovls; ++i) {
>   struct ovl_priv_data *op;
>  
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: vc4: hdmi: audio: ASoC: error at snd_soc_dai_startup on fef00700.hdmi

2021-06-17 Thread Stefan Wahren

Hi Maxime,

Am 17.06.21 um 17:25 schrieb Maxime Ripard:
> Hi Stefan,
>
> On Sat, Jun 12, 2021 at 12:04:08PM +0200, Stefan Wahren wrote:
>> Hi Maxime,
>>
>> Am 04.06.21 um 11:02 schrieb Maxime Ripard:
>>> Hi Stefan,
>>>
>>> I would assume it's due to this:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/vc4/vc4_hdmi.c#n1083
>>>
>>> It pre-dates my time working on the vc4 driver so I'm not really sure
>>> what this is supposed to prevent, but my guess is that it's there to
>>> avoid someone using the audio card before we have a display detected and
>>> connected, and its capabilities known (the first and more obvious one
>>> being does it support audio in the first place).
>>>
>>> It's nothing new though, maybe it's the error printing itself that is?
>> i'm sorry, i forgot about this discussion here:
>>
>> https://lists.freedesktop.org/archives/dri-devel/2020-December/292701.html
> It looks like there's no discussion on that link, is it the link you wanted 
> to paste?

it was the right patch, but the discussion get lost because of turn of
the year. Next try:

https://www.spinics.net/lists/dri-devel/msg284535.html

>
> Maxime

Re: [PATCH] drm/i915: Remove duplicate include of intel_region_lmem.h

2021-06-17 Thread Daniel Vetter

On Tue, Jun 15, 2021 at 07:35:20PM +0800, Wan Jiabing wrote:
> Fix the following checkinclude.pl warning:
> drivers/gpu/drm/i915/gt/intel_region_lmem.c
> 8 #include "intel_region_lmem.h"
>  12   #include "intel_region_lmem.h"
> 
> Signed-off-by: Wan Jiabing 

Applied to drm-intel-gt-next, thanks for your patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_region_lmem.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c 
> b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index f7366b054f8e..119eeec98837 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -9,7 +9,6 @@
>  #include "intel_region_ttm.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_region.h"
> -#include "intel_region_lmem.h"
>  
>  static int init_fake_lmem_bar(struct intel_memory_region *mem)
>  {
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Intel-gfx] [RFC PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-06-17 Thread Matthew Brost

On Thu, Jun 17, 2021 at 06:46:48PM +0200, Daniel Vetter wrote:
> Sorry I'm behind on mails  ...
> 

Aren't we all.

> On Fri, Jun 11, 2021 at 12:50:29PM -0700, Matthew Brost wrote:
> > On Fri, Jun 04, 2021 at 07:59:05PM +0200, Daniel Vetter wrote:
> > > On Wed, May 26, 2021 at 04:33:57PM -0700, Matthew Brost wrote:
> > > > Add entry for i915 new parallel submission uAPI plan.
> > > > 
> > > > v2:
> > > >  (Daniel Vetter):
> > > >   - Expand logical order explaination
> > > >   - Add dummy header
> > > >   - Only allow N BBs in execbuf IOCTL
> > > >   - Configure parallel submission per slot not per gem context
> > > > v3:
> > > >  (Marcin Ślusarz):
> > > >   - Lot's of typos / bad english fixed
> > > >  (Tvrtko Ursulin):
> > > >   - Consistent pseudo code, clean up wording in descriptions
> > > > 
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Tony Ye 
> > > > CC: Carl Zhang 
> > > > Cc: Daniel Vetter 
> > > > Cc: Jason Ekstrand 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 145 ++
> > > >  Documentation/gpu/rfc/i915_scheduler.rst  |  55 ++-
> > > >  2 files changed, 198 insertions(+), 2 deletions(-)
> > > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > 
> > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > new file mode 100644
> > > > index ..20de206e3ab4
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > @@ -0,0 +1,145 @@
> > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > i915_context_engines_parallel_submit */
> > > > +
> > > > +/*
> > > > + * i915_context_engines_parallel_submit:
> > > 
> > > So the idea is to make these kerneldoc and pull them into the rfc section.
> > > Then when we merge, move them to the real uapi section, like what Matt has
> > > done for lmem.
> > > 
> > 
> > Yep, will fix in next rev.
> > 
> > > > + *
> > > > + * Setup a slot in the context engine map to allow multiple BBs to be 
> > > > submitted
> > > > + * in a single execbuf IOCTL. Those BBs will then be scheduled to run 
> > > > on the GPU
> > > > + * in parallel. Multiple hardware contexts are created internally in 
> > > > the i915
> > > > + * run these BBs. Once a slot is configured for N BBs only N BBs can be
> > > > + * submitted in each execbuf IOCTL and this is implicit behavior e.g. 
> > > > The user
> > > > + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL 
> > > > know how
> > > > + * many BBs there are based on the slots configuration. The N BBs are 
> > > > the last N
> > > > + * buffer objects for first N if I915_EXEC_BATCH_FIRST is set.
> > > 
> > > s/for/or/
> > > 
> > > > + *
> > > > + * There are two currently defined ways to control the placement of the
> > > > + * hardware contexts on physical engines: default behavior (no flags) 
> > > > and
> > > > + * I915_PARALLEL_IMPLICIT_BONDS (a flag). More flags may be added the 
> > > > in the
> > > > + * future as new hardware / use cases arise. Details of how to use this
> > > > + * interface above the flags field in this structure.
> > > > + *
> > > > + * Returns -EINVAL if hardware context placement configuration is 
> > > > invalid or if
> > > > + * the placement configuration isn't supported on the platform / 
> > > > submission
> > > > + * interface.
> > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > submission
> > > > + * inteface.
> > > > + */
> > > > +struct i915_context_engines_parallel_submit {
> > > > +   struct i915_user_extension base;
> > > > +
> > > > +   __u16 engine_index; /* slot for parallel engine */
> > > 
> > > Kernel doc here for the inline comments too.
> > >
> > 
> > Yep.
> >  
> > > > +   __u16 width;/* number of contexts per parallel 
> > > > engine */
> > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > +   __u16 mbz16;
> > > > +/*
> > > > + * Default placement behavior (currently unsupported):
> > > > + *
> > > > + * Allow BBs to be placed on any available engine instance. In this 
> > > > case each
> > > > + * context's engine mask indicates where that context can be placed. 
> > > > It is
> > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > placement.
> > > > + * e.g. If one context is running CSX[0] no other contexts can run on 
> > > > CSX[0]).
> > > > + *
> > > > + * Example 1 pseudo code:
> > > > + * CSX,Y[N] = generic engine class X or Y, logical instance N
> > > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > > + * set_engines(INVALID)
> > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > + * engines=CSX[0],CSX[1],CSY[0],CSY[1])
> > > > + *
> > > > + * Results in the following valid placements:
> > > > + * CSX[0], CSY[0]
> > > > + * CSX[0], CSY[1]
> > > > + *

Re: [PATCH] dma-buf: fix and rework dma_buf_poll

2021-06-17 Thread Daniel Vetter

On Tue, Jun 15, 2021 at 01:21:17PM +0200, Christian König wrote:
> Daniel pointed me towards this function and there are multiple obvious 
> problems
> in the implementation.
> 
> First of all the retry loop is not working as intended. In general the retry
> makes only sense if you grab the reference first and then check the retry. 
> Then
> we skipped checking the exclusive fence when shared fences were present. And
> last the whole implementation was unnecessary complex and rather hard to
> understand which could lead to probably unexpected behavior of the IOCTL.
> 
> Fix all this by reworking the implementation from scratch.

Can't we split this a bit?

The other thing I'm wondering, instead of open-coding this and breaking
our heads trying to make sure we got it right. Can't we reuse
dma_resv_get_fences? That's what a lot of drivers use already to get a
consistent copy of the fence set without holding the lock.

I think then the actual semantics, i.e. do we need to include the
exclusive fence or not, stick out more.
-Daniel

> 
> Only mildly tested and needs a thoughtful review of the code.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-buf.c | 132 +++---
>  include/linux/dma-buf.h   |   2 +-
>  2 files changed, 54 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 511fe0d217a0..1bd00e18291f 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -72,7 +72,7 @@ static void dma_buf_release(struct dentry *dentry)
>* If you hit this BUG() it means someone dropped their ref to the
>* dma-buf while still having pending operation to the buffer.
>*/
> - BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
> + BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>  
>   dmabuf->ops->release(dmabuf);
>  
> @@ -206,12 +206,15 @@ static void dma_buf_poll_cb(struct dma_fence *fence, 
> struct dma_fence_cb *cb)
>  
>  static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
>  {
> + struct dma_buf_poll_cb_t *dcb;
>   struct dma_buf *dmabuf;
>   struct dma_resv *resv;
>   struct dma_resv_list *fobj;
>   struct dma_fence *fence_excl;
> - __poll_t events;
>   unsigned shared_count, seq;
> + struct dma_fence *fence;
> + __poll_t events;
> + int r, i;
>  
>   dmabuf = file->private_data;
>   if (!dmabuf || !dmabuf->resv)
> @@ -225,99 +228,70 @@ static __poll_t dma_buf_poll(struct file *file, 
> poll_table *poll)
>   if (!events)
>   return 0;
>  
> + dcb = events & EPOLLOUT ? >cb_out : >cb_in;
> +
> + /* Only queue a new one if we are not still waiting for the old one */
> + spin_lock_irq(>poll.lock);
> + if (dcb->active)
> + events = 0;
> + else
> + dcb->active = events;
> + spin_unlock_irq(>poll.lock);
> + if (!events)
> + return 0;
> +
>  retry:
>   seq = read_seqcount_begin(>seq);
>   rcu_read_lock();
>  
>   fobj = rcu_dereference(resv->fence);
> - if (fobj)
> + if (fobj && events & EPOLLOUT)
>   shared_count = fobj->shared_count;
>   else
>   shared_count = 0;
> - fence_excl = dma_resv_excl_fence(resv);
> - if (read_seqcount_retry(>seq, seq)) {
> - rcu_read_unlock();
> - goto retry;
> - }
>  
> - if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
> - struct dma_buf_poll_cb_t *dcb = >cb_excl;
> - __poll_t pevents = EPOLLIN;
> -
> - if (shared_count == 0)
> - pevents |= EPOLLOUT;
> -
> - spin_lock_irq(>poll.lock);
> - if (dcb->active) {
> - dcb->active |= pevents;
> - events &= ~pevents;
> - } else
> - dcb->active = pevents;
> - spin_unlock_irq(>poll.lock);
> -
> - if (events & pevents) {
> - if (!dma_fence_get_rcu(fence_excl)) {
> - /* force a recheck */
> - events &= ~pevents;
> - dma_buf_poll_cb(NULL, >cb);
> - } else if (!dma_fence_add_callback(fence_excl, >cb,
> -dma_buf_poll_cb)) {
> - events &= ~pevents;
> - dma_fence_put(fence_excl);
> - } else {
> - /*
> -  * No callback queued, wake up any additional
> -  * waiters.
> -  */
> - dma_fence_put(fence_excl);
> - dma_buf_poll_cb(NULL, >cb);
> - }
> + for (i = 0; i < shared_count; ++i) {
> + fence =

Re: [PATCH 1/2] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 09:44:25AM +0200, Christian König wrote:
> Alex do want to review those so that we can close the ticket?

Maybe I'm behind on mails, but 2nd patch still has the issues I think I'm
seeing ...
-Daniel

> 
> Thanks,
> Christian.
> 
> Am 14.06.21 um 19:45 schrieb Christian König:
> > Unwrap the explicit fence if it is a dma_fence_chain and
> > sync to the first fence not matching the owner rules.
> > 
> > Signed-off-by: Christian König 
> > Acked-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +--
> >   1 file changed, 68 insertions(+), 50 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > @@ -28,6 +28,8 @@
> >*Christian König 
> >*/
> > +#include 
> > +
> >   #include "amdgpu.h"
> >   #include "amdgpu_trace.h"
> >   #include "amdgpu_amdkfd.h"
> > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, 
> > struct dma_fence *fence)
> > return amdgpu_sync_fence(sync, fence);
> >   }
> > +/* Determine based on the owner and mode if we should sync to a fence or 
> > not */
> > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > +  enum amdgpu_sync_mode mode,
> > +  void *owner, struct dma_fence *f)
> > +{
> > +   void *fence_owner = amdgpu_sync_get_owner(f);
> > +
> > +   /* Always sync to moves, no matter what */
> > +   if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > +   return true;
> > +
> > +   /* We only want to trigger KFD eviction fences on
> > +* evict or move jobs. Skip KFD fences otherwise.
> > +*/
> > +   if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > +   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > +   return false;
> > +
> > +   /* Never sync to VM updates either. */
> > +   if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > +   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > +   return false;
> > +
> > +   /* Ignore fences depending on the sync mode */
> > +   switch (mode) {
> > +   case AMDGPU_SYNC_ALWAYS:
> > +   return true;
> > +
> > +   case AMDGPU_SYNC_NE_OWNER:
> > +   if (amdgpu_sync_same_dev(adev, f) &&
> > +   fence_owner == owner)
> > +   return false;
> > +   break;
> > +
> > +   case AMDGPU_SYNC_EQ_OWNER:
> > +   if (amdgpu_sync_same_dev(adev, f) &&
> > +   fence_owner != owner)
> > +   return false;
> > +   break;
> > +
> > +   case AMDGPU_SYNC_EXPLICIT:
> > +   return false;
> > +   }
> > +
> > +   WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > +"Adding eviction fence to sync obj");
> > +   return true;
> > +}
> > +
> >   /**
> >* amdgpu_sync_resv - sync to a reservation object
> >*
> > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, 
> > struct amdgpu_sync *sync,
> > /* always sync to the exclusive fence */
> > f = dma_resv_excl_fence(resv);
> > -   r = amdgpu_sync_fence(sync, f);
> > +   dma_fence_chain_for_each(f, f) {
> > +   struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > +
> > +   if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > +  chain->fence : f)) {
> > +   r = amdgpu_sync_fence(sync, f);
> > +   dma_fence_put(f);
> > +   if (r)
> > +   return r;
> > +   break;
> > +   }
> > +   }
> > flist = dma_resv_shared_list(resv);
> > -   if (!flist || r)
> > -   return r;
> > +   if (!flist)
> > +   return 0;
> > for (i = 0; i < flist->shared_count; ++i) {
> > -   void *fence_owner;
> > -
> > f = rcu_dereference_protected(flist->shared[i],
> >   dma_resv_held(resv));
> > -   fence_owner = amdgpu_sync_get_owner(f);
> > -
> > -   /* Always sync to moves, no matter what */
> > -   if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > +   if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > r = amdgpu_sync_fence(sync, f);
> > if (r)
> > -   break;
> > -   }
> > -
> > -   /* We only want to trigger KFD eviction fences on
> > -* evict or move jobs. Skip KFD fences otherwise.
> > -*/
> > -   if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > -   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > -   continue;
> > -
> > -   /* Never sync to VM updates either. */
> > -   if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > -   owner !=

Re: drm/i915: __GFP_RETRY_MAYFAIL allocations in stable kernels

2021-06-17 Thread Daniel Vetter

On Mon, Jun 14, 2021 at 09:45:37PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> We are observing some user-space crashes (sigabort, segfaults etc.)
> under moderate memory pressure (pretty far from severe pressure) which
> have one thing in common - restrictive GFP mask in setup_scratch_page().
> 
> For instance, (stable 4.19) drivers/gpu/drm/i915/i915_gem_gtt.c
> 
> (trimmed down version)
> 
> static int gen8_init_scratch(struct i915_address_space *vm)
> {
> setup_scratch_page(vm, __GFP_HIGHMEM);
> 
> vm->scratch_pt = alloc_pt(vm);
> vm->scratch_pd = alloc_pd(vm);
> if (use_4lvl(vm)) {
> vm->scratch_pdp = alloc_pdp(vm);
> }
> }
> 
> gen8_init_scratch() function puts a rather inconsistent restrictions on mm.
> 
> Looking at it line by line:
> 
> setup_scratch_page() uses very restrictive gfp mask:
>   __GFP_HIGHMEM | __GFP_ZERO | __GFP_RETRY_MAYFAIL
> 
> it doesn't try to reclaim anything and fails almost immediately.
> 
> alloc_pt() - uses more permissive gfp mask:
>   GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN
> 
> alloc_pd() - likewise:
>   GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN
> 
> alloc_pdp() - very permissive gfp mask:
>   GFP_KERNEL
> 
> 
> So can all allocations in gen8_init_scratch() use
>   GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN

Yeah that looks all fairly broken tbh. The only thing I didn't know was
that GFP_DMA32 wasn't a full gfp mask with reclaim bits set as needed. I
guess it would be clearer if we use GFP_KERNEL | __GFP_DMA32 for these.

The commit that introduced a lot of this, including I915_GFP_ALLOW_FAIL
seems to be

commit 1abb70f5955d1a9021f96359a2c6502ca569b68d
Author: Chris Wilson 
Date:   Tue May 22 09:36:43 2018 +0100

drm/i915/gtt: Allow pagedirectory allocations to fail

which used a selftest as justification, not real world workloads, so looks
rather dubious.

Adding Matt Auld to this thread, maybe he has ideas.

Thanks, Daniel

> ?
> 
> E.g.
> 
> ---
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index a12430187108..e862680b9c93 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -792,7 +792,7 @@ alloc_pdp(struct i915_address_space *vm)
>  
> GEM_BUG_ON(!use_4lvl(vm));
>  
> -   pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
> +   pdp = kzalloc(sizeof(*pdp), I915_GFP_ALLOW_FAIL);
> if (!pdp)
> return ERR_PTR(-ENOMEM);
>  
> @@ -1262,7 +1262,7 @@ static int gen8_init_scratch(struct i915_address_space 
> *vm)
>  {
> int ret;
>  
> -   ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +   ret = setup_scratch_page(vm, GFP_KERNEL | __GFP_HIGHMEM);
> if (ret)
> return ret;
>  
> @@ -1972,7 +1972,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt 
> *ppgtt)
> u32 pde;
> int ret;
>  
> -   ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +   ret = setup_scratch_page(vm, GFP_KERNEL | __GFP_HIGHMEM);
> if (ret)
> return ret;
>  
> @@ -3078,7 +3078,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, 
> u64 size)
> return -ENOMEM;
> }
>  
> -   ret = setup_scratch_page(>vm, GFP_DMA32);
> +   ret = setup_scratch_page(>vm, GFP_KERNEL | GFP_DMA32);
> if (ret) {
> DRM_ERROR("Scratch setup failed\n");
> /* iounmap will also get called at remove, but meh */
> ---
> 
> 
> 
> It's quite similar on stable 5.4 - setup_scratch_page() uses restrictive
> gfp mask again.
> 
> ---
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index f614646ed3f9..99d78b1052df 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1378,7 +1378,7 @@ static int gen8_init_scratch(struct i915_address_space 
> *vm)
> return 0;
> }
>  
> -   ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +   ret = setup_scratch_page(vm, GFP_KERNEL | __GFP_HIGHMEM);
> if (ret)
> return ret;
>  
> @@ -1753,7 +1753,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt 
> *ppgtt)
> struct i915_page_directory * const pd = ppgtt->base.pd;
> int ret;
>  
> -   ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> +   ret = setup_scratch_page(vm, GFP_KERNEL | __GFP_HIGHMEM);
> if (ret)
> return ret;
>  
> @@ -2860,7 +2860,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, 
> u64 size)
> return -ENOMEM;
> }
>  
> -   ret = setup_scratch_page(>vm, GFP_DMA32);
> +   ret = setup_scratch_page(>vm, GFP_KERNEL | GFP_DMA32);
> if (ret) {
> DRM_ERROR("Scratch setup failed\n");
> /* iounmap will also get called at remove, but meh */
> ---

-- 
Daniel Vetter
Software Engineer, Intel Corporation

Re: [PATCH] drm/i915: allow DG1 autoprobe for CONFIG_BROKEN

2021-06-17 Thread Daniel Vetter

On Wed, Jun 16, 2021 at 03:29:26PM +0100, Matthew Auld wrote:
> On Mon, 14 Jun 2021 at 10:22, Matthew Auld  wrote:
> >
> > Purely for CI so we can get some pre-merge results for DG1. This is
> > especially useful for cross driver TTM changes where CI can hopefully
> > catch regressions. This is similar to how we already handle the DG1
> > specific uAPI, which are also hidden behind CONFIG_BROKEN.
> >
> > Signed-off-by: Matthew Auld 
> > Cc: Thomas Hellström 
> > Cc: Daniel Vetter 
> > Cc: Dave Airlie 
> 
> Daniel, any objections to landing this?

I think stuffing this into topic/core-for-CI is fine, lets wait a bit more
until mesa and everything is ready with adding the pciids to an official
tree.

(Catching up on mails, apologies and all that).
-Daniel

> 
> > ---
> >  drivers/gpu/drm/i915/i915_pci.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c 
> > b/drivers/gpu/drm/i915/i915_pci.c
> > index 83b500bb170c..78742157aaa3 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1040,6 +1040,9 @@ static const struct pci_device_id pciidlist[] = {
> > INTEL_RKL_IDS(_info),
> > INTEL_ADLS_IDS(_s_info),
> > INTEL_ADLP_IDS(_p_info),
> > +#if IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM)
> > +   INTEL_DG1_IDS(_info),
> > +#endif
> > {0, 0, 0}
> >  };
> >  MODULE_DEVICE_TABLE(pci, pciidlist);
> > --
> > 2.26.3
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v2] drm/i915: Document the Virtual Engine uAPI

2021-06-17 Thread Daniel Vetter

On Mon, Jun 14, 2021 at 10:09:59AM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> A little bit of documentation covering the topics of engine discovery,
> context engine maps and virtual engines. It is not very detailed but
> supposed to be a starting point of giving a brief high level overview of
> general principles and intended use cases.
> 
> v2:
>  * Have the text in uapi header and link from there.
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Daniel Vetter 

What I meant was the kerneldoc directly as kerneldoc for the uapi structs,
like Matt has done for e.g. drm_i915_gem_create_ext_memory_regions.

But then I also realized that Matt hasn't set up the include for this, so
it's not automatic at all yet :-/
-Daniel

> ---
>  Documentation/gpu/i915.rst  |  18 
>  include/uapi/drm/i915_drm.h | 188 
>  2 files changed, 206 insertions(+)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 42ce0196930a..00aa55bbe0fd 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -335,6 +335,24 @@ for execution also include a list of all locations 
> within buffers that
>  refer to GPU-addresses so that the kernel can edit the buffer correctly.
>  This process is dubbed relocation.
>  
> +Engine Discovery uAPI
> +-
> +
> +.. kernel-doc:: include/uapi/drm/i915_drm.h
> +   :doc: Engine Discovery uAPI
> +
> +Context Engine Map uAPI
> +---
> +
> +.. kernel-doc:: include/uapi/drm/i915_drm.h
> +   :doc: Context Engine Map uAPI
> +
> +Virtual Engine uAPI
> +---
> +
> +.. kernel-doc:: include/uapi/drm/i915_drm.h
> +   :doc: Virtual Engine uAPI
> +
>  Locking Guidelines
>  --
>  
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a1cb4aa035a9..2f70c48567c0 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1806,6 +1806,69 @@ struct drm_i915_gem_context_param_sseu {
>   __u32 rsvd;
>  };
>  
> +/**
> + * DOC: Virtual Engine uAPI
> + *
> + * Virtual engine is a concept where userspace is able to configure a set of
> + * physical engines, submit a batch buffer, and let the driver execute it on 
> any
> + * engine from the set as it sees fit.
> + *
> + * This is primarily useful on parts which have multiple instances of a same
> + * class engine, like for example GT3+ Skylake parts with their two VCS 
> engines.
> + *
> + * For instance userspace can enumerate all engines of a certain class using 
> the
> + * previously described `Engine Discovery uAPI`_. After that userspace can
> + * create a GEM context with a placeholder slot for the virtual engine (using
> + * `I915_ENGINE_CLASS_INVALID` and `I915_ENGINE_CLASS_INVALID_NONE` for class
> + * and instance respectively) and finally using the
> + * `I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE` extension place a virtual engine 
> in
> + * the same reserved slot.
> + *
> + * Example of creating a virtual engine and submitting a batch buffer to it:
> + *
> + * .. code-block:: C
> + *
> + *   I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(virtual, 2) = {
> + *   .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
> + *   .engine_index = 0, // Place this virtual engine into engine map 
> slot 0
> + *   .num_siblings = 2,
> + *   .engines = { { I915_ENGINE_CLASS_VIDEO, 0 },
> + *{ I915_ENGINE_CLASS_VIDEO, 1 }, },
> + *   };
> + *   I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 1) = {
> + *   .engines = { { I915_ENGINE_CLASS_INVALID,
> + *  I915_ENGINE_CLASS_INVALID_NONE } },
> + *   .extensions = to_user_pointer(), // Chains after 
> load_balance extension
> + *   };
> + *   struct drm_i915_gem_context_create_ext_setparam p_engines = {
> + *   .base = {
> + *   .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
> + *   },
> + *   .param = {
> + *   .param = I915_CONTEXT_PARAM_ENGINES,
> + *   .value = to_user_pointer(),
> + *   .size = sizeof(engines),
> + *   },
> + *   };
> + *   struct drm_i915_gem_context_create_ext create = {
> + *   .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
> + *   .extensions = to_user_pointer(_engines);
> + *   };
> + *
> + *   ctx_id = gem_context_create_ext(drm_fd, );
> + *
> + *   // Now we have created a GEM context with its engine map containing a
> + *   // single virtual engine. Submissions to this slot can go either to
> + *   // vcs0 or vcs1, depending on the load balancing algorithm used inside
> + *   // the driver. The load balancing is dynamic from one batch buffer to
> + *   // another and transparent to userspace.
> + *
> + *   ...
> + *   execbuf.rsvd1 = ctx_id;
> + *   execbuf.flags = 0; // Submits to index 0 which is the virtual engine
> + *   gem_execbuf(drm_fd, );
> + */
> +
>  /*
>   *

Re: [PATCH v2 2/2] drm: Protect drm_master pointers in drm_lease.c

2021-06-17 Thread Daniel Vetter

On Tue, Jun 15, 2021 at 10:36:45AM +0800, Desmond Cheong Zhi Xi wrote:
> This patch ensures that the device's master mutex is acquired before
> accessing pointers to struct drm_master that are subsequently
> dereferenced. Without the mutex, the struct drm_master may be freed
> concurrently by another process calling drm_setmaster_ioctl(). This
> could then lead to use-after-free errors.
> 
> Reported-by: Daniel Vetter 
> Signed-off-by: Desmond Cheong Zhi Xi 
> Reviewed-by: Emil Velikov 
> ---
>  drivers/gpu/drm/drm_lease.c | 58 +++--
>  1 file changed, 43 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c
> index da4f085fc09e..3e6f689236e5 100644
> --- a/drivers/gpu/drm/drm_lease.c
> +++ b/drivers/gpu/drm/drm_lease.c
> @@ -107,10 +107,16 @@ static bool _drm_has_leased(struct drm_master *master, 
> int id)
>   */
>  bool _drm_lease_held(struct drm_file *file_priv, int id)
>  {
> + bool ret;
> +
>   if (!file_priv || !file_priv->master)
>   return true;
>  
> - return _drm_lease_held_master(file_priv->master, id);
> + mutex_lock(_priv->master->dev->master_mutex);

So maybe we have a bug somewhere, and the kerneldoc isn't 100% clear, but
I thought file_priv->master is invariant over the lifetime of file_priv.
So we don't need a lock to check anything here.

It's the drm_device->master derefence that gets us into trouble. Well also
file_priv->is_owner is protected by dev->master_mutex.

So I think with your previous patch all the access here in drm_lease.c is
ok and already protected? Or am I missing something?

Thanks, Daniel


> + ret = _drm_lease_held_master(file_priv->master, id);
> + mutex_unlock(_priv->master->dev->master_mutex);
> +
> + return ret;
>  }
>  
>  /**
> @@ -132,10 +138,12 @@ bool drm_lease_held(struct drm_file *file_priv, int id)
>   if (!file_priv || !file_priv->master || !file_priv->master->lessor)
>   return true;
>  
> + mutex_lock(_priv->master->dev->master_mutex);
>   master = file_priv->master;
>   mutex_lock(>dev->mode_config.idr_mutex);
>   ret = _drm_lease_held_master(master, id);
>   mutex_unlock(>dev->mode_config.idr_mutex);
> + mutex_unlock(_priv->master->dev->master_mutex);
>   return ret;
>  }
>  
> @@ -158,6 +166,7 @@ uint32_t drm_lease_filter_crtcs(struct drm_file 
> *file_priv, uint32_t crtcs_in)
>   if (!file_priv || !file_priv->master || !file_priv->master->lessor)
>   return crtcs_in;
>  
> + mutex_lock(_priv->master->dev->master_mutex);
>   master = file_priv->master;
>   dev = master->dev;
>  
> @@ -177,6 +186,7 @@ uint32_t drm_lease_filter_crtcs(struct drm_file 
> *file_priv, uint32_t crtcs_in)
>   count_in++;
>   }
>   mutex_unlock(>dev->mode_config.idr_mutex);
> + mutex_unlock(_priv->master->dev->master_mutex);
>   return crtcs_out;
>  }
>  
> @@ -490,7 +500,7 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
>   size_t object_count;
>   int ret = 0;
>   struct idr leases;
> - struct drm_master *lessor = lessor_priv->master;
> + struct drm_master *lessor;
>   struct drm_master *lessee = NULL;
>   struct file *lessee_file = NULL;
>   struct file *lessor_file = lessor_priv->filp;
> @@ -502,12 +512,6 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
>   if (!drm_core_check_feature(dev, DRIVER_MODESET))
>   return -EOPNOTSUPP;
>  
> - /* Do not allow sub-leases */
> - if (lessor->lessor) {
> - DRM_DEBUG_LEASE("recursive leasing not allowed\n");
> - return -EINVAL;
> - }
> -
>   /* need some objects */
>   if (cl->object_count == 0) {
>   DRM_DEBUG_LEASE("no objects in lease\n");
> @@ -519,12 +523,23 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
>   return -EINVAL;
>   }
>  
> + mutex_lock(>master_mutex);
> + lessor = lessor_priv->master;
> + /* Do not allow sub-leases */
> + if (lessor->lessor) {
> + DRM_DEBUG_LEASE("recursive leasing not allowed\n");
> + ret = -EINVAL;
> + goto unlock;
> + }
> +
>   object_count = cl->object_count;
>  
>   object_ids = memdup_user(u64_to_user_ptr(cl->object_ids),
>   array_size(object_count, sizeof(__u32)));
> - if (IS_ERR(object_ids))
> - return PTR_ERR(object_ids);
> + if (IS_ERR(object_ids)) {
> + ret = PTR_ERR(object_ids);
> + goto unlock;
> + }
>  
>   idr_init();
>  
> @@ -535,14 +550,15 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
>   if (ret) {
>   DRM_DEBUG_LEASE("lease object lookup failed: %i\n", ret);
>   idr_destroy();
> - return ret;
> + goto unlock;
>   }
>  
>   /* Allocate a file descriptor for the lease */
>   fd =

Re: [PATCH v2 1/2] drm: Add a locked version of drm_is_current_master

2021-06-17 Thread Daniel Vetter

On Tue, Jun 15, 2021 at 10:36:44AM +0800, Desmond Cheong Zhi Xi wrote:
> While checking the master status of the DRM file in
> drm_is_current_master(), the device's master mutex should be
> held. Without the mutex, the pointer fpriv->master may be freed
> concurrently by another process calling drm_setmaster_ioctl(). This
> could lead to use-after-free errors when the pointer is subsequently
> dereferenced in drm_lease_owner().
> 
> The callers of drm_is_current_master() from drm_auth.c hold the
> device's master mutex, but external callers do not. Hence, we implement
> drm_is_current_master_locked() to be used within drm_auth.c, and
> modify drm_is_current_master() to grab the device's master mutex
> before checking the master status.
> 
> Reported-by: Daniel Vetter 
> Signed-off-by: Desmond Cheong Zhi Xi 
> Reviewed-by: Emil Velikov 
> ---
>  drivers/gpu/drm/drm_auth.c | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index 232abbba3686..c6bf52c310a9 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -61,6 +61,8 @@
>   * trusted clients.
>   */
>  
> +static bool drm_is_current_master_locked(struct drm_file *fpriv);

A bit a bikeshed, but we try to avoid forward declarations when they're
not needed. If you don't want to tear apart drm_is_current_master and the
_locked version then just move them together.

Can you pls do that and respin?

Otherwise looks all great.
-Daniel


> +
>  int drm_getmagic(struct drm_device *dev, void *data, struct drm_file 
> *file_priv)
>  {
>   struct drm_auth *auth = data;
> @@ -223,7 +225,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void 
> *data,
>   if (ret)
>   goto out_unlock;
>  
> - if (drm_is_current_master(file_priv))
> + if (drm_is_current_master_locked(file_priv))
>   goto out_unlock;
>  
>   if (dev->master) {
> @@ -272,7 +274,7 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void 
> *data,
>   if (ret)
>   goto out_unlock;
>  
> - if (!drm_is_current_master(file_priv)) {
> + if (!drm_is_current_master_locked(file_priv)) {
>   ret = -EINVAL;
>   goto out_unlock;
>   }
> @@ -321,7 +323,7 @@ void drm_master_release(struct drm_file *file_priv)
>   if (file_priv->magic)
>   idr_remove(_priv->master->magic_map, file_priv->magic);
>  
> - if (!drm_is_current_master(file_priv))
> + if (!drm_is_current_master_locked(file_priv))
>   goto out;
>  
>   drm_legacy_lock_master_cleanup(dev, master);
> @@ -342,6 +344,13 @@ void drm_master_release(struct drm_file *file_priv)
>   mutex_unlock(>master_mutex);
>  }
>  
> +static bool drm_is_current_master_locked(struct drm_file *fpriv)
> +{
> + lockdep_assert_held_once(>master->dev->master_mutex);
> +
> + return fpriv->is_master && drm_lease_owner(fpriv->master) == 
> fpriv->minor->dev->master;
> +}
> +
>  /**
>   * drm_is_current_master - checks whether @priv is the current master
>   * @fpriv: DRM file private
> @@ -354,7 +363,13 @@ void drm_master_release(struct drm_file *file_priv)
>   */
>  bool drm_is_current_master(struct drm_file *fpriv)
>  {
> - return fpriv->is_master && drm_lease_owner(fpriv->master) == 
> fpriv->minor->dev->master;
> + bool ret;
> +
> + mutex_lock(>master->dev->master_mutex);
> + ret = drm_is_current_master_locked(fpriv);
> + mutex_unlock(>master->dev->master_mutex);
> +
> + return ret;
>  }
>  EXPORT_SYMBOL(drm_is_current_master);
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Intel-gfx] [PATCH 0/2] GuC submission / DRM scheduler integration plan + new uAPI

2021-06-17 Thread Daniel Vetter

On Fri, Jun 11, 2021 at 04:40:42PM -0700, Matthew Brost wrote:
> Subject and patches say it all.
> 
> v2: Address comments, patches have details of changes
> v3: Address comments, patches have details of changes
> v4: Address comments, patches have details of changes
> 
> Signed-off-by: Matthew Brost 

Imo ready (well overdue) for merging, please annoy Carl or someone from
media for an ack and then ask John or Daniele to merge it into
drm-intel-gt-next.
-Daniel

> 
> Matthew Brost (2):
>   drm/doc/rfc: i915 GuC submission / DRM scheduler
>   drm/doc/rfc: i915 new parallel submission uAPI plan
> 
>  Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++
>  Documentation/gpu/rfc/i915_scheduler.rst  | 148 ++
>  Documentation/gpu/rfc/index.rst   |   4 +
>  3 files changed, 269 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
>  create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst
> 
> -- 
> 2.28.0
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 1/5] dma-buf: fix dma_resv_test_signaled test_all handling

2021-06-17 Thread Daniel Vetter

On Mon, Jun 14, 2021 at 07:15:44PM +0200, Christian König wrote:
> Am 11.06.21 um 16:55 schrieb Daniel Vetter:
> > On Fri, Jun 11, 2021 at 04:53:11PM +0200, Christian König wrote:
> > > 
> > > Am 11.06.21 um 16:47 schrieb Daniel Vetter:
> > > > On Fri, Jun 11, 2021 at 02:02:57PM +0200, Christian König wrote:
> > > > > As the name implies if testing all fences is requested we
> > > > > should indeed test all fences and not skip the exclusive
> > > > > one because we see shared ones.
> > > > > 
> > > > > Signed-off-by: Christian König 
> > > > Hm I thought we've had the rule that when both fences exist, then
> > > > collectively the shared ones must signale no earlier than the exclusive
> > > > one.
> > > > 
> > > > That's at least the contract we've implemented in dma_resv.h. But I've
> > > > also found a bunch of drivers who are a lot more yolo on this.
> > > > 
> > > > I think there's a solid case here to just always take all the fences if 
> > > > we
> > > > ask for all the shared ones, but if we go that way then I'd say
> > > > - clear kerneldoc patch to really hammer this in (currently we're not 
> > > > good
> > > > at all in this regard)
> > > > - going through drivers a bit to check for this (I have some of that 
> > > > done
> > > > already in my earlier series, need to respin it and send it out)
> > > > 
> > > > But I'm kinda not seeing why this needs to be in this patch series here.
> > > You mentioned that this is a problem in the last patch and if you ask me
> > > that's just a bug or at least very inconsistent.
> > > 
> > > See dma_resv_wait_timeout() always waits for all fences, including the
> > > exclusive one even if shared ones are present. But 
> > > dma_resv_test_signaled()
> > > ignores the exclusive one if shared ones are present.
> > Hm the only one I thought I've mentioned is that dma_buf_poll doesn't use
> > dma_fence_get_rcu_safe where I think it should. Different problem. I think
> > this is one you spotted.
> > 
> > > The only other driver I could find trying to make use of this is nouveau 
> > > and
> > > I already provided a fix for this as well.
> > i915 also does this, and I think I've found a few more.
> > 
> > > I just think that this is the more defensive approach to fix this and have
> > > at least the core functions consistent on the handling.
> > Oh fully agree, it's just current dma_resv docs aren't the greatest, and
> > hacking on semantics without updating the docs isn't great. Especially
> > when it's ad-hoc.
> 
> Well when the requirement that shared fences should always signal after the
> exclusive fence is not documented anywhere then I would say that it is
> naturally allowed to just add any fence to the list of shared fence and any
> code assuming something else is just broken and need fixing.

That's not what I meant. I thought the rule is that the shared fences
_together_ need to signal after the exclusive ones. Not each individual
one.

This means that if you have both exclusive  fences and shared fences, and
you want to wait for just the shared fences, then you can ignore the
exclusive ones.

You have a patch series floating around which "fixes" this, but I think
it's imcomplete. And I'm pretty sure it's a change of defacto rules, since
not obeying this breaks a bunch of existing code (as you've noticed).
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > ---
> > > > >drivers/dma-buf/dma-resv.c | 33 -
> > > > >1 file changed, 12 insertions(+), 21 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> > > > > index f26c71747d43..c66bfdde9454 100644
> > > > > --- a/drivers/dma-buf/dma-resv.c
> > > > > +++ b/drivers/dma-buf/dma-resv.c
> > > > > @@ -615,25 +615,21 @@ static inline int 
> > > > > dma_resv_test_signaled_single(struct dma_fence *passed_fence)
> > > > > */
> > > > >bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all)
> > > > >{
> > > > > - unsigned int seq, shared_count;
> > > > > + struct dma_fence *fence;
> > > > > + unsigned int seq;
> > > > >   int ret;
> > > > >   rcu_read_lock();
> > > > >retry:
> > > > >   ret = true;
> > > > > - shared_count = 0;
> > > > >   seq = read_seqcount_begin(>seq);
> > > > >   if (test_all) {
> > > > >   struct dma_resv_list *fobj = dma_resv_shared_list(obj);
> > > > > - unsigned int i;
> > > > > -
> > > > > - if (fobj)
> > > > > - shared_count = fobj->shared_count;
> > > > > + unsigned int i, shared_count;
> > > > > + shared_count = fobj ? fobj->shared_count : 0;
> > > > >   for (i = 0; i < shared_count; ++i) {
> > > > > - struct dma_fence *fence;
> > > > > -
> > > > >   fence = rcu_dereference(fobj->shared[i]);
> > > > >   ret =

Re: [PATCH v2 2/2] drm/bridge: ti-sn65dsi86: Implement the pwm_chip

2021-06-17 Thread Uwe Kleine-König

Hello Bjorn,

On Thu, Jun 17, 2021 at 11:38:26AM -0500, Bjorn Andersson wrote:
> On Thu 17 Jun 01:24 CDT 2021, Uwe Kleine-K?nig wrote:
> > On Wed, Jun 16, 2021 at 10:22:17PM -0500, Bjorn Andersson wrote:
> > > > > +static int ti_sn_pwm_apply(struct pwm_chip *chip, struct pwm_device 
> > > > > *pwm,
> > > > > +const struct pwm_state *state)
> > > > > +{
> > > > > + struct ti_sn65dsi86 *pdata = pwm_chip_to_ti_sn_bridge(chip);
> > > > > + unsigned int pwm_en_inv;
> > > > > + unsigned int backlight;
> > > > > + unsigned int pre_div;
> > > > > + unsigned int scale;
> > > > > + int ret;
> > > > > +
> > > > > + if (!pdata->pwm_enabled) {
> > > > > + ret = pm_runtime_get_sync(pdata->dev);
> > > > > + if (ret < 0)
> > > > > + return ret;
> > > > > +
> > > > > + ret = regmap_update_bits(pdata->regmap, 
> > > > > SN_GPIO_CTRL_REG,
> > > > > + SN_GPIO_MUX_MASK << (2 * 
> > > > > SN_PWM_GPIO_IDX),
> > > > > + SN_GPIO_MUX_SPECIAL << (2 * 
> > > > > SN_PWM_GPIO_IDX));
> > > > > + if (ret) {
> > > > > + dev_err(pdata->dev, "failed to mux in PWM 
> > > > > function\n");
> > > > > + goto out;
> > > > > + }
> > > > 
> > > > Do you need to do this even if state->enabled is false?
> > > 
> > > I presume I should be able to explicitly mux in the GPIO function and
> > > configure that to output low. But I am not able to find anything in the
> > > data sheet that would indicate this to be preferred.
> > 
> > My question targetted a different case. If the PWM is off
> > (!pdata->pwm_enabled) and should remain off (state->enabled is false)
> > you can shortcut here, can you not?
> 
> Right, if we're going off->off then we don't need to touch the hardware.
> 
> But am I expected to -EINVAL improper period and duty cycle even though
> enabled is false?
> 
> And also, what is the supposed behavior of enabled = false? Is it
> supposedly equivalent of asking for a duty_cycle of 0?

In my book enabled = false is just syntactic sugar to say:
"duty_cycle=0, period=something small". So to answer your questions: if
enabled = false, the consumer doesn't really care about period and
duty_cycle. Some care that the output becomes inactive, some others
don't, so from my POV just emit the inactive level on the output and
ignore period and duty_cycle.

> > > > Does this already modify the output pin?
> > > 
> > > Yes, coming out of reset this pin is configured as input, so switching
> > > the mux here will effectively start driving the pin.
> > 
> > So please document this in the format the recently added drivers do,
> > too. See e.g. drivers/pwm/pwm-sifive.c. (The idea is to start that with
> > " * Limitations:" to make it easy to grep it.)
> > 
> 
> Okay, will do. Although I believe that for this driver it makes sense to
> place such comment close to this function, rather than at the top of the
> driver.

Yes, agreed.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature

Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()

2021-06-17 Thread Daniel Vetter

On Thu, Jun 17, 2021 at 09:41:35AM +0200, Christian König wrote:
> 
> 
> Am 16.06.21 um 21:19 schrieb Dan Carpenter:
> > On Wed, Jun 16, 2021 at 01:00:38PM +0200, Christian König wrote:
> > > 
> > > Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> > > > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
> > > > > Am 16.06.21 um 10:37 schrieb Dan Carpenter:
> > > > > > On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
> > > > > > > Sending the first message didn't worked, so let's try again.
> > > > > > > 
> > > > > > > Am 16.06.21 um 08:30 schrieb Dan Carpenter:
> > > > > > > > There are three bugs here:
> > > > > > > > 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
> > > > > > > > 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" 
> > > > > > > > assignment
> > > > > > > >was wrong and it was really assigning "new_mem = 
> > > > > > > > old_mem;".  There
> > > > > > > >is no need for this assignment anyway as we already have 
> > > > > > > > the value
> > > > > > > >for "new_mem".
> > > > > > > > 3) The (!new_man->use_tt) condition is reversed.
> > > > > > > > 
> > > > > > > > Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager 
> > > > > > > > subsystem.")
> > > > > > > > Signed-off-by: Dan Carpenter 
> > > > > > > > ---
> > > > > > > > This is from reading the code and I can't swear that I have 
> > > > > > > > understood
> > > > > > > > it correctly.  My nouveau driver is currently unusable and this 
> > > > > > > > patch
> > > > > > > > has not helped.  But hopefully if I fix enough bugs eventually 
> > > > > > > > it will
> > > > > > > > start to work.
> > > > > > > Well NAK, the code previously looked quite well and you are 
> > > > > > > breaking it now.
> > > > > > > 
> > > > > > > What's the problem with nouveau?
> > > > > > > 
> > > > > > The new Firefox seems to excersize nouveau more than the old one so
> > > > > > when I start 10 firefox windows it just hangs the graphics.
> > > > > > 
> > > > > > I've added debug code and it seems like the problem is that
> > > > > > nv50_mem_new() is failing.
> > > > > Sounds like it is running out of memory to me.
> > > > > 
> > > > > Do you have a dmesg?
> > > > > 
> > > > At first there was a very straight forward use after free bug which I
> > > > fixed.
> > > > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
> > > > 
> > > > But now the use after free is gone the only thing in dmesg is:
> > > > "[TTM] Buffer eviction failed".  And I have some firmware missing.
> > > > 
> > > > [  205.489763] rfkill: input handler disabled
> > > > [  205.678292] nouveau :01:00.0: Direct firmware load for 
> > > > nouveau/nva8_fuc084 failed with error -2
> > > > [  205.678300] nouveau :01:00.0: Direct firmware load for 
> > > > nouveau/nva8_fuc084d failed with error -2
> > > > [  205.678302] nouveau :01:00.0: msvld: unable to load firmware data
> > > > [  205.678304] nouveau :01:00.0: msvld: init failed, -19
> > > > [  296.150632] [TTM] Buffer eviction failed
> > > > [  417.084265] [TTM] Buffer eviction failed
> > > > [  447.295961] [TTM] Buffer eviction failed
> > > > [  510.800231] [TTM] Buffer eviction failed
> > > > [  556.101384] [TTM] Buffer eviction failed
> > > > [  616.495790] [TTM] Buffer eviction failed
> > > > [  692.014007] [TTM] Buffer eviction failed
> > > > 
> > > > The eviction failed message only shows up a minute after the hang so it
> > > > seems more like a symptom than a root cause.
> > > Yeah, look at the timing. What happens is that the buffer eviction timed 
> > > out
> > > because the hardware is locked up.
> > > 
> > > No idea what that could be. It might not even be kernel related at all.
> > I don't think it's hardware related...  Using an old version of firefox
> > "fixes" the problem.  I downloaded the firmware so that's not the issue.
> > Here's the dmesg load info with the new firmware.
> 
> Oh, I was not suggesting a hardware problem.
> 
> The most likely cause is a software issue in userspace, e.g. wrong order of
> doing thing, doing things to fast without waiting etc...
> 
> There are tons of things how userspace can crash GPU hardware you can't
> prevent in the kernel. Especially sending an endless loop is well known as
> Turing's halting problems and not even theoretically solvable.
> 
> I suggest to start digging in userspace instead.

I guess nouveau doesn't have reset when the fences time out? That would at
least paper over this, plus it makes debugging the bug in mesa3 easier.

Also as Christian points out, because halting problem lack of tdr (timeoud
and device reset) is actually a security bug itself.
-Daniel

> 
> Christian.
> 
> > 
> > [1.412458] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 
> > [1.412527] AMD-Vi: AMD IOMMUv2 functionality not available on this 
> > system
> > [1.412710] nouveau :01:00.0: vgaarb: deactivate vga console
> > [1.417213] Console: switching to colour dummy device 80x25

Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Daniel Vetter

On Mon, Jun 14, 2021 at 09:25:44AM +0200, Christian König wrote:
> Am 11.06.21 um 17:18 schrieb Daniel Vetter:
> > On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
> > > Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> > > > On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> > > > > Unwrap a the explicit fence if it is a dma_fence_chain and
> > > > > sync to the first fence not matching the owner rules.
> > > > > 
> > > > > Signed-off-by: Christian König 
> > > > > ---
> > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 
> > > > > +--
> > > > >1 file changed, 68 insertions(+), 50 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > @@ -28,6 +28,8 @@
> > > > > *Christian König 
> > > > > */
> > > > > +#include 
> > > > > +
> > > > >#include "amdgpu.h"
> > > > >#include "amdgpu_trace.h"
> > > > >#include "amdgpu_amdkfd.h"
> > > > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync 
> > > > > *sync, struct dma_fence *fence)
> > > > >   return amdgpu_sync_fence(sync, fence);
> > > > >}
> > > > > +/* Determine based on the owner and mode if we should sync to a 
> > > > > fence or not */
> > > > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > > > +enum amdgpu_sync_mode mode,
> > > > > +void *owner, struct dma_fence *f)
> > > > > +{
> > > > > + void *fence_owner = amdgpu_sync_get_owner(f);
> > > > > +
> > > > > + /* Always sync to moves, no matter what */
> > > > > + if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > + return true;
> > > > > +
> > > > > + /* We only want to trigger KFD eviction fences on
> > > > > +  * evict or move jobs. Skip KFD fences otherwise.
> > > > > +  */
> > > > > + if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > + return false;
> > > > > +
> > > > > + /* Never sync to VM updates either. */
> > > > > + if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > + return false;
> > > > > +
> > > > > + /* Ignore fences depending on the sync mode */
> > > > > + switch (mode) {
> > > > > + case AMDGPU_SYNC_ALWAYS:
> > > > > + return true;
> > > > > +
> > > > > + case AMDGPU_SYNC_NE_OWNER:
> > > > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > > > + fence_owner == owner)
> > > > > + return false;
> > > > > + break;
> > > > > +
> > > > > + case AMDGPU_SYNC_EQ_OWNER:
> > > > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > > > + fence_owner != owner)
> > > > > + return false;
> > > > > + break;
> > > > > +
> > > > > + case AMDGPU_SYNC_EXPLICIT:
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > + WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > > > +  "Adding eviction fence to sync obj");
> > > > > + return true;
> > > > > +}
> > > > > +
> > > > >/**
> > > > > * amdgpu_sync_resv - sync to a reservation object
> > > > > *
> > > > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device 
> > > > > *adev, struct amdgpu_sync *sync,
> > > > >   /* always sync to the exclusive fence */
> > > > >   f = dma_resv_excl_fence(resv);
> > > > > - r = amdgpu_sync_fence(sync, f);
> > > > > + dma_fence_chain_for_each(f, f) {
> > > > Jason has some helper for deep-walking fence chains/arrays here I think.
> > > > Might want to look into that, so that we have some consistency in how we
> > > > pile up multiple exclusive fences.
> > > Well those helpers are not from Jason, but from me :)
> > > 
> > > But no, for now the deep inspection is not really helpful here since
> > > grabbing a reference to a certain chain node is what that makes the 
> > > handling
> > > easier and faster here.
> > > 
> > > Thinking more about it that should also make it possible for the garbage
> > > collection to kick in properly.
> > Hm this is tricky to reason about, but yeah with this here it's a true
> > chain, and you just need to connect them. But then if a buffer is on
> > multiple engines, collapsing things down occasionally might be useful.
> > 
> > But maybe we need to do that in the bigger rework where exclusive fences
> > are also just in the dma_fence_list with a "this is an exclusive one btw"
> > tag.
> > 
> > I think for the vk import case doing the deep scan makes more sense, it's
> > a once-per-frame thing, and there's a

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-17 Thread Daniel Vetter

On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:
> As long as we can figure out who touched to a certain sync object last that
> would indeed work, yes.

Don't you need to know who will touch it next, i.e. who is holding up your
fence? Or maybe I'm just again totally confused.
-Daniel

> 
> Christian.
> 
> Am 14.06.21 um 19:10 schrieb Marek Olšák:
> > The call to the hw scheduler has a limitation on the size of all
> > parameters combined. I think we can only pass a 32-bit sequence number
> > and a ~16-bit global (per-GPU) syncobj handle in one call and not much
> > else.
> > 
> > The syncobj handle can be an element index in a global (per-GPU) syncobj
> > table and it's read only for all processes with the exception of the
> > signal command. Syncobjs can either have per VMID write access flags for
> > the signal command (slow), or any process can write to any syncobjs and
> > only rely on the kernel checking the write log (fast).
> > 
> > In any case, we can execute the memory write in the queue engine and
> > only use the hw scheduler for logging, which would be perfect.
> > 
> > Marek
> > 
> > On Thu, Jun 10, 2021 at 12:33 PM Christian König
> >  > > wrote:
> > 
> > Hi guys,
> > 
> > maybe soften that a bit. Reading from the shared memory of the
> > user fence is ok for everybody. What we need to take more care of
> > is the writing side.
> > 
> > So my current thinking is that we allow read only access, but
> > writing a new sequence value needs to go through the scheduler/kernel.
> > 
> > So when the CPU wants to signal a timeline fence it needs to call
> > an IOCTL. When the GPU wants to signal the timeline fence it needs
> > to hand that of to the hardware scheduler.
> > 
> > If we lockup the kernel can check with the hardware who did the
> > last write and what value was written.
> > 
> > That together with an IOCTL to give out sequence number for
> > implicit sync to applications should be sufficient for the kernel
> > to track who is responsible if something bad happens.
> > 
> > In other words when the hardware says that the shader wrote stuff
> > like 0xdeadbeef 0x0 or 0x into memory we kill the process
> > who did that.
> > 
> > If the hardware says that seq - 1 was written fine, but seq is
> > missing then the kernel blames whoever was supposed to write seq.
> > 
> > Just pieping the write through a privileged instance should be
> > fine to make sure that we don't run into issues.
> > 
> > Christian.
> > 
> > Am 10.06.21 um 17:59 schrieb Marek Olšák:
> > > Hi Daniel,
> > > 
> > > We just talked about this whole topic internally and we came up
> > > to the conclusion that the hardware needs to understand sync
> > > object handles and have high-level wait and signal operations in
> > > the command stream. Sync objects will be backed by memory, but
> > > they won't be readable or writable by processes directly. The
> > > hardware will log all accesses to sync objects and will send the
> > > log to the kernel periodically. The kernel will identify
> > > malicious behavior.
> > > 
> > > Example of a hardware command stream:
> > > ...
> > > ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence
> > > number is assigned by the kernel
> > > Draw();
> > > ImplicitSyncSignalWhenDone(syncObjHandle);
> > > ...
> > > 
> > > I'm afraid we have no other choice because of the TLB
> > > invalidation overhead.
> > > 
> > > Marek
> > > 
> > > 
> > > On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter  > > > wrote:
> > > 
> > > On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König wrote:
> > > > Am 09.06.21 um 15:19 schrieb Daniel Vetter:
> > > > > [SNIP]
> > > > > > Yeah, we call this the lightweight and the heavyweight
> > > tlb flush.
> > > > > >
> > > > > > The lighweight can be used when you are sure that you
> > > don't have any of the
> > > > > > PTEs currently in flight in the 3D/DMA engine and you
> > > just need to
> > > > > > invalidate the TLB.
> > > > > >
> > > > > > The heavyweight must be used when you need to
> > > invalidate the TLB *AND* make
> > > > > > sure that no concurrently operation moves new stuff
> > > into the TLB.
> > > > > >
> > > > > > The problem is for this use case we have to use the
> > > heavyweight one.
> > > > > Just for my own curiosity: So the lightweight flush is
> > > only for in-between
> > > > > CS when you know access is idle? Or does that also not
> > > work if userspace
> > > > > has a CS on a dma engine going at the same time because
> > > the tlb aren't
> > > > > isolated enough between

Re: [Intel-gfx] [RFC PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-06-17 Thread Daniel Vetter

Sorry I'm behind on mails  ...

On Fri, Jun 11, 2021 at 12:50:29PM -0700, Matthew Brost wrote:
> On Fri, Jun 04, 2021 at 07:59:05PM +0200, Daniel Vetter wrote:
> > On Wed, May 26, 2021 at 04:33:57PM -0700, Matthew Brost wrote:
> > > Add entry for i915 new parallel submission uAPI plan.
> > > 
> > > v2:
> > >  (Daniel Vetter):
> > >   - Expand logical order explaination
> > >   - Add dummy header
> > >   - Only allow N BBs in execbuf IOCTL
> > >   - Configure parallel submission per slot not per gem context
> > > v3:
> > >  (Marcin Ślusarz):
> > >   - Lot's of typos / bad english fixed
> > >  (Tvrtko Ursulin):
> > >   - Consistent pseudo code, clean up wording in descriptions
> > > 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Tony Ye 
> > > CC: Carl Zhang 
> > > Cc: Daniel Vetter 
> > > Cc: Jason Ekstrand 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 145 ++
> > >  Documentation/gpu/rfc/i915_scheduler.rst  |  55 ++-
> > >  2 files changed, 198 insertions(+), 2 deletions(-)
> > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > 
> > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > new file mode 100644
> > > index ..20de206e3ab4
> > > --- /dev/null
> > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > @@ -0,0 +1,145 @@
> > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > i915_context_engines_parallel_submit */
> > > +
> > > +/*
> > > + * i915_context_engines_parallel_submit:
> > 
> > So the idea is to make these kerneldoc and pull them into the rfc section.
> > Then when we merge, move them to the real uapi section, like what Matt has
> > done for lmem.
> > 
> 
> Yep, will fix in next rev.
> 
> > > + *
> > > + * Setup a slot in the context engine map to allow multiple BBs to be 
> > > submitted
> > > + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on 
> > > the GPU
> > > + * in parallel. Multiple hardware contexts are created internally in the 
> > > i915
> > > + * run these BBs. Once a slot is configured for N BBs only N BBs can be
> > > + * submitted in each execbuf IOCTL and this is implicit behavior e.g. 
> > > The user
> > > + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL 
> > > know how
> > > + * many BBs there are based on the slots configuration. The N BBs are 
> > > the last N
> > > + * buffer objects for first N if I915_EXEC_BATCH_FIRST is set.
> > 
> > s/for/or/
> > 
> > > + *
> > > + * There are two currently defined ways to control the placement of the
> > > + * hardware contexts on physical engines: default behavior (no flags) and
> > > + * I915_PARALLEL_IMPLICIT_BONDS (a flag). More flags may be added the in 
> > > the
> > > + * future as new hardware / use cases arise. Details of how to use this
> > > + * interface above the flags field in this structure.
> > > + *
> > > + * Returns -EINVAL if hardware context placement configuration is 
> > > invalid or if
> > > + * the placement configuration isn't supported on the platform / 
> > > submission
> > > + * interface.
> > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > submission
> > > + * inteface.
> > > + */
> > > +struct i915_context_engines_parallel_submit {
> > > + struct i915_user_extension base;
> > > +
> > > + __u16 engine_index; /* slot for parallel engine */
> > 
> > Kernel doc here for the inline comments too.
> >
> 
> Yep.
>  
> > > + __u16 width;/* number of contexts per parallel engine */
> > > + __u16 num_siblings; /* number of siblings per context */
> > > + __u16 mbz16;
> > > +/*
> > > + * Default placement behavior (currently unsupported):
> > > + *
> > > + * Allow BBs to be placed on any available engine instance. In this case 
> > > each
> > > + * context's engine mask indicates where that context can be placed. It 
> > > is
> > > + * implied in this mode that all contexts have mutual exclusive 
> > > placement.
> > > + * e.g. If one context is running CSX[0] no other contexts can run on 
> > > CSX[0]).
> > > + *
> > > + * Example 1 pseudo code:
> > > + * CSX,Y[N] = generic engine class X or Y, logical instance N
> > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > + * set_engines(INVALID)
> > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > + *   engines=CSX[0],CSX[1],CSY[0],CSY[1])
> > > + *
> > > + * Results in the following valid placements:
> > > + * CSX[0], CSY[0]
> > > + * CSX[0], CSY[1]
> > > + * CSX[1], CSY[0]
> > > + * CSX[1], CSY[1]
> > > + *
> > > + * This can also be thought of as 2 virtual engines described by 2-D 
> > > array in
> > > + * the engines the field:
> > > + * VE[0] = CSX[0], CSX[1]
> > > + * VE[1] = CSY[0], CSY[1]
> > > + *
> > > + * Example 2 pseudo code:
> > > + * CSX[Y] = generic engine of same class X, logical instance N
> > > + * INVALID =

Re: [PATCH v2 2/2] drm/bridge: ti-sn65dsi86: Implement the pwm_chip

2021-06-17 Thread Bjorn Andersson

On Thu 17 Jun 01:24 CDT 2021, Uwe Kleine-K?nig wrote:

> Hello Bjorn,
> 
> On Wed, Jun 16, 2021 at 10:22:17PM -0500, Bjorn Andersson wrote:
> > > > +static int ti_sn_pwm_apply(struct pwm_chip *chip, struct pwm_device 
> > > > *pwm,
> > > > +  const struct pwm_state *state)
> > > > +{
> > > > +   struct ti_sn65dsi86 *pdata = pwm_chip_to_ti_sn_bridge(chip);
> > > > +   unsigned int pwm_en_inv;
> > > > +   unsigned int backlight;
> > > > +   unsigned int pre_div;
> > > > +   unsigned int scale;
> > > > +   int ret;
> > > > +
> > > > +   if (!pdata->pwm_enabled) {
> > > > +   ret = pm_runtime_get_sync(pdata->dev);
> > > > +   if (ret < 0)
> > > > +   return ret;
> > > > +
> > > > +   ret = regmap_update_bits(pdata->regmap, 
> > > > SN_GPIO_CTRL_REG,
> > > > +   SN_GPIO_MUX_MASK << (2 * 
> > > > SN_PWM_GPIO_IDX),
> > > > +   SN_GPIO_MUX_SPECIAL << (2 * 
> > > > SN_PWM_GPIO_IDX));
> > > > +   if (ret) {
> > > > +   dev_err(pdata->dev, "failed to mux in PWM 
> > > > function\n");
> > > > +   goto out;
> > > > +   }
> > > 
> > > Do you need to do this even if state->enabled is false?
> > 
> > I presume I should be able to explicitly mux in the GPIO function and
> > configure that to output low. But I am not able to find anything in the
> > data sheet that would indicate this to be preferred.
> 
> My question targetted a different case. If the PWM is off
> (!pdata->pwm_enabled) and should remain off (state->enabled is false)
> you can shortcut here, can you not?
> 

Right, if we're going off->off then we don't need to touch the hardware.

But am I expected to -EINVAL improper period and duty cycle even though
enabled is false?


And also, what is the supposed behavior of enabled = false? Is it
supposedly equivalent of asking for a duty_cycle of 0?

> > > Does this already modify the output pin?
> > 
> > Yes, coming out of reset this pin is configured as input, so switching
> > the mux here will effectively start driving the pin.
> 
> So please document this in the format the recently added drivers do,
> too. See e.g. drivers/pwm/pwm-sifive.c. (The idea is to start that with
> " * Limitations:" to make it easy to grep it.)
> 

Okay, will do. Although I believe that for this driver it makes sense to
place such comment close to this function, rather than at the top of the
driver.

> > > Lets continue the above example with the fixed calculation. So we have:
> > > 
> > >   pdata->pwm_refclk_freq = 334
> > >   state->period = 10 [ns]
> > >   state->duty_cycle = 600
> > >   scale = 332
> > > 
> > > so the actually emitted period = 99899.98002000399 ns
> > > 
> > > Now you calculate:
> > > 
> > >   backlight = 1
> > > 
> > > which yields an actual duty_cycle of 299.4 ns, with backlight = 2
> > > you would get an actual duty_cycle of 599.99988 ns, which is better. The
> > > culprit here is that you divide by state->period but instead should
> > > divide by the actual period.
> > 
> > What do I do about the case where the actual period is lower than the
> > requested one and thereby the duty cycle becomes larger than the period?
> 
> The general principle is: Pick the biggest possible duty_cycle available
> for the just picked period. So in your example you have to clamp it to
> period (assuming you can, otherwise pick the next lower possible value).
> 

Sounds good.

Thank you,
Bjorn

> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K.   | Uwe Kleine-König|
> Industrial Linux Solutions | https://www.pengutronix.de/ |

Re: [PATCH v4 1/3] dt-bindings: msm: dsi: add missing 7nm bindings

2021-06-17 Thread Rob Clark

On Thu, Jun 17, 2021 at 8:09 AM Jonathan Marek  wrote:
>
> These got lost when going from .txt to .yaml bindings, add them back.
>

Fixes: 8fc939e72ff8 ("dt-bindings: msm: dsi: add yaml schemas for DSI
PHY bindings")

> Signed-off-by: Jonathan Marek 
> ---
>  .../bindings/display/msm/dsi-phy-7nm.yaml | 66 +++
>  1 file changed, 66 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
>
> diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml 
> b/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
> new file mode 100644
> index ..c0077ca7e9e7
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
> @@ -0,0 +1,66 @@
> +# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/display/msm/dsi-phy-7nm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Qualcomm Display DSI 7nm PHY
> +
> +maintainers:
> +  - Jonathan Marek 
> +
> +allOf:
> +  - $ref: dsi-phy-common.yaml#
> +
> +properties:
> +  compatible:
> +oneOf:
> +  - const: qcom,dsi-phy-7nm
> +  - const: qcom,dsi-phy-7nm-8150
> +
> +  reg:
> +items:
> +  - description: dsi phy register set
> +  - description: dsi phy lane register set
> +  - description: dsi pll register set
> +
> +  reg-names:
> +items:
> +  - const: dsi_phy
> +  - const: dsi_phy_lane
> +  - const: dsi_pll
> +
> +  vdds-supply:
> +description: |
> +  Connected to VDD_A_DSI_PLL_0P9 pin (or VDDA_DSI{0,1}_PLL_0P9 for 
> sm8150)
> +
> +required:
> +  - compatible
> +  - reg
> +  - reg-names
> +  - vdds-supply
> +
> +unevaluatedProperties: false
> +
> +examples:
> +  - |
> + #include 
> + #include 
> +
> + dsi-phy@ae94400 {
> + compatible = "qcom,dsi-phy-7nm";
> + reg = <0x0ae94400 0x200>,
> +   <0x0ae94600 0x280>,
> +   <0x0ae94900 0x260>;
> + reg-names = "dsi_phy",
> + "dsi_phy_lane",
> + "dsi_pll";
> +
> + #clock-cells = <1>;
> + #phy-cells = <0>;
> +
> + vdds-supply = <_l5a_0p88>;
> + clocks = < DISP_CC_MDSS_AHB_CLK>,
> +  < RPMH_CXO_CLK>;
> + clock-names = "iface", "ref";
> + };
> --
> 2.26.1
>

Re: [PATCH v3 1/8] ext4/xfs: add page refcount helper

2021-06-17 Thread Darrick J. Wong

On Thu, Jun 17, 2021 at 10:16:58AM -0500, Alex Sierra wrote:
> From: Ralph Campbell 
> 
> There are several places where ZONE_DEVICE struct pages assume a reference
> count == 1 means the page is idle and free. Instead of open coding this,
> add a helper function to hide this detail.
> 
> v2:
> [AS]: rename dax_layout_is_idle_page func to dax_page_unused
> 
> Signed-off-by: Ralph Campbell 
> Signed-off-by: Alex Sierra 
> ---
>  fs/dax.c|  4 ++--
>  fs/ext4/inode.c |  5 +
>  fs/xfs/xfs_file.c   |  4 +---
>  include/linux/dax.h | 10 ++
>  4 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 26d5dcd2d69e..321f4ddc6643 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -358,7 +358,7 @@ static void dax_disassociate_entry(void *entry, struct 
> address_space *mapping,
>   for_each_mapped_pfn(entry, pfn) {
>   struct page *page = pfn_to_page(pfn);
>  
> - WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> + WARN_ON_ONCE(trunc && !dax_layout_is_idle_page(page));
>   WARN_ON_ONCE(page->mapping && page->mapping != mapping);
>   page->mapping = NULL;
>   page->index = 0;
> @@ -372,7 +372,7 @@ static struct page *dax_busy_page(void *entry)
>   for_each_mapped_pfn(entry, pfn) {
>   struct page *page = pfn_to_page(pfn);
>  
> - if (page_ref_count(page) > 1)
> + if (!dax_layout_is_idle_page(page))
>   return page;
>   }
>   return NULL;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c173c8405856..9ee00186412f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3972,10 +3972,7 @@ int ext4_break_layouts(struct inode *inode)
>   if (!page)
>   return 0;
>  
> - error = ___wait_var_event(>_refcount,
> - atomic_read(>_refcount) == 1,
> - TASK_INTERRUPTIBLE, 0, 0,
> - ext4_wait_dax_page(ei));
> + error = dax_wait_page(ei, page, ext4_wait_dax_page);
>   } while (error == 0);
>  
>   return error;
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5b0f93f73837..39565fe5f817 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -782,9 +782,7 @@ xfs_break_dax_layouts(
>   return 0;
>  
>   *retry = true;
> - return ___wait_var_event(>_refcount,
> - atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE,
> - 0, 0, xfs_wait_dax_page(inode));
> + return dax_wait_page(inode, page, xfs_wait_dax_page);

Mechanically, this looks like a straightforward replacement, so:
Acked-by: Darrick J. Wong 

--D

>  }
>  
>  int
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index b52f084aa643..8b5da1d60dbc 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -243,6 +243,16 @@ static inline bool dax_mapping(struct address_space 
> *mapping)
>   return mapping->host && IS_DAX(mapping->host);
>  }
>  
> +static inline bool dax_page_unused(struct page *page)
> +{
> + return page_ref_count(page) == 1;
> +}
> +
> +#define dax_wait_page(_inode, _page, _wait_cb)   
> \
> + ___wait_var_event(&(_page)->_refcount,  \
> + dax_page_unused(_page), \
> + TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode))
> +
>  #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
>  void hmem_register_device(int target_nid, struct resource *r);
>  #else
> -- 
> 2.17.1
>

Re: [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*

2021-06-17 Thread Sierra Guiza, Alejandro (Alex)




On 6/17/2021 10:16 AM, Alex Sierra wrote:

v1:
AMD is building a system architecture for the Frontier supercomputer with a
coherent interconnect between CPUs and GPUs. This hardware architecture allows
the CPUs to coherently access GPU device memory. We have hardware in our labs
and we are working with our partner HPE on the BIOS, firmware and software
for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver looks
it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the migrate_vma_*
helpers so we can support page-based migration in our unified memory 
allocations,
while also supporting CPU access to those pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
correctly in the migrate_vma_* helpers. We are looking for feedback about this
approach. If we're close, what's needed to make our patches acceptable upstream?
If we're not close, any suggestions how else to achieve what we are trying to do
(i.e. page migration and coherent CPU access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently 
upstreamed
to Dave Airlie's drm-next branch
https://lore.kernel.org/dri-devel/20210527205606.2660-6-felix.kuehl...@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc

Corrected link:

https://cgit.freedesktop.org/drm/drm/log/?h=drm-next

Regards,
Alex Sierra


On top of that we did some rework of our VRAM management for migrations to 
remove
some incorrect assumptions, allow partially successful migrations and GPU memory
mappings that mix pages in VRAM and system memory.
https://patchwork.kernel.org/project/dri-devel/list/?series=489811


Corrected link:

https://lore.kernel.org/dri-devel/20210527205606.2660-6-felix.kuehl...@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc

Regards,
Alex Sierra



v2:
This patch series version has merged "[RFC PATCH v3 0/2]
mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
Ralph Campbell. It also applies at the top of these series, our changes
to support device generic type in migration_vma helpers.
This has been tested in systems with device memory that has coherent
access by CPU.

Also addresses the following feedback made in v1:
- Isolate in one patch kernel/resource.c modification, based
on Christoph's feedback.
- Add helpers check for generic and private type to avoid
duplicated long lines.

v3:
- Include cover letter from v1
- Rename dax_layout_is_idle_page func to dax_page_unused in patch
ext4/xfs: add page refcount helper

Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches
Patches 4-5 are for context to show how we are looking up the SPM
memory and registering it with devmap.
Patches 3,6-8 are the changes we are trying to upstream or rework to
make them acceptable upstream.

Alex Sierra (6):
   kernel: resource: lookup_resource as exported symbol
   drm/amdkfd: add SPM support for SVM
   drm/amdkfd: generic type as sys mem on migration to ram
   include/linux/mm.h: helpers to check zone device generic type
   mm: add generic type support to migrate_vma helpers
   mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

Ralph Campbell (2):
   ext4/xfs: add page refcount helper
   mm: remove extra ZONE_DEVICE struct page refcount

  arch/powerpc/kvm/book3s_hv_uvmem.c   |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 --
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
  fs/dax.c |  8 +--
  fs/ext4/inode.c  |  5 +-
  fs/xfs/xfs_file.c|  4 +-
  include/linux/dax.h  | 10 
  include/linux/memremap.h |  7 +--
  include/linux/mm.h   | 52 +++---
  kernel/resource.c|  2 +-
  lib/test_hmm.c   |  2 +-
  mm/internal.h|  8 +++
  mm/memremap.c| 69 +++-
  mm/migrate.c | 13 ++---
  mm/page_alloc.c  |  3 ++
  mm/swap.c| 45 ++--
  16 files changed, 83 insertions(+), 164 deletions(-)

Re: [PATCH v4] Documentation: gpu: Mention the requirements for new properties

2021-06-17 Thread Philippe CORNU





On 6/16/21 4:38 PM, Maxime Ripard wrote:

New KMS properties come with a bunch of requirements to avoid each
driver from running their own, inconsistent, set of properties,
eventually leading to issues like property conflicts, inconsistencies
between drivers and semantics, etc.

Let's document what we expect.

Cc: Alexandre Belloni 
Cc: Alexandre Torgue 
Cc: Alex Deucher 
Cc: Alison Wang 
Cc: Alyssa Rosenzweig 
Cc: Andrew Jeffery 
Cc: Andrzej Hajda 
Cc: Anitha Chrisanthus 
Cc: Benjamin Gaignard 
Cc: Ben Skeggs 
Cc: Boris Brezillon 
Cc: Brian Starkey 
Cc: Chen Feng 
Cc: Chen-Yu Tsai 
Cc: Christian Gmeiner 
Cc: "Christian König" 
Cc: Chun-Kuang Hu 
Cc: Edmund Dea 
Cc: Eric Anholt 
Cc: Fabio Estevam 
Cc: Gerd Hoffmann 
Cc: Haneen Mohammed 
Cc: Hans de Goede 
Cc: "Heiko Stübner" 
Cc: Huang Rui 
Cc: Hyun Kwon 
Cc: Inki Dae 
Cc: Jani Nikula 
Cc: Jernej Skrabec 
Cc: Jerome Brunet 
Cc: Joel Stanley 
Cc: John Stultz 
Cc: Jonas Karlman 
Cc: Jonathan Hunter 
Cc: Joonas Lahtinen 
Cc: Joonyoung Shim 
Cc: Jyri Sarha 
Cc: Kevin Hilman 
Cc: Kieran Bingham 
Cc: Krzysztof Kozlowski 
Cc: Kyungmin Park 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Liviu Dudau 
Cc: Lucas Stach 
Cc: Ludovic Desroches 
Cc: Marek Vasut 
Cc: Martin Blumenstingl 
Cc: Matthias Brugger 
Cc: Maxime Coquelin 
Cc: Maxime Ripard 
Cc: Melissa Wen 
Cc: Neil Armstrong 
Cc: Nicolas Ferre 
Cc: "Noralf Trønnes" 
Cc: NXP Linux Team 
Cc: Oleksandr Andrushchenko 
Cc: Patrik Jakobsson 
Cc: Paul Cercueil 
Cc: Pekka Paalanen 
Cc: Pengutronix Kernel Team 
Cc: Philippe Cornu 
Cc: Philipp Zabel 
Cc: Qiang Yu 
Cc: Rob Clark 
Cc: Robert Foss 
Cc: Rob Herring 
Cc: Rodrigo Siqueira 
Cc: Rodrigo Vivi 
Cc: Roland Scheidegger 
Cc: Russell King 
Cc: Sam Ravnborg 
Cc: Sandy Huang 
Cc: Sascha Hauer 
Cc: Sean Paul 
Cc: Seung-Woo Kim 
Cc: Shawn Guo 
Cc: Simon Ser 
Cc: Stefan Agner 
Cc: Steven Price 
Cc: Sumit Semwal 
Cc: Thierry Reding 
Cc: Tian Tao 
Cc: Tomeu Vizoso 
Cc: Tomi Valkeinen 
Cc: VMware Graphics 
Cc: Xinliang Liu 
Cc: Xinwei Kong 
Cc: Yannick Fertre 
Cc: Zack Rusin 
Reviewed-by: Daniel Vetter 
Signed-off-by: Maxime Ripard 

---

Changes from v3:
   - Roll back to the v2
   - Add Simon and Pekka in Cc

Changes from v2:
   - Take into account the feedback from Laurent and Lidiu to no longer
 force generic properties, but prefix vendor-specific properties with
 the vendor name

Changes from v1:
   - Typos and wording reported by Daniel and Alex
---
  Documentation/gpu/drm-kms.rst | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
index 87e5023e3f55..c28b464dd397 100644
--- a/Documentation/gpu/drm-kms.rst
+++ b/Documentation/gpu/drm-kms.rst
@@ -463,6 +463,25 @@ KMS Properties
  This section of the documentation is primarily aimed at user-space developers.
  For the driver APIs, see the other sections.
  
+Requirements

+
+
+KMS drivers might need to add extra properties to support new features.
+Each new property introduced in a driver need to meet a few
+requirements, in addition to the one mentioned above.:
+
+- It must be standardized, with some documentation to describe how the
+  property can be used.
+
+- It must provide a generic helper in the core code to register that
+  property on the object it attaches to.
+
+- Its content must be decoded by the core and provided in the object's
+  associated state structure. That includes anything drivers might want to
+  precompute, like :c:type:`struct drm_clip_rect ` for planes.
+
+- An IGT test must be submitted where reasonable.
+
  Property Types and Blob Property Support
  
  



Hi,

Regarding properties, we have a “case study example” related in a 
certain way to this documentation update :-)


The use case: on a front desk at an exhibition, there is a welcome 
screen you can touch for searching various information. When this 
welcome screen is in idle, a small logo is displayed at its center 
(around 20% of the fullscreen). The logo has a white background color. 
We want to reduce the ddr usage for lowering the power (the board is 
battery powered) so the idea is to use a white background color around 
this logo, produced by the drm CRTC so the image in ddr is only the size 
of the logo.


Reading the thread 
https://lists.freedesktop.org/archives/dri-devel/2019-October/239733.html 
dissuade us from coding a generic solution, so we started to implement a 
"STM_" private background color property, it works... but we are not at 
all convince this is the right way and we clearly prefer 
mainline/generic sw for both kernel & userland.


So now, what are our options... well, this v4 documentation update is I 
think clear enough: we have to document + provide a generic helper in 
the core code (similar to the original patch) + update IGT test, right?


Thanks
Philippe :-)

Note: It is really a pleasure to read such interesting thread, exposing 
the “complexity” of our job,

RE: [Intel-gfx] [PATCH] drm/i915: Perform execbuffer object locking as a separate step

2021-06-17 Thread Tang, CQ



> -Original Message-
> From: Intel-gfx  On Behalf Of
> Thomas Hellström
> Sent: Tuesday, June 15, 2021 4:36 AM
> To: intel-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Thomas Hellström ; Auld, Matthew
> 
> Subject: [Intel-gfx] [PATCH] drm/i915: Perform execbuffer object locking as a
> separate step
> 
> To help avoid evicting already resident buffers from the batch we're
> processing, perform locking as a separate step.
> 
> Signed-off-by: Thomas Hellström 
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 25 --
> -
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 201fed19d120..394eb40c95b5 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -922,21 +922,38 @@ static int eb_lookup_vmas(struct i915_execbuffer
> *eb)
>   return err;
>  }
> 
> -static int eb_validate_vmas(struct i915_execbuffer *eb)
> +static int eb_lock_vmas(struct i915_execbuffer *eb)
>  {
>   unsigned int i;
>   int err;
> 
> - INIT_LIST_HEAD(>unbound);
> -
>   for (i = 0; i < eb->buffer_count; i++) {
> - struct drm_i915_gem_exec_object2 *entry = >exec[i];
>   struct eb_vma *ev = >vma[i];
>   struct i915_vma *vma = ev->vma;
> 
>   err = i915_gem_object_lock(vma->obj, >ww);
>   if (err)
>   return err;
> + }
> +
> + return 0;
> +}
> +
> +static int eb_validate_vmas(struct i915_execbuffer *eb) {
> + unsigned int i;
> + int err;
> +
> + INIT_LIST_HEAD(>unbound);
> +
> + err = eb_lock_vmas(eb);
> + if (err)
> + return err;
> +
> + for (i = 0; i < eb->buffer_count; i++) {
> + struct drm_i915_gem_exec_object2 *entry = >exec[i];
> + struct eb_vma *ev = >vma[i];
> + struct i915_vma *vma = ev->vma;
> 
>   err = eb_pin_vma(eb, entry, ev);
>   if (err == -EDEADLK)

Thomas, just checked eb_pin_vma(), it calls i915_vma_pin_ww(), if the object is 
already locked, under what condition these calls still return -EDEADLK?

--CQ

> --
> 2.31.1
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: vc4: hdmi: audio: ASoC: error at snd_soc_dai_startup on fef00700.hdmi

2021-06-17 Thread Maxime Ripard

Hi Stefan,

On Sat, Jun 12, 2021 at 12:04:08PM +0200, Stefan Wahren wrote:
> Hi Maxime,
> 
> Am 04.06.21 um 11:02 schrieb Maxime Ripard:
> > Hi Stefan,
> >
> > I would assume it's due to this:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/vc4/vc4_hdmi.c#n1083
> >
> > It pre-dates my time working on the vc4 driver so I'm not really sure
> > what this is supposed to prevent, but my guess is that it's there to
> > avoid someone using the audio card before we have a display detected and
> > connected, and its capabilities known (the first and more obvious one
> > being does it support audio in the first place).
> >
> > It's nothing new though, maybe it's the error printing itself that is?
> 
> i'm sorry, i forgot about this discussion here:
> 
> https://lists.freedesktop.org/archives/dri-devel/2020-December/292701.html

It looks like there's no discussion on that link, is it the link you wanted to 
paste?

Maxime


signature.asc
Description: PGP signature

[PATCH v3 8/8] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

2021-06-17 Thread Alex Sierra

Add MEMORY_DEVICE_GENERIC case to free_zone_device_page
callback.
Device generic type memory case is now able to free its
pages properly.

Signed-off-by: Alex Sierra 
---
 mm/memremap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memremap.c b/mm/memremap.c
index 614b3d600e95..6c884e2542a9 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -438,7 +438,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 EXPORT_SYMBOL_GPL(get_dev_pagemap);
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-static void free_device_private_page(struct page *page)
+static void free_device_page(struct page *page)
 {
 
__ClearPageWaiters(page);
@@ -477,7 +477,8 @@ void free_zone_device_page(struct page *page)
wake_up_var(>_refcount);
return;
case MEMORY_DEVICE_PRIVATE:
-   free_device_private_page(page);
+   case MEMORY_DEVICE_GENERIC:
+   free_device_page(page);
return;
default:
return;
-- 
2.17.1

[PATCH v3 7/8] mm: add generic type support to migrate_vma helpers

2021-06-17 Thread Alex Sierra

Device generic type case added for migrate_vma_pages and
migrate_vma_check_page helpers.
Both, generic and private device types have the same
conditions to decide to migrate pages from/to device
memory.

Signed-off-by: Alex Sierra 
---
 mm/migrate.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 8c2430d3e77b..3b6aaba96fe6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2602,7 +2602,7 @@ static bool migrate_vma_check_page(struct page *page)
 * FIXME proper solution is to rework migration_entry_wait() so
 * it does not need to take a reference on page.
 */
-   return is_device_private_page(page);
+   return is_device_page(page);
}
 
/* For file back page */
@@ -3064,10 +3064,10 @@ void migrate_vma_pages(struct migrate_vma *migrate)
mapping = page_mapping(page);
 
if (is_zone_device_page(newpage)) {
-   if (is_device_private_page(newpage)) {
+   if (is_device_page(newpage)) {
/*
-* For now only support private anonymous when
-* migrating to un-addressable device memory.
+* For now only support private and generic
+* anonymous when migrating to device memory.
 */
if (mapping) {
migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
-- 
2.17.1

[PATCH v3 6/8] include/linux/mm.h: helpers to check zone device generic type

2021-06-17 Thread Alex Sierra

Two helpers added. One checks if zone device page is generic
type. The other if page is either private or generic type.

Signed-off-by: Alex Sierra 
---
 include/linux/mm.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8d79bb94be8..f5b247a63044 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1125,6 +1125,14 @@ static inline bool is_device_private_page(const struct 
page *page)
page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
+static inline bool is_device_page(const struct page *page)
+{
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   is_zone_device_page(page) &&
+   (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
+page->pgmap->type == MEMORY_DEVICE_GENERIC);
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
-- 
2.17.1

[PATCH v3 5/8] drm/amdkfd: generic type as sys mem on migration to ram

2021-06-17 Thread Alex Sierra

Generic device type memory on VRAM to RAM migration,
has similar access as System RAM from the CPU. This flag sets
the source from the sender. Which in Generic type case,
should be set as SYSTEM.

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index f5939449a99f..7b41006c1164 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -653,8 +653,9 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.vma = vma;
migrate.start = start;
migrate.end = end;
-   migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
+   migrate.flags = adev->gmc.xgmi.connected_to_cpu ?
+   MIGRATE_VMA_SELECT_SYSTEM : 
MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 
size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
size *= npages;
-- 
2.17.1

[PATCH v3 4/8] drm/amdkfd: add SPM support for SVM

2021-06-17 Thread Alex Sierra

When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_GENERIC to create the device
page map region.

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index c8ca3252cbc2..f5939449a99f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -895,6 +895,7 @@ int svm_migrate_init(struct amdgpu_device *adev)
struct resource *res;
unsigned long size;
void *r;
+   bool xgmi_connected_to_cpu = adev->gmc.xgmi.connected_to_cpu;
 
/* Page migration works on Vega10 or newer */
if (kfddev->device_info->asic_family < CHIP_VEGA10)
@@ -907,17 +908,22 @@ int svm_migrate_init(struct amdgpu_device *adev)
 * should remove reserved size
 */
size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
-   res = devm_request_free_mem_region(adev->dev, _resource, size);
+   if (xgmi_connected_to_cpu)
+   res = lookup_resource(_resource, adev->gmc.aper_base);
+   else
+   res = devm_request_free_mem_region(adev->dev, _resource, 
size);
+
if (IS_ERR(res))
return -ENOMEM;
 
-   pgmap->type = MEMORY_DEVICE_PRIVATE;
pgmap->nr_range = 1;
pgmap->range.start = res->start;
pgmap->range.end = res->end;
+   pgmap->type = xgmi_connected_to_cpu ?
+   MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE;
pgmap->ops = _migrate_pgmap_ops;
pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
-   pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   pgmap->flags = 0;
r = devm_memremap_pages(adev->dev, pgmap);
if (IS_ERR(r)) {
pr_err("failed to register HMM device memory\n");
-- 
2.17.1

[PATCH v3 3/8] kernel: resource: lookup_resource as exported symbol

2021-06-17 Thread Alex Sierra

The AMD architecture for the Frontier supercomputer will
have device memory which can be coherently accessed by
the CPU. The system BIOS advertises this memory as SPM
(special purpose memory) in the UEFI system address map.

The AMDGPU driver needs to be able to lookup this resource
in order to claim it as MEMORY_DEVICE_GENERIC using
devm_memremap_pages.

Signed-off-by: Alex Sierra 
---
 kernel/resource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..269489bb7097 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -783,7 +783,7 @@ struct resource *lookup_resource(struct resource *root, 
resource_size_t start)
 
return res;
 }
-
+EXPORT_SYMBOL_GPL(lookup_resource);
 /*
  * Insert a resource into the resource tree. If successful, return NULL,
  * otherwise return the conflicting resource (compare to __request_resource())
-- 
2.17.1

[PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount

2021-06-17 Thread Alex Sierra

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
 fs/dax.c   |  4 +-
 include/linux/dax.h|  2 +-
 include/linux/memremap.h   |  7 +--
 include/linux/mm.h | 44 -
 lib/test_hmm.c |  2 +-
 mm/internal.h  |  8 +++
 mm/memremap.c  | 68 +++---
 mm/migrate.c   |  5 --
 mm/page_alloc.c|  3 ++
 mm/swap.c  | 45 ++---
 12 files changed, 45 insertions(+), 147 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..acee67710620 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -711,7 +711,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 
dpage = pfn_to_page(uvmem_pfn);
dpage->zone_device_data = pvt;
-   get_page(dpage);
+   init_page_count(dpage);
lock_page(dpage);
return dpage;
 out_clear:
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 92987daa5e17..8bc7120e1216 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -324,7 +324,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
return NULL;
}
 
-   get_page(page);
+   init_page_count(page);
lock_page(page);
return page;
 }
diff --git a/fs/dax.c b/fs/dax.c
index 321f4ddc6643..7b4c6b35b098 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -560,14 +560,14 @@ static void *grab_mapping_entry(struct xa_state *xas,
 
 /**
  * dax_layout_busy_page_range - find first pinned page in @mapping
- * @mapping: address space to scan for a page with ref count > 1
+ * @mapping: address space to scan for a page with ref count > 0
  * @start: Starting offset. Page containing 'start' is included.
  * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX,
  *   pages from 'start' till the end of file are included.
  *
  * DAX requires ZONE_DEVICE mapped pages. These pages are never
  * 'onlined' to the page allocator so they are considered idle when
- * page->count == 1. A filesystem uses this interface to determine if
+ * page->count == 0. A filesystem uses this interface to determine if
  * any page in the mapping is busy, i.e. for DMA, or other
  * get_user_pages() usages.
  *
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8b5da1d60dbc..05fc982ce153 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -245,7 +245,7 @@ static inline bool dax_mapping(struct address_space 
*mapping)
 
 static inline bool dax_page_unused(struct page *page)
 {
-   return page_ref_count(page) == 1;
+   return page_ref_count(page) == 0;
 }
 
 #define dax_wait_page(_inode, _page, _wait_cb) \
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 79c49e7f5c30..327f32427d21 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -66,9 +66,10 @@ enum memory_type {
 
 struct dev_pagemap_ops {
/*
-* Called once the page refcount reaches 1.  (ZONE_DEVICE pages never
-* reach 0 refcount unless there is a refcount bug. This allows the
-* device driver to implement its own memory management.)
+* Called once the page refcount reaches 0. The reference count
+* should be reset to one with init_page_count(page) before reusing
+* the page. This allows the device driver to implement its own
+* memory management.
 */
void (*page_free)(struct page *page);
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c9900aedc195..d8d79bb94be8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1117,39 +1117,6 @@ static inline bool is_zone_device_page(const struct page 
*page)
 }
 #endif
 
-#ifdef CONFIG_DEV_PAGEMAP_OPS
-void free_devmap_managed_page(struct page *page);
-DECLARE_STATIC_KEY_FALSE(devmap_managed_key);
-
-static inline bool page_is_devmap_managed(struct page *page)
-{
-   if (!static_branch_unlikely(_managed_key))
-   return false;
-   if (!is_zone_device_page(page))
-   return false;
-   switch (page->pgmap->type) {
-   case MEMORY_DEVICE_PRIVATE:
-   case

[PATCH v3 1/8] ext4/xfs: add page refcount helper

2021-06-17 Thread Alex Sierra

From: Ralph Campbell 

There are several places where ZONE_DEVICE struct pages assume a reference
count == 1 means the page is idle and free. Instead of open coding this,
add a helper function to hide this detail.

v2:
[AS]: rename dax_layout_is_idle_page func to dax_page_unused

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
 fs/dax.c|  4 ++--
 fs/ext4/inode.c |  5 +
 fs/xfs/xfs_file.c   |  4 +---
 include/linux/dax.h | 10 ++
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 26d5dcd2d69e..321f4ddc6643 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -358,7 +358,7 @@ static void dax_disassociate_entry(void *entry, struct 
address_space *mapping,
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
 
-   WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
+   WARN_ON_ONCE(trunc && !dax_layout_is_idle_page(page));
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
page->mapping = NULL;
page->index = 0;
@@ -372,7 +372,7 @@ static struct page *dax_busy_page(void *entry)
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
 
-   if (page_ref_count(page) > 1)
+   if (!dax_layout_is_idle_page(page))
return page;
}
return NULL;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c173c8405856..9ee00186412f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3972,10 +3972,7 @@ int ext4_break_layouts(struct inode *inode)
if (!page)
return 0;
 
-   error = ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1,
-   TASK_INTERRUPTIBLE, 0, 0,
-   ext4_wait_dax_page(ei));
+   error = dax_wait_page(ei, page, ext4_wait_dax_page);
} while (error == 0);
 
return error;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..39565fe5f817 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -782,9 +782,7 @@ xfs_break_dax_layouts(
return 0;
 
*retry = true;
-   return ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE,
-   0, 0, xfs_wait_dax_page(inode));
+   return dax_wait_page(inode, page, xfs_wait_dax_page);
 }
 
 int
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b52f084aa643..8b5da1d60dbc 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -243,6 +243,16 @@ static inline bool dax_mapping(struct address_space 
*mapping)
return mapping->host && IS_DAX(mapping->host);
 }
 
+static inline bool dax_page_unused(struct page *page)
+{
+   return page_ref_count(page) == 1;
+}
+
+#define dax_wait_page(_inode, _page, _wait_cb) \
+   ___wait_var_event(&(_page)->_refcount,  \
+   dax_page_unused(_page), \
+   TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode))
+
 #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
 void hmem_register_device(int target_nid, struct resource *r);
 #else
-- 
2.17.1

[PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*

2021-06-17 Thread Alex Sierra

v1:
AMD is building a system architecture for the Frontier supercomputer with a
coherent interconnect between CPUs and GPUs. This hardware architecture allows
the CPUs to coherently access GPU device memory. We have hardware in our labs
and we are working with our partner HPE on the BIOS, firmware and software
for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver looks
it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the migrate_vma_*
helpers so we can support page-based migration in our unified memory 
allocations,
while also supporting CPU access to those pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
correctly in the migrate_vma_* helpers. We are looking for feedback about this
approach. If we're close, what's needed to make our patches acceptable upstream?
If we're not close, any suggestions how else to achieve what we are trying to do
(i.e. page migration and coherent CPU access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently 
upstreamed
to Dave Airlie's drm-next branch
https://lore.kernel.org/dri-devel/20210527205606.2660-6-felix.kuehl...@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc
On top of that we did some rework of our VRAM management for migrations to 
remove
some incorrect assumptions, allow partially successful migrations and GPU memory
mappings that mix pages in VRAM and system memory.
https://patchwork.kernel.org/project/dri-devel/list/?series=489811

v2:
This patch series version has merged "[RFC PATCH v3 0/2]
mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
Ralph Campbell. It also applies at the top of these series, our changes
to support device generic type in migration_vma helpers.
This has been tested in systems with device memory that has coherent
access by CPU.

Also addresses the following feedback made in v1:
- Isolate in one patch kernel/resource.c modification, based
on Christoph's feedback.
- Add helpers check for generic and private type to avoid
duplicated long lines.

v3:
- Include cover letter from v1
- Rename dax_layout_is_idle_page func to dax_page_unused in patch
ext4/xfs: add page refcount helper

Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches
Patches 4-5 are for context to show how we are looking up the SPM 
memory and registering it with devmap.
Patches 3,6-8 are the changes we are trying to upstream or rework to 
make them acceptable upstream.

Alex Sierra (6):
  kernel: resource: lookup_resource as exported symbol
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: generic type as sys mem on migration to ram
  include/linux/mm.h: helpers to check zone device generic type
  mm: add generic type support to migrate_vma helpers
  mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

Ralph Campbell (2):
  ext4/xfs: add page refcount helper
  mm: remove extra ZONE_DEVICE struct page refcount

 arch/powerpc/kvm/book3s_hv_uvmem.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 --
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
 fs/dax.c |  8 +--
 fs/ext4/inode.c  |  5 +-
 fs/xfs/xfs_file.c|  4 +-
 include/linux/dax.h  | 10 
 include/linux/memremap.h |  7 +--
 include/linux/mm.h   | 52 +++---
 kernel/resource.c|  2 +-
 lib/test_hmm.c   |  2 +-
 mm/internal.h|  8 +++
 mm/memremap.c| 69 +++-
 mm/migrate.c | 13 ++---
 mm/page_alloc.c  |  3 ++
 mm/swap.c| 45 ++--
 16 files changed, 83 insertions(+), 164 deletions(-)

-- 
2.17.1

[PATCH v4 3/3] drm/msm/dsi: support CPHY mode for 7nm pll/phy

2021-06-17 Thread Jonathan Marek

Add the required changes to support 7nm pll/phy in CPHY mode.

This adds a "qcom,dsi-phy-cphy-mode" property for the PHY node to enable
the CPHY mode.

Signed-off-by: Jonathan Marek 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dsi/dsi.xml.h |   2 +
 drivers/gpu/drm/msm/dsi/dsi_host.c|  34 -
 drivers/gpu/drm/msm/dsi/phy/dsi_phy.c |  49 
 drivers/gpu/drm/msm/dsi/phy/dsi_phy.h |   3 +
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c | 145 +++---
 5 files changed, 186 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi.xml.h 
b/drivers/gpu/drm/msm/dsi/dsi.xml.h
index b8e9e608abfc..a59a9bd3f5d1 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.xml.h
+++ b/drivers/gpu/drm/msm/dsi/dsi.xml.h
@@ -621,6 +621,8 @@ static inline uint32_t DSI_VERSION_MAJOR(uint32_t val)
return ((val) << DSI_VERSION_MAJOR__SHIFT) & DSI_VERSION_MAJOR__MASK;
 }
 
+#define REG_DSI_CPHY_MODE_CTRL 0x02d4
+
 #define REG_DSI_PHY_PLL_CTRL_0 0x0200
 #define DSI_PHY_PLL_CTRL_0_ENABLE  0x0001
 
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 809997f870f6..262d6d3b9c4b 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -27,6 +27,7 @@
 #include "dsi_cfg.h"
 #include "msm_kms.h"
 #include "msm_gem.h"
+#include "phy/dsi_phy.h"
 
 #define DSI_RESET_TOGGLE_DELAY_MS 20
 
@@ -170,6 +171,9 @@ struct msm_dsi_host {
int dlane_swap;
int num_data_lanes;
 
+   /* from phy DT */
+   bool cphy_mode;
+
u32 dma_cmd_ctrl_restore;
 
bool registered;
@@ -513,6 +517,7 @@ int msm_dsi_runtime_resume(struct device *dev)
 
 int dsi_link_clk_set_rate_6g(struct msm_dsi_host *msm_host)
 {
+   u32 byte_intf_rate;
int ret;
 
DBG("Set clk rates: pclk=%d, byteclk=%d",
@@ -532,8 +537,13 @@ int dsi_link_clk_set_rate_6g(struct msm_dsi_host *msm_host)
}
 
if (msm_host->byte_intf_clk) {
-   ret = clk_set_rate(msm_host->byte_intf_clk,
-  msm_host->byte_clk_rate / 2);
+   /* For CPHY, byte_intf_clk is same as byte_clk */
+   if (msm_host->cphy_mode)
+   byte_intf_rate = msm_host->byte_clk_rate;
+   else
+   byte_intf_rate = msm_host->byte_clk_rate / 2;
+
+   ret = clk_set_rate(msm_host->byte_intf_clk, byte_intf_rate);
if (ret) {
pr_err("%s: Failed to set rate byte intf clk, %d\n",
   __func__, ret);
@@ -721,7 +731,11 @@ static void dsi_calc_pclk(struct msm_dsi_host *msm_host, 
bool is_dual_dsi)
lanes = 1;
}
 
-   do_div(pclk_bpp, (8 * lanes));
+   /* CPHY "byte_clk" is in units of 16 bits */
+   if (msm_host->cphy_mode)
+   do_div(pclk_bpp, (16 * lanes));
+   else
+   do_div(pclk_bpp, (8 * lanes));
 
msm_host->pixel_clk_rate = pclk_rate;
msm_host->byte_clk_rate = pclk_bpp;
@@ -947,6 +961,9 @@ static void dsi_ctrl_config(struct msm_dsi_host *msm_host, 
bool enable,
data |= DSI_CTRL_ENABLE;
 
dsi_write(msm_host, REG_DSI_CTRL, data);
+
+   if (msm_host->cphy_mode)
+   dsi_write(msm_host, REG_DSI_CPHY_MODE_CTRL, BIT(0));
 }
 
 static void dsi_set_video_dsc(struct msm_dsi_host *msm_host,
@@ -2278,6 +2295,8 @@ int msm_dsi_host_set_src_pll(struct mipi_dsi_host *host,
struct clk *byte_clk_provider, *pixel_clk_provider;
int ret;
 
+   msm_host->cphy_mode = src_phy->cphy_mode;
+
ret = msm_dsi_phy_get_clk_provider(src_phy,
_clk_provider, _clk_provider);
if (ret) {
@@ -2349,7 +2368,14 @@ void msm_dsi_host_get_phy_clk_req(struct mipi_dsi_host 
*host,
return;
}
 
-   clk_req->bitclk_rate = msm_host->byte_clk_rate * 8;
+   /* CPHY transmits 16 bits over 7 clock cycles
+* "byte_clk" is in units of 16-bits (see dsi_calc_pclk),
+* so multiply by 7 to get the "bitclk rate"
+*/
+   if (msm_host->cphy_mode)
+   clk_req->bitclk_rate = msm_host->byte_clk_rate * 7;
+   else
+   clk_req->bitclk_rate = msm_host->byte_clk_rate * 8;
clk_req->escclk_rate = msm_host->esc_clk_rate;
 }
 
diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy.c 
b/drivers/gpu/drm/msm/dsi/phy/dsi_phy.c
index 6ca6bfd4809b..3e64f1840672 100644
--- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy.c
+++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 
 #include "dsi_phy.h"
 
@@ -461,6 +462,51 @@ int msm_dsi_dphy_timing_calc_v4(struct msm_dsi_dphy_timing 
*timing,
return 0;
 }
 
+int msm_dsi_cphy_timing_calc_v4(struct msm_dsi_dphy_timing *timing,
+   struct msm_dsi_phy_clk_request *clk_req)
+{
+

[PATCH v4 2/3] dt-bindings: msm: dsi: document phy-type property for 7nm dsi phy

2021-06-17 Thread Jonathan Marek

Document a new phy-type property which will be used to determine whether
the phy should operate in D-PHY or C-PHY mode.

Signed-off-by: Jonathan Marek 
Reviewed-by: Laurent Pinchart 
---
 .../devicetree/bindings/display/msm/dsi-phy-7nm.yaml | 5 +
 include/dt-bindings/phy/phy.h| 2 ++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml 
b/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
index c0077ca7e9e7..70809d1cac54 100644
--- a/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
+++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml
@@ -34,6 +34,11 @@ properties:
 description: |
   Connected to VDD_A_DSI_PLL_0P9 pin (or VDDA_DSI{0,1}_PLL_0P9 for sm8150)
 
+  phy-type:
+description: D-PHY (default) or C-PHY mode
+enum: [ 10, 11 ]
+default: 10
+
 required:
   - compatible
   - reg
diff --git a/include/dt-bindings/phy/phy.h b/include/dt-bindings/phy/phy.h
index 887a31b250a8..f48c9acf251e 100644
--- a/include/dt-bindings/phy/phy.h
+++ b/include/dt-bindings/phy/phy.h
@@ -20,5 +20,7 @@
 #define PHY_TYPE_XPCS  7
 #define PHY_TYPE_SGMII 8
 #define PHY_TYPE_QSGMII9
+#define PHY_TYPE_DPHY  10
+#define PHY_TYPE_CPHY  11
 
 #endif /* _DT_BINDINGS_PHY */
-- 
2.26.1

1 2 >

1 - 100 of 195 matches

Mail list logo