Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-22 Thread Christian König

Am 22.12.21 um 21:53 schrieb Daniel Vetter:

On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:

[SNIP]
Still sounds funky. I think minimally we should have an ack from CRIU
developers that this is officially the right way to solve this problem. I
really don't want to have random one-off hacks that don't work across the
board, for a problem where we (drm subsystem) really shouldn't be the only
one with this problem. Where "this problem" means that the mmap space is
per file description, and not per underlying inode or real device or
whatever. That part sounds like a CRIU problem, and I expect CRIU folks
want a consistent solution across the board for this. Hence please grab an
ack from them.


Unfortunately it's a KFD design problem. AMD used a single device node, 
then mmaped different objects from the same offset to different 
processes and expected it to work the rest of the fs subsystem without 
churn.


So yes, this is indeed because the mmap space is per file descriptor for 
the use case here.


And thanks for pointing this out, this indeed makes the whole change 
extremely questionable.


Regards,
Christian.



Cheers, Daniel





Re: mmotm 2021-12-22-19-02 uploaded (drivers/gpu/drm/i915/display/intel_backlight.o)

2021-12-22 Thread Randy Dunlap


On 12/22/21 19:02, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2021-12-22-19-02 has been uploaded to
> 
>https://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> https://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.
> 


on x86_64:

ld: drivers/gpu/drm/i915/display/intel_backlight.o: in function 
`intel_backlight_device_register':
intel_backlight.c:(.text+0x27ba): undefined reference to 
`backlight_device_register'
ld: intel_backlight.c:(.text+0x2871): undefined reference to 
`backlight_device_register'
ld: drivers/gpu/drm/i915/display/intel_backlight.o: in function 
`intel_backlight_device_unregister':
intel_backlight.c:(.text+0x28c4): undefined reference to 
`backlight_device_unregister'



Full randconfig file is attached.

-- 
~Randy

config-intel-backlight.gz
Description: application/gzip


[PATCH V2] drm: nouveau: lsfw: cleanup coccinelle warning

2021-12-22 Thread Qing Wang
From: Wang Qing 

odd_ptr_err.cocci has complained about this warning for a long time:
lsfw.c:194:5-11: inconsistent IS_ERR and PTR_ERR on line 195.

Although there is no actual impact, it can improve scanning efficiency.

Signed-off-by: Wang Qing 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c
index 9b1cf67..0f70d14
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c
@@ -191,7 +191,8 @@ nvkm_acr_lsfw_load_bl_inst_data_sig(struct nvkm_subdev 
*subdev,
u32 *bldata;
int ret;
 
-   if (IS_ERR((lsfw = nvkm_acr_lsfw_add(func, acr, falcon, id
+   lsfw = nvkm_acr_lsfw_add(func, acr, falcon, id);
+   if (IS_ERR(lsfw))
return PTR_ERR(lsfw);
 
ret = nvkm_firmware_load_name(subdev, path, "bl", ver, );
-- 
2.7.4



Re: [PATCH] drm/ast: Support 1600x900 with 108MHz PCLK

2021-12-22 Thread Dave Airlie
On Wed, 22 Dec 2021 at 11:19, Kuo-Hsiang Chou
 wrote:
>
> Hi
>
> -Original Message-
> From: Dave Airlie [mailto:airl...@gmail.com]
> Sent: Wednesday, December 22, 2021 5:56 AM
> To: Thomas Zimmermann 
>
> Subject: Re: [PATCH] drm/ast: Support 1600x900 with 108MHz PCLK
>
> On Mon, 2 Nov 2020 at 17:57, Thomas Zimmermann  wrote:
> >
> > Hi
> >
> > Am 30.10.20 um 08:42 schrieb KuoHsiang Chou:
> > > [New] Create the setting for 1600x900 @60Hz refresh rate
> > >   by 108MHz pixel-clock.
> > >
> > > Signed-off-by: KuoHsiang Chou 
> >
> > Acked-by: Thomas Zimmermann 
> >
> > I'll add your patch to drm-misc-next.
> >
> > As Sam mentioned, you should use scripts/get_maintainers.pl to
> > retrieve the relevant people. These include those in MAINTAINERS, but
> > also developers that have previously worked on the code.
>
> We are seeing a possible report of a regression on an ast2600 server with 
> this patch.
>
> I haven't ascertained that reverting it fixes it for the customer yet, but 
> this is a heads up in case anyone else has seen issues.
>
> Hi Dave,
>
> Yes, you're right, The patch needs to be removed. The patch occurs incorrect 
> timing on CRT and ASTDP when 1600x900 are selected.
> So, do I need to commit a new patch to remove/revert it from drm/ast?

Yes, do a git revert 

Fixup the resulting message, to say why, and add a
Fixes: <12 chars of sha1> ("commitmsg")

and send to the list.
Dave.


Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-22 Thread Bhardwaj, Rajneesh

Sorry for the typo in my previous email. Please read Adrian Reber*

On 12/22/2021 8:49 PM, Bhardwaj, Rajneesh wrote:


Adding Adrian Rebel who is the CRIU maintainer and CRIU list

On 12/22/2021 3:53 PM, Daniel Vetter wrote:

On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:

On 12/20/2021 4:29 AM, Daniel Vetter wrote:

On Fri, Dec 10, 2021 at 07:58:50AM +0100, Christian König wrote:

Am 09.12.21 um 19:28 schrieb Felix Kuehling:

Am 2021-12-09 um 10:30 a.m. schrieb Christian König:

That still won't work.

But I think we could do this change for the amdgpu mmap callback only.

If graphics user mode has problems with it, we could even make this
specific to KFD BOs in the amdgpu_gem_object_mmap callback.

I think it's fine for the whole amdgpu stack, my concern is more about
radeon, nouveau and the ARM stacks which are using this as well.

That blew up so nicely the last time we tried to change it and I know of at
least one case where radeon was/is used with BOs in a child process.

I'm way late and burried again, but I think it'd be good to be consistent



I had committed this change into our amd-staging-drm-next branch last 
week after I got the ACK and RB from Felix and Christian.




here across drivers. Or at least across drm drivers. And we've had the vma
open/close refcounting to make fork work since forever.

I think if we do this we should really only do this for mmap() where this
applies, but reading through the thread here I'm honestly confused why
this is a problem. If CRIU can't handle forked mmaps it needs to be
thought that, not hacked around. Or at least I'm not understanding why
this shouldn't work ...
-Daniel


Hi Daniel

In the v2
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2Fa1a865f5-ad2c-29c8-cbe4-2635d53eceb6%40amd.com%2FT%2Fdata=04%7C01%7Crajneesh.bhardwaj%40amd.com%7Ce4634a16c37149da173408d9c58d1338%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637758031981907821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=h0z4sO19bsJecMqeHGdz%2BHZElKuyzK%2BW%2FMbLWA79I10%3Dreserved=0
I pretty much limited the scope of the change to KFD BOs on mmap. Regarding
CRIU, I think its not a CRIU problem as CRIU on restore, only tries to
recreate all the child processes and then mmaps all the VMAs it sees (as per
checkpoint snapshot) in the new process address space after the VMA
placements are finalized in the position independent code phase. Since the
inherited VMAs don't have access rights the criu mmap fails.

Still sounds funky. I think minimally we should have an ack from CRIU
developers that this is officially the right way to solve this problem. I
really don't want to have random one-off hacks that don't work across the
board, for a problem where we (drm subsystem) really shouldn't be the only
one with this problem. Where "this problem" means that the mmap space is
per file description, and not per underlying inode or real device or
whatever. That part sounds like a CRIU problem, and I expect CRIU folks
want a consistent solution across the board for this. Hence please grab an
ack from them.

Cheers, Daniel



Maybe Adrian can share his views on this.

Hi Adrian - For the context, on CRIU restore we see mmap failures ( in 
PIE restore phase) due to permission issues on the (render node) VMAs 
that were inherited since the application that check pointed had 
forked.  The VMAs ideally should not be in the child process but the 
smaps file shows these VMAs in the child address space. We didn't want 
to use madvise to avoid this copy and rather change in the kernel mode 
to limit the impact to our user space library thunk. Based on my 
understanding, during PIE restore phase, after the VMA placements are 
finalized, CRIU does a sys_mmap on all the VMA it sees in the VmaEntry 
list and I think its not an issue as per CRIU design but do you think 
we could handle this corner case better inside CRIU?




Regards,

Rajneesh


Regards,
Christian.


Regards,
     Felix



Regards,
Christian.

Am 09.12.21 um 16:29 schrieb Bhardwaj, Rajneesh:

Sounds good. I will send a v2 with only ttm_bo_mmap_obj change. Thank
you!

On 12/9/2021 10:27 AM, Christian König wrote:

Hi Rajneesh,

yes, separating this from the drm_gem_mmap_obj() change is certainly
a good idea.


The child cannot access the BOs mapped by the parent anyway with
access restrictions applied

exactly that is not correct. That behavior is actively used by some
userspace stacks as far as I know.

Regards,
Christian.

Am 09.12.21 um 16:23 schrieb Bhardwaj, Rajneesh:

Thanks Christian. Would it make it less intrusive if I just use the
flag for ttm bo mmap and remove the drm_gem_mmap_obj change from
this patch? For our use case, just the ttm_bo_mmap_obj change
should suffice and we don't want to put any more work arounds in
the user space (thunk, in our case).

The child cannot access the BOs mapped by the parent anyway with
access 

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-22 Thread Bhardwaj, Rajneesh

Adding Adrian Rebel who is the CRIU maintainer and CRIU list

On 12/22/2021 3:53 PM, Daniel Vetter wrote:

On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:

On 12/20/2021 4:29 AM, Daniel Vetter wrote:

On Fri, Dec 10, 2021 at 07:58:50AM +0100, Christian König wrote:

Am 09.12.21 um 19:28 schrieb Felix Kuehling:

Am 2021-12-09 um 10:30 a.m. schrieb Christian König:

That still won't work.

But I think we could do this change for the amdgpu mmap callback only.

If graphics user mode has problems with it, we could even make this
specific to KFD BOs in the amdgpu_gem_object_mmap callback.

I think it's fine for the whole amdgpu stack, my concern is more about
radeon, nouveau and the ARM stacks which are using this as well.

That blew up so nicely the last time we tried to change it and I know of at
least one case where radeon was/is used with BOs in a child process.

I'm way late and burried again, but I think it'd be good to be consistent



I had committed this change into our amd-staging-drm-next branch last 
week after I got the ACK and RB from Felix and Christian.




here across drivers. Or at least across drm drivers. And we've had the vma
open/close refcounting to make fork work since forever.

I think if we do this we should really only do this for mmap() where this
applies, but reading through the thread here I'm honestly confused why
this is a problem. If CRIU can't handle forked mmaps it needs to be
thought that, not hacked around. Or at least I'm not understanding why
this shouldn't work ...
-Daniel


Hi Daniel

In the v2
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2Fa1a865f5-ad2c-29c8-cbe4-2635d53eceb6%40amd.com%2FT%2Fdata=04%7C01%7Crajneesh.bhardwaj%40amd.com%7Ce4634a16c37149da173408d9c58d1338%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637758031981907821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=h0z4sO19bsJecMqeHGdz%2BHZElKuyzK%2BW%2FMbLWA79I10%3Dreserved=0
I pretty much limited the scope of the change to KFD BOs on mmap. Regarding
CRIU, I think its not a CRIU problem as CRIU on restore, only tries to
recreate all the child processes and then mmaps all the VMAs it sees (as per
checkpoint snapshot) in the new process address space after the VMA
placements are finalized in the position independent code phase. Since the
inherited VMAs don't have access rights the criu mmap fails.

Still sounds funky. I think minimally we should have an ack from CRIU
developers that this is officially the right way to solve this problem. I
really don't want to have random one-off hacks that don't work across the
board, for a problem where we (drm subsystem) really shouldn't be the only
one with this problem. Where "this problem" means that the mmap space is
per file description, and not per underlying inode or real device or
whatever. That part sounds like a CRIU problem, and I expect CRIU folks
want a consistent solution across the board for this. Hence please grab an
ack from them.

Cheers, Daniel



Maybe Adrian can share his views on this.

Hi Adrian - For the context, on CRIU restore we see mmap failures ( in 
PIE restore phase) due to permission issues on the (render node) VMAs 
that were inherited since the application that check pointed had 
forked.  The VMAs ideally should not be in the child process but the 
smaps file shows these VMAs in the child address space. We didn't want 
to use madvise to avoid this copy and rather change in the kernel mode 
to limit the impact to our user space library thunk. Based on my 
understanding, during PIE restore phase, after the VMA placements are 
finalized, CRIU does a sys_mmap on all the VMA it sees in the VmaEntry 
list and I think its not an issue as per CRIU design but do you think we 
could handle this corner case better inside CRIU?






Regards,

Rajneesh


Regards,
Christian.


Regards,
     Felix



Regards,
Christian.

Am 09.12.21 um 16:29 schrieb Bhardwaj, Rajneesh:

Sounds good. I will send a v2 with only ttm_bo_mmap_obj change. Thank
you!

On 12/9/2021 10:27 AM, Christian König wrote:

Hi Rajneesh,

yes, separating this from the drm_gem_mmap_obj() change is certainly
a good idea.


The child cannot access the BOs mapped by the parent anyway with
access restrictions applied

exactly that is not correct. That behavior is actively used by some
userspace stacks as far as I know.

Regards,
Christian.

Am 09.12.21 um 16:23 schrieb Bhardwaj, Rajneesh:

Thanks Christian. Would it make it less intrusive if I just use the
flag for ttm bo mmap and remove the drm_gem_mmap_obj change from
this patch? For our use case, just the ttm_bo_mmap_obj change
should suffice and we don't want to put any more work arounds in
the user space (thunk, in our case).

The child cannot access the BOs mapped by the parent anyway with
access restrictions applied so I wonder why even inherit the vma?

On 12/9/2021 2:54 AM, Christian König wrote:

Am 08.12.21 um 

[PATCH] drm/i915/guc: Report error on invalid reset notification

2021-12-22 Thread John . C . Harrison
From: John Harrison 

Don't silently drop reset notifications from the GuC. It might not be
safe to do an error capture but we still want some kind of report that
the reset happened.

Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e7517206af82..0fbf24b8d5e1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3979,6 +3979,11 @@ static void guc_handle_context_reset(struct intel_guc 
*guc,
   !context_blocked(ce))) {
capture_error_state(guc, ce);
guc_context_replay(ce);
+   } else {
+   drm_err(_to_gt(guc)->i915->drm,
+   "Invalid GuC engine reset notificaion for 0x%04X on %s: 
banned = %d, blocked = %d",
+   ce->guc_id.id, ce->engine->name, 
intel_context_is_banned(ce),
+   context_blocked(ce));
}
 }
 
-- 
2.25.1



Re: [PATCH] drm/i915/guc: Use lockless list for destroyed contexts

2021-12-22 Thread Matthew Brost
On Wed, Dec 22, 2021 at 04:48:36PM -0800, John Harrison wrote:
> On 12/22/2021 15:29, Matthew Brost wrote:
> > Use a lockless list structure for destroyed contexts to avoid hammering
> > on global submission spin lock.
> I thought the guidance was that lockless anything without an explanation
> longer than War And Peace comes with an automatic termination penalty?
> 

I was thinking that was for custom lockless algorithms not using core
uAPIs. If this is really concern I could protect the llist_del_all by a
lock but the doc explicitly says how I'm using this uAPI is safe without
a lock. 

> Also, I thought the simple suggestion was to just move the entire list
> sideways under the existing lock and then loop through the local list safely
> without requiring locks because it is now local only.
> 

That's basically what this uAPI does in a few simple calls rather than
our own algorithm to move to a new list.

Matt

> John.
> 
> 
> > 
> > Suggested-by: Tvrtko Ursulin 
> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c   |  2 -
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  3 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h|  3 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 44 +--
> >   4 files changed, 16 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 5d0ec7c49b6a..4aacb4b0418d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -403,8 +403,6 @@ intel_context_init(struct intel_context *ce, struct 
> > intel_engine_cs *engine)
> > ce->guc_id.id = GUC_INVALID_LRC_ID;
> > INIT_LIST_HEAD(>guc_id.link);
> > -   INIT_LIST_HEAD(>destroyed_link);
> > -
> > INIT_LIST_HEAD(>parallel.child_list);
> > /*
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 30cd81ad8911..4532d43ec9c0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -9,6 +9,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> > @@ -224,7 +225,7 @@ struct intel_context {
> >  * list when context is pending to be destroyed (deregistered with the
> >  * GuC), protected by guc->submission_state.lock
> >  */
> > -   struct list_head destroyed_link;
> > +   struct llist_node destroyed_link;
> > /** @parallel: sub-structure for parallel submission members */
> > struct {
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index f9240d4baa69..705085058411 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -8,6 +8,7 @@
> >   #include 
> >   #include 
> > +#include 
> >   #include "intel_uncore.h"
> >   #include "intel_guc_fw.h"
> > @@ -112,7 +113,7 @@ struct intel_guc {
> >  * @destroyed_contexts: list of contexts waiting to be destroyed
> >  * (deregistered with the GuC)
> >  */
> > -   struct list_head destroyed_contexts;
> > +   struct llist_head destroyed_contexts;
> > /**
> >  * @destroyed_worker: worker to deregister contexts, need as we
> >  * need to take a GT PM reference and can't from destroy
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 0a03a30e4c6d..6f7643edc139 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1771,7 +1771,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
> > spin_lock_init(>submission_state.lock);
> > INIT_LIST_HEAD(>submission_state.guc_id_list);
> > ida_init(>submission_state.guc_ids);
> > -   INIT_LIST_HEAD(>submission_state.destroyed_contexts);
> > +   init_llist_head(>submission_state.destroyed_contexts);
> > INIT_WORK(>submission_state.destroyed_worker,
> >   destroyed_worker_func);
> > @@ -2696,26 +2696,18 @@ static void __guc_context_destroy(struct 
> > intel_context *ce)
> > }
> >   }
> > +#define take_destroyed_contexts(guc) \
> > +   llist_del_all(>submission_state.destroyed_contexts)
> > +
> >   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
> >   {
> > -   struct intel_context *ce;
> > -   unsigned long flags;
> > +   struct intel_context *ce, *cn;
> > GEM_BUG_ON(!submission_disabled(guc) &&
> >guc_submission_initialized(guc));
> > -   while (!list_empty(>submission_state.destroyed_contexts)) {
> > -   spin_lock_irqsave(>submission_state.lock, flags);
> > -   ce = 
> > list_first_entry_or_null(>submission_state.destroyed_contexts,
> > - struct 

Re: [PATCH] drm/i915/execlists: Weak parallel submission support for execlists

2021-12-22 Thread John Harrison

On 12/22/2021 14:35, Matthew Brost wrote:

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

v2:
  (John Harrison)
   - Drop siblings array as num_siblings must be 1
v3:
  (John Harrison)
   - Drop single submission
v4:
  (John Harrison)
   - Actually drop single submission
   - Use IS_ERR check on return value from intel_context_create
   - Set last request to NULL on unpin

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 11 --
  drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
  .../drm/i915/gt/intel_execlists_submission.c  | 38 +++
  drivers/gpu/drm/i915/gt/intel_lrc.c   |  4 ++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
  5 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index cad3f0b2be9e..b0d2d81fc3b3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -570,10 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
  
-	/* FIXME: This is NIY for execlists */

-   if (!(intel_uc_uses_guc_submission(_gt(i915)->uc)))
-   return -ENODEV;
-
if (get_user(slot, >engine_index))
return -EFAULT;
  
@@ -583,6 +579,13 @@ set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,

if (get_user(num_siblings, >num_siblings))
return -EFAULT;
  
+	if (!intel_uc_uses_guc_submission(_gt(i915)->uc) &&

+   num_siblings != 1) {
+   drm_dbg(>drm, "Only 1 sibling (%d) supported in non-GuC 
mode\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
if (slot >= set->num_engines) {
drm_dbg(>drm, "Invalid placement value, %d >= %d\n",
slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ba083d800a08..5d0ec7c49b6a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
  
  	__i915_active_acquire(>active);
  
-	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))

+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
+   intel_context_is_parallel(ce))
return 0;
  
  	/* Preallocate tracking nodes */

@@ -563,7 +564,6 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
 * Callers responsibility to validate that this function is used
 * correctly but we use GEM_BUG_ON here ensure that they do.
 */
-   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
GEM_BUG_ON(intel_context_is_pinned(parent));
GEM_BUG_ON(intel_context_is_child(parent));
GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index a69df5e9e77a..be56d0b41892 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2599,6 +2599,43 @@ static void execlists_context_cancel_request(struct 
intel_context *ce,
  current->comm);
  }
  
+static struct intel_context *

+execlists_create_parallel(struct intel_engine_cs **engines,
+ unsigned int num_siblings,
+ unsigned int width)
+{
+   struct intel_context *parent = NULL, *ce, *err;
+   int i;
+
+   GEM_BUG_ON(num_siblings != 1);
+
+   for (i = 0; i < width; ++i) {
+   ce = intel_context_create(engines[i]);
+   if (IS_ERR(ce)) {
+   err = ce;

Could get rid of 'err' and just say 'return ce;' at the end of 'unwind:'.

Either way:
Reviewed-by: John Harrison 



+   goto unwind;
+   }
+
+   if (i == 0)
+   parent = ce;
+   else
+   intel_context_bind_parent_child(parent, ce);
+   }
+
+   parent->parallel.fence_context = dma_fence_context_alloc(1);
+
+   intel_context_set_nopreempt(parent);
+   for_each_child(parent, ce)
+   intel_context_set_nopreempt(ce);
+
+   return parent;
+
+unwind:
+   if (parent)
+   

[PATCH v9 2/2] drm/msm/dp: do not initialize phy until plugin interrupt received

2021-12-22 Thread Kuogee Hsieh
Current DP drivers have regulators, clocks, irq and phy are grouped
together within a function and executed not in a symmetric manner.
This increase difficulty of code maintenance and limited code scalability.
This patch divides the driver life cycle of operation into four states,
resume (including booting up), dongle plugin, dongle unplugged and suspend.
Regulators, core clocks and irq are grouped together and enabled at resume
(or booting up) so that the DP controller is armed and ready to receive HPD
plugin interrupts. HPD plugin interrupt is generated when a dongle plugs
into DUT (device under test). Once HPD plugin interrupt is received, DP
controller will initialize phy so that dpcd read/write will function and
following link training can be proceeded successfully. DP phy will be
disabled after main link is teared down at end of unplugged HPD interrupt
handle triggered by dongle unplugged out of DUT. Finally regulators, code
clocks and irq are disabled at corresponding suspension.

Changes in V2:
-- removed unnecessary dp_ctrl NULL check
-- removed unnecessary phy init_count and power_count DRM_DEBUG_DP logs
-- remove flip parameter out of dp_ctrl_irq_enable()
-- add fixes tag

Changes in V3:
-- call dp_display_host_phy_init() instead of dp_ctrl_phy_init() at
dp_display_host_init() for eDP

Changes in V4:
-- rewording commit text to match this commit changes

Changes in V5:
-- rebase on top of msm-next branch

Changes in V6:
-- delete flip variable

Changes in V7:
-- dp_ctrl_irq_enable/disabe() merged into dp_ctrl_reset_irq_ctrl()

Changes in V8:
-- add more detail comment regrading dp phy at dp_display_host_init()

Changes in V9:
-- remove set phy_initialized to false when -ECONNRESET detected

Fixes: 8ede2ecc3e5e ("drm/msm/dp: Add DP compliance tests on Snapdragon 
Chipsets")
Signed-off-by: Kuogee Hsieh 
---
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 80 +
 drivers/gpu/drm/msm/dp/dp_ctrl.h|  8 ++--
 drivers/gpu/drm/msm/dp/dp_display.c | 89 -
 3 files changed, 94 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index c724cb0..9c80b49 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -1365,60 +1365,44 @@ static int dp_ctrl_enable_stream_clocks(struct 
dp_ctrl_private *ctrl)
return ret;
 }
 
-int dp_ctrl_host_init(struct dp_ctrl *dp_ctrl, bool flip, bool reset)
+void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable)
+{
+   struct dp_ctrl_private *ctrl;
+
+   ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
+
+   dp_catalog_ctrl_reset(ctrl->catalog);
+
+   if (enable)
+   dp_catalog_ctrl_enable_irq(ctrl->catalog, enable);
+}
+
+void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
struct dp_io *dp_io;
struct phy *phy;
 
-   if (!dp_ctrl) {
-   DRM_ERROR("Invalid input data\n");
-   return -EINVAL;
-   }
-
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
dp_io = >parser->io;
phy = dp_io->phy;
 
-   ctrl->dp_ctrl.orientation = flip;
-
-   if (reset)
-   dp_catalog_ctrl_reset(ctrl->catalog);
-
-   DRM_DEBUG_DP("flip=%d\n", flip);
dp_catalog_ctrl_phy_reset(ctrl->catalog);
phy_init(phy);
-   dp_catalog_ctrl_enable_irq(ctrl->catalog, true);
-
-   return 0;
 }
 
-/**
- * dp_ctrl_host_deinit() - Uninitialize DP controller
- * @dp_ctrl: Display Port Driver data
- *
- * Perform required steps to uninitialize DP controller
- * and its resources.
- */
-void dp_ctrl_host_deinit(struct dp_ctrl *dp_ctrl)
+void dp_ctrl_phy_exit(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
struct dp_io *dp_io;
struct phy *phy;
 
-   if (!dp_ctrl) {
-   DRM_ERROR("Invalid input data\n");
-   return;
-   }
-
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
dp_io = >parser->io;
phy = dp_io->phy;
 
-   dp_catalog_ctrl_enable_irq(ctrl->catalog, false);
+   dp_catalog_ctrl_phy_reset(ctrl->catalog);
phy_exit(phy);
-
-   DRM_DEBUG_DP("Host deinitialized successfully\n");
 }
 
 static bool dp_ctrl_use_fixed_nvid(struct dp_ctrl_private *ctrl)
@@ -1488,7 +1472,10 @@ static int dp_ctrl_deinitialize_mainlink(struct 
dp_ctrl_private *ctrl)
}
 
phy_power_off(phy);
+
+   /* aux channel down, reinit phy */
phy_exit(phy);
+   phy_init(phy);
 
return 0;
 }
@@ -1893,8 +1880,14 @@ int dp_ctrl_off_link_stream(struct dp_ctrl *dp_ctrl)
return ret;
}
 
+   DRM_DEBUG_DP("Before, phy=%x init_count=%d power_on=%d\n",
+   (u32)(uintptr_t)phy, phy->init_count, phy->power_count);
+
phy_power_off(phy);
 
+   DRM_DEBUG_DP("After, phy=%x init_count=%d 

[PATCH v9 1/2] drm/msm/dp: dp_link_parse_sink_count() return immediately if aux read failed

2021-12-22 Thread Kuogee Hsieh
Add checking aux read/write status at both dp_link_parse_sink_count()
and dp_link_parse_sink_status_filed() to avoid long timeout delay if
dp aux read/write failed at timeout due to cable unplugged. Also make
sure dp controller had been initialized before start dpcd read and write.

Changes in V4:
-- split this patch as stand alone patch

Changes in v5:
-- rebase on msm-next branch

Changes in v6:
-- add more details commit text

Signed-off-by: Kuogee Hsieh 
Reviewed-by: Stephen Boyd 
Tested-by: Stephen Boyd 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 12 +---
 drivers/gpu/drm/msm/dp/dp_link.c| 19 ++-
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 3d61459..0766752 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -692,9 +692,15 @@ static int dp_irq_hpd_handle(struct dp_display_private 
*dp, u32 data)
return 0;
}
 
-   ret = dp_display_usbpd_attention_cb(>pdev->dev);
-   if (ret == -ECONNRESET) { /* cable unplugged */
-   dp->core_initialized = false;
+   /*
+* dp core (ahb/aux clks) must be initialized before
+* irq_hpd be handled
+*/
+   if (dp->core_initialized) {
+   ret = dp_display_usbpd_attention_cb(>pdev->dev);
+   if (ret == -ECONNRESET) { /* cable unplugged */
+   dp->core_initialized = false;
+   }
}
DRM_DEBUG_DP("hpd_state=%d\n", state);
 
diff --git a/drivers/gpu/drm/msm/dp/dp_link.c b/drivers/gpu/drm/msm/dp/dp_link.c
index a5bdfc5..d4d31e5 100644
--- a/drivers/gpu/drm/msm/dp/dp_link.c
+++ b/drivers/gpu/drm/msm/dp/dp_link.c
@@ -737,18 +737,25 @@ static int dp_link_parse_sink_count(struct dp_link 
*dp_link)
return 0;
 }
 
-static void dp_link_parse_sink_status_field(struct dp_link_private *link)
+static int dp_link_parse_sink_status_field(struct dp_link_private *link)
 {
int len = 0;
 
link->prev_sink_count = link->dp_link.sink_count;
-   dp_link_parse_sink_count(>dp_link);
+   len = dp_link_parse_sink_count(>dp_link);
+   if (len < 0) {
+   DRM_ERROR("DP parse sink count failed\n");
+   return len;
+   }
 
len = drm_dp_dpcd_read_link_status(link->aux,
link->link_status);
-   if (len < DP_LINK_STATUS_SIZE)
+   if (len < DP_LINK_STATUS_SIZE) {
DRM_ERROR("DP link status read failed\n");
-   dp_link_parse_request(link);
+   return len;
+   }
+
+   return dp_link_parse_request(link);
 }
 
 /**
@@ -1023,7 +1030,9 @@ int dp_link_process_request(struct dp_link *dp_link)
 
dp_link_reset_data(link);
 
-   dp_link_parse_sink_status_field(link);
+   ret = dp_link_parse_sink_status_field(link);
+   if (ret)
+   return ret;
 
if (link->request.test_requested == DP_TEST_LINK_EDID_READ) {
dp_link->sink_request |= DP_TEST_LINK_EDID_READ;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH] drm/i915/guc: Use lockless list for destroyed contexts

2021-12-22 Thread John Harrison

On 12/22/2021 15:29, Matthew Brost wrote:

Use a lockless list structure for destroyed contexts to avoid hammering
on global submission spin lock.
I thought the guidance was that lockless anything without an explanation 
longer than War And Peace comes with an automatic termination penalty?


Also, I thought the simple suggestion was to just move the entire list 
sideways under the existing lock and then loop through the local list 
safely without requiring locks because it is now local only.


John.




Suggested-by: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c   |  2 -
  drivers/gpu/drm/i915/gt/intel_context_types.h |  3 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  3 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 44 +--
  4 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5d0ec7c49b6a..4aacb4b0418d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -403,8 +403,6 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->guc_id.id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(>guc_id.link);
  
-	INIT_LIST_HEAD(>destroyed_link);

-
INIT_LIST_HEAD(>parallel.child_list);
  
  	/*

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 30cd81ad8911..4532d43ec9c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -224,7 +225,7 @@ struct intel_context {

 * list when context is pending to be destroyed (deregistered with the
 * GuC), protected by guc->submission_state.lock
 */
-   struct list_head destroyed_link;
+   struct llist_node destroyed_link;
  
  	/** @parallel: sub-structure for parallel submission members */

struct {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index f9240d4baa69..705085058411 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -8,6 +8,7 @@
  
  #include 

  #include 
+#include 
  
  #include "intel_uncore.h"

  #include "intel_guc_fw.h"
@@ -112,7 +113,7 @@ struct intel_guc {
 * @destroyed_contexts: list of contexts waiting to be destroyed
 * (deregistered with the GuC)
 */
-   struct list_head destroyed_contexts;
+   struct llist_head destroyed_contexts;
/**
 * @destroyed_worker: worker to deregister contexts, need as we
 * need to take a GT PM reference and can't from destroy
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0a03a30e4c6d..6f7643edc139 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1771,7 +1771,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
spin_lock_init(>submission_state.lock);
INIT_LIST_HEAD(>submission_state.guc_id_list);
ida_init(>submission_state.guc_ids);
-   INIT_LIST_HEAD(>submission_state.destroyed_contexts);
+   init_llist_head(>submission_state.destroyed_contexts);
INIT_WORK(>submission_state.destroyed_worker,
  destroyed_worker_func);
  
@@ -2696,26 +2696,18 @@ static void __guc_context_destroy(struct intel_context *ce)

}
  }
  
+#define take_destroyed_contexts(guc) \

+   llist_del_all(>submission_state.destroyed_contexts)
+
  static void guc_flush_destroyed_contexts(struct intel_guc *guc)
  {
-   struct intel_context *ce;
-   unsigned long flags;
+   struct intel_context *ce, *cn;
  
  	GEM_BUG_ON(!submission_disabled(guc) &&

   guc_submission_initialized(guc));
  
-	while (!list_empty(>submission_state.destroyed_contexts)) {

-   spin_lock_irqsave(>submission_state.lock, flags);
-   ce = 
list_first_entry_or_null(>submission_state.destroyed_contexts,
- struct intel_context,
- destroyed_link);
-   if (ce)
-   list_del_init(>destroyed_link);
-   spin_unlock_irqrestore(>submission_state.lock, flags);
-
-   if (!ce)
-   break;
-
+   llist_for_each_entry_safe(ce, cn, take_destroyed_contexts(guc),
+destroyed_link) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
@@ -2723,23 +2715,11 @@ static void guc_flush_destroyed_contexts(struct 
intel_guc *guc)
  
  static void deregister_destroyed_contexts(struct 

[Patch v4 12/24] drm/amdkfd: CRIU restore queue doorbell id

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.

Signed-off-by: David Yat Sin 

---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +--
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7e49f70b81b9..a0f5b8533a03 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -153,7 +153,13 @@ static void decrement_queue_count(struct 
device_queue_manager *dqm,
dqm->active_cp_queue_count--;
 }
 
-static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+/*
+ * Allocate a doorbell ID to this queue.
+ * If doorbell_id is passed in, make sure requested ID is valid then allocate 
it.
+ */
+static int allocate_doorbell(struct qcm_process_device *qpd,
+struct queue *q,
+uint32_t const *restore_id)
 {
struct kfd_dev *dev = qpd->dqm->dev;
 
@@ -161,6 +167,10 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
/* On pre-SOC15 chips we need to use the queue ID to
 * preserve the user mode ABI.
 */
+
+   if (restore_id && *restore_id != q->properties.queue_id)
+   return -EINVAL;
+
q->doorbell_id = q->properties.queue_id;
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
@@ -169,25 +179,37 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
 * The doobell index distance between RLC (2*i) and (2*i+1)
 * for a SDMA engine is 512.
 */
-   uint32_t *idx_offset =
-   dev->shared_resources.sdma_doorbell_idx;
 
-   q->doorbell_id = idx_offset[q->properties.sdma_engine_id]
-   + (q->properties.sdma_queue_id & 1)
-   * KFD_QUEUE_DOORBELL_MIRROR_OFFSET
-   + (q->properties.sdma_queue_id >> 1);
+   uint32_t *idx_offset = dev->shared_resources.sdma_doorbell_idx;
+   uint32_t valid_id = idx_offset[q->properties.sdma_engine_id]
+   + (q->properties.sdma_queue_id 
& 1)
+   * 
KFD_QUEUE_DOORBELL_MIRROR_OFFSET
+   + (q->properties.sdma_queue_id 
>> 1);
+
+   if (restore_id && *restore_id != valid_id)
+   return -EINVAL;
+   q->doorbell_id = valid_id;
} else {
-   /* For CP queues on SOC15 reserve a free doorbell ID */
-   unsigned int found;
-
-   found = find_first_zero_bit(qpd->doorbell_bitmap,
-   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
-   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
-   pr_debug("No doorbells available");
-   return -EBUSY;
+   /* For CP queues on SOC15 */
+   if (restore_id) {
+   /* make sure that ID is free  */
+   if (__test_and_set_bit(*restore_id, 
qpd->doorbell_bitmap))
+   return -EINVAL;
+
+   q->doorbell_id = *restore_id;
+   } else {
+   /* or reserve a free doorbell ID */
+   unsigned int found;
+
+   found = find_first_zero_bit(qpd->doorbell_bitmap,
+   
KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+   pr_debug("No doorbells available");
+   return -EBUSY;
+   }
+   set_bit(found, qpd->doorbell_bitmap);
+   q->doorbell_id = found;
}
-   set_bit(found, qpd->doorbell_bitmap);
-   q->doorbell_id = found;
}
 
q->properties.doorbell_off =
@@ -355,7 +377,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? >doorbell_id : NULL);
if (retval)
goto out_deallocate_hqd;
 
@@ -1338,7 +1360,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
goto out;
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? >doorbell_id : 

[Patch v4 19/24] drm/amdkfd: CRIU allow external mm for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
Both svm_range_get_attr and svm_range_set_attr helpers use mm struct
from current but for a Checkpoint or Restore operation, the current->mm
will fetch the mm for the CRIU master process. So modify these helpers to
accept the task mm for a target kfd process to support Checkpoint
Restore.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 88360f23eb61..7c92116153fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3134,11 +3134,11 @@ static void svm_range_evict_svm_bo_worker(struct 
work_struct *work)
 }
 
 static int
-svm_range_set_attr(struct kfd_process *p, uint64_t start, uint64_t size,
-  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
+  uint64_t start, uint64_t size, uint32_t nattr,
+  struct kfd_ioctl_svm_attribute *attrs)
 {
struct amdkfd_process_info *process_info = p->kgd_process_info;
-   struct mm_struct *mm = current->mm;
struct list_head update_list;
struct list_head insert_list;
struct list_head remove_list;
@@ -3242,8 +3242,9 @@ svm_range_set_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
 }
 
 static int
-svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size,
-  uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs)
+svm_range_get_attr(struct kfd_process *p, struct mm_struct *mm,
+  uint64_t start, uint64_t size, uint32_t nattr,
+  struct kfd_ioctl_svm_attribute *attrs)
 {
DECLARE_BITMAP(bitmap_access, MAX_GPU_INSTANCE);
DECLARE_BITMAP(bitmap_aip, MAX_GPU_INSTANCE);
@@ -3253,7 +3254,6 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, 
uint64_t size,
bool get_accessible = false;
bool get_flags = false;
uint64_t last = start + size - 1UL;
-   struct mm_struct *mm = current->mm;
uint8_t granularity = 0xff;
struct interval_tree_node *node;
struct svm_range_list *svms;
@@ -3422,6 +3422,7 @@ int
 svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op op, uint64_t start,
  uint64_t size, uint32_t nattrs, struct kfd_ioctl_svm_attribute *attrs)
 {
+   struct mm_struct *mm = current->mm;
int r;
 
start >>= PAGE_SHIFT;
@@ -3429,10 +3430,10 @@ svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op 
op, uint64_t start,
 
switch (op) {
case KFD_IOCTL_SVM_OP_SET_ATTR:
-   r = svm_range_set_attr(p, start, size, nattrs, attrs);
+   r = svm_range_set_attr(p, mm, start, size, nattrs, attrs);
break;
case KFD_IOCTL_SVM_OP_GET_ATTR:
-   r = svm_range_get_attr(p, start, size, nattrs, attrs);
+   r = svm_range_get_attr(p, mm, start, size, nattrs, attrs);
break;
default:
r = EINVAL;
-- 
2.17.1



[Patch v4 16/24] drm/amdkfd: CRIU implement gpu_id remapping

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 465 --
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  45 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  11 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  32 ++
 .../amd/amdkfd/kfd_process_queue_manager.c|  18 +-
 5 files changed, 412 insertions(+), 159 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 08467fa2f514..20652d488cde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -294,18 +294,20 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
return err;
 
pr_debug("Looking for gpu id 0x%x\n", args->gpu_id);
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev) {
-   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
-   return -EINVAL;
-   }
 
mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto err_unlock;
+   }
+   dev = pdd->dev;
 
pdd = kfd_bind_process_to_device(dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
-   goto err_bind_process;
+   goto err_unlock;
}
 
pr_debug("Creating queue for PASID 0x%x on gpu 0x%x\n",
@@ -315,7 +317,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL, NULL,
_offset_in_process);
if (err != 0)
-   goto err_create_queue;
+   goto err_unlock;
 
args->queue_id = queue_id;
 
@@ -344,8 +346,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
 
return 0;
 
-err_create_queue:
-err_bind_process:
+err_unlock:
mutex_unlock(>mutex);
return err;
 }
@@ -492,7 +493,6 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_memory_policy_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
enum cache_policy default_policy, alternate_policy;
@@ -507,13 +507,15 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
return -EINVAL;
}
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto out;
+   }
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -526,7 +528,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
(args->alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
   ? cache_policy_coherent : cache_policy_noncoherent;
 
-   if (!dev->dqm->ops.set_cache_memory_policy(dev->dqm,
+   if (!pdd->dev->dqm->ops.set_cache_memory_policy(pdd->dev->dqm,
>qpd,
default_policy,
alternate_policy,
@@ -544,17 +546,18 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_trap_handler_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(>mutex);
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   err = -EINVAL;
+   goto out;
+   }
+
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -578,16 +581,20 @@ static int kfd_ioctl_dbg_register(struct file *filep,
bool create_ok;
long status = 0;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
+   

[Patch v4 22/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2021-12-22 Thread Rajneesh Bhardwaj
During checkpoint stage, save the shared virtual memory ranges and
attributes for the target process. A process may contain a number of svm
ranges and each range might contain a number of arrtibutes. While not
all attributes may be applicable for a given prange but during
checkpoint we store all possible values for the max possible attribute
types.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1c25d5e9067c..916b8d000317 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2186,7 +2186,9 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto close_bo_fds;
 
-   /* TODO: Dump SVM-Ranges */
+   ret = kfd_criu_checkpoint_svm(p, (uint8_t __user 
*)args->priv_data, _offset);
+   if (ret)
+   goto close_bo_fds;
}
 
 close_bo_fds:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 49e05fb5c898..6d59f1bedcf2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3478,6 +3478,101 @@ int svm_range_get_info(struct kfd_process *p, uint32_t 
*num_svm_ranges,
return 0;
 }
 
+int kfd_criu_checkpoint_svm(struct kfd_process *p,
+   uint8_t __user *user_priv_data,
+   uint64_t *priv_data_offset)
+{
+   struct kfd_criu_svm_range_priv_data *svm_priv = NULL;
+   struct kfd_ioctl_svm_attribute *query_attr = NULL;
+   uint64_t svm_priv_data_size, query_attr_size = 0;
+   int index, nattr_common = 4, ret = 0;
+   struct svm_range_list *svms;
+   int num_devices = p->n_pdds;
+   struct svm_range *prange;
+   struct mm_struct *mm;
+
+   svms = >svms;
+   if (!svms)
+   return -EINVAL;
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_err("failed to get mm for the target process\n");
+   return -ESRCH;
+   }
+
+   query_attr_size = sizeof(struct kfd_ioctl_svm_attribute) *
+   (nattr_common + num_devices);
+
+   query_attr = kzalloc(query_attr_size, GFP_KERNEL);
+   if (!query_attr) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   query_attr[0].type = KFD_IOCTL_SVM_ATTR_PREFERRED_LOC;
+   query_attr[1].type = KFD_IOCTL_SVM_ATTR_PREFETCH_LOC;
+   query_attr[2].type = KFD_IOCTL_SVM_ATTR_SET_FLAGS;
+   query_attr[3].type = KFD_IOCTL_SVM_ATTR_GRANULARITY;
+
+   for (index = 0; index < num_devices; index++) {
+   struct kfd_process_device *pdd = p->pdds[index];
+
+   query_attr[index + nattr_common].type =
+   KFD_IOCTL_SVM_ATTR_ACCESS;
+   query_attr[index + nattr_common].value = pdd->user_gpu_id;
+   }
+
+   svm_priv_data_size = sizeof(*svm_priv) + query_attr_size;
+
+   svm_priv = kzalloc(svm_priv_data_size, GFP_KERNEL);
+   if (!svm_priv) {
+   ret = -ENOMEM;
+   goto exit_query;
+   }
+
+   index = 0;
+   list_for_each_entry(prange, >list, list) {
+
+   svm_priv->object_type = KFD_CRIU_OBJECT_TYPE_SVM_RANGE;
+   svm_priv->start_addr = prange->start;
+   svm_priv->size = prange->npages;
+   memcpy(_priv->attrs, query_attr, query_attr_size);
+   pr_debug("CRIU: prange: 0x%p start: 0x%lx\t npages: 0x%llx end: 
0x%llx\t size: 0x%llx\n",
+prange, prange->start, prange->npages,
+prange->start + prange->npages - 1,
+prange->npages * PAGE_SIZE);
+
+   ret = svm_range_get_attr(p, mm, svm_priv->start_addr,
+svm_priv->size,
+(nattr_common + num_devices),
+svm_priv->attrs);
+   if (ret) {
+   pr_err("CRIU: failed to obtain range attributes\n");
+   goto exit_priv;
+   }
+
+   ret = copy_to_user(user_priv_data + *priv_data_offset,
+  svm_priv, svm_priv_data_size);
+   if (ret) {
+   pr_err("Failed to copy svm priv to user\n");
+   goto exit_priv;
+   }
+
+   *priv_data_offset += svm_priv_data_size;
+
+   }
+
+
+exit_priv:
+   kfree(svm_priv);
+exit_query:
+   kfree(query_attr);
+exit:
+   mmput(mm);
+   return ret;
+}
+
 int
 svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op 

[Patch v4 23/24] drm/amdkfd: CRIU prepare for svm resume

2021-12-22 Thread Rajneesh Bhardwaj
During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an imminent CRIU
resume phase.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 99 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 12 +++
 4 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 916b8d000317..f7aa15b18f95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2638,8 +2638,8 @@ static int criu_restore_objects(struct file *filep,
goto exit;
break;
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
-   /* TODO: Implement SVM range */
-   *priv_offset += sizeof(struct 
kfd_criu_svm_range_priv_data);
+   ret = kfd_criu_restore_svm(p, (uint8_t __user 
*)args->priv_data,
+priv_offset, 
max_priv_data_size);
if (ret)
goto exit;
break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 87eb6739a78e..92191c541c29 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -790,6 +790,7 @@ struct svm_range_list {
struct list_headlist;
struct work_struct  deferred_list_work;
struct list_headdeferred_range_list;
+   struct list_headcriu_svm_metadata_list;
spinlock_t  deferred_list_lock;
atomic_tevicted_ranges;
booldrain_pagefaults;
@@ -1148,6 +1149,10 @@ int kfd_criu_restore_event(struct file *devkfd,
   uint8_t __user *user_priv_data,
   uint64_t *priv_data_offset,
   uint64_t max_priv_data_size);
+int kfd_criu_restore_svm(struct kfd_process *p,
+uint8_t __user *user_priv_data,
+uint64_t *priv_data_offset,
+uint64_t max_priv_data_size);
 /* CRIU - End */
 
 /* Queue Context Management */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 6d59f1bedcf2..e9f6c63c2a26 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -45,6 +45,14 @@
  */
 #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING   2000
 
+struct criu_svm_metadata {
+   struct list_head list;
+   __u64 start_addr;
+   __u64 size;
+   /* Variable length array of attributes */
+   struct kfd_ioctl_svm_attribute attrs[0];
+};
+
 static void svm_range_evict_svm_bo_worker(struct work_struct *work);
 static bool
 svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
@@ -2753,6 +2761,7 @@ int svm_range_list_init(struct kfd_process *p)
INIT_DELAYED_WORK(>restore_work, svm_range_restore_work);
INIT_WORK(>deferred_list_work, svm_range_deferred_list_work);
INIT_LIST_HEAD(>deferred_range_list);
+   INIT_LIST_HEAD(>criu_svm_metadata_list);
spin_lock_init(>deferred_list_lock);
 
for (i = 0; i < p->n_pdds; i++)
@@ -3418,6 +3427,96 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int svm_criu_prepare_for_resume(struct kfd_process *p,
+   struct kfd_criu_svm_range_priv_data *svm_priv)
+{
+   int nattr_common = 4, nattr_accessibility = 1;
+   struct criu_svm_metadata *criu_svm_md = NULL;
+   uint64_t svm_attrs_size, svm_object_md_size;
+   struct svm_range_list *svms = >svms;
+   int num_devices = p->n_pdds;
+   int i, ret = 0;
+
+   svm_attrs_size = sizeof(struct kfd_ioctl_svm_attribute) *
+   (nattr_common + nattr_accessibility * num_devices);
+   svm_object_md_size = sizeof(struct criu_svm_metadata) + svm_attrs_size;
+
+   criu_svm_md = kzalloc(svm_object_md_size, GFP_KERNEL);
+   if (!criu_svm_md) {
+   pr_err("failed to allocate memory to store svm metadata\n");
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   criu_svm_md->start_addr = svm_priv->start_addr;
+   criu_svm_md->size = svm_priv->size;
+   for (i = 0; i < svm_attrs_size; i++)
+   {
+   criu_svm_md->attrs[i].type = svm_priv->attrs[i].type;
+   criu_svm_md->attrs[i].value = svm_priv->attrs[i].value;
+   }
+
+   list_add_tail(_svm_md->list, >criu_svm_metadata_list);
+
+
+exit:
+   return 

[Patch v4 14/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 23 ---
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  9 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  | 11 +++-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 13 ++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 14 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 29 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 22 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
 .../amd/amdkfd/kfd_process_queue_manager.c| 62 +--
 11 files changed, 139 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 146879cd3f2b..582b4a393f95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL, NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 3a5303ebcabf..8eca9ed3ab36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL, NULL, NULL);
+   , , NULL, NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a92274f9f1f7..248e69c7960b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -332,7 +332,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
struct queue *q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void 
*restore_ctl_stack)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -394,7 +394,8 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
 
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
->properties, restore_mqd);
+>properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
@@ -1347,7 +1348,7 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void *restore_ctl_stack)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1393,9 +1394,11 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
+
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
->properties, restore_mqd);
+>properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
@@ -1788,7 +1791,8 @@ static int get_wave_state(struct device_queue_manager 
*dqm,
 
 static void get_queue_checkpoint_info(struct device_queue_manager *dqm,
const struct queue *q,
-   u32 *mqd_size)
+   u32 *mqd_size,
+   u32 *ctl_stack_size)
 {

[Patch v4 18/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2021-12-22 Thread Rajneesh Bhardwaj
Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically used for querying
the current xnack mode without modifying it.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 15 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 178b0ccfb286..446eb9310915 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1845,6 +1845,11 @@ static int criu_checkpoint_process(struct kfd_process *p,
memset(_priv, 0, sizeof(process_priv));
 
process_priv.version = KFD_CRIU_PRIV_VERSION;
+   /* For CR, we don't consider negative xnack mode which is used for
+* querying without changing it, here 0 simply means disabled and 1
+* means enabled so retry for finding a valid PTE.
+*/
+   process_priv.xnack_mode = p->xnack_enabled ? 1 : 0;
 
ret = copy_to_user(user_priv_data + *priv_offset,
_priv, sizeof(process_priv));
@@ -2231,6 +2236,16 @@ static int criu_restore_process(struct kfd_process *p,
return -EINVAL;
}
 
+   pr_debug("Setting XNACK mode\n");
+   if (process_priv.xnack_mode && !kfd_process_xnack_mode(p, true)) {
+   pr_err("xnack mode cannot be set\n");
+   ret = -EPERM;
+   goto exit;
+   } else {
+   pr_debug("set xnack mode: %d\n", process_priv.xnack_mode);
+   p->xnack_enabled = process_priv.xnack_mode;
+   }
+
 exit:
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 855c162b85ea..d72dda84c18c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1057,6 +1057,7 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
 
 struct kfd_criu_process_priv_data {
uint32_t version;
+   uint32_t xnack_mode;
 };
 
 struct kfd_criu_device_priv_data {
-- 
2.17.1



[Patch v4 11/24] drm/amdkfd: CRIU restore sdma id for queues

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.

Signed-off-by: David Yat Sin 

---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  4 +-
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 62fe28244a80..7e49f70b81b9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -58,7 +58,7 @@ static inline void deallocate_hqd(struct device_queue_manager 
*dqm,
struct queue *q);
 static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q);
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q);
+   struct queue *q, const uint32_t 
*restore_sdma_id);
 static void kfd_process_hw_exception(struct work_struct *work);
 
 static inline
@@ -308,7 +308,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
-   struct qcm_process_device *qpd)
+   struct qcm_process_device *qpd,
+   const struct kfd_criu_queue_priv_data *qd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -348,7 +349,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
q->pipe, q->queue);
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
-   retval = allocate_sdma_queue(dqm, q);
+   retval = allocate_sdma_queue(dqm, q, qd ? >sdma_id : NULL);
if (retval)
goto deallocate_vmid;
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1040,7 +1041,7 @@ static void pre_reset(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q)
+   struct queue *q, const uint32_t 
*restore_sdma_id)
 {
int bit;
 
@@ -1050,9 +1051,21 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
return -ENOMEM;
}
 
-   bit = __ffs64(dqm->sdma_bitmap);
-   dqm->sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->sdma_bitmap & (1ULL << *restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   /* Find first available sdma_id */
+   bit = __ffs64(dqm->sdma_bitmap);
+   dqm->sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
+
q->properties.sdma_engine_id = q->sdma_id %
get_num_sdma_engines(dqm);
q->properties.sdma_queue_id = q->sdma_id /
@@ -1062,9 +1075,19 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
pr_err("No more XGMI SDMA queue to allocate\n");
return -ENOMEM;
}
-   bit = __ffs64(dqm->xgmi_sdma_bitmap);
-   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->xgmi_sdma_bitmap & (1ULL << 
*restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   bit = __ffs64(dqm->xgmi_sdma_bitmap);
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
/* sdma_engine_id is sdma id including
 * both PCIe-optimized SDMAs and XGMI-
 * optimized SDMAs. The calculation below
@@ -1293,7 +1316,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 }
 
 static int create_queue_cpsch(struct 

[Patch v4 21/24] drm/amdkfd: CRIU Discover svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
A KFD process may contain a number of virtual address ranges for shared
virtual memory management and each such range can have many SVM
attributes spanning across various nodes within the process boundary.
This change reports the total number of such SVM ranges and
their total private data size by extending the PROCESS_INFO op of the the
CRIU IOCTL to discover the svm ranges in the target process and a future
patches brings in the required support for checkpoint and restore for
SVM ranges.


Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 60 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 11 +
 4 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 446eb9310915..1c25d5e9067c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2089,10 +2089,9 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
uint32_t *num_objects,
uint64_t *objs_priv_size)
 {
-   int ret;
-   uint64_t priv_size;
+   uint64_t queues_priv_data_size, svm_priv_data_size, priv_size;
uint32_t num_queues, num_events, num_svm_ranges;
-   uint64_t queues_priv_data_size;
+   int ret;
 
*num_devices = p->n_pdds;
*num_bos = get_process_num_bos(p);
@@ -2102,7 +2101,10 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
return ret;
 
num_events = kfd_get_num_events(p);
-   num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
+
+   ret = svm_range_get_info(p, _svm_ranges, _priv_data_size);
+   if (ret)
+   return ret;
 
*num_objects = num_queues + num_events + num_svm_ranges;
 
@@ -2112,7 +2114,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
priv_size += queues_priv_data_size;
priv_size += num_events * sizeof(struct 
kfd_criu_event_priv_data);
-   /* TODO: Add SVM ranges priv size */
+   priv_size += svm_priv_data_size;
*objs_priv_size = priv_size;
}
return 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d72dda84c18c..87eb6739a78e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1082,7 +1082,10 @@ enum kfd_criu_object_type {
 
 struct kfd_criu_svm_range_priv_data {
uint32_t object_type;
-   uint64_t reserved;
+   uint64_t start_addr;
+   uint64_t size;
+   /* Variable length array of attributes */
+   struct kfd_ioctl_svm_attribute attrs[0];
 };
 
 struct kfd_criu_queue_priv_data {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7c92116153fe..49e05fb5c898 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3418,6 +3418,66 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int svm_range_get_info(struct kfd_process *p, uint32_t *num_svm_ranges,
+  uint64_t *svm_priv_data_size)
+{
+   uint64_t total_size, accessibility_size, common_attr_size;
+   int nattr_common = 4, naatr_accessibility = 1;
+   int num_devices = p->n_pdds;
+   struct svm_range_list *svms;
+   struct svm_range *prange;
+   uint32_t count = 0;
+
+   *svm_priv_data_size = 0;
+
+   svms = >svms;
+   if (!svms)
+   return -EINVAL;
+
+   mutex_lock(>lock);
+   list_for_each_entry(prange, >list, list) {
+   pr_debug("prange: 0x%p start: 0x%lx\t npages: 0x%llx\t end: 
0x%llx\n",
+prange, prange->start, prange->npages,
+prange->start + prange->npages - 1);
+   count++;
+   }
+   mutex_unlock(>lock);
+
+   *num_svm_ranges = count;
+   /* Only the accessbility attributes need to be queried for all the gpus
+* individually, remaining ones are spanned across the entire process
+* regardless of the various gpu nodes. Of the remaining attributes,
+* KFD_IOCTL_SVM_ATTR_CLR_FLAGS need not be saved.
+*
+* KFD_IOCTL_SVM_ATTR_PREFERRED_LOC
+* KFD_IOCTL_SVM_ATTR_PREFETCH_LOC
+* KFD_IOCTL_SVM_ATTR_SET_FLAGS
+* KFD_IOCTL_SVM_ATTR_GRANULARITY
+*
+* ** ACCESSBILITY ATTRIBUTES **
+* (Considered as one, type is altered during query, value is gpuid)
+* KFD_IOCTL_SVM_ATTR_ACCESS
+* KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE
+* KFD_IOCTL_SVM_ATTR_NO_ACCESS
+*/
+   if (*num_svm_ranges 

[Patch v4 15/24] drm/amdkfd: CRIU checkpoint and restore events

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  70 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 272 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  27 ++-
 3 files changed, 280 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 582b4a393f95..08467fa2f514 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1009,57 +1009,11 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
 * through the event_page_offset field.
 */
if (args->event_page_offset) {
-   struct kfd_dev *kfd;
-   struct kfd_process_device *pdd;
-   void *mem, *kern_addr;
-   uint64_t size;
-
-   kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
-   if (!kfd) {
-   pr_err("Getting device by id failed in %s\n", __func__);
-   return -EINVAL;
-   }
-
mutex_lock(>mutex);
-
-   if (p->signal_page) {
-   pr_err("Event page is already set\n");
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   pdd = kfd_bind_process_to_device(kfd, p);
-   if (IS_ERR(pdd)) {
-   err = PTR_ERR(pdd);
-   goto out_unlock;
-   }
-
-   mem = kfd_process_device_translate_handle(pdd,
-   GET_IDR_HANDLE(args->event_page_offset));
-   if (!mem) {
-   pr_err("Can't find BO, offset is 0x%llx\n",
-  args->event_page_offset);
-   err = -EINVAL;
-   goto out_unlock;
-   }
-
-   err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->adev,
-   mem, _addr, );
-   if (err) {
-   pr_err("Failed to map event page to kernel\n");
-   goto out_unlock;
-   }
-
-   err = kfd_event_page_set(p, kern_addr, size);
-   if (err) {
-   pr_err("Failed to set event page\n");
-   amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(kfd->adev, 
mem);
-   goto out_unlock;
-   }
-
-   p->signal_handle = args->event_page_offset;
-
+   err = kfd_kmap_event_page(p, args->event_page_offset);
mutex_unlock(>mutex);
+   if (err)
+   return err;
}
 
err = kfd_event_create(filp, p, args->event_type,
@@ -1068,10 +1022,7 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
>event_page_offset,
>event_slot_index);
 
-   return err;
-
-out_unlock:
-   mutex_unlock(>mutex);
+   pr_debug("Created event (id:0x%08x) (%s)\n", args->event_id, __func__);
return err;
 }
 
@@ -2022,7 +1973,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
if (ret)
return ret;
 
-   num_events = 0; /* TODO: Implement Events */
+   num_events = kfd_get_num_events(p);
num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
 
*num_objects = num_queues + num_events + num_svm_ranges;
@@ -2031,7 +1982,7 @@ static int criu_get_process_object_info(struct 
kfd_process *p,
priv_size = sizeof(struct kfd_criu_process_priv_data);
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
priv_size += queues_priv_data_size;
-   /* TODO: Add Events priv size */
+   priv_size += num_events * sizeof(struct 
kfd_criu_event_priv_data);
/* TODO: Add SVM ranges priv size */
*objs_priv_size = priv_size;
}
@@ -2093,7 +2044,10 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto exit_unlock;
 
-   /* TODO: Dump Events */
+   ret = kfd_criu_checkpoint_events(p, (uint8_t __user 
*)args->priv_data,
+_offset);
+   if (ret)
+   goto exit_unlock;
 
/* TODO: Dump SVM-Ranges */
}
@@ -2406,8 +2360,8 @@ static int criu_restore_objects(struct file *filep,
goto exit;
break;
case KFD_CRIU_OBJECT_TYPE_EVENT:
-   /* TODO: Implement Events */
-   *priv_offset += sizeof(struct kfd_criu_event_priv_data);
+  

[Patch v4 17/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2021-12-22 Thread Rajneesh Bhardwaj
KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to further process
the sdma command submissions.

With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.

Suggested-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71 +++-
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 20652d488cde..178b0ccfb286 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
@@ -43,6 +44,7 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_object.h"
+#include "amdgpu_dma_buf.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1932,6 +1934,33 @@ uint64_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_get_prime_handle(struct drm_gem_object *gobj, int flags,
+ u32 *shared_fd)
+{
+   struct dma_buf *dmabuf;
+   int ret;
+
+   dmabuf = amdgpu_gem_prime_export(gobj, flags);
+   if (IS_ERR(dmabuf)) {
+   ret = PTR_ERR(dmabuf);
+   pr_err("dmabuf export failed for the BO\n");
+   return ret;
+   }
+
+   ret = dma_buf_fd(dmabuf, flags);
+   if (ret < 0) {
+   pr_err("dmabuf create fd failed, ret:%d\n", ret);
+   goto out_free_dmabuf;
+   }
+
+   *shared_fd = ret;
+   return 0;
+
+out_free_dmabuf:
+   dma_buf_put(dmabuf);
+   return ret;
+}
+
 static int criu_checkpoint_bos(struct kfd_process *p,
   uint32_t num_bos,
   uint8_t __user *user_bos,
@@ -1992,6 +2021,14 @@ static int criu_checkpoint_bos(struct kfd_process *p,
goto exit;
}
}
+   if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+   ret = 
criu_get_prime_handle(_bo->tbo.base,
+   bo_bucket->alloc_flags &
+   
KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DRM_RDWR : 0,
+   _bucket->dmabuf_fd);
+   if (ret)
+   goto exit;
+   }
if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
KFD_MMAP_GPU_ID(pdd->dev->id);
@@ -2031,6 +2068,10 @@ static int criu_checkpoint_bos(struct kfd_process *p,
*priv_offset += num_bos * sizeof(*bo_privs);
 
 exit:
+   while (ret && bo_index--) {
+   if (bo_buckets[bo_index].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[bo_index].dmabuf_fd);
+   }
 
kvfree(bo_buckets);
kvfree(bo_privs);
@@ -2131,16 +2172,28 @@ static int criu_checkpoint(struct file *filep,
ret = kfd_criu_checkpoint_queues(p, (uint8_t __user 
*)args->priv_data,
 _offset);
if (ret)
-   goto exit_unlock;
+   goto close_bo_fds;
 
ret = kfd_criu_checkpoint_events(p, (uint8_t __user 
*)args->priv_data,
 _offset);
if (ret)
-   goto exit_unlock;
+   goto close_bo_fds;
 
/* TODO: Dump SVM-Ranges */
}
 
+close_bo_fds:
+   if (ret) {
+   /* If IOCTL returns err, user assumes all FDs opened in 
criu_dump_bos are closed */
+   uint32_t i;
+   struct kfd_criu_bo_bucket *bo_buckets = (struct 
kfd_criu_bo_bucket *) args->bos;
+
+   for (i = 0; i < num_bos; i++) {
+   if (bo_buckets[i].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[i].dmabuf_fd);
+   }
+   }
+
 exit_unlock:
mutex_unlock(>mutex);
if (ret)
@@ -2335,6 +2388,7 @@ static int criu_restore_bos(struct kfd_process *p,
struct kfd_criu_bo_priv_data *bo_priv;
struct kfd_dev *dev;
struct 

[Patch v4 09/24] drm/amdkfd: CRIU add queues support

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 110 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  43 +++-
 .../amd/amdkfd/kfd_process_queue_manager.c| 212 ++
 3 files changed, 357 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index db2bb302a8d4..9665c8657929 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2006,19 +2006,36 @@ static int criu_checkpoint_bos(struct kfd_process *p,
return ret;
 }
 
-static void criu_get_process_object_info(struct kfd_process *p,
-uint32_t *num_bos,
-uint64_t *objs_priv_size)
+static int criu_get_process_object_info(struct kfd_process *p,
+   uint32_t *num_bos,
+   uint32_t *num_objects,
+   uint64_t *objs_priv_size)
 {
+   int ret;
uint64_t priv_size;
+   uint32_t num_queues, num_events, num_svm_ranges;
+   uint64_t queues_priv_data_size;
 
*num_bos = get_process_num_bos(p);
 
+   ret = kfd_process_get_queue_info(p, _queues, 
_priv_data_size);
+   if (ret)
+   return ret;
+
+   num_events = 0; /* TODO: Implement Events */
+   num_svm_ranges = 0; /* TODO: Implement SVM-Ranges */
+
+   *num_objects = num_queues + num_events + num_svm_ranges;
+
if (objs_priv_size) {
priv_size = sizeof(struct kfd_criu_process_priv_data);
priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
+   priv_size += queues_priv_data_size;
+   /* TODO: Add Events priv size */
+   /* TODO: Add SVM ranges priv size */
*objs_priv_size = priv_size;
}
+   return 0;
 }
 
 static int criu_checkpoint(struct file *filep,
@@ -2026,7 +2043,7 @@ static int criu_checkpoint(struct file *filep,
   struct kfd_ioctl_criu_args *args)
 {
int ret;
-   uint32_t num_bos;
+   uint32_t num_bos, num_objects;
uint64_t priv_size, priv_offset = 0;
 
if (!args->bos || !args->priv_data)
@@ -2048,9 +2065,12 @@ static int criu_checkpoint(struct file *filep,
goto exit_unlock;
}
 
-   criu_get_process_object_info(p, _bos, _size);
+   ret = criu_get_process_object_info(p, _bos, _objects, 
_size);
+   if (ret)
+   goto exit_unlock;
 
if (num_bos != args->num_bos ||
+   num_objects != args->num_objects ||
priv_size != args->priv_data_size) {
 
ret = -EINVAL;
@@ -2067,6 +2087,17 @@ static int criu_checkpoint(struct file *filep,
if (ret)
goto exit_unlock;
 
+   if (num_objects) {
+   ret = kfd_criu_checkpoint_queues(p, (uint8_t __user 
*)args->priv_data,
+_offset);
+   if (ret)
+   goto exit_unlock;
+
+   /* TODO: Dump Events */
+
+   /* TODO: Dump SVM-Ranges */
+   }
+
 exit_unlock:
mutex_unlock(>mutex);
if (ret)
@@ -2340,6 +2371,62 @@ static int criu_restore_bos(struct kfd_process *p,
return ret;
 }
 
+static int criu_restore_objects(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   int ret = 0;
+   uint32_t i;
+
+   BUILD_BUG_ON(offsetof(struct kfd_criu_queue_priv_data, object_type));
+   BUILD_BUG_ON(offsetof(struct kfd_criu_event_priv_data, object_type));
+   BUILD_BUG_ON(offsetof(struct kfd_criu_svm_range_priv_data, 
object_type));
+
+   for (i = 0; i < args->num_objects; i++) {
+   uint32_t object_type;
+
+   if (*priv_offset + sizeof(object_type) > max_priv_data_size) {
+   pr_err("Invalid private data size\n");
+   return -EINVAL;
+   }
+
+   ret = get_user(object_type, (uint32_t __user *)(args->priv_data 
+ *priv_offset));
+   if (ret) {
+   pr_err("Failed to copy private information from 
user\n");
+   goto exit;
+   }
+
+   switch (object_type) {
+   case KFD_CRIU_OBJECT_TYPE_QUEUE:
+   ret = kfd_criu_restore_queue(p, (uint8_t __user 
*)args->priv_data,
+priv_offset, 
max_priv_data_size);
+   

[Patch v4 02/24] x86/configs: Add rock-rel_defconfig for amd-feature-criu branch

2021-12-22 Thread Rajneesh Bhardwaj
 - Add rock-rel_defconfig for release builds.

Signed-off-by: Rajneesh Bhardwaj 
---
 arch/x86/configs/rock-rel_defconfig | 4927 +++
 1 file changed, 4927 insertions(+)
 create mode 100644 arch/x86/configs/rock-rel_defconfig

diff --git a/arch/x86/configs/rock-rel_defconfig 
b/arch/x86/configs/rock-rel_defconfig
new file mode 100644
index ..f038ce7a0d06
--- /dev/null
+++ b/arch/x86/configs/rock-rel_defconfig
@@ -0,0 +1,4927 @@
+#
+# Automatically generated file; DO NOT EDIT.
+# Linux/x86 5.13.0 Kernel Configuration
+#
+CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
+CONFIG_CC_IS_GCC=y
+CONFIG_GCC_VERSION=70500
+CONFIG_CLANG_VERSION=0
+CONFIG_AS_IS_GNU=y
+CONFIG_AS_VERSION=23000
+CONFIG_LD_IS_BFD=y
+CONFIG_LD_VERSION=23000
+CONFIG_LLD_VERSION=0
+CONFIG_CC_CAN_LINK=y
+CONFIG_CC_CAN_LINK_STATIC=y
+CONFIG_CC_HAS_ASM_GOTO=y
+CONFIG_CC_HAS_ASM_INLINE=y
+CONFIG_IRQ_WORK=y
+CONFIG_BUILDTIME_TABLE_SORT=y
+CONFIG_THREAD_INFO_IN_TASK=y
+
+#
+# General setup
+#
+CONFIG_INIT_ENV_ARG_LIMIT=32
+# CONFIG_COMPILE_TEST is not set
+CONFIG_LOCALVERSION="-kfd"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_BUILD_SALT=""
+CONFIG_HAVE_KERNEL_GZIP=y
+CONFIG_HAVE_KERNEL_BZIP2=y
+CONFIG_HAVE_KERNEL_LZMA=y
+CONFIG_HAVE_KERNEL_XZ=y
+CONFIG_HAVE_KERNEL_LZO=y
+CONFIG_HAVE_KERNEL_LZ4=y
+CONFIG_HAVE_KERNEL_ZSTD=y
+CONFIG_KERNEL_GZIP=y
+# CONFIG_KERNEL_BZIP2 is not set
+# CONFIG_KERNEL_LZMA is not set
+# CONFIG_KERNEL_XZ is not set
+# CONFIG_KERNEL_LZO is not set
+# CONFIG_KERNEL_LZ4 is not set
+# CONFIG_KERNEL_ZSTD is not set
+CONFIG_DEFAULT_INIT=""
+CONFIG_DEFAULT_HOSTNAME="(none)"
+CONFIG_SWAP=y
+CONFIG_SYSVIPC=y
+CONFIG_SYSVIPC_SYSCTL=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_POSIX_MQUEUE_SYSCTL=y
+# CONFIG_WATCH_QUEUE is not set
+CONFIG_CROSS_MEMORY_ATTACH=y
+CONFIG_USELIB=y
+CONFIG_AUDIT=y
+CONFIG_HAVE_ARCH_AUDITSYSCALL=y
+CONFIG_AUDITSYSCALL=y
+
+#
+# IRQ subsystem
+#
+CONFIG_GENERIC_IRQ_PROBE=y
+CONFIG_GENERIC_IRQ_SHOW=y
+CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
+CONFIG_GENERIC_PENDING_IRQ=y
+CONFIG_GENERIC_IRQ_MIGRATION=y
+CONFIG_HARDIRQS_SW_RESEND=y
+CONFIG_IRQ_DOMAIN=y
+CONFIG_IRQ_DOMAIN_HIERARCHY=y
+CONFIG_GENERIC_MSI_IRQ=y
+CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
+CONFIG_IRQ_MSI_IOMMU=y
+CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
+CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
+CONFIG_IRQ_FORCED_THREADING=y
+CONFIG_SPARSE_IRQ=y
+# CONFIG_GENERIC_IRQ_DEBUGFS is not set
+# end of IRQ subsystem
+
+CONFIG_CLOCKSOURCE_WATCHDOG=y
+CONFIG_ARCH_CLOCKSOURCE_INIT=y
+CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
+CONFIG_GENERIC_TIME_VSYSCALL=y
+CONFIG_GENERIC_CLOCKEVENTS=y
+CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
+CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
+CONFIG_GENERIC_CMOS_UPDATE=y
+CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
+CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y
+
+#
+# Timers subsystem
+#
+CONFIG_TICK_ONESHOT=y
+CONFIG_NO_HZ_COMMON=y
+# CONFIG_HZ_PERIODIC is not set
+CONFIG_NO_HZ_IDLE=y
+# CONFIG_NO_HZ_FULL is not set
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+# end of Timers subsystem
+
+CONFIG_BPF=y
+CONFIG_HAVE_EBPF_JIT=y
+CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
+
+#
+# BPF subsystem
+#
+CONFIG_BPF_SYSCALL=y
+# CONFIG_BPF_JIT is not set
+# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
+# CONFIG_BPF_PRELOAD is not set
+# end of BPF subsystem
+
+# CONFIG_PREEMPT_NONE is not set
+CONFIG_PREEMPT_VOLUNTARY=y
+# CONFIG_PREEMPT is not set
+CONFIG_PREEMPT_COUNT=y
+
+#
+# CPU/Task time and stats accounting
+#
+CONFIG_TICK_CPU_ACCOUNTING=y
+# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
+# CONFIG_IRQ_TIME_ACCOUNTING is not set
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_BSD_PROCESS_ACCT_V3=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+# CONFIG_PSI is not set
+# end of CPU/Task time and stats accounting
+
+# CONFIG_CPU_ISOLATION is not set
+
+#
+# RCU Subsystem
+#
+CONFIG_TREE_RCU=y
+# CONFIG_RCU_EXPERT is not set
+CONFIG_SRCU=y
+CONFIG_TREE_SRCU=y
+CONFIG_TASKS_RCU_GENERIC=y
+CONFIG_TASKS_RUDE_RCU=y
+CONFIG_TASKS_TRACE_RCU=y
+CONFIG_RCU_STALL_COMMON=y
+CONFIG_RCU_NEED_SEGCBLIST=y
+# end of RCU Subsystem
+
+CONFIG_BUILD_BIN2C=y
+# CONFIG_IKCONFIG is not set
+# CONFIG_IKHEADERS is not set
+CONFIG_LOG_BUF_SHIFT=18
+CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
+CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
+CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
+
+#
+# Scheduler features
+#
+# CONFIG_UCLAMP_TASK is not set
+# end of Scheduler features
+
+CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
+CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
+CONFIG_CC_HAS_INT128=y
+CONFIG_ARCH_SUPPORTS_INT128=y
+CONFIG_NUMA_BALANCING=y
+CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
+CONFIG_CGROUPS=y
+CONFIG_PAGE_COUNTER=y
+CONFIG_MEMCG=y
+CONFIG_MEMCG_SWAP=y
+CONFIG_MEMCG_KMEM=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_WRITEBACK=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_FAIR_GROUP_SCHED=y
+CONFIG_CFS_BANDWIDTH=y
+# CONFIG_RT_GROUP_SCHED is not set
+CONFIG_CGROUP_PIDS=y
+# CONFIG_CGROUP_RDMA is not set
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_HUGETLB=y
+CONFIG_CPUSETS=y

[Patch v4 13/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.

Signed-off-by: David Yat Sin 

---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |   2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  72 +++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   7 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  67 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  68 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  68 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  69 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 158 --
 11 files changed, 506 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 3fb155f756fd..146879cd3f2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL, NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 0c50e67e2b51..3a5303ebcabf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL, NULL);
+   , , NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a0f5b8533a03..a92274f9f1f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -331,7 +331,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -390,8 +391,14 @@ static int create_queue_nocpsch(struct 
device_queue_manager *dqm,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
-   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
-   >gart_mqd_addr, >properties);
+
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
+>properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
+   >gart_mqd_addr, >properties);
+
if (q->properties.is_active) {
if (!dqm->sched_running) {
WARN_ONCE(1, "Load non-HWS mqd while stopped\n");
@@ -1339,7 +1346,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1385,8 +1393,12 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
-   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
-   >gart_mqd_addr, >properties);
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, 
>gart_mqd_addr,
+>properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
+   >gart_mqd_addr, >properties);
 
list_add(>list, >queues_list);
qpd->queue_count++;
@@ -1774,6 +1786,50 @@ static int get_wave_state(struct device_queue_manager 
*dqm,

[Patch v4 06/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-12-22 Thread Rajneesh Bhardwaj
This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic foundation for the
userptrs buffer objects which will be added in a separate patch.
This ioctl creates various types of buffer objects such as VRAM,
MMIO, Doorbell, GTT based on the date sent from the userspace plugin.
The data mostly contains the previously checkpointed KFD images from
some KFD processs.

While restoring a criu process, attach old IDR values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++-
 1 file changed, 297 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index cdbb92972338..c93f74ad073f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2069,11 +2069,307 @@ static int criu_checkpoint(struct file *filep,
return ret;
 }
 
+static int criu_restore_process(struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   int ret = 0;
+   struct kfd_criu_process_priv_data process_priv;
+
+   if (*priv_offset + sizeof(process_priv) > max_priv_data_size)
+   return -EINVAL;
+
+   ret = copy_from_user(_priv,
+   (void __user *)(args->priv_data + *priv_offset),
+   sizeof(process_priv));
+   if (ret) {
+   pr_err("Failed to copy process private information from 
user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_offset += sizeof(process_priv);
+
+   if (process_priv.version != KFD_CRIU_PRIV_VERSION) {
+   pr_err("Invalid CRIU API version (checkpointed:%d 
current:%d)\n",
+   process_priv.version, KFD_CRIU_PRIV_VERSION);
+   return -EINVAL;
+   }
+
+exit:
+   return ret;
+}
+
+static int criu_restore_bos(struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args,
+   uint64_t *priv_offset,
+   uint64_t max_priv_data_size)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   struct kfd_criu_bo_priv_data *bo_privs;
+   bool flush_tlbs = false;
+   int ret = 0, j = 0;
+   uint32_t i;
+
+   if (*priv_offset + (args->num_bos * sizeof(*bo_privs)) > 
max_priv_data_size)
+   return -EINVAL;
+
+   bo_buckets = kvmalloc_array(args->num_bos, sizeof(*bo_buckets), 
GFP_KERNEL);
+   if (!bo_buckets)
+   return -ENOMEM;
+
+   ret = copy_from_user(bo_buckets, (void __user *)args->bos,
+args->num_bos * sizeof(*bo_buckets));
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+
+   bo_privs = kvmalloc_array(args->num_bos, sizeof(*bo_privs), GFP_KERNEL);
+   if (!bo_privs) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   ret = copy_from_user(bo_privs, (void __user *)args->priv_data + 
*priv_offset,
+args->num_bos * sizeof(*bo_privs));
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+   *priv_offset += args->num_bos * sizeof(*bo_privs);
+
+   /* Create and map new BOs */
+   for (i = 0; i < args->num_bos; i++) {
+   struct kfd_criu_bo_bucket *bo_bucket;
+   struct kfd_criu_bo_priv_data *bo_priv;
+   struct kfd_dev *dev;
+   struct kfd_process_device *pdd;
+   void *mem;
+   u64 offset;
+   int idr_handle;
+
+   bo_bucket = _buckets[i];
+   bo_priv = _privs[i];
+
+   dev = kfd_device_by_id(bo_bucket->gpu_id);
+   if (!dev) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+   pdd = kfd_get_process_device_data(dev, p);
+   if (!pdd) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+
+   pr_debug("kfd restore ioctl - bo_bucket[%d]:\n", i);
+   pr_debug("size = 0x%llx, bo_addr = 0x%llx bo_offset = 0x%llx\n"
+   "gpu_id = 0x%x alloc_flags = 0x%x\n"
+   

[Patch v4 24/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2021-12-22 Thread Rajneesh Bhardwaj
In CRIU resume stage, resume all the shared virtual memory ranges from
the data stored inside the resuming kfd process during CRIU restore
phase. Also setup xnack mode and free up the resources.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  6 +++
 3 files changed, 71 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f7aa15b18f95..6191e37656dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2759,7 +2759,17 @@ static int criu_resume(struct file *filep,
}
 
mutex_lock(>mutex);
+   ret = kfd_criu_resume_svm(target);
+   if (ret) {
+   pr_err("kfd_criu_resume_svm failed for %i\n", args->pid);
+   goto exit;
+   }
+
ret =  amdgpu_amdkfd_criu_resume(target->kgd_process_info);
+   if (ret)
+   pr_err("amdgpu_amdkfd_criu_resume failed for %i\n", args->pid);
+
+exit:
mutex_unlock(>mutex);
 
kfd_unref_process(target);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index e9f6c63c2a26..bd2dce37f345 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3427,6 +3427,61 @@ svm_range_get_attr(struct kfd_process *p, struct 
mm_struct *mm,
return 0;
 }
 
+int kfd_criu_resume_svm(struct kfd_process *p)
+{
+   int nattr_common = 4, nattr_accessibility = 1;
+   struct criu_svm_metadata *criu_svm_md = NULL;
+   struct criu_svm_metadata *next = NULL;
+   struct svm_range_list *svms = >svms;
+   int i, j, num_attrs, ret = 0;
+   struct mm_struct *mm;
+
+   if (list_empty(>criu_svm_metadata_list)) {
+   pr_debug("No SVM data from CRIU restore stage 2\n");
+   return ret;
+   }
+
+   mm = get_task_mm(p->lead_thread);
+   if (!mm) {
+   pr_err("failed to get mm for the target process\n");
+   return -ESRCH;
+   }
+
+   num_attrs = nattr_common + (nattr_accessibility * p->n_pdds);
+
+   i = j = 0;
+   list_for_each_entry(criu_svm_md, >criu_svm_metadata_list, list) {
+   pr_debug("criu_svm_md[%d]\n\tstart: 0x%llx size: 0x%llx 
(npages)\n",
+i, criu_svm_md->start_addr, criu_svm_md->size);
+   for (j = 0; j < num_attrs; j++) {
+   pr_debug("\ncriu_svm_md[%d]->attrs[%d].type : 0x%x 
\ncriu_svm_md[%d]->attrs[%d].value : 0x%x\n",
+i,j, criu_svm_md->attrs[j].type,
+i,j, criu_svm_md->attrs[j].value);
+   }
+
+   ret = svm_range_set_attr(p, mm, criu_svm_md->start_addr,
+criu_svm_md->size, num_attrs,
+criu_svm_md->attrs);
+   if (ret) {
+   pr_err("CRIU: failed to set range attributes\n");
+   goto exit;
+   }
+
+   i++;
+   }
+
+exit:
+   list_for_each_entry_safe(criu_svm_md, next, 
>criu_svm_metadata_list, list) {
+   pr_debug("freeing criu_svm_md[]\n\tstart: 0x%llx\n",
+   criu_svm_md->start_addr);
+   kfree(criu_svm_md);
+   }
+
+   mmput(mm);
+   return ret;
+
+}
+
 int svm_criu_prepare_for_resume(struct kfd_process *p,
struct kfd_criu_svm_range_priv_data *svm_priv)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index e0c0853f085c..3b5bcb52723c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -195,6 +195,7 @@ int kfd_criu_restore_svm(struct kfd_process *p,
 uint8_t __user *user_priv_ptr,
 uint64_t *priv_data_offset,
 uint64_t max_priv_data_size);
+int kfd_criu_resume_svm(struct kfd_process *p);
 struct kfd_process_device *
 svm_range_get_pdd_by_adev(struct svm_range *prange, struct amdgpu_device 
*adev);
 void svm_range_list_lock_and_flush_work(struct svm_range_list *svms, struct 
mm_struct *mm);
@@ -256,6 +257,11 @@ static inline int kfd_criu_restore_svm(struct kfd_process 
*p,
return -EINVAL;
 }
 
+static inline int kfd_criu_resume_svm(struct kfd_process *p)
+{
+   return 0;
+}
+
 #define KFD_IS_SVM_API_SUPPORTED(dev) false
 
 #endif /* IS_ENABLED(CONFIG_HSA_AMD_SVM) */
-- 
2.17.1



[Patch v4 20/24] drm/amdkfd: use user_gpu_id for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore
support its possible that the SVM ranges can be resumed on another node
where the actual_gpu_id may not be same as the original (user_gpu_id)
gpu id. So modify svm code to use user_gpu_id.

Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 67e2432098d1..0769dc655e15 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1813,7 +1813,7 @@ int kfd_process_gpuidx_from_gpuid(struct kfd_process *p, 
uint32_t gpu_id)
int i;
 
for (i = 0; i < p->n_pdds; i++)
-   if (p->pdds[i] && gpu_id == p->pdds[i]->dev->id)
+   if (p->pdds[i] && gpu_id == p->pdds[i]->user_gpu_id)
return i;
return -EINVAL;
 }
@@ -1826,7 +1826,7 @@ kfd_process_gpuid_from_adev(struct kfd_process *p, struct 
amdgpu_device *adev,
 
for (i = 0; i < p->n_pdds; i++)
if (p->pdds[i] && p->pdds[i]->dev->adev == adev) {
-   *gpuid = p->pdds[i]->dev->id;
+   *gpuid = p->pdds[i]->user_gpu_id;
*gpuidx = i;
return 0;
}
-- 
2.17.1



[Patch v4 04/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-12-22 Thread Rajneesh Bhardwaj
This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
IOCTL.

The process_info IOCTL determines the number of GPUs, buffer objects
that are associated with the target process, its process id in
caller's namespace since /proc/pid/mem interface maybe used to drain
the contents of the discovered buffer objects in userspace and getpid
returns the pid of CRIU dumper process. Also the pid of a process
inside a container might be different than its global pid so return
the ns pid.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 14 ++
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1b863bd84c96..53d7a20e3c06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1857,6 +1857,41 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+uint64_t get_process_num_bos(struct kfd_process *p)
+{
+   uint64_t num_of_bos = 0, i;
+
+   /* Run over all PDDs of the process */
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+   void *mem;
+   int id;
+
+   idr_for_each_entry(>alloc_idr, mem, id) {
+   struct kgd_mem *kgd_mem = (struct kgd_mem *)mem;
+
+   if ((uint64_t)kgd_mem->va > pdd->gpuvm_base)
+   num_of_bos++;
+   }
+   }
+   return num_of_bos;
+}
+
+static void criu_get_process_object_info(struct kfd_process *p,
+uint32_t *num_bos,
+uint64_t *objs_priv_size)
+{
+   uint64_t priv_size;
+
+   *num_bos = get_process_num_bos(p);
+
+   if (objs_priv_size) {
+   priv_size = sizeof(struct kfd_criu_process_priv_data);
+   priv_size += *num_bos * sizeof(struct kfd_criu_bo_priv_data);
+   *objs_priv_size = priv_size;
+   }
+}
+
 static int criu_checkpoint(struct file *filep,
   struct kfd_process *p,
   struct kfd_ioctl_criu_args *args)
@@ -1889,7 +1924,25 @@ static int criu_process_info(struct file *filep,
struct kfd_process *p,
struct kfd_ioctl_criu_args *args)
 {
-   return 0;
+   int ret = 0;
+
+   mutex_lock(>mutex);
+
+   if (!kfd_has_process_device_data(p)) {
+   pr_err("No pdd for given process\n");
+   ret = -ENODEV;
+   goto err_unlock;
+   }
+
+   args->pid = task_pid_nr_ns(p->lead_thread,
+   task_active_pid_ns(p->lead_thread));
+
+   criu_get_process_object_info(p, >num_bos, >priv_data_size);
+
+   dev_dbg(kfd_device, "Num of bos:%u\n", args->num_bos);
+err_unlock:
+   mutex_unlock(>mutex);
+   return ret;
 }
 
 static int kfd_ioctl_criu(struct file *filep, struct kfd_process *p, void 
*data)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e68f692362bb..4d9bc7af03af 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -950,6 +950,8 @@ void *kfd_process_device_translate_handle(struct 
kfd_process_device *p,
 void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
int handle);
 
+bool kfd_has_process_device_data(struct kfd_process *p);
+
 /* PASIDs */
 int kfd_pasid_init(void);
 void kfd_pasid_exit(void);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d4c8a6948a9f..f77d556ca0fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1456,6 +1456,20 @@ static int init_doorbell_bitmap(struct 
qcm_process_device *qpd,
return 0;
 }
 
+bool kfd_has_process_device_data(struct kfd_process *p)
+{
+   int i;
+
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (pdd)
+   return true;
+   }
+
+   return false;
+}
+
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
struct kfd_process *p)
 {
-- 
2.17.1



[Patch v4 10/24] drm/amdkfd: CRIU restore queue ids

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 

---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 37 +++
 4 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 9665c8657929..3fb155f756fd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(>pqm, dev, filep, _properties, _id,
+   err = pqm_create_queue(>pqm, dev, filep, _properties, _id, 
NULL,
_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 1e30717b5253..0c50e67e2b51 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   , , NULL);
+   , , NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 7c2679a23aa3..8272bd5c4600 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -461,6 +461,7 @@ enum KFD_QUEUE_PRIORITY {
  * it's user mode or kernel mode queue.
  *
  */
+
 struct queue_properties {
enum kfd_queue_type type;
enum kfd_queue_format format;
@@ -1156,6 +1157,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue_properties(struct process_queue_manager *pqm, unsigned 
int qid,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 480ad794df4e..275aeebc58fa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -42,6 +42,20 @@ static inline struct process_queue_node *get_queue_by_qid(
return NULL;
 }
 
+static int assign_queue_slot_by_qid(struct process_queue_manager *pqm,
+   unsigned int qid)
+{
+   if (qid >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+   return -EINVAL;
+
+   if (__test_and_set_bit(qid, pqm->queue_slot_bitmap)) {
+   pr_err("Cannot create new queue because requested qid(%u) is in 
use\n", qid);
+   return -ENOSPC;
+   }
+
+   return 0;
+}
+
 static int find_available_queue_slot(struct process_queue_manager *pqm,
unsigned int *qid)
 {
@@ -194,6 +208,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process)
 {
int retval;
@@ -225,7 +240,12 @@ int pqm_create_queue(struct process_queue_manager *pqm,
if (pdd->qpd.queue_count >= max_queues)
return -ENOSPC;
 
-   retval = find_available_queue_slot(pqm, qid);
+   if (q_data) {
+   retval = assign_queue_slot_by_qid(pqm, q_data->q_id);
+   *qid = q_data->q_id;
+   } else
+   retval = find_available_queue_slot(pqm, qid);
+
if (retval != 0)
return retval;
 
@@ -528,7 +548,7 @@ int kfd_process_get_queue_info(struct kfd_process *p,
return 0;
 }
 
-static void criu_dump_queue(struct kfd_process_device *pdd,
+static void criu_checkpoint_queue(struct kfd_process_device *pdd,
   struct queue *q,
   struct kfd_criu_queue_priv_data *q_data)
 {
@@ -560,7 +580,7 @@ static void criu_dump_queue(struct kfd_process_device *pdd,
pr_debug("Dumping Queue: gpu_id:%x 

[Patch v4 07/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-12-22 Thread Rajneesh Bhardwaj
This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  6 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 51 +--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 44 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 35 +++--
 5 files changed, 123 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index fcbc8a9c9e06..5c5fc839f701 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -131,6 +131,7 @@ struct amdkfd_process_info {
atomic_t evicted_bos;
struct delayed_work restore_userptr_work;
struct pid *pid;
+   bool block_mmu_notifications;
 };
 
 int amdgpu_amdkfd_init(void);
@@ -269,7 +270,7 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct amdgpu_device *adev, uint64_t va, uint64_t size,
void *drm_priv, struct kgd_mem **mem,
-   uint64_t *offset, uint32_t flags);
+   uint64_t *offset, uint32_t flags, bool criu_resume);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv,
uint64_t *size);
@@ -297,6 +298,9 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct amdgpu_device 
*adev,
 int amdgpu_amdkfd_get_tile_config(struct amdgpu_device *adev,
struct tile_config *config);
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev);
+void amdgpu_amdkfd_block_mmu_notifications(void *p);
+int amdgpu_amdkfd_criu_resume(void *p);
+
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 90b985436878..5679fb75ec88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -846,7 +846,8 @@ static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem 
*mem,
  *
  * Returns 0 for success, negative errno for errors.
  */
-static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
+static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
+  bool criu_resume)
 {
struct amdkfd_process_info *process_info = mem->process_info;
struct amdgpu_bo *bo = mem->bo;
@@ -868,6 +869,17 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
goto out;
}
 
+   if (criu_resume) {
+   /*
+* During a CRIU restore operation, the userptr buffer objects
+* will be validated in the restore_userptr_work worker at a
+* later stage when it is scheduled by another ioctl called by
+* CRIU master process for the target pid for restore.
+*/
+   atomic_inc(>invalid);
+   mutex_unlock(_info->lock);
+   return 0;
+   }
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
@@ -1240,6 +1252,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
INIT_DELAYED_WORK(>restore_userptr_work,
  amdgpu_amdkfd_restore_userptr_worker);
 
+   info->block_mmu_notifications = false;
*process_info = info;
*ef = dma_fence_get(>eviction_fence->base);
}
@@ -1456,10 +1469,37 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv)
return avm->pd_phys_addr;
 }
 
+void amdgpu_amdkfd_block_mmu_notifications(void *p)
+{
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   pinfo->block_mmu_notifications = true;
+}
+
+int amdgpu_amdkfd_criu_resume(void *p)
+{
+   int ret = 0;
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   mutex_lock(>lock);
+   pr_debug("scheduling work\n");
+   atomic_inc(>evicted_bos);
+   if (!pinfo->block_mmu_notifications) {
+ 

[Patch v4 05/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2021-12-22 Thread Rajneesh Bhardwaj
This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  20 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   3 +-
 4 files changed, 195 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 56c5c4464829..4fd36bd9dcfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1173,6 +1173,26 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device 
*bdev,
return ttm_pool_free(>mman.bdev.pool, ttm);
 }
 
+/**
+ * amdgpu_ttm_tt_get_userptr - Return the userptr GTT ttm_tt for the current
+ * task
+ *
+ * @tbo: The ttm_buffer_object that contains the userptr
+ * @user_addr:  The returned value
+ */
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr)
+{
+   struct amdgpu_ttm_tt *gtt;
+
+   if (!tbo->ttm)
+   return -EINVAL;
+
+   gtt = (void *)tbo->ttm;
+   *user_addr = gtt->userptr;
+   return 0;
+}
+
 /**
  * amdgpu_ttm_tt_set_userptr - Initialize userptr GTT ttm_tt for the current
  * task
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 7346ecff4438..6e6d67ec43f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -177,6 +177,8 @@ static inline bool amdgpu_ttm_tt_get_user_pages_done(struct 
ttm_tt *ttm)
 #endif
 
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr);
 int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo,
  uint64_t addr, uint32_t flags);
 bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 53d7a20e3c06..cdbb92972338 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -42,6 +42,7 @@
 #include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "amdgpu_object.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1857,6 +1858,29 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int criu_checkpoint_process(struct kfd_process *p,
+uint8_t __user *user_priv_data,
+uint64_t *priv_offset)
+{
+   struct kfd_criu_process_priv_data process_priv;
+   int ret;
+
+   memset(_priv, 0, sizeof(process_priv));
+
+   process_priv.version = KFD_CRIU_PRIV_VERSION;
+
+   ret = copy_to_user(user_priv_data + *priv_offset,
+   _priv, sizeof(process_priv));
+
+   if (ret) {
+   pr_err("Failed to copy process information to user\n");
+   ret = -EFAULT;
+   }
+
+   *priv_offset += sizeof(process_priv);
+   return ret;
+}
+
 uint64_t get_process_num_bos(struct kfd_process *p)
 {
uint64_t num_of_bos = 0, i;
@@ -1877,6 +1901,111 @@ uint64_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_checkpoint_bos(struct kfd_process *p,
+  uint32_t num_bos,
+  uint8_t __user *user_bos,
+  uint8_t __user *user_priv_data,
+  uint64_t *priv_offset)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   struct kfd_criu_bo_priv_data *bo_privs;
+   int ret = 0, pdd_index, bo_index = 0, id;
+   void *mem;
+
+   bo_buckets = kvzalloc(num_bos * sizeof(*bo_buckets), GFP_KERNEL);
+   if (!bo_buckets) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   bo_privs = kvzalloc(num_bos * sizeof(*bo_privs), GFP_KERNEL);
+   if (!bo_privs) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   for (pdd_index = 0; pdd_index < p->n_pdds; pdd_index++) {
+   struct kfd_process_device *pdd = p->pdds[pdd_index];
+   struct amdgpu_bo *dumper_bo;
+   struct kgd_mem *kgd_mem;
+
+   idr_for_each_entry(>alloc_idr, mem, id) {
+   struct kfd_criu_bo_bucket *bo_bucket;
+   struct kfd_criu_bo_priv_data *bo_priv;
+
+ 

[Patch v4 08/24] drm/amdkfd: CRIU Implement KFD unpause operation

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin 

Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO
op the queues will be stay in an evicted state. Once the plugin is done
draining BO contents, it is safe to perform an UNPAUSE op for the queues
to resume.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 37 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  1 +
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 87b9f019e96e..db2bb302a8d4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2040,6 +2040,14 @@ static int criu_checkpoint(struct file *filep,
goto exit_unlock;
}
 
+   /* Confirm all process queues are evicted */
+   if (!p->queues_paused) {
+   pr_err("Cannot dump process when queues are not in evicted 
state\n");
+   /* CRIU plugin did not call op PROCESS_INFO before 
checkpointing */
+   ret = -EINVAL;
+   goto exit_unlock;
+   }
+
criu_get_process_object_info(p, _bos, _size);
 
if (num_bos != args->num_bos ||
@@ -2382,7 +2390,24 @@ static int criu_unpause(struct file *filep,
struct kfd_process *p,
struct kfd_ioctl_criu_args *args)
 {
-   return 0;
+   int ret;
+
+   mutex_lock(>mutex);
+
+   if (!p->queues_paused) {
+   mutex_unlock(>mutex);
+   return -EINVAL;
+   }
+
+   ret = kfd_process_restore_queues(p);
+   if (ret)
+   pr_err("Failed to unpause queues ret:%d\n", ret);
+   else
+   p->queues_paused = false;
+
+   mutex_unlock(>mutex);
+
+   return ret;
 }
 
 static int criu_resume(struct file *filep,
@@ -2434,6 +2459,12 @@ static int criu_process_info(struct file *filep,
goto err_unlock;
}
 
+   ret = kfd_process_evict_queues(p);
+   if (ret)
+   goto err_unlock;
+
+   p->queues_paused = true;
+
args->pid = task_pid_nr_ns(p->lead_thread,
task_active_pid_ns(p->lead_thread));
 
@@ -2441,6 +2472,10 @@ static int criu_process_info(struct file *filep,
 
dev_dbg(kfd_device, "Num of bos:%u\n", args->num_bos);
 err_unlock:
+   if (ret) {
+   kfd_process_restore_queues(p);
+   p->queues_paused = false;
+   }
mutex_unlock(>mutex);
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cd72541a8f4f..f3a9f3de34e4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -875,6 +875,9 @@ struct kfd_process {
struct svm_range_list svms;
 
bool xnack_enabled;
+
+   /* Queues are in paused stated because we are in the process of doing a 
CRIU checkpoint */
+   bool queues_paused;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d2fcdc5e581f..e20fbb7ba9bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1364,6 +1364,7 @@ static struct kfd_process *create_process(const struct 
task_struct *thread)
process->mm = thread->mm;
process->lead_thread = thread->group_leader;
process->n_pdds = 0;
+   process->queues_paused = false;
INIT_DELAYED_WORK(>eviction_work, evict_process_worker);
INIT_DELAYED_WORK(>restore_work, restore_process_worker);
process->last_restore_timestamp = get_jiffies_64();
-- 
2.17.1



[Patch v4 01/24] x86/configs: CRIU update debug rock defconfig

2021-12-22 Thread Rajneesh Bhardwaj
 - Update debug config for Checkpoint-Restore (CR) support
 - Also include necessary options for CR with docker containers.

Signed-off-by: Rajneesh Bhardwaj 
---
 arch/x86/configs/rock-dbg_defconfig | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/arch/x86/configs/rock-dbg_defconfig 
b/arch/x86/configs/rock-dbg_defconfig
index 4877da183599..bc2a34666c1d 100644
--- a/arch/x86/configs/rock-dbg_defconfig
+++ b/arch/x86/configs/rock-dbg_defconfig
@@ -249,6 +249,7 @@ CONFIG_KALLSYMS_ALL=y
 CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
 CONFIG_KALLSYMS_BASE_RELATIVE=y
 # CONFIG_USERFAULTFD is not set
+CONFIG_USERFAULTFD=y
 CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
 CONFIG_KCMP=y
 CONFIG_RSEQ=y
@@ -1015,6 +1016,11 @@ CONFIG_PACKET_DIAG=y
 CONFIG_UNIX=y
 CONFIG_UNIX_SCM=y
 CONFIG_UNIX_DIAG=y
+CONFIG_SMC_DIAG=y
+CONFIG_XDP_SOCKETS_DIAG=y
+CONFIG_INET_MPTCP_DIAG=y
+CONFIG_TIPC_DIAG=y
+CONFIG_VSOCKETS_DIAG=y
 # CONFIG_TLS is not set
 CONFIG_XFRM=y
 CONFIG_XFRM_ALGO=y
@@ -1052,15 +1058,17 @@ CONFIG_SYN_COOKIES=y
 # CONFIG_NET_IPVTI is not set
 # CONFIG_NET_FOU is not set
 # CONFIG_NET_FOU_IP_TUNNELS is not set
-# CONFIG_INET_AH is not set
-# CONFIG_INET_ESP is not set
-# CONFIG_INET_IPCOMP is not set
-CONFIG_INET_TUNNEL=y
-CONFIG_INET_DIAG=y
-CONFIG_INET_TCP_DIAG=y
-# CONFIG_INET_UDP_DIAG is not set
-# CONFIG_INET_RAW_DIAG is not set
-# CONFIG_INET_DIAG_DESTROY is not set
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_ESP_OFFLOAD=m
+CONFIG_INET_TUNNEL=m
+CONFIG_INET_XFRM_TUNNEL=m
+CONFIG_INET_DIAG=m
+CONFIG_INET_TCP_DIAG=m
+CONFIG_INET_UDP_DIAG=m
+CONFIG_INET_RAW_DIAG=m
+CONFIG_INET_DIAG_DESTROY=y
 CONFIG_TCP_CONG_ADVANCED=y
 # CONFIG_TCP_CONG_BIC is not set
 CONFIG_TCP_CONG_CUBIC=y
@@ -1085,12 +1093,14 @@ CONFIG_TCP_MD5SIG=y
 CONFIG_IPV6=y
 # CONFIG_IPV6_ROUTER_PREF is not set
 # CONFIG_IPV6_OPTIMISTIC_DAD is not set
-CONFIG_INET6_AH=y
-CONFIG_INET6_ESP=y
-# CONFIG_INET6_ESP_OFFLOAD is not set
-# CONFIG_INET6_ESPINTCP is not set
-# CONFIG_INET6_IPCOMP is not set
-# CONFIG_IPV6_MIP6 is not set
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_ESP_OFFLOAD=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_INET6_XFRM_TUNNEL=m
+CONFIG_INET_DCCP_DIAG=m
+CONFIG_INET_SCTP_DIAG=m
 # CONFIG_IPV6_ILA is not set
 # CONFIG_IPV6_VTI is not set
 CONFIG_IPV6_SIT=y
@@ -1146,8 +1156,13 @@ CONFIG_NF_CT_PROTO_UDPLITE=y
 # CONFIG_NF_CONNTRACK_SANE is not set
 # CONFIG_NF_CONNTRACK_SIP is not set
 # CONFIG_NF_CONNTRACK_TFTP is not set
-# CONFIG_NF_CT_NETLINK is not set
-# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
+CONFIG_COMPAT_NETLINK_MESSAGES=y
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NF_CT_NETLINK_TIMEOUT=m
+CONFIG_NF_CT_NETLINK_HELPER=m
+CONFIG_NETFILTER_NETLINK_GLUE_CT=y
+CONFIG_SCSI_NETLINK=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
 CONFIG_NF_NAT=m
 CONFIG_NF_NAT_REDIRECT=y
 CONFIG_NF_NAT_MASQUERADE=y
@@ -1992,7 +2007,7 @@ CONFIG_NETCONSOLE_DYNAMIC=y
 CONFIG_NETPOLL=y
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_RIONET is not set
-# CONFIG_TUN is not set
+CONFIG_TUN=y
 # CONFIG_TUN_VNET_CROSS_LE is not set
 CONFIG_VETH=y
 # CONFIG_NLMON is not set
@@ -3990,7 +4005,7 @@ CONFIG_MANDATORY_FILE_LOCKING=y
 CONFIG_FSNOTIFY=y
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY_USER=y
-# CONFIG_FANOTIFY is not set
+CONFIG_FANOTIFY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
-- 
2.17.1



[Patch v4 03/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-12-22 Thread Rajneesh Bhardwaj
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.

This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD doesn't allow any arbitrary ioctl call
unless it is called by the group leader process. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.

(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling 
Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 94 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 65 +++-
 include/uapi/linux/kfd_ioctl.h   | 79 +++-
 3 files changed, 235 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4bfc0c8ab764..1b863bd84c96 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "kfd_priv.h"
@@ -1856,6 +1857,75 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int criu_checkpoint(struct file *filep,
+  struct kfd_process *p,
+  struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_restore(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_unpause(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_resume(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int criu_process_info(struct file *filep,
+   struct kfd_process *p,
+   struct kfd_ioctl_criu_args *args)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu(struct file *filep, struct kfd_process *p, void 
*data)
+{
+   struct kfd_ioctl_criu_args *args = data;
+   int ret;
+
+   dev_dbg(kfd_device, "CRIU operation: %d\n", args->op);
+   switch (args->op) {
+   case KFD_CRIU_OP_PROCESS_INFO:
+   ret = criu_process_info(filep, p, args);
+   break;
+   case KFD_CRIU_OP_CHECKPOINT:
+   ret = criu_checkpoint(filep, p, args);
+   break;
+   case KFD_CRIU_OP_UNPAUSE:
+   ret = criu_unpause(filep, p, args);
+   break;
+   case KFD_CRIU_OP_RESTORE:
+   ret = criu_restore(filep, p, args);
+   break;
+   case KFD_CRIU_OP_RESUME:
+   ret = criu_resume(filep, p, args);
+   break;
+   default:
+   dev_dbg(kfd_device, "Unsupported CRIU operation:%d\n", 
args->op);
+   ret = -EINVAL;
+   break;
+   }
+
+   if (ret)
+   dev_dbg(kfd_device, "CRIU operation:%d err:%d\n", args->op, 
ret);
+
+   return ret;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -1959,6 +2029,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
kfd_ioctl_set_xnack_mode, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_OP,
+   kfd_ioctl_criu, KFD_IOC_FLAG_CHECKPOINT_RESTORE),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -1973,6 +2046,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
char *kdata = NULL;
unsigned int usize, asize;
int retcode = -EINVAL;
+   bool ptrace_attached = false;
 
if (nr >= AMDKFD_CORE_IOCTL_COUNT)
goto err_i1;
@@ -1998,7 +2072,15 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
 * processes need to create their own KFD device context.
 */
process = filep->private_data;
-   if (process->lead_thread != 

[Patch v4 00/24] CHECKPOINT RESTORE WITH ROCm

2021-12-22 Thread Rajneesh Bhardwaj
CRIU is a user space tool which is very popular for container live
migration in datacentres. It can checkpoint a running application, save
its complete state, memory contents and all system resources to images
on disk which can be migrated to another m achine and restored later.
More information on CRIU can be found at https://criu.org/Main_Page

CRIU currently does not support Checkpoint / Restore with applications
that have devices files open so it cannot perform checkpoint and restore
on GPU devices which are very complex and have their own VRAM managed
privately. CRIU, however can support external devices by using a plugin
architecture. We feel that we are getting close to finalizing our IOCTL
APIs which were again changed since V3 for an improved modular design.

Our changes to CRIU user space  are can be obtained from here:
https://github.com/RadeonOpenCompute/criu/tree/amdgpu_rfc-211222

We have tested the following scenarios:
 - Checkpoint / Restore of a Pytorch (BERT) workload
 - kfdtests with queues and events
 - Gfx9 and Gfx10 based multi GPU test systems 
 - On baremetal and inside a docker container
 - Restoring on a different system

V1: Initial
V2: Addressed review comments
V3: Rebased on latest amd-staging-drm-next (5.15 based)
v4: New API design and basic support for SVM, however there is an
outstanding issue with SVM restore which is currently under debug and
hopefully that won't impact the ioctl APIs as SVMs are treated as
private data hidden from user space like queues and events with the new
approch.


David Yat Sin (9):
  drm/amdkfd: CRIU Implement KFD unpause operation
  drm/amdkfd: CRIU add queues support
  drm/amdkfd: CRIU restore queue ids
  drm/amdkfd: CRIU restore sdma id for queues
  drm/amdkfd: CRIU restore queue doorbell id
  drm/amdkfd: CRIU checkpoint and restore queue mqds
  drm/amdkfd: CRIU checkpoint and restore queue control stack
  drm/amdkfd: CRIU checkpoint and restore events
  drm/amdkfd: CRIU implement gpu_id remapping

Rajneesh Bhardwaj (15):
  x86/configs: CRIU update debug rock defconfig
  x86/configs: Add rock-rel_defconfig for amd-feature-criu branch
  drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  drm/amdkfd: CRIU Implement KFD process_info ioctl
  drm/amdkfd: CRIU Implement KFD checkpoint ioctl
  drm/amdkfd: CRIU Implement KFD restore ioctl
  drm/amdkfd: CRIU Implement KFD resume ioctl
  drm/amdkfd: CRIU export BOs as prime dmabuf objects
  drm/amdkfd: CRIU checkpoint and restore xnack mode
  drm/amdkfd: CRIU allow external mm for svm ranges
  drm/amdkfd: use user_gpu_id for svm ranges
  drm/amdkfd: CRIU Discover svm ranges
  drm/amdkfd: CRIU Save Shared Virtual Memory ranges
  drm/amdkfd: CRIU prepare for svm resume
  drm/amdkfd: CRIU resume shared virtual memory ranges

 arch/x86/configs/rock-dbg_defconfig   |   53 +-
 arch/x86/configs/rock-rel_defconfig   | 4927 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   51 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   20 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 1453 -
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  185 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  313 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   14 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |   72 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |   74 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |   89 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |   81 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  166 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   86 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  377 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  326 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h  |   39 +
 include/uapi/linux/kfd_ioctl.h|   79 +-
 22 files changed, 8099 insertions(+), 334 deletions(-)
 create mode 100644 arch/x86/configs/rock-rel_defconfig

-- 
2.17.1



[PATCH] drm/i915/guc: Use lockless list for destroyed contexts

2021-12-22 Thread Matthew Brost
Use a lockless list structure for destroyed contexts to avoid hammering
on global submission spin lock.

Suggested-by: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |  2 -
 drivers/gpu/drm/i915/gt/intel_context_types.h |  3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|  3 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 44 +--
 4 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5d0ec7c49b6a..4aacb4b0418d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -403,8 +403,6 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->guc_id.id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(>guc_id.link);
 
-   INIT_LIST_HEAD(>destroyed_link);
-
INIT_LIST_HEAD(>parallel.child_list);
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 30cd81ad8911..4532d43ec9c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -224,7 +225,7 @@ struct intel_context {
 * list when context is pending to be destroyed (deregistered with the
 * GuC), protected by guc->submission_state.lock
 */
-   struct list_head destroyed_link;
+   struct llist_node destroyed_link;
 
/** @parallel: sub-structure for parallel submission members */
struct {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index f9240d4baa69..705085058411 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
@@ -112,7 +113,7 @@ struct intel_guc {
 * @destroyed_contexts: list of contexts waiting to be destroyed
 * (deregistered with the GuC)
 */
-   struct list_head destroyed_contexts;
+   struct llist_head destroyed_contexts;
/**
 * @destroyed_worker: worker to deregister contexts, need as we
 * need to take a GT PM reference and can't from destroy
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0a03a30e4c6d..6f7643edc139 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1771,7 +1771,7 @@ int intel_guc_submission_init(struct intel_guc *guc)
spin_lock_init(>submission_state.lock);
INIT_LIST_HEAD(>submission_state.guc_id_list);
ida_init(>submission_state.guc_ids);
-   INIT_LIST_HEAD(>submission_state.destroyed_contexts);
+   init_llist_head(>submission_state.destroyed_contexts);
INIT_WORK(>submission_state.destroyed_worker,
  destroyed_worker_func);
 
@@ -2696,26 +2696,18 @@ static void __guc_context_destroy(struct intel_context 
*ce)
}
 }
 
+#define take_destroyed_contexts(guc) \
+   llist_del_all(>submission_state.destroyed_contexts)
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc)
 {
-   struct intel_context *ce;
-   unsigned long flags;
+   struct intel_context *ce, *cn;
 
GEM_BUG_ON(!submission_disabled(guc) &&
   guc_submission_initialized(guc));
 
-   while (!list_empty(>submission_state.destroyed_contexts)) {
-   spin_lock_irqsave(>submission_state.lock, flags);
-   ce = 
list_first_entry_or_null(>submission_state.destroyed_contexts,
- struct intel_context,
- destroyed_link);
-   if (ce)
-   list_del_init(>destroyed_link);
-   spin_unlock_irqrestore(>submission_state.lock, flags);
-
-   if (!ce)
-   break;
-
+   llist_for_each_entry_safe(ce, cn, take_destroyed_contexts(guc),
+destroyed_link) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
@@ -2723,23 +2715,11 @@ static void guc_flush_destroyed_contexts(struct 
intel_guc *guc)
 
 static void deregister_destroyed_contexts(struct intel_guc *guc)
 {
-   struct intel_context *ce;
-   unsigned long flags;
-
-   while (!list_empty(>submission_state.destroyed_contexts)) {
-   spin_lock_irqsave(>submission_state.lock, flags);
-   ce = 
list_first_entry_or_null(>submission_state.destroyed_contexts,
- struct intel_context,
- 

[PATCH] drm/i915/execlists: Weak parallel submission support for execlists

2021-12-22 Thread Matthew Brost
A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

v2:
 (John Harrison)
  - Drop siblings array as num_siblings must be 1
v3:
 (John Harrison)
  - Drop single submission
v4:
 (John Harrison)
  - Actually drop single submission
  - Use IS_ERR check on return value from intel_context_create
  - Set last request to NULL on unpin

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 11 --
 drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
 .../drm/i915/gt/intel_execlists_submission.c  | 38 +++
 drivers/gpu/drm/i915/gt/intel_lrc.c   |  4 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
 5 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index cad3f0b2be9e..b0d2d81fc3b3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -570,10 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
 
-   /* FIXME: This is NIY for execlists */
-   if (!(intel_uc_uses_guc_submission(_gt(i915)->uc)))
-   return -ENODEV;
-
if (get_user(slot, >engine_index))
return -EFAULT;
 
@@ -583,6 +579,13 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
if (get_user(num_siblings, >num_siblings))
return -EFAULT;
 
+   if (!intel_uc_uses_guc_submission(_gt(i915)->uc) &&
+   num_siblings != 1) {
+   drm_dbg(>drm, "Only 1 sibling (%d) supported in non-GuC 
mode\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
if (slot >= set->num_engines) {
drm_dbg(>drm, "Invalid placement value, %d >= %d\n",
slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index ba083d800a08..5d0ec7c49b6a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
 
__i915_active_acquire(>active);
 
-   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
+   intel_context_is_parallel(ce))
return 0;
 
/* Preallocate tracking nodes */
@@ -563,7 +564,6 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
 * Callers responsibility to validate that this function is used
 * correctly but we use GEM_BUG_ON here ensure that they do.
 */
-   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
GEM_BUG_ON(intel_context_is_pinned(parent));
GEM_BUG_ON(intel_context_is_child(parent));
GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index a69df5e9e77a..be56d0b41892 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2599,6 +2599,43 @@ static void execlists_context_cancel_request(struct 
intel_context *ce,
  current->comm);
 }
 
+static struct intel_context *
+execlists_create_parallel(struct intel_engine_cs **engines,
+ unsigned int num_siblings,
+ unsigned int width)
+{
+   struct intel_context *parent = NULL, *ce, *err;
+   int i;
+
+   GEM_BUG_ON(num_siblings != 1);
+
+   for (i = 0; i < width; ++i) {
+   ce = intel_context_create(engines[i]);
+   if (IS_ERR(ce)) {
+   err = ce;
+   goto unwind;
+   }
+
+   if (i == 0)
+   parent = ce;
+   else
+   intel_context_bind_parent_child(parent, ce);
+   }
+
+   parent->parallel.fence_context = dma_fence_context_alloc(1);
+
+   intel_context_set_nopreempt(parent);
+   for_each_child(parent, ce)
+   intel_context_set_nopreempt(ce);
+
+   return parent;
+
+unwind:
+   if (parent)
+   intel_context_put(parent);
+   return err;
+}
+
 static const struct intel_context_ops execlists_context_ops = {
.flags = COPS_HAS_INFLIGHT,
 
@@ 

Re: [PATCH] drm/i915/execlists: Weak parallel submission support for execlists

2021-12-22 Thread Matthew Brost
On Mon, Dec 06, 2021 at 12:01:04PM -0800, John Harrison wrote:
> On 11/11/2021 13:20, Matthew Brost wrote:
> > A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
> > execlists. Doing as little as possible to support this interface for
> > execlists - basically just passing submit fences between each request
> > generated and virtual engines are not allowed. This is on par with what
> > is there for the existing (hopefully soon deprecated) bonding interface.
> > 
> > We perma-pin these execlists contexts to align with GuC implementation.
> > 
> > v2:
> >   (John Harrison)
> >- Drop siblings array as num_siblings must be 1
> > v3:
> >   (John Harrison)
> >- Drop single submission
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c   | 10 +++--
> >   drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
> >   .../drm/i915/gt/intel_execlists_submission.c  | 40 +++
> >   drivers/gpu/drm/i915/gt/intel_lrc.c   |  2 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
> >   5 files changed, 50 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index ebd775cb1661c..d7bf6c8f70b7b 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -570,10 +570,6 @@ set_proto_ctx_engines_parallel_submit(struct 
> > i915_user_extension __user *base,
> > struct intel_engine_cs **siblings = NULL;
> > intel_engine_mask_t prev_mask;
> > -   /* FIXME: This is NIY for execlists */
> > -   if (!(intel_uc_uses_guc_submission(>gt.uc)))
> > -   return -ENODEV;
> > -
> > if (get_user(slot, >engine_index))
> > return -EFAULT;
> > @@ -583,6 +579,12 @@ set_proto_ctx_engines_parallel_submit(struct 
> > i915_user_extension __user *base,
> > if (get_user(num_siblings, >num_siblings))
> > return -EFAULT;
> > +   if (!intel_uc_uses_guc_submission(>gt.uc) && num_siblings != 1) {
> > +   drm_dbg(>drm, "Only 1 sibling (%d) supported in non-GuC 
> > mode\n",
> > +   num_siblings);
> > +   return -EINVAL;
> > +   }
> > +
> > if (slot >= set->num_engines) {
> > drm_dbg(>drm, "Invalid placement value, %d >= %d\n",
> > slot, set->num_engines);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 5634d14052bc9..1bec92e1d8e63 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct 
> > intel_context *ce)
> > __i915_active_acquire(>active);
> > -   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
> > +   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
> > +   intel_context_is_parallel(ce))
> > return 0;
> > /* Preallocate tracking nodes */
> > @@ -563,7 +564,6 @@ void intel_context_bind_parent_child(struct 
> > intel_context *parent,
> >  * Callers responsibility to validate that this function is used
> >  * correctly but we use GEM_BUG_ON here ensure that they do.
> >  */
> > -   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
> > GEM_BUG_ON(intel_context_is_pinned(parent));
> > GEM_BUG_ON(intel_context_is_child(parent));
> > GEM_BUG_ON(intel_context_is_pinned(child));
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index ca03880fa7e49..5fd49ee47096d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -2598,6 +2598,45 @@ static void execlists_context_cancel_request(struct 
> > intel_context *ce,
> >   current->comm);
> >   }
> > +static struct intel_context *
> > +execlists_create_parallel(struct intel_engine_cs **engines,
> > + unsigned int num_siblings,
> > + unsigned int width)
> > +{
> > +   struct intel_context *parent = NULL, *ce, *err;
> > +   int i;
> > +
> > +   GEM_BUG_ON(num_siblings != 1);
> > +
> > +   for (i = 0; i < width; ++i) {
> > +   ce = intel_context_create(engines[i]);
> > +   if (!ce) {
> > +   err = ERR_PTR(-ENOMEM);
> intel_context_create already checks for null and returns -ENOMEM. This needs
> to check for IS_ERR(ce).
> 

Yep.

> > +   goto unwind;
> > +   }
> > +
> > +   if (i == 0)
> > +   parent = ce;
> > +   else
> > +   intel_context_bind_parent_child(parent, ce);
> > +   }
> > +
> > +   parent->parallel.fence_context = dma_fence_context_alloc(1);
> > +
> > +   intel_context_set_nopreempt(parent);
> > +   

Re: completely rework the dma_resv semantic

2021-12-22 Thread Daniel Vetter
On Fri, Dec 17, 2021 at 03:39:52PM +0100, Christian König wrote:
> Hi Daniel,
> 
> looks like this is going nowhere and you don't seem to have time to review.
> 
> What can we do?

cc more people, you didn't cc any of the driver folks :-)

Also I did find some review before I disappeared, back on 10th Jan.

Cheers, Daniel

> 
> Thanks,
> Christian.
> 
> Am 07.12.21 um 13:33 schrieb Christian König:
> > Hi Daniel,
> > 
> > just a gentle ping that you wanted to take a look at this.
> > 
> > Not much changed compared to the last version, only a minor bugfix in
> > the dma_resv_get_singleton error handling.
> > 
> > Regards,
> > Christian.
> > 
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 22/24] dma-buf: wait for map to complete for static attachments

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:09PM +0100, Christian König wrote:
> We have previously done that in the individual drivers but it is
> more defensive to move that into the common code.
> 
> Dynamic attachments should wait for map operations to complete by themselves.
> 
> Signed-off-by: Christian König 

i915 should probably stop reinveinting so much stuff here and align more
...

I do wonder whether we want the same for dma_buf_pin(), or at least
document that for dynamic attachments, you still need to sync even if it's
pinned. Especially since your kerneldoc for the usage flags suggests that
waiting isn't needed, but after this patch waiting _is_ needed even for
dynamic importers.

So there is a gap here I think, and I deleted my r-b tag that I already
typed again. Or do I miss something?

Minimally needs accurate docs, but I'm leaning towards an unconditional
dma_resv_wait() in dma_buf_pin() for safety's sake.


> ---
>  drivers/dma-buf/dma-buf.c   | 18 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +-
>  drivers/gpu/drm/nouveau/nouveau_prime.c | 17 +
>  drivers/gpu/drm/radeon/radeon_prime.c   | 16 +++-
>  4 files changed, 20 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 528983d3ba64..d3dd602c4753 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -660,12 +660,24 @@ static struct sg_table * __map_dma_buf(struct 
> dma_buf_attachment *attach,
>  enum dma_data_direction direction)
>  {
>   struct sg_table *sg_table;
> + signed long ret;
>  
>   sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
> + if (IS_ERR_OR_NULL(sg_table))
> + return sg_table;
> +
> + if (!dma_buf_attachment_is_dynamic(attach)) {
> + ret = dma_resv_wait_timeout(attach->dmabuf->resv,

Another place where this dma_resv_wait() wrapper would be good. I think we
should have it :-)

Cheers, Daniel

> + DMA_RESV_USAGE_KERNEL, true,
> + MAX_SCHEDULE_TIMEOUT);
> + if (ret < 0) {
> + attach->dmabuf->ops->unmap_dma_buf(attach, sg_table,
> +direction);
> + return ERR_PTR(ret);
> + }
> + }
>  
> - if (!IS_ERR_OR_NULL(sg_table))
> - mangle_sg_table(sg_table);
> -
> + mangle_sg_table(sg_table);
>   return sg_table;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index 4896c876ffec..33127bd56c64 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -102,21 +102,9 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment 
> *attach)
>  {
>   struct drm_gem_object *obj = attach->dmabuf->priv;
>   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> - int r;
>  
>   /* pin buffer into GTT */
> - r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
> - if (r)
> - return r;
> -
> - if (bo->tbo.moving) {
> - r = dma_fence_wait(bo->tbo.moving, true);
> - if (r) {
> - amdgpu_bo_unpin(bo);
> - return r;
> - }
> - }
> - return 0;
> + return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c 
> b/drivers/gpu/drm/nouveau/nouveau_prime.c
> index 60019d0532fc..347488685f74 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_prime.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
> @@ -93,22 +93,7 @@ int nouveau_gem_prime_pin(struct drm_gem_object *obj)
>   if (ret)
>   return -EINVAL;
>  
> - ret = ttm_bo_reserve(>bo, false, false, NULL);
> - if (ret)
> - goto error;
> -
> - if (nvbo->bo.moving)
> - ret = dma_fence_wait(nvbo->bo.moving, true);
> -
> - ttm_bo_unreserve(>bo);
> - if (ret)
> - goto error;
> -
> - return ret;
> -
> -error:
> - nouveau_bo_unpin(nvbo);
> - return ret;
> + return 0;
>  }
>  
>  void nouveau_gem_prime_unpin(struct drm_gem_object *obj)
> diff --git a/drivers/gpu/drm/radeon/radeon_prime.c 
> b/drivers/gpu/drm/radeon/radeon_prime.c
> index 4a90807351e7..42a87948e28c 100644
> --- a/drivers/gpu/drm/radeon/radeon_prime.c
> +++ b/drivers/gpu/drm/radeon/radeon_prime.c
> @@ -77,19 +77,9 @@ int radeon_gem_prime_pin(struct drm_gem_object *obj)
>  
>   /* pin buffer into GTT */
>   ret = radeon_bo_pin(bo, RADEON_GEM_DOMAIN_GTT, NULL);
> - if (unlikely(ret))
> - goto error;
> -
> - if (bo->tbo.moving) {
> - ret = dma_fence_wait(bo->tbo.moving, false);
> - if (unlikely(ret)) {
> - 

[RFC v2 7/8] drm/amdgpu: Drop concurrent GPU reset protection for device

2021-12-22 Thread Andrey Grodzovsky
Since now all GPU resets are serialzied there is no need for this.

This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout'

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++
 1 file changed, 7 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 107a393ebbfd..fef952ca8db5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
return r;
 }
 
-static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
+static void amdgpu_device_lock_adev(struct amdgpu_device *adev,
struct amdgpu_hive_info *hive)
 {
-   if (atomic_cmpxchg(>in_gpu_reset, 0, 1) != 0)
-   return false;
+   atomic_set(>in_gpu_reset, 1);
 
if (hive) {
down_write_nest_lock(>reset_sem, >hive_lock);
@@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device 
*adev,
adev->mp1_state = PP_MP1_STATE_NONE;
break;
}
-
-   return true;
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
@@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct 
amdgpu_device *adev)
up_write(>reset_sem);
 }
 
-/*
- * to lockup a list of amdgpu devices in a hive safely, if not a hive
- * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
- *
- * unlock won't require roll back.
- */
-static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct 
amdgpu_hive_info *hive)
-{
-   struct amdgpu_device *tmp_adev = NULL;
-
-   if (adev->gmc.xgmi.num_physical_nodes > 1) {
-   if (!hive) {
-   dev_err(adev->dev, "Hive is NULL while device has 
multiple xgmi nodes");
-   return -ENODEV;
-   }
-   list_for_each_entry(tmp_adev, >device_list, 
gmc.xgmi.head) {
-   if (!amdgpu_device_lock_adev(tmp_adev, hive))
-   goto roll_back;
-   }
-   } else if (!amdgpu_device_lock_adev(adev, hive))
-   return -EAGAIN;
-
-   return 0;
-roll_back:
-   if (!list_is_first(_adev->gmc.xgmi.head, >device_list)) {
-   /*
-* if the lockup iteration break in the middle of a hive,
-* it may means there may has a race issue,
-* or a hive device locked up independently.
-* we may be in trouble and may not, so will try to roll back
-* the lock and give out a warnning.
-*/
-   dev_warn(tmp_adev->dev, "Hive lock iteration broke in the 
middle. Rolling back to unlock");
-   list_for_each_entry_continue_reverse(tmp_adev, 
>device_list, gmc.xgmi.head) {
-   amdgpu_device_unlock_adev(tmp_adev);
-   }
-   }
-   return -EAGAIN;
-}
-
 static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev)
 {
struct pci_dev *p = NULL;
@@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
reset_context.hive = hive;
clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
 
-   /*
-* lock the device before we try to operate the linked list
-* if didn't get the device lock, don't touch the linked list since
-* others may iterating it.
-*/
-   r = amdgpu_device_lock_hive_adev(adev, hive);
-   if (r) {
-   dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another 
already in progress",
-   job ? job->base.id : -1);
-
-   /* even we skipped this reset, still need to set the job to 
guilty */
-   if (job && job->vm)
-   drm_sched_increase_karma(>base);
-   goto skip_recovery;
-   }
-
/*
 * Build list of devices to reset.
 * In case we are in XGMI hive mode, resort the device list
@@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
 
/* block all schedulers and reset given job's ring */
list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+
+   amdgpu_device_lock_adev(tmp_adev, hive);
+
/*
 * Try to put the audio codec into suspend state
 * before gpu reset started.
@@ -5209,13 +5153,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
amdgpu_device_unlock_adev(tmp_adev);
}
 
-skip_recovery:
if (hive) {
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);
}
 
-   if (r && r != -EAGAIN)
+   if (r)
 

[RFC v2 6/8] drm/amdgpu: Drop hive->in_reset

2021-12-22 Thread Andrey Grodzovsky
Since we serialize all resets no need to protect from concurrent
resets.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 258ec3c0b2af..107a393ebbfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
dev_info(adev->dev, "GPU %s begin!\n",
need_emergency_restart ? "jobs stop":"reset");
 
-   /*
-* Here we trylock to avoid chain of resets executing from
-* either trigger by jobs on different adevs in XGMI hive or jobs on
-* different schedulers for same device while this TO handler is 
running.
-* We always reset all schedulers for device and all devices for XGMI
-* hive so that should take care of them too.
-*/
hive = amdgpu_get_xgmi_hive(adev);
-   if (hive) {
-   if (atomic_cmpxchg(>in_reset, 0, 1) != 0) {
-   DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as 
another already in progress",
-   job ? job->base.id : -1, hive->hive_id);
-   amdgpu_put_xgmi_hive(hive);
-   if (job && job->vm)
-   drm_sched_increase_karma(>base);
-   return 0;
-   }
+   if (hive)
mutex_lock(>hive_lock);
-   }
 
reset_context.method = AMD_RESET_METHOD_NONE;
reset_context.reset_req_dev = adev;
@@ -5227,7 +5211,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
 
 skip_recovery:
if (hive) {
-   atomic_set(>in_reset, 0);
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index a858e3457c5c..9ad742039ac9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -404,7 +404,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
INIT_LIST_HEAD(>device_list);
INIT_LIST_HEAD(>node);
mutex_init(>hive_lock);
-   atomic_set(>in_reset, 0);
atomic_set(>number_devices, 0);
task_barrier_init(>tb);
hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6121aaa292cb..2f2ce53645a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -33,7 +33,6 @@ struct amdgpu_hive_info {
struct list_head node;
atomic_t number_devices;
struct mutex hive_lock;
-   atomic_t in_reset;
int hi_req_count;
struct amdgpu_device *hi_req_gpu;
struct task_barrier tb;
-- 
2.25.1



[RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2021-12-22 Thread Andrey Grodzovsky
Since now flr work is serialized against  GPU resets
there is no need for this.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ---
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ---
 2 files changed, 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 487cd654b69e..7d59a66e3988 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, 
virt);
int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
 
-   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-* otherwise the mailbox msg will be ruined/reseted by
-* the VF FLR.
-*/
-   if (!down_write_trylock(>reset_sem))
-   return;
-
amdgpu_virt_fini_data_exchange(adev);
-   atomic_set(>in_gpu_reset, 1);
 
xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
 
@@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct 
*work)
} while (timeout > 1);
 
 flr_done:
-   atomic_set(>in_gpu_reset, 0);
-   up_write(>reset_sem);
-
/* Trigger recovery for world switch failure if no TDR */
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index e3869067a31d..f82c066c8e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, 
virt);
int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
 
-   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-* otherwise the mailbox msg will be ruined/reseted by
-* the VF FLR.
-*/
-   if (!down_write_trylock(>reset_sem))
-   return;
-
amdgpu_virt_fini_data_exchange(adev);
-   atomic_set(>in_gpu_reset, 1);
 
xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
 
@@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct 
*work)
} while (timeout > 1);
 
 flr_done:
-   atomic_set(>in_gpu_reset, 0);
-   up_write(>reset_sem);
-
/* Trigger recovery for world switch failure if no TDR */
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||
-- 
2.25.1



[RFC v2 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.

2021-12-22 Thread Andrey Grodzovsky
No need to to trigger another work queue inside the work queue.

Suggested-by: Liu Shaoyun 
Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 7 +--
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 7 +--
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 7 +--
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 23b066bcffb2..487cd654b69e 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -276,7 +276,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct 
*work)
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||
adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT))
-   amdgpu_device_gpu_recover(adev, NULL);
+   amdgpu_device_gpu_recover_imp(adev, NULL);
 }
 
 static int xgpu_ai_set_mailbox_rcv_irq(struct amdgpu_device *adev,
@@ -302,7 +302,10 @@ static int xgpu_ai_mailbox_rcv_irq(struct amdgpu_device 
*adev,
switch (event) {
case IDH_FLR_NOTIFICATION:
if (amdgpu_sriov_runtime(adev))
-   schedule_work(>virt.flr_work);
+   WARN_ONCE(!queue_work(adev->reset_domain.wq,
+ >virt.flr_work),
+ "Failed to queue work! at %s",
+ __FUNCTION__ );
break;
case IDH_QUERY_ALIVE:
xgpu_ai_mailbox_send_ack(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index a35e6d87e537..e3869067a31d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -308,7 +308,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct 
*work)
adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
adev->video_timeout == MAX_SCHEDULE_TIMEOUT))
-   amdgpu_device_gpu_recover(adev, NULL);
+   amdgpu_device_gpu_recover_imp(adev, NULL);
 }
 
 static int xgpu_nv_set_mailbox_rcv_irq(struct amdgpu_device *adev,
@@ -337,7 +337,10 @@ static int xgpu_nv_mailbox_rcv_irq(struct amdgpu_device 
*adev,
switch (event) {
case IDH_FLR_NOTIFICATION:
if (amdgpu_sriov_runtime(adev))
-   schedule_work(>virt.flr_work);
+   WARN_ONCE(!queue_work(adev->reset_domain.wq,
+ >virt.flr_work),
+ "Failed to queue work! at %s",
+ __FUNCTION__ );
break;
/* READY_TO_ACCESS_GPU is fetched by kernel polling, IRQ can 
ignore
 * it byfar since that polling thread will handle it,
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
index aef9d059ae52..23e802cae2bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
@@ -521,7 +521,7 @@ static void xgpu_vi_mailbox_flr_work(struct work_struct 
*work)
 
/* Trigger recovery due to world switch failure */
if (amdgpu_device_should_recover_gpu(adev))
-   amdgpu_device_gpu_recover(adev, NULL);
+   amdgpu_device_gpu_recover_imp(adev, NULL);
 }
 
 static int xgpu_vi_set_mailbox_rcv_irq(struct amdgpu_device *adev,
@@ -551,7 +551,10 @@ static int xgpu_vi_mailbox_rcv_irq(struct amdgpu_device 
*adev,
 
/* only handle FLR_NOTIFY now */
if (!r)
-   schedule_work(>virt.flr_work);
+   WARN_ONCE(!queue_work(adev->reset_domain.wq,
+ >virt.flr_work),
+ "Failed to queue work! at %s",
+ __FUNCTION__ );
}
 
return 0;
-- 
2.25.1



Re: [PATCH 21/24] dma-buf: add DMA_RESV_USAGE_BOOKKEEP

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:08PM +0100, Christian König wrote:
> Add an usage for submissions independent of implicit sync but still
> interesting for memory management.
> 
> Signed-off-by: Christian König 

Focusing on the kerneldoc first to get semantics agreed.

> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 29d71496..07ae5b00c1fa 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -55,7 +55,7 @@ struct dma_resv_list;
>   * This enum describes the different use cases for a dma_resv object and
>   * controls which fences are returned when queried.
>   *
> - * An important fact is that there is the order KERNEL + * An important fact is that there is the order KERNEL and
>   * when the dma_resv object is asked for fences for one use case the fences
>   * for the lower use case are returned as well.
>   *
> @@ -93,6 +93,22 @@ enum dma_resv_usage {
>* an implicit read dependency.
>*/
>   DMA_RESV_USAGE_READ,
> +
> + /**
> +  * @DMA_RESV_USAGE_BOOKKEEP: No implicit sync.
> +  *
> +  * This should be used by submissions which don't want to participate in
> +  * implicit synchronization.

Uh we might still have a disagreement, because that isn't really what
drivers which added opt-in implicit sync have done thus far. Minimally we
need a note that some drivers also use _READ for this.

> +  *
> +  * The most common case are submissions with explicit synchronization,
> +  * but also things like preemption fences as well as page table updates
> +  * might use this.
> +  *
> +  * The kernel memory management *always* need to wait for those fences
> +  * before moving or freeing the resource protected by the dma_resv
> +  * object.

Yeah this is the comment I wanted to see for READ, and which now is in
bookkeeping (where it's correct in the end). I think we still should have
something in the READ comment (and here) explaining that there could very
well be writes hiding behind this, and that the kernel cannot assume
anything about what's going on in general (maybe some drivers enforce
read/write through command parsers).

Also all the text in dma_buf.resv needs to be updated to use the right
constants instead of words.
-Daniel


> +  */
> + DMA_RESV_USAGE_BOOKKEEP
>  };
>  
>  /**
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[RFC v2 2/8] drm/amdgpu: Move scheduler init to after XGMI is ready

2021-12-22 Thread Andrey Grodzovsky
Before we initialize schedulers we must know which reset
domain are we in - for single device there iis a single
domain per device and so single wq per device. For XGMI
the reset domain spans the entire XGMI hive and so the
reset wq is per hive.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0f3e6c078f88..7c063fd37389 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device 
*adev)
return r;
 }
 
+static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
+{
+   long timeout;
+   int r, i;
+
+   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+   struct amdgpu_ring *ring = adev->rings[i];
+
+   /* No need to setup the GPU scheduler for rings that don't need 
it */
+   if (!ring || ring->no_scheduler)
+   continue;
+
+   switch (ring->funcs->type) {
+   case AMDGPU_RING_TYPE_GFX:
+   timeout = adev->gfx_timeout;
+   break;
+   case AMDGPU_RING_TYPE_COMPUTE:
+   timeout = adev->compute_timeout;
+   break;
+   case AMDGPU_RING_TYPE_SDMA:
+   timeout = adev->sdma_timeout;
+   break;
+   default:
+   timeout = adev->video_timeout;
+   break;
+   }
+
+   r = drm_sched_init(>sched, _sched_ops,
+  ring->num_hw_submission, 
amdgpu_job_hang_limit,
+  timeout, adev->reset_domain.wq, 
ring->sched_score, ring->name);
+   if (r) {
+   DRM_ERROR("Failed to create scheduler on ring %s.\n",
+ ring->name);
+   return r;
+   }
+   }
+
+   return 0;
+}
+
+
 /**
  * amdgpu_device_ip_init - run init for hardware IPs
  *
@@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
}
}
 
+   r = amdgpu_device_init_schedulers(adev);
+   if (r)
+   goto init_failed;
+
/* Don't init kfd if whole hive need to be reset during init */
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3b7e86ea7167..5527c68c51de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
  atomic_t *sched_score)
 {
struct amdgpu_device *adev = ring->adev;
-   long timeout;
-   int r;
 
if (!adev)
return -EINVAL;
@@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
*ring,
spin_lock_init(>fence_drv.lock);
ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),
 GFP_KERNEL);
-   if (!ring->fence_drv.fences)
-   return -ENOMEM;
 
-   /* No need to setup the GPU scheduler for rings that don't need it */
-   if (ring->no_scheduler)
-   return 0;
+   ring->num_hw_submission = num_hw_submission;
+   ring->sched_score = sched_score;
 
-   switch (ring->funcs->type) {
-   case AMDGPU_RING_TYPE_GFX:
-   timeout = adev->gfx_timeout;
-   break;
-   case AMDGPU_RING_TYPE_COMPUTE:
-   timeout = adev->compute_timeout;
-   break;
-   case AMDGPU_RING_TYPE_SDMA:
-   timeout = adev->sdma_timeout;
-   break;
-   default:
-   timeout = adev->video_timeout;
-   break;
-   }
-
-   r = drm_sched_init(>sched, _sched_ops,
-  num_hw_submission, amdgpu_job_hang_limit,
-  timeout, NULL, sched_score, ring->name);
-   if (r) {
-   DRM_ERROR("Failed to create scheduler on ring %s.\n",
- ring->name);
-   return r;
-   }
+   if (!ring->fence_drv.fences)
+   return -ENOMEM;
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 4d380e79752c..a4b8279e3011 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -253,6 +253,8 @@ struct 

[RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2021-12-22 Thread Andrey Grodzovsky
Use reset domain wq also for non TDR gpu recovery trigers
such as sysfs and RAS. We must serialize all possible
GPU recoveries to gurantee no concurrency there.
For TDR call the original recovery function directly since
it's already executed from within the wq. For others just
use a wrapper to qeueue work and wait on it to finish.

v2: Rename to amdgpu_recover_work_struct

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  2 +-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b5ff76aae7e0..8e96b9a14452 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev);
 bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job* job);
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+ struct amdgpu_job *job);
 void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
 bool amdgpu_device_need_post(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7c063fd37389..258ec3c0b2af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  * Returns 0 for success or an error on failure.
  */
 
-int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
  struct amdgpu_job *job)
 {
struct list_head device_list, *device_list_handle =  NULL;
@@ -5237,6 +5237,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
return r;
 }
 
+struct amdgpu_recover_work_struct {
+   struct work_struct base;
+   struct amdgpu_device *adev;
+   struct amdgpu_job *job;
+   int ret;
+};
+
+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
+{
+   struct amdgpu_recover_work_struct *recover_work = container_of(work, 
struct amdgpu_recover_work_struct, base);
+
+   recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, 
recover_work->job);
+}
+/*
+ * Serialize gpu recover into reset domain single threaded wq
+ */
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+   struct amdgpu_job *job)
+{
+   struct amdgpu_recover_work_struct work = {.adev = adev, .job = job};
+
+   INIT_WORK(, amdgpu_device_queue_gpu_recover_work);
+
+   if (!queue_work(adev->reset_domain.wq, ))
+   return -EAGAIN;
+
+   flush_work();
+
+   return work.ret;
+}
+
 /**
  * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index bfc47bea23db..38c9fd7b7ad4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
if (amdgpu_device_should_recover_gpu(ring->adev)) {
-   amdgpu_device_gpu_recover(ring->adev, job);
+   amdgpu_device_gpu_recover_imp(ring->adev, job);
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
-- 
2.25.1



[RFC v2 3/8] drm/amdgpu: Fix crash on modprobe

2021-12-22 Thread Andrey Grodzovsky
Restrict jobs resubmission to suspend case
only since schedulers not initialised yet on
probe.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 5527c68c51de..8ebd954e06c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler) {
+   if (adev->in_suspend && !ring->no_scheduler) {
drm_sched_resubmit_jobs(>sched);
drm_sched_start(>sched, true);
}
-- 
2.25.1



[RFC v2 1/8] drm/amdgpu: Introduce reset domain

2021-12-22 Thread Andrey Grodzovsky
Defined a reset_domain struct such that
all the entities that go through reset
together will be serialized one against
another. Do it for both single device and
XGMI hive cases.

Signed-off-by: Andrey Grodzovsky 
Suggested-by: Daniel Vetter 
Suggested-by: Christian König 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  2 ++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9f017663ac50..b5ff76aae7e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -812,6 +812,11 @@ struct amd_powerplay {
 
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
+
+struct amdgpu_reset_domain {
+   struct workqueue_struct *wq;
+};
+
 struct amdgpu_device {
struct device   *dev;
struct pci_dev  *pdev;
@@ -1096,6 +1101,8 @@ struct amdgpu_device {
 
struct amdgpu_reset_control *reset_cntl;
uint32_t
ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
+
+   struct amdgpu_reset_domain  reset_domain;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 90d22a376632..0f3e6c078f88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;
 
-   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   if (adev->gmc.xgmi.num_physical_nodes > 1) {
+   struct amdgpu_hive_info *hive;
+
amdgpu_xgmi_add_device(adev);
 
+   hive = amdgpu_get_xgmi_hive(adev);
+   if (!hive || !hive->reset_domain.wq) {
+   DRM_ERROR("Failed to obtain reset domain info for XGMI 
hive:%llx", hive->hive_id);
+   r = -EINVAL;
+   goto init_failed;
+   }
+
+   adev->reset_domain.wq = hive->reset_domain.wq;
+   } else {
+   adev->reset_domain.wq = 
alloc_ordered_workqueue("amdgpu-reset-dev", 0);
+   if (!adev->reset_domain.wq) {
+   r = -ENOMEM;
+   goto init_failed;
+   }
+   }
+
/* Don't init kfd if whole hive need to be reset during init */
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 567df2db23ac..a858e3457c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -392,6 +392,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
goto pro_end;
}
 
+   hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0);
+   if (!hive->reset_domain.wq) {
+   dev_err(adev->dev, "XGMI: failed allocating wq for reset 
domain!\n");
+   kfree(hive);
+   hive = NULL;
+   goto pro_end;
+   }
+
hive->hive_id = adev->gmc.xgmi.hive_id;
INIT_LIST_HEAD(>device_list);
INIT_LIST_HEAD(>node);
@@ -401,6 +409,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
task_barrier_init(>tb);
hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
hive->hi_req_gpu = NULL;
+
/*
 * hive pstate on boot is high in vega20 so we have to go to low
 * pstate on after boot.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index d2189bf7d428..6121aaa292cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,8 @@ struct amdgpu_hive_info {
AMDGPU_XGMI_PSTATE_MAX_VEGA20,
AMDGPU_XGMI_PSTATE_UNKNOWN
} pstate;
+
+   struct amdgpu_reset_domain reset_domain;
 };
 
 struct amdgpu_pcs_ras_field {
-- 
2.25.1



[RFC v2 0/8] Define and use reset domain for GPU recovery in amdgpu

2021-12-22 Thread Andrey Grodzovsky
This patchset is based on earlier work by Boris[1] that allowed to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. On top of that I also serialized
any GPU reset we trigger from within amdgpu code to also go through the same
ordered wq and in this way simplify somewhat our GPU reset code so we don't need
to protect from concurrency by multiple GPU reset triggeres such as TDR on one
hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that
all the entities that go through reset together will be serialized one against
another. 

TDR triggered by multiple entities within the same domain due to the same 
reason will not
be triggered as the first such reset will cancel all the pending resets. This is
relevant only to TDR timers and not to triggered resets coming from RAS or 
SYSFS,
those will still happen after the in flight resets finishes.

v2:
Add handling on SRIOV configuration, the reset notify coming from host 
and driver already trigger a work queue to handle the reset so drop this
intermidiate wq and send directly to timeout wq. (Shaoyun)

[1] 
https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezil...@collabora.com/

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work 
hasn't landed yet there.

Andrey Grodzovsky (8):
  drm/amdgpu: Introduce reset domain
  drm/amdgpu: Move scheduler init to after XGMI is ready
  drm/amdgpu: Fix crash on modprobe
  drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.
  drm/amdgpu: Drop hive->in_reset
  drm/amdgpu: Drop concurrent GPU reset protection for device
  drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  18 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  18 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  |   7 +-
 10 files changed, 147 insertions(+), 164 deletions(-)

-- 
2.25.1



Re: [PATCH 20/24] dma-buf: add DMA_RESV_USAGE_KERNEL

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:07PM +0100, Christian König wrote:
> Add an usage for kernel submissions. Waiting for those
> are mandatory for dynamic DMA-bufs.
> 
> Signed-off-by: Christian König 

Again just skipping to the doc bikeshedding, maybe with more cc others
help with some code review too.

>  EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 4f3a6abf43c4..29d71496 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -54,8 +54,30 @@ struct dma_resv_list;
>   *
>   * This enum describes the different use cases for a dma_resv object and
>   * controls which fences are returned when queried.
> + *
> + * An important fact is that there is the order KERNEL + * when the dma_resv object is asked for fences for one use case the fences
> + * for the lower use case are returned as well.
> + *
> + * For example when asking for WRITE fences then the KERNEL fences are 
> returned
> + * as well. Similar when asked for READ fences then both WRITE and KERNEL
> + * fences are returned as well.
>   */
>  enum dma_resv_usage {
> + /**
> +  * @DMA_RESV_USAGE_KERNEL: For in kernel memory management only.
> +  *
> +  * This should only be used for things like copying or clearing memory
> +  * with a DMA hardware engine for the purpose of kernel memory
> +  * management.
> +  *
> + * Drivers *always* need to wait for those fences before accessing 
> the

s/need to/must/ to stay with usual RFC wording. It's a hard requirement or
there's a security bug somewhere.

> +  * resource protected by the dma_resv object. The only exception for
> +  * that is when the resource is known to be locked down in place by
> +  * pinning it previously.

Is this true? This sounds more confusing than helpful, because afaik in
general our pin interfaces do not block for any kernel fences. dma_buf_pin
doesn't do that for sure. And I don't think ttm does that either.

I think the only safe thing here is to state that it's safe if a) the
resource is pinned down and b) the callers has previously waited for the
kernel fences.

I also think we should put that wait for kernel fences into dma_buf_pin(),
but that's maybe a later patch.
-Daniel



> +  */
> + DMA_RESV_USAGE_KERNEL,
> +
>   /**
>* @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
>*
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 18/24] dma-buf: add enum dma_resv_usage v3

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:05PM +0100, Christian König wrote:
> This change adds the dma_resv_usage enum and allows us to specify why a
> dma_resv object is queried for its containing fences.
> 
> Additional to that a dma_resv_usage_rw() helper function is added to aid
> retrieving the fences for a read or write userspace submission.
> 
> This is then deployed to the different query functions of the dma_resv
> object and all of their users. When the write paratermer was previously
> true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise.
> 
> v2: add KERNEL/OTHER in separate patch
> v3: some kerneldoc suggestions by Daniel
> 
> Signed-off-by: Christian König 

Just commenting on the kerneldoc here.

> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 40ac9d486f8f..d96d8ca9af56 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -49,6 +49,49 @@ extern struct ww_class reservation_ww_class;
>  
>  struct dma_resv_list;
>  
> +/**
> + * enum dma_resv_usage - how the fences from a dma_resv obj are used
> + *
> + * This enum describes the different use cases for a dma_resv object and
> + * controls which fences are returned when queried.

We need to link here to both dma_buf.resv and from there to here.

Also we had a fair amount of text in the old dma_resv fields which should
probably be included here.

> + */
> +enum dma_resv_usage {
> + /**
> +  * @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
> +  *
> +  * This should only be used for userspace command submissions which add
> +  * an implicit write dependency.
> +  */
> + DMA_RESV_USAGE_WRITE,
> +
> + /**
> +  * @DMA_RESV_USAGE_READ: Implicit read synchronization.
> +  *
> +  * This should only be used for userspace command submissions which add
> +  * an implicit read dependency.

I think the above would benefit from at least a link each to _buf.resv
for further discusion.

Plus the READ flag needs a huge warning that in general it does _not_
guarantee that neither there's no writes possible, nor that the writes can
be assumed mistakes and dropped (on buffer moves e.g.).

Drivers can only make further assumptions for driver-internal dma_resv
objects (e.g. on vm/pagetables) or when the fences are all fences of the
same driver (e.g. the special sync rules amd has that takes the fence
owner into account).

We have this documented in the dma_buf.resv rules, but since it came up
again in a discussion with Thomas H. somewhere, it's better to hammer this
in a few more time. Specically in generally ignoring READ fences for
buffer moves (well the copy job, memory freeing still has to wait for all
of them) is a correctness bug.

Maybe include a big warning that really the difference between READ and
WRITE should only matter for implicit sync, and _not_ for anything else
the kernel does.

I'm assuming the actual replacement is all mechanical, so I skipped that
one for now, that's for next year :-)
-Daniel

> +  */
> + DMA_RESV_USAGE_READ,
> +};
> +
> +/**
> + * dma_resv_usage_rw - helper for implicit sync
> + * @write: true if we create a new implicit sync write
> + *
> + * This returns the implicit synchronization usage for write or read 
> accesses,
> + * see enum dma_resv_usage.
> + */
> +static inline enum dma_resv_usage dma_resv_usage_rw(bool write)
> +{
> + /* This looks confusing at first sight, but is indeed correct.
> +  *
> +  * The rational is that new write operations needs to wait for the
> +  * existing read and write operations to finish.
> +  * But a new read operation only needs to wait for the existing write
> +  * operations to finish.
> +  */
> + return write ? DMA_RESV_USAGE_READ : DMA_RESV_USAGE_WRITE;
> +}
> +
>  /**
>   * struct dma_resv - a reservation object manages fences for a buffer
>   *
> @@ -147,8 +190,8 @@ struct dma_resv_iter {
>   /** @obj: The dma_resv object we iterate over */
>   struct dma_resv *obj;
>  
> - /** @all_fences: If all fences should be returned */
> - bool all_fences;
> + /** @usage: Controls which fences are returned */
> + enum dma_resv_usage usage;
>  
>   /** @fence: the currently handled fence */
>   struct dma_fence *fence;
> @@ -178,14 +221,14 @@ struct dma_fence *dma_resv_iter_next(struct 
> dma_resv_iter *cursor);
>   * dma_resv_iter_begin - initialize a dma_resv_iter object
>   * @cursor: The dma_resv_iter object to initialize
>   * @obj: The dma_resv object which we want to iterate over
> - * @all_fences: If all fences should be returned or just the exclusive one
> + * @usage: controls which fences to include, see enum dma_resv_usage.
>   */
>  static inline void dma_resv_iter_begin(struct dma_resv_iter *cursor,
>  struct dma_resv *obj,
> -bool all_fences)
> +enum dma_resv_usage usage)
>  {
>   

Re: [Intel-gfx] [PATCH] drm/i915/guc: Log engine resets

2021-12-22 Thread John Harrison

On 12/22/2021 08:21, Tvrtko Ursulin wrote:

On 21/12/2021 22:14, John Harrison wrote:

On 12/21/2021 05:37, Tvrtko Ursulin wrote:

On 20/12/2021 18:34, John Harrison wrote:

On 12/20/2021 07:00, Tvrtko Ursulin wrote:

On 17/12/2021 16:22, Matthew Brost wrote:

On Fri, Dec 17, 2021 at 12:15:53PM +, Tvrtko Ursulin wrote:


On 14/12/2021 15:07, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Log engine resets done by the GuC firmware in the similar way 
it is done

by the execlists backend.

This way we have notion of where the hangs are before the GuC 
gains

support for proper error capture.


Ping - any interest to log this info?

All there currently is a non-descriptive "[drm] GPU HANG: ecode
12:0:".



Yea, this could be helpful. One suggestion below.

Also, will GuC be reporting the reason for the engine reset at 
any point?




We are working on the error state capture, presumably the 
registers will

give a clue what caused the hang.

As for the GuC providing a reason, that isn't defined in the 
interface
but that is decent idea to provide a hint in G2H what the issue 
was. Let
me run that by the i915 GuC developers / GuC firmware team and 
see what

they think.

The GuC does not do any hang analysis. So as far as GuC is 
concerned, the reason is pretty much always going to be pre-emption 
timeout. There are a few ways the pre-emption itself could be 
triggered but basically, if GuC resets an active context then it is 
because it did not pre-empt quickly enough when requested.




Regards,

Tvrtko


Signed-off-by: Tvrtko Ursulin 
Cc: Matthew Brost 
Cc: John Harrison 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 
+++-

   1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 9739da6f..51512123dc1a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
   #include "gt/intel_context.h"
   #include "gt/intel_engine_pm.h"
   #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_engine_user.h"
   #include "gt/intel_gpu_commands.h"
   #include "gt/intel_gt.h"
   #include "gt/intel_gt_clock_utils.h"
@@ -3934,9 +3935,18 @@ static void capture_error_state(struct 
intel_guc *guc,

   {
   struct intel_gt *gt = guc_to_gt(guc);
   struct drm_i915_private *i915 = gt->i915;
-    struct intel_engine_cs *engine = 
__context_to_physical_engine(ce);

+    struct intel_engine_cs *engine = ce->engine;
   intel_wakeref_t wakeref;
+    if (intel_engine_is_virtual(engine)) {
+    drm_notice(>drm, "%s class, engines 0x%x; GuC 
engine reset\n",

+ intel_engine_class_repr(engine->class),
+   engine->mask);
+    engine = guc_virtual_get_sibling(engine, 0);
+    } else {
+    drm_notice(>drm, "%s GuC engine reset\n", 
engine->name);


Probably include the guc_id of the context too then?


Is the guc id stable and useful on its own - who would be the user?
The GuC id is the only thing that matters when trying to correlate 
KMD activity with a GuC log. So while it might not be of any use or 
interest to an end user, it is extremely important and useful to a 
kernel developer attempting to debug an issue. And that includes 
bug reports from end users that are hard to repro given that the 
standard error capture will include the GuC log.


On the topic of GuC log - is there a tool in IGT (or will be) which 
will parse the bit saved in the error capture or how is that 
supposed to be used?

Nope.

However, Alan is currently working on supporting the GuC error 
capture mechanism. Prior to sending the reset notification to the 
KMD, the GuC will save a whole bunch of register state to a memory 
buffer and send a notification to the KMD that this is available. 
When we then get the actual reset notification, we need to match the 
two together and include a parsed, human readable version of the 
GuC's capture state buffer in the sysfs error log output.


The GuC log should not be involved in this process. And note that any 
register dumps in the GuC log are limited in scope and only enabled 
at higher verbosity levels. Whereas, the official state capture is 
based on a register list provided by the KMD and is available 
irrespective of debug CONFIG settings, verbosity levels, etc.


Hm why should GuC log not be involved now? I thought earlier you said:

"""
And that includes bug reports from end users that are hard to repro 
given that the standard error capture will include the GuC log.

"""

Hence I thought there would be a tool in IGT which would parse the 
part saved inside the error capture.

Different things.

The GuC log is not involved in capturing hardware register state and 
reporting that as part of the sysfs error capture that user's can read 
out. The GuC needs to do the state capture for us if it is doing the 
reset, but it is provided via a 

Re: [PATCH 17/24] drm/amdgpu: use dma_resv_get_singleton in amdgpu_pasid_free_cb

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:04PM +0100, Christian König wrote:
> Makes the code a bit more simpler.
> 
> Signed-off-by: Christian König 

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 23 +++
>  1 file changed, 3 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index be48487e2ca7..888d97143177 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -107,36 +107,19 @@ static void amdgpu_pasid_free_cb(struct dma_fence 
> *fence,
>  void amdgpu_pasid_free_delayed(struct dma_resv *resv,
>  u32 pasid)
>  {
> - struct dma_fence *fence, **fences;
>   struct amdgpu_pasid_cb *cb;
> - unsigned count;
> + struct dma_fence *fence;
>   int r;
>  
> - r = dma_resv_get_fences(resv, true, , );
> + r = dma_resv_get_singleton(resv, true, );
>   if (r)
>   goto fallback;
>  
> - if (count == 0) {
> + if (!fence) {
>   amdgpu_pasid_free(pasid);
>   return;
>   }
>  
> - if (count == 1) {
> - fence = fences[0];
> - kfree(fences);
> - } else {
> - uint64_t context = dma_fence_context_alloc(1);
> - struct dma_fence_array *array;
> -
> - array = dma_fence_array_create(count, fences, context,
> -1, false);
> - if (!array) {
> - kfree(fences);
> - goto fallback;
> - }
> - fence = >base;
> - }
> -
>   cb = kmalloc(sizeof(*cb), GFP_KERNEL);
>   if (!cb) {
>   /* Last resort when we are OOM */
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 16/24] drm/nouveau: support more than one write fence in fenv50_wndw_prepare_fb

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:03PM +0100, Christian König wrote:
> Use dma_resv_get_singleton() here to eventually get more than one write
> fence as single fence.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/nouveau/dispnv50/wndw.c | 14 +-
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
> b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
> index 133c8736426a..b55a8a723581 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
> @@ -536,8 +536,6 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct 
> drm_plane_state *state)
>   struct nouveau_bo *nvbo;
>   struct nv50_head_atom *asyh;
>   struct nv50_wndw_ctxdma *ctxdma;
> - struct dma_resv_iter cursor;
> - struct dma_fence *fence;
>   int ret;
>  
>   NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, fb);
> @@ -560,13 +558,11 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct 
> drm_plane_state *state)
>   asyw->image.handle[0] = ctxdma->object.handle;
>   }
>  
> - dma_resv_iter_begin(, nvbo->bo.base.resv, false);
> - dma_resv_for_each_fence_unlocked(, fence) {
> - /* TODO: We only use the first writer here */
> - asyw->state.fence = dma_fence_get(fence);
> - break;
> - }
> - dma_resv_iter_end();
> + ret = dma_resv_get_singleton(nvbo->bo.base.resv, false,
> +  >state.fence);

Needs nouveau-ack, but otherwise lgtm.

Reviewed-by: Daniel Vetter 

> + if (ret)
> + return ret;
> +
>   asyw->image.offset[0] = nvbo->offset;
>  
>   if (wndw->func->prepare) {
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 15/24] drm: support more than one write fence in drm_gem_plane_helper_prepare_fb

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:02PM +0100, Christian König wrote:
> Use dma_resv_get_singleton() here to eventually get more than one write
> fence as single fence.
> 
> Signed-off-by: Christian König 

Patch title should be drm/atomic-helper: prefix, not just drm:

With that nit:

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/drm_gem_atomic_helper.c | 18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c 
> b/drivers/gpu/drm/drm_gem_atomic_helper.c
> index c3189afe10cb..9338ddb7edff 100644
> --- a/drivers/gpu/drm/drm_gem_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c
> @@ -143,25 +143,21 @@
>   */
>  int drm_gem_plane_helper_prepare_fb(struct drm_plane *plane, struct 
> drm_plane_state *state)
>  {
> - struct dma_resv_iter cursor;
>   struct drm_gem_object *obj;
>   struct dma_fence *fence;
> + int ret;
>  
>   if (!state->fb)
>   return 0;
>  
>   obj = drm_gem_fb_get_obj(state->fb, 0);
> - dma_resv_iter_begin(, obj->resv, false);
> - dma_resv_for_each_fence_unlocked(, fence) {
> - /* TODO: Currently there should be only one write fence, so this
> -  * here works fine. But drm_atomic_set_fence_for_plane() should
> -  * be changed to be able to handle more fences in general for
> -  * multiple BOs per fb anyway. */
> - dma_fence_get(fence);
> - break;
> - }
> - dma_resv_iter_end();
> + ret = dma_resv_get_singleton(obj->resv, false, );
> + if (ret)
> + return ret;
>  
> + /* TODO: drm_atomic_set_fence_for_plane() should be changed to be able
> +  * to handle more fences in general for multiple BOs per fb.
> +  */
>   drm_atomic_set_fence_for_plane(state, fence);
>   return 0;
>  }
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 14/24] dma-buf/drivers: make reserving a shared slot mandatory v2

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:01PM +0100, Christian König wrote:
> Audit all the users of dma_resv_add_excl_fence() and make sure they
> reserve a shared slot also when only trying to add an exclusive fence.
> 
> This is the next step towards handling the exclusive fence like a
> shared one.
> 
> v2: fix missed case in amdgpu
> 
> Signed-off-by: Christian König 

Needs all the driver cc and also at least some acks/testing.

> ---
>  drivers/dma-buf/st-dma-resv.c | 64 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  8 +++
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |  8 +--
>  drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |  3 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  8 +--
>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  5 +-
>  drivers/gpu/drm/i915/i915_vma.c   |  6 ++
>  .../drm/i915/selftests/intel_memory_region.c  |  7 ++
>  drivers/gpu/drm/lima/lima_gem.c   | 10 ++-
>  drivers/gpu/drm/msm/msm_gem_submit.c  | 18 +++---
>  drivers/gpu/drm/nouveau/nouveau_fence.c   |  9 +--
>  drivers/gpu/drm/panfrost/panfrost_job.c   |  4 ++
>  drivers/gpu/drm/ttm/ttm_bo_util.c | 12 +++-
>  drivers/gpu/drm/ttm/ttm_execbuf_util.c| 11 ++--

vc4 seems missing?

Also I think I found one bug below in the conversions.
-Daniel


>  drivers/gpu/drm/v3d/v3d_gem.c | 15 +++--
>  drivers/gpu/drm/vgem/vgem_fence.c | 12 ++--
>  drivers/gpu/drm/virtio/virtgpu_gem.c  |  9 +++
>  drivers/gpu/drm/vmwgfx/vmwgfx_bo.c| 16 +++--
>  18 files changed, 133 insertions(+), 92 deletions(-)
> 
> diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
> index cbe999c6e7a6..f33bafc78693 100644
> --- a/drivers/dma-buf/st-dma-resv.c
> +++ b/drivers/dma-buf/st-dma-resv.c
> @@ -75,17 +75,16 @@ static int test_signaling(void *arg, bool shared)
>   goto err_free;
>   }
>  
> - if (shared) {
> - r = dma_resv_reserve_shared(, 1);
> - if (r) {
> - pr_err("Resv shared slot allocation failed\n");
> - goto err_unlock;
> - }
> + r = dma_resv_reserve_shared(, 1);
> + if (r) {
> + pr_err("Resv shared slot allocation failed\n");
> + goto err_unlock;
> + }
>  
> + if (shared)
>   dma_resv_add_shared_fence(, f);
> - } else {
> + else
>   dma_resv_add_excl_fence(, f);
> - }
>  
>   if (dma_resv_test_signaled(, shared)) {
>   pr_err("Resv unexpectedly signaled\n");
> @@ -134,17 +133,16 @@ static int test_for_each(void *arg, bool shared)
>   goto err_free;
>   }
>  
> - if (shared) {
> - r = dma_resv_reserve_shared(, 1);
> - if (r) {
> - pr_err("Resv shared slot allocation failed\n");
> - goto err_unlock;
> - }
> + r = dma_resv_reserve_shared(, 1);
> + if (r) {
> + pr_err("Resv shared slot allocation failed\n");
> + goto err_unlock;
> + }
>  
> + if (shared)
>   dma_resv_add_shared_fence(, f);
> - } else {
> + else
>   dma_resv_add_excl_fence(, f);
> - }
>  
>   r = -ENOENT;
>   dma_resv_for_each_fence(, , shared, fence) {
> @@ -206,18 +204,17 @@ static int test_for_each_unlocked(void *arg, bool 
> shared)
>   goto err_free;
>   }
>  
> - if (shared) {
> - r = dma_resv_reserve_shared(, 1);
> - if (r) {
> - pr_err("Resv shared slot allocation failed\n");
> - dma_resv_unlock();
> - goto err_free;
> - }
> + r = dma_resv_reserve_shared(, 1);
> + if (r) {
> + pr_err("Resv shared slot allocation failed\n");
> + dma_resv_unlock();
> + goto err_free;
> + }
>  
> + if (shared)
>   dma_resv_add_shared_fence(, f);
> - } else {
> + else
>   dma_resv_add_excl_fence(, f);
> - }
>   dma_resv_unlock();
>  
>   r = -ENOENT;
> @@ -290,18 +287,17 @@ static int test_get_fences(void *arg, bool shared)
>   goto err_resv;
>   }
>  
> - if (shared) {
> - r = dma_resv_reserve_shared(, 1);
> - if (r) {
> - pr_err("Resv shared slot allocation failed\n");
> - dma_resv_unlock();
> - goto err_resv;
> - }
> + r = dma_resv_reserve_shared(, 1);
> + if (r) {
> + pr_err("Resv shared slot allocation failed\n");
> + dma_resv_unlock();
> + goto err_resv;
> + }
>  
> + if (shared)
>   dma_resv_add_shared_fence(, f);
> - } else {
> + else
>   dma_resv_add_excl_fence(, f);
> - }
>   dma_resv_unlock();
>  
>   r = dma_resv_get_fences(, shared, , );
> 

Re: [PATCH 13/24] dma-buf: drop the DAG approach for the dma_resv object

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:34:00PM +0100, Christian König wrote:
> So far we had the approach of using a directed acyclic
> graph with the dma_resv obj.
> 
> This turned out to have many downsides, especially it means
> that every single driver and user of this interface needs
> to be aware of this restriction when adding fences. If the
> rules for the DAG are not followed then we end up with
> potential hard to debug memory corruption, information
> leaks or even elephant big security holes because we allow
> userspace to access freed up memory.
> 
> Since we already took a step back from that by always
> looking at all fences we now go a step further and stop
> dropping the shared fences when a new exclusive one is
> added.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c | 13 -
>  1 file changed, 13 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 9acceabc9399..ecb2ff606bac 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c

No doc update at all!

I checked, we're not that shitty with docs, Minimally the DOC: section
header and also the struct dma_resv kerneldoc. Also there's maybe more
references and stuff I've missed on a quick look, please check for them
(e.g. dma_buf.resv kerneldoc is rather important to keep correct too).

Code itself does what it says in the commit message, but we really need
the most accurate docs we can get for this stuff, or the confusion will
persist :-/

Cheers, Daniel

> @@ -383,29 +383,16 @@ EXPORT_SYMBOL(dma_resv_replace_fences);
>  void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
>  {
>   struct dma_fence *old_fence = dma_resv_excl_fence(obj);
> - struct dma_resv_list *old;
> - u32 i = 0;
>  
>   dma_resv_assert_held(obj);
>  
> - old = dma_resv_shared_list(obj);
> - if (old)
> - i = old->shared_count;
> -
>   dma_fence_get(fence);
>  
>   write_seqcount_begin(>seq);
>   /* write_seqcount_begin provides the necessary memory barrier */
>   RCU_INIT_POINTER(obj->fence_excl, fence);
> - if (old)
> - old->shared_count = 0;
>   write_seqcount_end(>seq);
>  
> - /* inplace update, no shared fences */
> - while (i--)
> - dma_fence_put(rcu_dereference_protected(old->shared[i],
> - dma_resv_held(obj)));
> -
>   dma_fence_put(old_fence);
>  }
>  EXPORT_SYMBOL(dma_resv_add_excl_fence);
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 12/24] dma-buf: finally make dma_resv_excl_fence private

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:59PM +0100, Christian König wrote:
> Drivers should never touch this directly.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c | 17 +
>  include/linux/dma-resv.h   | 17 -
>  2 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 694716a3d66d..9acceabc9399 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -147,6 +147,23 @@ void dma_resv_fini(struct dma_resv *obj)
>  }
>  EXPORT_SYMBOL(dma_resv_fini);
>  
> +/**
> + * dma_resv_excl_fence - return the object's exclusive fence
> + * @obj: the reservation object
> + *
> + * Returns the exclusive fence (if any). Caller must either hold the objects
> + * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),
> + * or one of the variants of each
> + *
> + * RETURNS
> + * The exclusive fence or NULL
> + */

Same thing with us not documenting internals, pls drop the comment
outright it doesn't really explain anything. With that:

Reviewed-by: Daniel Vetter 

> +static inline struct dma_fence *
> +dma_resv_excl_fence(struct dma_resv *obj)
> +{
> + return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj));
> +}
> +
>  /**
>   * dma_resv_shared_list - get the reservation object's shared fence list
>   * @obj: the reservation object
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index cdfbbda6f600..40ac9d486f8f 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -412,23 +412,6 @@ static inline void dma_resv_unlock(struct dma_resv *obj)
>   ww_mutex_unlock(>lock);
>  }
>  
> -/**
> - * dma_resv_excl_fence - return the object's exclusive fence
> - * @obj: the reservation object
> - *
> - * Returns the exclusive fence (if any). Caller must either hold the objects
> - * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),
> - * or one of the variants of each
> - *
> - * RETURNS
> - * The exclusive fence or NULL
> - */
> -static inline struct dma_fence *
> -dma_resv_excl_fence(struct dma_resv *obj)
> -{
> - return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj));
> -}
> -
>  void dma_resv_init(struct dma_resv *obj);
>  void dma_resv_fini(struct dma_resv *obj);
>  int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 11/24] drm/amdgpu: use dma_resv_for_each_fence for CS workaround

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:58PM +0100, Christian König wrote:
> Get the write fence using dma_resv_for_each_fence instead of accessing
> it manually.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 53e407ea4c89..7facd614e50a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1268,6 +1268,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>   struct dma_resv *resv = e->tv.bo->base.resv;
>   struct dma_fence_chain *chain = e->chain;
> + struct dma_resv_iter cursor;
> + struct dma_fence *fence;
>  
>   if (!chain)
>   continue;
> @@ -1277,9 +1279,10 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>* submission in a dma_fence_chain and add it as exclusive
>* fence.
>*/
> - dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
> -  dma_fence_get(p->fence), 1);
> -
> + dma_resv_for_each_fence(, resv, false, fence) {
> + break;
> + }
> + dma_fence_chain_init(chain, fence, dma_fence_get(p->fence), 1);

Uh this needs a TODO. I'm assuming you'll fix this up later on when
there's more than write fence, but in case of bisect or whatever this is a
bit too clever. Like you just replace one "dig around in dma-resv
implementation details" with one that's not even a documented interface
:-)

With an adequately loud comment added interim:

Reviewed-by: Daniel Vetter 

>   rcu_assign_pointer(resv->fence_excl, >base);
>   e->chain = NULL;
>   }
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 10/24] drm/amdgpu: remove excl as shared workarounds

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:57PM +0100, Christian König wrote:
> This was added because of the now dropped shared on excl dependency.
> 
> Signed-off-by: Christian König 

I didn't do a full re-audit of whether you got them all, I think latest
with the semantic change to allow more kinds of fence types with dma-resv
we should catch them all.

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 5 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 --
>  2 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 0311d799a010..53e407ea4c89 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1275,14 +1275,11 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser 
> *p,
>   /*
>* Work around dma_resv shortcommings by wrapping up the
>* submission in a dma_fence_chain and add it as exclusive
> -  * fence, but first add the submission as shared fence to make
> -  * sure that shared fences never signal before the exclusive
> -  * one.
> +  * fence.
>*/
>   dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
>dma_fence_get(p->fence), 1);
>  
> - dma_resv_add_shared_fence(resv, p->fence);
>   rcu_assign_pointer(resv->fence_excl, >base);
>   e->chain = NULL;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a1e63ba4c54a..85d31d85c384 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -226,12 +226,6 @@ static void amdgpu_gem_object_close(struct 
> drm_gem_object *obj,
>   if (!amdgpu_vm_ready(vm))
>   goto out_unlock;
>  
> - fence = dma_resv_excl_fence(bo->tbo.base.resv);
> - if (fence) {
> - amdgpu_bo_fence(bo, fence, true);
> - fence = NULL;
> - }
> -
>   r = amdgpu_vm_clear_freed(adev, vm, );
>   if (r || !fence)
>   goto out_unlock;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 08/24] drm/vmwgfx: stop using dma_resv_excl_fence

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:55PM +0100, Christian König wrote:
> Instead use the new dma_resv_get_singleton function.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> index 8d1e869cc196..23c3fc2cbf10 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> @@ -1168,8 +1168,10 @@ int vmw_resources_clean(struct vmw_buffer_object *vbo, 
> pgoff_t start,
>   vmw_bo_fence_single(bo, NULL);
>   if (bo->moving)
>   dma_fence_put(bo->moving);
> - bo->moving = dma_fence_get
> - (dma_resv_excl_fence(bo->base.resv));
> +
> + /* TODO: This is actually a memory management dependency */
> + return dma_resv_get_singleton(bo->base.resv, false,
> +   >moving);

Reviewed-by: Daniel Vetter 

>   }
>  
>   return 0;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 09/24] drm/radeon: stop using dma_resv_excl_fence

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:56PM +0100, Christian König wrote:
> Instead use the new dma_resv_get_singleton function.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/radeon/radeon_display.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
> b/drivers/gpu/drm/radeon/radeon_display.c
> index 573154268d43..a6f875118f01 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -533,7 +533,12 @@ static int radeon_crtc_page_flip_target(struct drm_crtc 
> *crtc,
>   DRM_ERROR("failed to pin new rbo buffer before flip\n");
>   goto cleanup;
>   }
> - work->fence = 
> dma_fence_get(dma_resv_excl_fence(new_rbo->tbo.base.resv));
> + r = dma_resv_get_singleton(new_rbo->tbo.base.resv, false, >fence);
> + if (r) {
> + radeon_bo_unreserve(new_rbo);
> + DRM_ERROR("failed to get new rbo buffer fences\n");
> + goto cleanup;
> + }

Reviewed-by: Daniel Vetter 

>   radeon_bo_get_tiling_flags(new_rbo, _flags, NULL);
>   radeon_bo_unreserve(new_rbo);
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 07/24] drm/nouveau: stop using dma_resv_excl_fence

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:54PM +0100, Christian König wrote:
> Instead use the new dma_resv_get_singleton function.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/nouveau/nouveau_bo.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index fa73fe57f97b..74f8652d2bd3 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -959,7 +959,14 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
>  {
>   struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
>   struct drm_device *dev = drm->dev;
> - struct dma_fence *fence = dma_resv_excl_fence(bo->base.resv);
> + struct dma_fence *fence;
> + int ret;
> +
> + /* TODO: This is actually a memory management dependency */
> + ret = dma_resv_get_singleton(bo->base.resv, false, );
> + if (ret)
> + dma_resv_wait_timeout(bo->base.resv, false, false,
> +   MAX_SCHEDULE_TIMEOUT);

Needs ack from nouveau folks.

Reviewed-by: Daniel Vetter 

>  
>   nv10_bo_put_tile_region(dev, *old_tile, fence);
>   *old_tile = new_tile;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 06/24] drm/etnaviv: stop using dma_resv_excl_fence

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:53PM +0100, Christian König wrote:
> We can get the excl fence together with the shared ones as well.
> 
> Signed-off-by: Christian König 

Pls cc driver maintainers.

dim add-missing-cc

is your friend if you're lazy can even combine that with git rebase -x.
Same for all the other driver patches, some acks/testing would be good to
avoid fallout (we had a bit much of that with all these I think).

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.h|  1 -
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 14 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c  | 10 --
>  3 files changed, 5 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> index 98e60df882b6..f596d743baa3 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> @@ -80,7 +80,6 @@ struct etnaviv_gem_submit_bo {
>   u64 va;
>   struct etnaviv_gem_object *obj;
>   struct etnaviv_vram_mapping *mapping;
> - struct dma_fence *excl;
>   unsigned int nr_shared;
>   struct dma_fence **shared;
>  };
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> index 64c90ff348f2..4286dc93fdaa 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> @@ -188,15 +188,11 @@ static int submit_fence_sync(struct etnaviv_gem_submit 
> *submit)
>   if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
>   continue;
>  
> - if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> - ret = dma_resv_get_fences(robj, true, >nr_shared,
> -   >shared);
> - if (ret)
> - return ret;
> - } else {
> - bo->excl = dma_fence_get(dma_resv_excl_fence(robj));
> - }
> -
> + ret = dma_resv_get_fences(robj,
> +   !!(bo->flags & ETNA_SUBMIT_BO_WRITE),

Afaik the cast to bool !! here is overkill, compiler will do that for you
or something like that. With that dropped:

Reviewed-by: Daniel Vetter 


> +   >nr_shared, >shared);
> + if (ret)
> + return ret;
>   }
>  
>   return ret;
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 180bb633d5c5..8c038a363d15 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -39,16 +39,6 @@ etnaviv_sched_dependency(struct drm_sched_job *sched_job,
>   struct etnaviv_gem_submit_bo *bo = >bos[i];
>   int j;
>  
> - if (bo->excl) {
> - fence = bo->excl;
> - bo->excl = NULL;
> -
> - if (!dma_fence_is_signaled(fence))
> - return fence;
> -
> - dma_fence_put(fence);
> - }
> -
>   for (j = 0; j < bo->nr_shared; j++) {
>   if (!bo->shared[j])
>   continue;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 05/24] RDMA: use dma_resv_wait() instead of extracting the fence

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:52PM +0100, Christian König wrote:
> Use dma_resv_wait() instead of extracting the exclusive fence and
> waiting on it manually.
> 
> Signed-off-by: Christian König 

No rdma lists nor maintainers on cc, so no chances to get the ack you need
to merge this through drm-misc-next.


> ---
>  drivers/infiniband/core/umem_dmabuf.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem_dmabuf.c 
> b/drivers/infiniband/core/umem_dmabuf.c
> index f0760741f281..d32cd7538835 100644
> --- a/drivers/infiniband/core/umem_dmabuf.c
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -16,7 +16,6 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf 
> *umem_dmabuf)
>  {
>   struct sg_table *sgt;
>   struct scatterlist *sg;
> - struct dma_fence *fence;
>   unsigned long start, end, cur = 0;
>   unsigned int nmap = 0;
>   int i;
> @@ -68,11 +67,8 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf 
> *umem_dmabuf)
>* may be not up-to-date. Wait for the exporter to finish
>* the migration.
>*/
> - fence = dma_resv_excl_fence(umem_dmabuf->attach->dmabuf->resv);
> - if (fence)
> - return dma_fence_wait(fence, false);
> -
> - return 0;
> + return dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv, false,
> +  false, MAX_SCHEDULE_TIMEOUT);

I think a wrapper for dma_resv_wait() without timeout would be neat, which
we lack. Either way:

Reviewed-by: Daniel Vetter 

>  }
>  EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:
> Add a function to simplify getting a single fence for all the fences in
> the dma_resv object.
> 
> v2: fix ref leak in error handling
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c | 52 ++
>  include/linux/dma-resv.h   |  2 ++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 480c305554a1..694716a3d66d 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -34,6 +34,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj, bool write,
>  }
>  EXPORT_SYMBOL_GPL(dma_resv_get_fences);
>  
> +/**
> + * dma_resv_get_singleton - Get a single fence for all the fences
> + * @obj: the reservation object
> + * @write: true if we should return all fences
> + * @fence: the resulting fence
> + *
> + * Get a single fence representing all the fences inside the resv object.
> + * Returns either 0 for success or -ENOMEM.
> + *
> + * Warning: This can't be used like this when adding the fence back to the 
> resv
> + * object since that can lead to stack corruption when finalizing the
> + * dma_fence_array.

Uh I don't get this one? I thought the only problem with nested fences is
the signalling recursion, which we work around with the irq_work?

Also if there's really an issue with dma_fence_array fences, then that
warning should be on the dma_resv kerneldoc, not somewhere hidden like
this. And finally I really don't see what can go wrong, sure we'll end up
with the same fence once in the dma_resv_list and then once more in the
fence array. But they're all refcounted, so really shouldn't matter.

The code itself looks correct, but me not understanding what even goes
wrong here freaks me out a bit.

I guess something to figure out next year, I kinda hoped I could squeeze a
review in before I disappear :-/
-Daniel

> + */
> +int dma_resv_get_singleton(struct dma_resv *obj, bool write,
> +struct dma_fence **fence)
> +{
> + struct dma_fence_array *array;
> + struct dma_fence **fences;
> + unsigned count;
> + int r;
> +
> + r = dma_resv_get_fences(obj, write, , );
> +if (r)
> + return r;
> +
> + if (count == 0) {
> + *fence = NULL;
> + return 0;
> + }
> +
> + if (count == 1) {
> + *fence = fences[0];
> + kfree(fences);
> + return 0;
> + }
> +
> + array = dma_fence_array_create(count, fences,
> +dma_fence_context_alloc(1),
> +1, false);
> + if (!array) {
> + while (count--)
> + dma_fence_put(fences[count]);
> + kfree(fences);
> + return -ENOMEM;
> + }
> +
> + *fence = >base;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dma_resv_get_singleton);
> +
>  /**
>   * dma_resv_wait_timeout - Wait on reservation's objects
>   * shared and/or exclusive fences.
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index fa2002939b19..cdfbbda6f600 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, 
> uint64_t context,
>  void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
>  int dma_resv_get_fences(struct dma_resv *obj, bool write,
>   unsigned int *num_fences, struct dma_fence ***fences);
> +int dma_resv_get_singleton(struct dma_resv *obj, bool write,
> +struct dma_fence **fence);
>  int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src);
>  long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr,
>  unsigned long timeout);
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 03/24] dma-buf: drop excl_fence parameter from dma_resv_get_fences

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:50PM +0100, Christian König wrote:
> Returning the exclusive fence separately is no longer used.
> 
> Instead add a write parameter to indicate the use case.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c   | 48 
>  drivers/dma-buf/st-dma-resv.c| 26 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c  |  6 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c  |  2 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c |  3 +-
>  include/linux/dma-resv.h |  4 +-
>  6 files changed, 31 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index a12a3a39f280..480c305554a1 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -611,57 +611,45 @@ EXPORT_SYMBOL(dma_resv_copy_fences);
>   * dma_resv_get_fences - Get an object's shared and exclusive
>   * fences without update side lock held
>   * @obj: the reservation object
> - * @fence_excl: the returned exclusive fence (or NULL)
> - * @shared_count: the number of shared fences returned
> - * @shared: the array of shared fence ptrs returned (array is krealloc'd to
> - * the required size, and must be freed by caller)
> - *
> - * Retrieve all fences from the reservation object. If the pointer for the
> - * exclusive fence is not specified the fence is put into the array of the
> - * shared fences as well. Returns either zero or -ENOMEM.
> + * @write: true if we should return all fences

I'm assuming that this will be properly documented later on in the series
...

> + * @num_fences: the number of fences returned
> + * @fences: the array of fence ptrs returned (array is krealloc'd to the
> + * required size, and must be freed by caller)
> + *
> + * Retrieve all fences from the reservation object.
> + * Returns either zero or -ENOMEM.
>   */
> -int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **fence_excl,
> - unsigned int *shared_count, struct dma_fence ***shared)
> +int dma_resv_get_fences(struct dma_resv *obj, bool write,
> + unsigned int *num_fences, struct dma_fence ***fences)
>  {
>   struct dma_resv_iter cursor;
>   struct dma_fence *fence;
>  
> - *shared_count = 0;
> - *shared = NULL;
> -
> - if (fence_excl)
> - *fence_excl = NULL;
> + *num_fences = 0;
> + *fences = NULL;
>  
> - dma_resv_iter_begin(, obj, true);
> + dma_resv_iter_begin(, obj, write);
>   dma_resv_for_each_fence_unlocked(, fence) {
>  
>   if (dma_resv_iter_is_restarted()) {
>   unsigned int count;
>  
> - while (*shared_count)
> - dma_fence_put((*shared)[--(*shared_count)]);
> + while (*num_fences)
> + dma_fence_put((*fences)[--(*num_fences)]);
>  
> - if (fence_excl)
> - dma_fence_put(*fence_excl);
> -
> - count = cursor.shared_count;
> - count += fence_excl ? 0 : 1;
> + count = cursor.shared_count + 1;
>  
>   /* Eventually re-allocate the array */
> - *shared = krealloc_array(*shared, count,
> + *fences = krealloc_array(*fences, count,
>sizeof(void *),
>GFP_KERNEL);
> - if (count && !*shared) {
> + if (count && !*fences) {
>   dma_resv_iter_end();
>   return -ENOMEM;
>   }
>   }
>  
> - dma_fence_get(fence);
> - if (dma_resv_iter_is_exclusive() && fence_excl)
> - *fence_excl = fence;
> - else
> - (*shared)[(*shared_count)++] = fence;
> + (*fences)[(*num_fences)++] = dma_fence_get(fence);
>   }
>   dma_resv_iter_end();
>  
> diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
> index bc32b3eedcb6..cbe999c6e7a6 100644
> --- a/drivers/dma-buf/st-dma-resv.c
> +++ b/drivers/dma-buf/st-dma-resv.c
> @@ -275,7 +275,7 @@ static int test_shared_for_each_unlocked(void *arg)
>  
>  static int test_get_fences(void *arg, bool shared)
>  {
> - struct dma_fence *f, *excl = NULL, **fences = NULL;
> + struct dma_fence *f, **fences = NULL;
>   struct dma_resv resv;
>   int r, i;
>  
> @@ -304,35 +304,19 @@ static int test_get_fences(void *arg, bool shared)
>   }
>   dma_resv_unlock();
>  
> - r = dma_resv_get_fences(, , , );
> + r = dma_resv_get_fences(, shared, , );
>   if (r) {
>   pr_err("get_fences failed\n");
>   goto err_free;
>   }
>  
> - if (shared) {
> - if (excl != NULL) {
> -  

Re: [PATCH 02/24] dma-buf: finally make the dma_resv_list private

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:49PM +0100, Christian König wrote:
> Drivers should never touch this directly.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c | 26 ++
>  include/linux/dma-resv.h   | 26 +-
>  2 files changed, 27 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index a688dbded3d3..a12a3a39f280 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -56,6 +56,19 @@
>  DEFINE_WD_CLASS(reservation_ww_class);
>  EXPORT_SYMBOL(reservation_ww_class);
>  
> +/**
> + * struct dma_resv_list - a list of shared fences
> + * @rcu: for internal use
> + * @shared_count: table of shared fences
> + * @shared_max: for growing shared fence table
> + * @shared: shared fence table
> + */

Imo drop the kerneldoc here and just make these comments before the right
member if you feel like keeping them. Imo it's obvious enough what's going
on that the comments aren't necessary, and we don't kerneldoc document
internals generally at all - only interfaces relevant by drivers and
things outside of a subsystem.

> +struct dma_resv_list {
> + struct rcu_head rcu;
> + u32 shared_count, shared_max;
> + struct dma_fence __rcu *shared[];
> +};
> +
>  /**
>   * dma_resv_list_alloc - allocate fence list
>   * @shared_max: number of fences we need space for
> @@ -133,6 +146,19 @@ void dma_resv_fini(struct dma_resv *obj)
>  }
>  EXPORT_SYMBOL(dma_resv_fini);
>  
> +/**
> + * dma_resv_shared_list - get the reservation object's shared fence list
> + * @obj: the reservation object
> + *
> + * Returns the shared fence list. Caller must either hold the objects
> + * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),
> + * or one of the variants of each
> + */

Same here. With that:

Reviewed-by: Daniel Vetter 

> +static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv 
> *obj)
> +{
> + return rcu_dereference_check(obj->fence, dma_resv_held(obj));
> +}
> +
>  /**
>   * dma_resv_reserve_shared - Reserve space to add shared fences to
>   * a dma_resv.
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index e0be34265eae..3baf2a4a9a0d 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -47,18 +47,7 @@
>  
>  extern struct ww_class reservation_ww_class;
>  
> -/**
> - * struct dma_resv_list - a list of shared fences
> - * @rcu: for internal use
> - * @shared_count: table of shared fences
> - * @shared_max: for growing shared fence table
> - * @shared: shared fence table
> - */
> -struct dma_resv_list {
> - struct rcu_head rcu;
> - u32 shared_count, shared_max;
> - struct dma_fence __rcu *shared[];
> -};
> +struct dma_resv_list;
>  
>  /**
>   * struct dma_resv - a reservation object manages fences for a buffer
> @@ -440,19 +429,6 @@ dma_resv_excl_fence(struct dma_resv *obj)
>   return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj));
>  }
>  
> -/**
> - * dma_resv_shared_list - get the reservation object's shared fence list
> - * @obj: the reservation object
> - *
> - * Returns the shared fence list. Caller must either hold the objects
> - * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),
> - * or one of the variants of each
> - */
> -static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv 
> *obj)
> -{
> - return rcu_dereference_check(obj->fence, dma_resv_held(obj));
> -}
> -
>  void dma_resv_init(struct dma_resv *obj);
>  void dma_resv_fini(struct dma_resv *obj);
>  int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 01/24] dma-buf: add dma_resv_replace_fences

2021-12-22 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 01:33:48PM +0100, Christian König wrote:
> This function allows to replace fences from the shared fence list when
> we can gurantee that the operation represented by the original fence has
> finished or no accesses to the resources protected by the dma_resv
> object any more when the new fence finishes.
> 
> Then use this function in the amdkfd code when BOs are unmapped from the
> process.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c| 43 
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 49 +++
>  include/linux/dma-resv.h  |  2 +
>  3 files changed, 52 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 4deea75c0b9c..a688dbded3d3 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -284,6 +284,49 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, 
> struct dma_fence *fence)
>  }
>  EXPORT_SYMBOL(dma_resv_add_shared_fence);
>  
> +/**
> + * dma_resv_replace_fences - replace fences in the dma_resv obj
> + * @obj: the reservation object
> + * @context: the context of the fences to replace
> + * @replacement: the new fence to use instead
> + *
> + * Replace fences with a specified context with a new fence. Only valid if 
> the
> + * operation represented by the original fences is completed or has no longer
> + * access to the resources protected by the dma_resv object when the new 
> fence
> + * completes.
> + */
> +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
> +  struct dma_fence *replacement)
> +{
> + struct dma_resv_list *list;
> + struct dma_fence *old;
> + unsigned int i;
> +
> + dma_resv_assert_held(obj);
> +
> + write_seqcount_begin(>seq);
> +
> + old = dma_resv_excl_fence(obj);
> + if (old->context == context) {
> + RCU_INIT_POINTER(obj->fence_excl, dma_fence_get(replacement));
> + dma_fence_put(old);
> + }
> +
> + list = dma_resv_shared_list(obj);
> + for (i = 0; list && i < list->shared_count; ++i) {
> + old = rcu_dereference_protected(list->shared[i],
> + dma_resv_held(obj));
> + if (old->context != context)
> + continue;
> +
> + rcu_assign_pointer(list->shared[i], dma_fence_get(replacement));
> + dma_fence_put(old);

Since the fences are all guaranteed to be from the same context, maybe we
should have a WARN_ON(__dma_fence_is_later()); here just to be safe?

With that added:

Reviewed-by: Daniel Vetter 

> + }
> +
> + write_seqcount_end(>seq);
> +}
> +EXPORT_SYMBOL(dma_resv_replace_fences);
> +
>  /**
>   * dma_resv_add_excl_fence - Add an exclusive fence.
>   * @obj: the reservation object
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 71acd577803e..b558ef0f8c4a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -236,53 +236,18 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
>  static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
>   struct amdgpu_amdkfd_fence *ef)
>  {
> - struct dma_resv *resv = bo->tbo.base.resv;
> - struct dma_resv_list *old, *new;
> - unsigned int i, j, k;
> + struct dma_fence *replacement;
>  
>   if (!ef)
>   return -EINVAL;
>  
> - old = dma_resv_shared_list(resv);
> - if (!old)
> - return 0;
> -
> - new = kmalloc(struct_size(new, shared, old->shared_max), GFP_KERNEL);
> - if (!new)
> - return -ENOMEM;
> -
> - /* Go through all the shared fences in the resevation object and sort
> -  * the interesting ones to the end of the list.
> + /* TODO: Instead of block before we should use the fence of the page
> +  * table update and TLB flush here directly.
>*/
> - for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
> - struct dma_fence *f;
> -
> - f = rcu_dereference_protected(old->shared[i],
> -   dma_resv_held(resv));
> -
> - if (f->context == ef->base.context)
> - RCU_INIT_POINTER(new->shared[--j], f);
> - else
> - RCU_INIT_POINTER(new->shared[k++], f);
> - }
> - new->shared_max = old->shared_max;
> - new->shared_count = k;
> -
> - /* Install the new fence list, seqcount provides the barriers */
> - write_seqcount_begin(>seq);
> - RCU_INIT_POINTER(resv->fence, new);
> - write_seqcount_end(>seq);
> -
> - /* Drop the references to the removed fences or move them to ef_list */
> - for (i = j; i < old->shared_count; ++i) {
> - 

Re: [PATCH] drm/ttm: fix compilation on ARCH=um

2021-12-22 Thread Daniel Vetter
On Mon, Dec 20, 2021 at 11:15:22AM +0100, Johannes Berg wrote:
> From: Johannes Berg 
> 
> Even if it's probably not really useful, it can get selected
> by e.g. randconfig builds, and then failing to compile is an
> annoyance. Unfortunately, it's hard to fix in Kconfig, since
> DRM_TTM is selected by many things that don't really depend
> on any specific architecture, and just depend on PCI (which
> is indeed now available in ARCH=um via simulation/emulation).
> 
> Fix this in the code instead by just ifdef'ing the relevant
> two lines that depend on "real X86".
> 
> Reported-by: Geert Uytterhoeven 
> Signed-off-by: Johannes Berg 

Probably the last thing before I disappear until 2022 :-)

Merged into drm-misc-fixes, thanks for your patch.
-Daniel

> ---
>  drivers/gpu/drm/ttm/ttm_module.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_module.c 
> b/drivers/gpu/drm/ttm/ttm_module.c
> index 0037eefe3239..a3ad7c9736ec 100644
> --- a/drivers/gpu/drm/ttm/ttm_module.c
> +++ b/drivers/gpu/drm/ttm/ttm_module.c
> @@ -68,9 +68,11 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching caching, 
> pgprot_t tmp)
>  #if defined(__i386__) || defined(__x86_64__)
>   if (caching == ttm_write_combined)
>   tmp = pgprot_writecombine(tmp);
> +#ifndef CONFIG_UML
>   else if (boot_cpu_data.x86 > 3)
>   tmp = pgprot_noncached(tmp);
> -#endif
> +#endif /* CONFIG_UML */
> +#endif /* __i386__ || __x86_64__ */
>  #if defined(__ia64__) || defined(__arm__) || defined(__aarch64__) || \
>   defined(__powerpc__) || defined(__mips__)
>   if (caching == ttm_write_combined)
> -- 
> 2.33.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-22 Thread Daniel Vetter
On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:
> 
> On 12/20/2021 4:29 AM, Daniel Vetter wrote:
> > On Fri, Dec 10, 2021 at 07:58:50AM +0100, Christian König wrote:
> > > Am 09.12.21 um 19:28 schrieb Felix Kuehling:
> > > > Am 2021-12-09 um 10:30 a.m. schrieb Christian König:
> > > > > That still won't work.
> > > > > 
> > > > > But I think we could do this change for the amdgpu mmap callback only.
> > > > If graphics user mode has problems with it, we could even make this
> > > > specific to KFD BOs in the amdgpu_gem_object_mmap callback.
> > > I think it's fine for the whole amdgpu stack, my concern is more about
> > > radeon, nouveau and the ARM stacks which are using this as well.
> > > 
> > > That blew up so nicely the last time we tried to change it and I know of 
> > > at
> > > least one case where radeon was/is used with BOs in a child process.
> > I'm way late and burried again, but I think it'd be good to be consistent
> > here across drivers. Or at least across drm drivers. And we've had the vma
> > open/close refcounting to make fork work since forever.
> > 
> > I think if we do this we should really only do this for mmap() where this
> > applies, but reading through the thread here I'm honestly confused why
> > this is a problem. If CRIU can't handle forked mmaps it needs to be
> > thought that, not hacked around. Or at least I'm not understanding why
> > this shouldn't work ...
> > -Daniel
> > 
> 
> Hi Daniel
> 
> In the v2
> https://lore.kernel.org/all/a1a865f5-ad2c-29c8-cbe4-2635d53ec...@amd.com/T/
> I pretty much limited the scope of the change to KFD BOs on mmap. Regarding
> CRIU, I think its not a CRIU problem as CRIU on restore, only tries to
> recreate all the child processes and then mmaps all the VMAs it sees (as per
> checkpoint snapshot) in the new process address space after the VMA
> placements are finalized in the position independent code phase. Since the
> inherited VMAs don't have access rights the criu mmap fails.

Still sounds funky. I think minimally we should have an ack from CRIU
developers that this is officially the right way to solve this problem. I
really don't want to have random one-off hacks that don't work across the
board, for a problem where we (drm subsystem) really shouldn't be the only
one with this problem. Where "this problem" means that the mmap space is
per file description, and not per underlying inode or real device or
whatever. That part sounds like a CRIU problem, and I expect CRIU folks
want a consistent solution across the board for this. Hence please grab an
ack from them.

Cheers, Daniel

> 
> Regards,
> 
> Rajneesh
> 
> > > Regards,
> > > Christian.
> > > 
> > > > Regards,
> > > >     Felix
> > > > 
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > Am 09.12.21 um 16:29 schrieb Bhardwaj, Rajneesh:
> > > > > > Sounds good. I will send a v2 with only ttm_bo_mmap_obj change. 
> > > > > > Thank
> > > > > > you!
> > > > > > 
> > > > > > On 12/9/2021 10:27 AM, Christian König wrote:
> > > > > > > Hi Rajneesh,
> > > > > > > 
> > > > > > > yes, separating this from the drm_gem_mmap_obj() change is 
> > > > > > > certainly
> > > > > > > a good idea.
> > > > > > > 
> > > > > > > > The child cannot access the BOs mapped by the parent anyway with
> > > > > > > > access restrictions applied
> > > > > > > exactly that is not correct. That behavior is actively used by 
> > > > > > > some
> > > > > > > userspace stacks as far as I know.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > Am 09.12.21 um 16:23 schrieb Bhardwaj, Rajneesh:
> > > > > > > > Thanks Christian. Would it make it less intrusive if I just use 
> > > > > > > > the
> > > > > > > > flag for ttm bo mmap and remove the drm_gem_mmap_obj change from
> > > > > > > > this patch? For our use case, just the ttm_bo_mmap_obj change
> > > > > > > > should suffice and we don't want to put any more work arounds in
> > > > > > > > the user space (thunk, in our case).
> > > > > > > > 
> > > > > > > > The child cannot access the BOs mapped by the parent anyway with
> > > > > > > > access restrictions applied so I wonder why even inherit the 
> > > > > > > > vma?
> > > > > > > > 
> > > > > > > > On 12/9/2021 2:54 AM, Christian König wrote:
> > > > > > > > > Am 08.12.21 um 21:53 schrieb Rajneesh Bhardwaj:
> > > > > > > > > > When an application having open file access to a node 
> > > > > > > > > > forks, its
> > > > > > > > > > shared
> > > > > > > > > > mappings also get reflected in the address space of child 
> > > > > > > > > > process
> > > > > > > > > > even
> > > > > > > > > > though it cannot access them with the object permissions 
> > > > > > > > > > applied.
> > > > > > > > > > With the
> > > > > > > > > > existing permission checks on the gem objects, it might be
> > > > > > > > > > reasonable to
> > > > > > > > > > also create the VMAs with VM_DONTCOPY flag so a user space
> > > > > > > > > > application
> > > > > > 

Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts

2021-12-22 Thread Matthew Brost
On Wed, Dec 22, 2021 at 04:25:13PM +, Tvrtko Ursulin wrote:
> 
> Ping?
>

Missed this.

This was merged before your comments landed on the list.
 
> Main two points being:
> 
> 1) Commit message seems in contradiction with the change in
> guc_flush_destroyed_contexts. And the lock drop to immediately re-acquire it
> looks questionable to start with.
> 
> 2) And in deregister_destroyed_contexts and in 1) I was therefore asking if
> you can unlink all at once and process with reduced hammering on the lock.
> 

Probably can address both concerns by using a llist, right?

Be on the look out for this rework patch over the next week or so.

Matt

> Regards,
> 
> Tvrtko
> 
> On 17/12/2021 11:14, Tvrtko Ursulin wrote:
> > 
> > On 17/12/2021 11:06, Tvrtko Ursulin wrote:
> > > On 14/12/2021 17:04, Matthew Brost wrote:
> > > > From: John Harrison 
> > > > 
> > > > While attempting to debug a CT deadlock issue in various CI failures
> > > > (most easily reproduced with gem_ctx_create/basic-files), I was seeing
> > > > CPU deadlock errors being reported. This were because the context
> > > > destroy loop was blocking waiting on H2G space from inside an IRQ
> > > > spinlock. There no was deadlock as such, it's just that the H2G queue
> > > > was full of context destroy commands and GuC was taking a long time to
> > > > process them. However, the kernel was seeing the large amount of time
> > > > spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
> > > > then happen (heartbeat failures, CT deadlock errors, outstanding H2G
> > > > WARNs, etc.).
> > > > 
> > > > Re-working the loop to only acquire the spinlock around the list
> > > > management (which is all it is meant to protect) rather than the
> > > > entire destroy operation seems to fix all the above issues.
> > > > 
> > > > v2:
> > > >   (John Harrison)
> > > >    - Fix typo in comment message
> > > > 
> > > > Signed-off-by: John Harrison 
> > > > Signed-off-by: Matthew Brost 
> > > > Reviewed-by: Matthew Brost 
> > > > ---
> > > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ---
> > > >   1 file changed, 28 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 36c2965db49b..96fcf869e3ff 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -2644,7 +2644,6 @@ static inline void
> > > > guc_lrc_desc_unpin(struct intel_context *ce)
> > > >   unsigned long flags;
> > > >   bool disabled;
> > > > -    lockdep_assert_held(>submission_state.lock);
> > > >   GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
> > > >   GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
> > > >   GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
> > > > @@ -2660,7 +2659,7 @@ static inline void
> > > > guc_lrc_desc_unpin(struct intel_context *ce)
> > > >   }
> > > >   spin_unlock_irqrestore(>guc_state.lock, flags);
> > > >   if (unlikely(disabled)) {
> > > > -    __release_guc_id(guc, ce);
> > > > +    release_guc_id(guc, ce);
> > > >   __guc_context_destroy(ce);
> > > >   return;
> > > >   }
> > > > @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct
> > > > intel_context *ce)
> > > >   static void guc_flush_destroyed_contexts(struct intel_guc *guc)
> > > >   {
> > > > -    struct intel_context *ce, *cn;
> > > > +    struct intel_context *ce;
> > > >   unsigned long flags;
> > > >   GEM_BUG_ON(!submission_disabled(guc) &&
> > > >  guc_submission_initialized(guc));
> > > > -    spin_lock_irqsave(>submission_state.lock, flags);
> > > > -    list_for_each_entry_safe(ce, cn,
> > > > - >submission_state.destroyed_contexts,
> > > > - destroyed_link) {
> > > > -    list_del_init(>destroyed_link);
> > > > -    __release_guc_id(guc, ce);
> > > > +    while (!list_empty(>submission_state.destroyed_contexts)) {
> > > 
> > > Are lockless false negatives a concern here - I mean this thread not
> > > seeing something just got added to the list?
> > > 
> > > > +    spin_lock_irqsave(>submission_state.lock, flags);
> > > > +    ce =
> > > > list_first_entry_or_null(>submission_state.destroyed_contexts,
> > > > +  struct intel_context,
> > > > +  destroyed_link);
> > > > +    if (ce)
> > > > +    list_del_init(>destroyed_link);
> > > > +    spin_unlock_irqrestore(>submission_state.lock, flags);
> > > > +
> > > > +    if (!ce)
> > > > +    break;
> > > > +
> > > > +    release_guc_id(guc, ce);
> > > 
> > > This looks suboptimal and in conflict with this part of the commit
> > > message:
> > > 
> > > """
> > >   Re-working the loop to only acquire the spinlock around the list
> > >   management (which is all it is meant to protect) 

[Bug 201957] amdgpu: ring gfx timeout

2021-12-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=201957

roman (cool...@gmx.at) changed:

   What|Removed |Added

 CC||cool...@gmx.at

--- Comment #52 from roman (cool...@gmx.at) ---
I can confirm that 
amdgpu.dpm=0 
removes the issue 
on an AMD Radeon PRO FIJI (Dual Fury) kernel: 5.15.10|FW:
20211027.1d00989-1|mesa: 21.3.2-1

Works perfectly fine in Gnome as long as there is no application accessing the
2nd GPU. 

When opening Radeon-profile as long as card0 is selected, there is no issue but
as soon as I select card1 I get instantly 
Dec 22 21:15:46 Workstation kernel: amdgpu: 
 failed to send message 171 ret is 0 
Dec 22 21:15:49 Workstation kernel: amdgpu: 
 last message was failed ret is 0

The application Radeon-profile freezes but desktop is still responsive. 



When opening CS:GO with mangohud and configuring either

pci_dev = :3d:00.0 # primary card works fine
or 
pci_dev = :3e:00.0 # secondary card, errors from above occur and CS:GO
loads super slow and after menu is visible it is stuck 

When CSM is disabled in BIOS I have 2 GPUs 

Dec 22 20:45:50 Workstation kernel: [drm] amdgpu kernel modesetting enabled.
Dec 22 20:45:50 Workstation kernel: amdgpu: CRAT table not found
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for CPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add CPU node
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: vgaarb: deactivate vga
console
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: enabling device (0106
-> 0107)
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 22 20:45:50 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801MS-102
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: amdgpu: VRAM: 4096M
0x00F4 - 0x00F4 (4096M used)
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: amdgpu: GART: 1024M
0x00FF - 0x00FF3FFF
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:50 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 22 20:45:50 Workstation kernel: snd_hda_intel :3d:00.1: bound
:3d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dec 22 20:45:50 Workstation kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR*
retrieve_link_cap: Read receiver caps dpcd data failed.
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:50 Workstation kernel: amdgpu :3d:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:50 Workstation kernel: fbcon: amdgpu (fb0) is primary device
Dec 22 20:45:51 Workstation kernel: amdgpu :3d:00.0: [drm] fb0: amdgpu
frame buffer device
Dec 22 20:45:51 Workstation kernel: amdgpu :3d:00.0: amdgpu: Using BACO for
runtime pm
Dec 22 20:45:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for :3d:00.0 on minor 0
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: enabling device (0106
-> 0107)
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 22 20:45:51 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801SL-102
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: amdgpu: VRAM: 4096M
0x00F4 - 0x00F4 (4096M used)
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: amdgpu: GART: 1024M
0x00FF - 0x00FF3FFF
Dec 22 20:45:51 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:51 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:51 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 22 20:45:51 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 22 20:45:51 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:51 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 22 20:45:51 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:51 Workstation kernel: amdgpu :3e:00.0: amdgpu: Using BACO for
runtime pm
Dec 22 20:45:51 Workstation kernel: 

Re: [PATCH 08/22] dt-bindings: display: rockchip: dw-hdmi: use "ref" as clock name

2021-12-22 Thread Rob Herring
On Wed, Dec 22, 2021 at 3:40 PM Heiko Stübner  wrote:
>
> Am Mittwoch, 22. Dezember 2021, 14:52:51 CET schrieb Rob Herring:
> > On Wed, Dec 22, 2021 at 6:47 AM Sascha Hauer  wrote:
> > >
> > > On Tue, Dec 21, 2021 at 10:31:23AM -0400, Rob Herring wrote:
> > > > On Mon, Dec 20, 2021 at 12:06:16PM +0100, Sascha Hauer wrote:
> > > > > "vpll" is a misnomer. A clock input to a device should be named after
> > > > > the usage in the device, not after the clock that drives it. On the
> > > > > rk3568 the same clock is driven by the HPLL.
> > > > > To fix that, this patch renames the vpll clock to ref clock.
> > > >
> > > > The problem with this series is it breaks an old kernel with new dt. You
> > > > can partially mitigate that with stable kernel backport, but IMO keeping
> > > > the old name is not a burden to maintain.
> > >
> > > As suggested I only removed vpll from the binding document, but not from
> > > the code. The code still handles the old binding as well.
> >
> > The problem is updating rk3399.dtsi. That change won't work with old
> > kernels because they won't look for 'ref'. Since you shouldn't change
> > it, the binding needs to cover both the old and new cases.
>
> is "newer dt with old kernel" really a case these days?

I've had complaints about it. In particular from SUSE folks that were
shipping new dtbs with old (stable) kernels.

> I do understand the new kernel old dt case - for example with the
> dtb being provided by firmware.

Yes, so update your firmware that contains a newer dtb and then you
stop booting or a device stops working.

> But which user would get the idea of updating only the devicetree
> while staying with an older kernel?

Any synchronization between firmware and OS updates is a problem.

Rob


Re: [PATCH 1/2] drm/tegra: dpaux: Populate AUX bus

2021-12-22 Thread Dmitry Osipenko
20.12.2021 13:48, Thierry Reding пишет:
> From: Thierry Reding 
> 
> The DPAUX hardware block exposes an DP AUX interface that provides
> access to an AUX bus and the devices on that bus. Use the DP AUX bus
> infrastructure that was recently introduced to probe devices on this
> bus from DT.
> 
> Signed-off-by: Thierry Reding 
> ---
>  drivers/gpu/drm/tegra/Kconfig | 1 +
>  drivers/gpu/drm/tegra/dpaux.c | 7 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
> index 8cf5aeb9db6c..201f5175ecfe 100644
> --- a/drivers/gpu/drm/tegra/Kconfig
> +++ b/drivers/gpu/drm/tegra/Kconfig
> @@ -5,6 +5,7 @@ config DRM_TEGRA
>   depends on COMMON_CLK
>   depends on DRM
>   depends on OF
> + select DRM_DP_AUX_BUS
>   select DRM_KMS_HELPER
>   select DRM_MIPI_DSI
>   select DRM_PANEL
> diff --git a/drivers/gpu/drm/tegra/dpaux.c b/drivers/gpu/drm/tegra/dpaux.c
> index 1f96e416fa08..9da1edcdc835 100644
> --- a/drivers/gpu/drm/tegra/dpaux.c
> +++ b/drivers/gpu/drm/tegra/dpaux.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  
> @@ -570,6 +571,12 @@ static int tegra_dpaux_probe(struct platform_device 
> *pdev)
>   list_add_tail(>list, _list);
>   mutex_unlock(_lock);
>  
> + err = devm_of_dp_aux_populate_ep_devices(>aux);
> + if (err < 0) {
> + dev_err(dpaux->dev, "failed to populate AUX bus: %d\n", err);
> + return err;
> + }
> +
>   return 0;
>  }

Needs stable tag for 5.15+.



Re: [PATCH 08/22] dt-bindings: display: rockchip: dw-hdmi: use "ref" as clock name

2021-12-22 Thread Nicolas Frattaroli
On Mittwoch, 22. Dezember 2021 20:39:58 CET Heiko Stübner wrote:
> Am Mittwoch, 22. Dezember 2021, 14:52:51 CET schrieb Rob Herring:
> > On Wed, Dec 22, 2021 at 6:47 AM Sascha Hauer  wrote:
> > >
> > > On Tue, Dec 21, 2021 at 10:31:23AM -0400, Rob Herring wrote:
> > > > On Mon, Dec 20, 2021 at 12:06:16PM +0100, Sascha Hauer wrote:
> > > > > "vpll" is a misnomer. A clock input to a device should be named after
> > > > > the usage in the device, not after the clock that drives it. On the
> > > > > rk3568 the same clock is driven by the HPLL.
> > > > > To fix that, this patch renames the vpll clock to ref clock.
> > > >
> > > > The problem with this series is it breaks an old kernel with new dt. You
> > > > can partially mitigate that with stable kernel backport, but IMO keeping
> > > > the old name is not a burden to maintain.
> > >
> > > As suggested I only removed vpll from the binding document, but not from
> > > the code. The code still handles the old binding as well.
> > 
> > The problem is updating rk3399.dtsi. That change won't work with old
> > kernels because they won't look for 'ref'. Since you shouldn't change
> > it, the binding needs to cover both the old and new cases.
> 
> is "newer dt with old kernel" really a case these days?
> 
> I do understand the new kernel old dt case - for example with the
> dtb being provided by firmware.
> 
> But which user would get the idea of updating only the devicetree
> while staying with an older kernel?
> 

Side-by-side installations of LTS kernels with new kernels. LTS kernel
uses same DT as new kernel because distribution set it up this way.

Other scenario: user wants to modify their device tree. They download
the latest kernel sources from kernel.org because they can't use over-
lays and they don't want to fiddle with decompiled device trees.




Re: [Intel-gfx] [PATCH] drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.

2021-12-22 Thread Intel



On 12/22/21 16:56, Maarten Lankhorst wrote:

Convert free_work into delayed_work, similar to ttm to allow converting the
blocking lock in __i915_gem_free_objects to a trylock.

Unlike ttm, the object should already be idle, as it's kept alive
by a reference through struct i915_vma->active, which is dropped
after all vma's are idle.

Because of this, we can use a no wait by default, or when the lock
is contested, we use ttm's 10 ms.

The trylock should only fail when the object is sharing it's resv with
other objects, and typically objects are not kept locked for a long
time, so we can safely retry on failure.

Fixes: be7612fd6665 ("drm/i915: Require object lock when freeing pages during 
destruction")
Testcase: igt/gem_exec_alignment/pi*
Signed-off-by: Maarten Lankhorst 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.c | 14 ++
  drivers/gpu/drm/i915/i915_drv.h|  4 ++--
  2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 39cd563544a5..d87b508b59b1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -331,7 +331,13 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
continue;
}
  
-		i915_gem_object_lock(obj, NULL);

+   if (!i915_gem_object_trylock(obj, NULL)) {
+   /* busy, toss it back to the pile */
+   if (llist_add(>freed, >mm.free_list))
+   queue_delayed_work(i915->wq, 
>mm.free_work, msecs_to_jiffies(10));


i915->wq is ordered. From what I can tell, with queue_delayed_work(), 
the work doesn't get inserted into the queue order until the delay 
expires, right? So we don't unnecessarily hold up other objects getting 
freed?



+   continue;
+   }
+
__i915_gem_object_pages_fini(obj);
i915_gem_object_unlock(obj);
__i915_gem_free_object(obj);
@@ -353,7 +359,7 @@ void i915_gem_flush_free_objects(struct drm_i915_private 
*i915)
  static void __i915_gem_free_work(struct work_struct *work)
  {
struct drm_i915_private *i915 =
-   container_of(work, struct drm_i915_private, mm.free_work);
+   container_of(work, struct drm_i915_private, mm.free_work.work);
  
  	i915_gem_flush_free_objects(i915);

  }
@@ -385,7 +391,7 @@ static void i915_gem_free_object(struct drm_gem_object 
*gem_obj)
 */
  
  	if (llist_add(>freed, >mm.free_list))

-   queue_work(i915->wq, >mm.free_work);
+   queue_delayed_work(i915->wq, >mm.free_work, 0);
  }
  
  void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,

@@ -710,7 +716,7 @@ bool i915_gem_object_placement_possible(struct 
drm_i915_gem_object *obj,
  
  void i915_gem_init__objects(struct drm_i915_private *i915)

  {
-   INIT_WORK(>mm.free_work, __i915_gem_free_work);
+   INIT_DELAYED_WORK(>mm.free_work, __i915_gem_free_work);
  }
  
  void i915_objects_module_exit(void)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c8fddb7e61c9..beeb42a14aae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -465,7 +465,7 @@ struct i915_gem_mm {
 * List of objects which are pending destruction.
 */
struct llist_head free_list;
-   struct work_struct free_work;
+   struct delayed_work free_work;
/**
 * Count of objects pending destructions. Used to skip needlessly
 * waiting on an RCU barrier if no objects are waiting to be freed.
@@ -1625,7 +1625,7 @@ static inline void i915_gem_drain_freed_objects(struct 
drm_i915_private *i915)
 * armed the work again.
 */
while (atomic_read(>mm.free_count)) {
-   flush_work(>mm.free_work);
+   flush_delayed_work(>mm.free_work);
flush_delayed_work(>bdev.wq);
rcu_barrier();
}


Otherwise LGTM.

Reviewed-by: Thomas Hellström 






Re: [PATCH 08/22] dt-bindings: display: rockchip: dw-hdmi: use "ref" as clock name

2021-12-22 Thread Heiko Stübner
Am Mittwoch, 22. Dezember 2021, 14:52:51 CET schrieb Rob Herring:
> On Wed, Dec 22, 2021 at 6:47 AM Sascha Hauer  wrote:
> >
> > On Tue, Dec 21, 2021 at 10:31:23AM -0400, Rob Herring wrote:
> > > On Mon, Dec 20, 2021 at 12:06:16PM +0100, Sascha Hauer wrote:
> > > > "vpll" is a misnomer. A clock input to a device should be named after
> > > > the usage in the device, not after the clock that drives it. On the
> > > > rk3568 the same clock is driven by the HPLL.
> > > > To fix that, this patch renames the vpll clock to ref clock.
> > >
> > > The problem with this series is it breaks an old kernel with new dt. You
> > > can partially mitigate that with stable kernel backport, but IMO keeping
> > > the old name is not a burden to maintain.
> >
> > As suggested I only removed vpll from the binding document, but not from
> > the code. The code still handles the old binding as well.
> 
> The problem is updating rk3399.dtsi. That change won't work with old
> kernels because they won't look for 'ref'. Since you shouldn't change
> it, the binding needs to cover both the old and new cases.

is "newer dt with old kernel" really a case these days?

I do understand the new kernel old dt case - for example with the
dtb being provided by firmware.

But which user would get the idea of updating only the devicetree
while staying with an older kernel?





[PATCH] drm/msm/dp: Simplify dp_debug_init() and dp_debug_get()

2021-12-22 Thread Christophe JAILLET
dp_debug_init() always returns 0. So, make it a void function and simplify
the only caller accordingly.

While at it remove a useless 'rc' initialization in dp_debug_get()

Signed-off-by: Christophe JAILLET 
---
 drivers/gpu/drm/msm/dp/dp_debug.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_debug.c 
b/drivers/gpu/drm/msm/dp/dp_debug.c
index da4323556ef3..338f1f9c4d14 100644
--- a/drivers/gpu/drm/msm/dp/dp_debug.c
+++ b/drivers/gpu/drm/msm/dp/dp_debug.c
@@ -207,9 +207,8 @@ static const struct file_operations test_active_fops = {
.write = dp_test_active_write
 };
 
-static int dp_debug_init(struct dp_debug *dp_debug, struct drm_minor *minor)
+static void dp_debug_init(struct dp_debug *dp_debug, struct drm_minor *minor)
 {
-   int rc = 0;
struct dp_debug_private *debug = container_of(dp_debug,
struct dp_debug_private, dp_debug);
 
@@ -229,17 +228,15 @@ static int dp_debug_init(struct dp_debug *dp_debug, 
struct drm_minor *minor)
debug, _test_type_fops);
 
debug->root = minor->debugfs_root;
-
-   return rc;
 }
 
 struct dp_debug *dp_debug_get(struct device *dev, struct dp_panel *panel,
struct dp_usbpd *usbpd, struct dp_link *link,
struct drm_connector *connector, struct drm_minor *minor)
 {
-   int rc = 0;
struct dp_debug_private *debug;
struct dp_debug *dp_debug;
+   int rc;
 
if (!dev || !panel || !usbpd || !link) {
DRM_ERROR("invalid input\n");
@@ -266,11 +263,7 @@ struct dp_debug *dp_debug_get(struct device *dev, struct 
dp_panel *panel,
dp_debug->hdisplay = 0;
dp_debug->vrefresh = 0;
 
-   rc = dp_debug_init(dp_debug, minor);
-   if (rc) {
-   devm_kfree(dev, debug);
-   goto error;
-   }
+   dp_debug_init(dp_debug, minor);
 
return dp_debug;
  error:
-- 
2.32.0



Re: [PATCH v16 08/40] gpu: host1x: Add initial runtime PM and OPP support

2021-12-22 Thread Dmitry Osipenko
22.12.2021 22:30, Jon Hunter пишет:
> 
> On 22/12/2021 19:01, Dmitry Osipenko wrote:
> 
> ...
> 
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
>> index e08e331e46ae..8194826c9ce3 100644
>> --- a/drivers/gpu/host1x/syncpt.c
>> +++ b/drivers/gpu/host1x/syncpt.c
>> @@ -137,6 +137,15 @@ void host1x_syncpt_restore(struct host1x *host)
>>   struct host1x_syncpt *sp_base = host->syncpt;
>>   unsigned int i;
>>
>> +    for (i = 0; i < host->info->nb_pts; i++) {
>> +    /*
>> + * Unassign syncpt from channels for purposes of Tegra186
>> + * syncpoint protection. This prevents any channel from
>> + * accessing it until it is reassigned.
>> + */
>> +    host1x_hw_syncpt_assign_to_channel(host, sp_base + i, NULL);
>> +    }
>> +
>>   for (i = 0; i < host1x_syncpt_nb_pts(host); i++)
>>   host1x_hw_syncpt_restore(host, sp_base + i);
>>
>> @@ -352,13 +361,6 @@ int host1x_syncpt_init(struct host1x *host)
>>   for (i = 0; i < host->info->nb_pts; i++) {
>>   syncpt[i].id = i;
>>   syncpt[i].host = host;
>> -
>> -    /*
>> - * Unassign syncpt from channels for purposes of Tegra186
>> - * syncpoint protection. This prevents any channel from
>> - * accessing it until it is reassigned.
>> - */
>> -    host1x_hw_syncpt_assign_to_channel(host, [i], NULL);
>>   }
>>
>>   for (i = 0; i < host->info->nb_bases; i++)
>>
> 
> 
> Thanks! This fixed it!

I'll prepare proper patch with yours t-b, thank you.



Re: [PATCH v16 08/40] gpu: host1x: Add initial runtime PM and OPP support

2021-12-22 Thread Jon Hunter



On 22/12/2021 19:01, Dmitry Osipenko wrote:

...


diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index e08e331e46ae..8194826c9ce3 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -137,6 +137,15 @@ void host1x_syncpt_restore(struct host1x *host)
struct host1x_syncpt *sp_base = host->syncpt;
unsigned int i;

+   for (i = 0; i < host->info->nb_pts; i++) {
+   /*
+* Unassign syncpt from channels for purposes of Tegra186
+* syncpoint protection. This prevents any channel from
+* accessing it until it is reassigned.
+*/
+   host1x_hw_syncpt_assign_to_channel(host, sp_base + i, NULL);
+   }
+
for (i = 0; i < host1x_syncpt_nb_pts(host); i++)
host1x_hw_syncpt_restore(host, sp_base + i);

@@ -352,13 +361,6 @@ int host1x_syncpt_init(struct host1x *host)
for (i = 0; i < host->info->nb_pts; i++) {
syncpt[i].id = i;
syncpt[i].host = host;
-
-   /*
-* Unassign syncpt from channels for purposes of Tegra186
-* syncpoint protection. This prevents any channel from
-* accessing it until it is reassigned.
-*/
-   host1x_hw_syncpt_assign_to_channel(host, [i], NULL);
}

for (i = 0; i < host->info->nb_bases; i++)




Thanks! This fixed it!

Jon

--
nvpublic


Re: [PATCH 2/2] ARM: tegra: Move panels to AUX bus

2021-12-22 Thread Dmitry Osipenko
20.12.2021 13:48, Thierry Reding пишет:
> From: Thierry Reding 
> 
> Move the eDP panel on Venice 2 and Nyan boards into the corresponding
> AUX bus device tree node. This allows us to avoid a nasty circular
> dependency that would otherwise be created between the DPAUX and panel
> nodes via the DDC/I2C phandle.
> 
> Signed-off-by: Thierry Reding 
> ---
>  arch/arm/boot/dts/tegra124-nyan-big.dts   | 15 +--
>  arch/arm/boot/dts/tegra124-nyan-blaze.dts | 15 +--
>  arch/arm/boot/dts/tegra124-venice2.dts| 14 +++---
>  3 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/tegra124-nyan-big.dts 
> b/arch/arm/boot/dts/tegra124-nyan-big.dts
> index 1d2aac2cb6d0..fdc1d64dfff9 100644
> --- a/arch/arm/boot/dts/tegra124-nyan-big.dts
> +++ b/arch/arm/boot/dts/tegra124-nyan-big.dts
> @@ -13,12 +13,15 @@ / {
>"google,nyan-big-rev1", "google,nyan-big-rev0",
>"google,nyan-big", "google,nyan", "nvidia,tegra124";
>  
> - panel: panel {
> - compatible = "auo,b133xtn01";
> -
> - power-supply = <_3v3_panel>;
> - backlight = <>;
> - ddc-i2c-bus = <>;
> + host1x@5000 {
> + dpaux@545c {
> + aux-bus {
> + panel: panel {
> + compatible = "auo,b133xtn01";
> + backlight = <>;
> + };
> + };
> + };
>   };
>  
>   mmc@700b0400 { /* SD Card on this bus */
> diff --git a/arch/arm/boot/dts/tegra124-nyan-blaze.dts 
> b/arch/arm/boot/dts/tegra124-nyan-blaze.dts
> index 677babde6460..abdf4456826f 100644
> --- a/arch/arm/boot/dts/tegra124-nyan-blaze.dts
> +++ b/arch/arm/boot/dts/tegra124-nyan-blaze.dts
> @@ -15,12 +15,15 @@ / {
>"google,nyan-blaze-rev0", "google,nyan-blaze",
>"google,nyan", "nvidia,tegra124";
>  
> - panel: panel {
> - compatible = "samsung,ltn140at29-301";
> -
> - power-supply = <_3v3_panel>;
> - backlight = <>;
> - ddc-i2c-bus = <>;
> + host1x@5000 {
> + dpaux@545c {
> + aux-bus {
> + panel: panel {
> + compatible = "samsung,ltn140at29-301";
> + backlight = <>;
> + };
> + };
> + };
>   };
>  
>   sound {
> diff --git a/arch/arm/boot/dts/tegra124-venice2.dts 
> b/arch/arm/boot/dts/tegra124-venice2.dts
> index 232c90604df9..6a9592ceb5f2 100644
> --- a/arch/arm/boot/dts/tegra124-venice2.dts
> +++ b/arch/arm/boot/dts/tegra124-venice2.dts
> @@ -48,6 +48,13 @@ sor@5454 {
>   dpaux@545c {
>   vdd-supply = <_3v3_panel>;
>   status = "okay";
> +
> + aux-bus {
> + panel: panel {
> + compatible = "lg,lp129qe";
> + backlight = <>;
> + };
> + };
>   };
>   };
>  
> @@ -1080,13 +1087,6 @@ power {
>   };
>   };
>  
> - panel: panel {
> - compatible = "lg,lp129qe";
> - power-supply = <_3v3_panel>;
> - backlight = <>;
> - ddc-i2c-bus = <>;
> - };
> -
>   vdd_mux: regulator-mux {
>   compatible = "regulator-fixed";
>   regulator-name = "+VDD_MUX";
> 

You should add stable tag for 5.15 and also add separate patch to update
the new arch/arm/boot/dts/tegra124-nyan-big-fhd.dts which we have in
-next now.


Re: [PATCH 0/2] drm/tegra: Fix panel support on Venice 2 and Nyan

2021-12-22 Thread Dmitry Osipenko
22.12.2021 14:53, Thierry Reding пишет:
> On Wed, Dec 22, 2021 at 06:01:26AM +0300, Dmitry Osipenko wrote:
>> 21.12.2021 21:01, Thierry Reding пишет:
>>> On Tue, Dec 21, 2021 at 07:45:31PM +0300, Dmitry Osipenko wrote:
 21.12.2021 19:17, Thierry Reding пишет:
> On Tue, Dec 21, 2021 at 06:47:31PM +0300, Dmitry Osipenko wrote:
>> 21.12.2021 13:58, Thierry Reding пишет:
>> ..
>> The panel->ddc isn't used by the new panel-edp driver unless panel is
>> compatible with "edp-panel". Hence the generic_edp_panel_probe() 
>> should
>> either fail or crash for a such "edp-panel" since panel->ddc isn't 
>> fully
>> instantiated, AFAICS.
>
> I've tested this and it works fine on Venice 2. Since that was the
> reference design for Nyan, I suspect that Nyan's will also work.
>
> It'd be great if Thomas or anyone else with access to a Nyan could
> test this to verify that.

 There is no panel-edp driver in the v5.15. The EOL of v5.15 is Oct,
 2023, hence we need to either use:
>>>
>>> All the (at least relevant) functionality that is in panel-edp was in
>>> panel-simple before it was moved to panel-edp. I've backported this set
>>> of patches to v5.15 and it works just fine there.
>>
>> Will we be able to add patch to bypass the panel's DT ddc-i2c-bus on
>> Nyan to keep the older DTBs working?
>
> I don't see why we would want to do that. It's quite clear that the DTB
> is buggy in this case and we have a more accurate way to describe what's
> really there in hardware. In addition that more accurate representation
> also gets rid of a bug. Obviously because the bug is caused by the
> previous representation that was not accurate.
>
> Given that we can easily replace the DTBs on these devices there's no
> reason to make this any more complicated than it has to be.

 Don't you care about normal people at all? Do you assume that everyone
 must to be a kernel developer to be able to use Tegra devices? :/
>>>
>>> If you know how to install a custom kernel you also know how to replace
>>> the DTB on these devices.
>>>
>>> For everyone else, once these patches are merged upstream and
>>> distributions start shipping the new version, they will get this
>>> automatically by updating their kernel package since most distributions
>>> actually ship the DTB files as part of that.
>>>
 It's not a problem for you to figure out why display is broken, for
 other people it's a problem. Usually nobody will update DTB without a
 well known reason, instead device will be dusted on a shelf. In the end
 you won't have any users at all.
>>>
>>> Most "normal" people aren't even going to notice that their DTB is going
>>> to be updated. They would actually have to do extra work *not* to update
>>> it.
>>
>> My past experience tells that your assumption is incorrect. There are
>> quite a lot of people who will update kernel, but not DTB.
> 
> People that do this will have to do it manually because most
> distributions I know of will actually ship the DTBs. If they know how to
> update the kernel separately, I'm sure they will manage to update the
> DTB as well. It's really not more complicated that updating the kernel
> image.
> 
>> ARM devices have endless variations of bootloaders and individual quirks
>> required for a successful installation of a kernel. Kernel update by
>> distro usually isn't a thing on ARM.
> 
> I'm not sure what distribution you have been using, but the ones that
> I'm familiar with all install the DTBs along with the kernel. Most Tegra
> devices (newer ones at least) do also support booting with U-Boot which
> supports standard ways to boot a system (which were co-developed with
> distributions precisely so that it would become easier for users to keep
> their systems up-to-date), so there's really nothing magical anyone
> should need to do in order to get an updated DTB along with the updated
> kernel.
> 
> It's a simple fact that sometimes a DTB contains a bug and we have to
> fix it.
> 
> In general we try to fix things up in the driver code when reasonable so
> that people don't have to update the DTB. This is for the (mostly hypo-
> thetical) case where updating the DTB is not possible or very
> complicated.
> 
> However, that's not the case on the Venice 2 or Nyan boards. And looking
> at the alternative in this case, I don't think it's reasonable compared
> to just fixing the problem at the root, which is in the DTB.

My understanding that U-Boot isn't the only available bootloader option
for Nyan. I don't feel happy about the ABI breakage, but in the same
time don't feel very strong about the need to care about it in the case
of Nyan since its DT already had a preexisting problem with the wrong
panel model used for the FHD model. The decision will be on your
conscience :)


Re: [PATCH] dt-bindings: display: bridge: lvds-codec: Fix duplicate key

2021-12-22 Thread Marek Vasut

On 12/22/21 19:03, Rob Herring wrote:

On Mon, 20 Dec 2021 13:51:47 +0100, Thierry Reding wrote:

From: Thierry Reding 

In order to validate multiple "if" conditionals, they must be part of an
"allOf:" list, otherwise they will cause a failure in parsing the schema
because of the duplicated "if" property.

Fixes: d7df3948eb49 ("dt-bindings: display: bridge: lvds-codec: Document pixel data 
sampling edge select")
Signed-off-by: Thierry Reding 
---
  .../bindings/display/bridge/lvds-codec.yaml   | 43 ++-
  1 file changed, 22 insertions(+), 21 deletions(-)



I went ahead and applied to drm-misc, so linux-next is fixed.


Thank you


Re: [PATCH v16 08/40] gpu: host1x: Add initial runtime PM and OPP support

2021-12-22 Thread Dmitry Osipenko
22.12.2021 21:41, Jon Hunter пишет:
> 
> On 22/12/2021 09:47, Jon Hunter wrote:
>>
>> On 21/12/2021 20:58, Dmitry Osipenko wrote:
>>> Hi,
>>>
>>> Thank you for testing it all.
>>>
>>> 21.12.2021 21:55, Jon Hunter пишет:
 Hi Dmitry, Thierry,

 On 30/11/2021 23:23, Dmitry Osipenko wrote:
> Add runtime PM and OPP support to the Host1x driver. For the
> starter we
> will keep host1x always-on because dynamic power management require a
> major
> refactoring of the driver code since lot's of code paths are
> missing the
> RPM handling and we're going to remove some of these paths in the
> future.


 Unfortunately, this change is breaking boot on Tegra186. Bisect points
 to this and reverting on top of -next gets the board booting again.
 Sadly, there is no panic or error reported, it is just a hard hang. I
 will not have time to look at this this week and so we may need to
 revert for the moment.
>>>
>>> Only T186 broken? What about T194?
>>
>> Yes interestingly only Tegra186 and no other board.
>>
>>> Which board model fails to boot? Is it running in hypervisor mode?
>>
>> This is Jetson TX2. No hypervisor.
>>
>>> Do you use any additional patches?
>>
>> No just plain -next. The tests run every day on top of tree.
>>
>>> Could you please test the below diff? I suspect that
>>> host1x_syncpt_save/restore may be entirely broken for T186 since we
>>> never used these funcs before.
>>>
>>> --- >8 ---
>>>
>>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>>> index f5b4dcded088..fd5dfb875422 100644
>>> --- a/drivers/gpu/host1x/dev.c
>>> +++ b/drivers/gpu/host1x/dev.c
>>> @@ -580,7 +580,6 @@ static int __maybe_unused
>>> host1x_runtime_suspend(struct device *dev)
>>>   int err;
>>>
>>>   host1x_intr_stop(host);
>>> -    host1x_syncpt_save(host);
>>>
>>>   err = reset_control_bulk_assert(host->nresets, host->resets);
>>>   if (err) {
>>> @@ -596,9 +595,8 @@ static int __maybe_unused
>>> host1x_runtime_suspend(struct device *dev)
>>>   return 0;
>>>
>>>   resume_host1x:
>>> -    host1x_setup_sid_table(host);
>>> -    host1x_syncpt_restore(host);
>>>   host1x_intr_start(host);
>>> +    host1x_setup_sid_table(host);
>>>
>>>   return err;
>>>   }
>>> @@ -626,9 +624,8 @@ static int __maybe_unused
>>> host1x_runtime_resume(struct device *dev)
>>>   goto disable_clk;
>>>   }
>>>
>>> -    host1x_setup_sid_table(host);
>>> -    host1x_syncpt_restore(host);
>>>   host1x_intr_start(host);
>>> +    host1x_setup_sid_table(host);
>>
>>
>> Thanks! Will try this later, once the next bisect is finished :-)
> 
> I tested the above, but this did not fix it. It still hangs on boot.

Thank you, now I see where the problem should be. Apparently host1x is
disabled at a boot time on T186 and we touch h/w before RPM is resumed.

Could you please revert the above change and try this instead:

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index e08e331e46ae..8194826c9ce3 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -137,6 +137,15 @@ void host1x_syncpt_restore(struct host1x *host)
struct host1x_syncpt *sp_base = host->syncpt;
unsigned int i;

+   for (i = 0; i < host->info->nb_pts; i++) {
+   /*
+* Unassign syncpt from channels for purposes of Tegra186
+* syncpoint protection. This prevents any channel from
+* accessing it until it is reassigned.
+*/
+   host1x_hw_syncpt_assign_to_channel(host, sp_base + i, NULL);
+   }
+
for (i = 0; i < host1x_syncpt_nb_pts(host); i++)
host1x_hw_syncpt_restore(host, sp_base + i);

@@ -352,13 +361,6 @@ int host1x_syncpt_init(struct host1x *host)
for (i = 0; i < host->info->nb_pts; i++) {
syncpt[i].id = i;
syncpt[i].host = host;
-
-   /*
-* Unassign syncpt from channels for purposes of Tegra186
-* syncpoint protection. This prevents any channel from
-* accessing it until it is reassigned.
-*/
-   host1x_hw_syncpt_assign_to_channel(host, [i], NULL);
}

for (i = 0; i < host->info->nb_bases; i++)


Re: [PATCH v16 08/40] gpu: host1x: Add initial runtime PM and OPP support

2021-12-22 Thread Jon Hunter



On 22/12/2021 09:47, Jon Hunter wrote:


On 21/12/2021 20:58, Dmitry Osipenko wrote:

Hi,

Thank you for testing it all.

21.12.2021 21:55, Jon Hunter пишет:

Hi Dmitry, Thierry,

On 30/11/2021 23:23, Dmitry Osipenko wrote:

Add runtime PM and OPP support to the Host1x driver. For the starter we
will keep host1x always-on because dynamic power management require a
major
refactoring of the driver code since lot's of code paths are missing 
the
RPM handling and we're going to remove some of these paths in the 
future.



Unfortunately, this change is breaking boot on Tegra186. Bisect points
to this and reverting on top of -next gets the board booting again.
Sadly, there is no panic or error reported, it is just a hard hang. I
will not have time to look at this this week and so we may need to
revert for the moment.


Only T186 broken? What about T194?


Yes interestingly only Tegra186 and no other board.


Which board model fails to boot? Is it running in hypervisor mode?


This is Jetson TX2. No hypervisor.


Do you use any additional patches?


No just plain -next. The tests run every day on top of tree.


Could you please test the below diff? I suspect that
host1x_syncpt_save/restore may be entirely broken for T186 since we
never used these funcs before.

--- >8 ---

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index f5b4dcded088..fd5dfb875422 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -580,7 +580,6 @@ static int __maybe_unused
host1x_runtime_suspend(struct device *dev)
  int err;

  host1x_intr_stop(host);
-    host1x_syncpt_save(host);

  err = reset_control_bulk_assert(host->nresets, host->resets);
  if (err) {
@@ -596,9 +595,8 @@ static int __maybe_unused
host1x_runtime_suspend(struct device *dev)
  return 0;

  resume_host1x:
-    host1x_setup_sid_table(host);
-    host1x_syncpt_restore(host);
  host1x_intr_start(host);
+    host1x_setup_sid_table(host);

  return err;
  }
@@ -626,9 +624,8 @@ static int __maybe_unused
host1x_runtime_resume(struct device *dev)
  goto disable_clk;
  }

-    host1x_setup_sid_table(host);
-    host1x_syncpt_restore(host);
  host1x_intr_start(host);
+    host1x_setup_sid_table(host);



Thanks! Will try this later, once the next bisect is finished :-)


I tested the above, but this did not fix it. It still hangs on boot.

Jon

--
nvpublic


Re: [PATCH] dt-bindings: display: novatek,nt36672a: Fix unevaluated properties warning

2021-12-22 Thread Rob Herring
On Tue, 21 Dec 2021 08:51:26 -0400, Rob Herring wrote:
> With 'unevaluatedProperties' support enabled, the novatek,nt36672a
> binding has a new warning:
> 
> Documentation/devicetree/bindings/display/panel/novatek,nt36672a.example.dt.yaml:
>  panel@0: Unevaluated properties are not allowed ('vddi0-supply', 
> '#address-cells', '#size-cells' were unexpected)
> 
> Based on dts files, 'vddi0-supply' does appear to be the correct name.
> Drop '#address-cells' and '#size-cells' which aren't needed.
> 
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/bindings/display/panel/novatek,nt36672a.yaml   | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 

Applied, thanks!


Re: [PATCH] dt-bindings: msm: disp: remove bus from dpu bindings

2021-12-22 Thread Rob Herring
On Mon, 20 Dec 2021 19:42:20 +0100, David Heidelberg wrote:
> Driver and dts has been already adjusted and bus moved out of dpu, let's
> update also dt-bindings.
> 
> Fixes warnings as:
> arch/arm64/boot/dts/qcom/sdm845-oneplus-fajita.dt.yaml: mdss
> @ae0: clock-names: ['iface', 'core'] is too short
> From schema: 
> Documentation/devicetree/bindings/display/msm/dpu-sdm845.yaml
> 
> Ref: 
> https://lore.kernel.org/all/20210803101657.1072358-1-dmitry.barysh...@linaro.org/
> 
> Signed-off-by: David Heidelberg 
> ---
>  .../devicetree/bindings/display/msm/dpu-sdm845.yaml  | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 

Applied, thanks!


Re: [PATCH] dt-bindings: display: bridge: lvds-codec: Fix duplicate key

2021-12-22 Thread Rob Herring
On Mon, 20 Dec 2021 13:51:47 +0100, Thierry Reding wrote:
> From: Thierry Reding 
> 
> In order to validate multiple "if" conditionals, they must be part of an
> "allOf:" list, otherwise they will cause a failure in parsing the schema
> because of the duplicated "if" property.
> 
> Fixes: d7df3948eb49 ("dt-bindings: display: bridge: lvds-codec: Document 
> pixel data sampling edge select")
> Signed-off-by: Thierry Reding 
> ---
>  .../bindings/display/bridge/lvds-codec.yaml   | 43 ++-
>  1 file changed, 22 insertions(+), 21 deletions(-)
> 

I went ahead and applied to drm-misc, so linux-next is fixed. 

Rob


Re: make dt_binding_check broken by drm & lvds-codec

2021-12-22 Thread Marek Vasut

On 12/22/21 18:43, Rafał Miłecki wrote:

Hi,


Hi,

ba3e86789eaf ("dt-bindings: display: bridge: lvds-codec: Document LVDS 
data mapping select")
d7df3948eb49 ("dt-bindings: display: bridge: lvds-codec: Document pixel 
data sampling edge select")


Both commits add "if" and "then" at YAML "root" level.

Can you take a look at that, please?


This should already be fixed by:
[PATCH] dt-bindings: display: bridge: lvds-codec: Fix duplicate key

+CC Thomas/Thierry, can you please pick the aforementioned patch ?


make dt_binding_check broken by drm & lvds-codec

2021-12-22 Thread Rafał Miłecki

Hi,

I just noticed that "make dt_binding_check" doesn't work in linux-next:

  SCHEMA  Documentation/devicetree/bindings/processed-schema-examples.json
Traceback (most recent call last):
  File "/home/rmilecki/.local/bin/dt-mk-schema", line 38, in 
schemas = dtschema.process_schemas(args.schemas, core_schema=(not 
args.useronly))
  File "/home/rmilecki/.local/lib/python3.6/site-packages/dtschema/lib.py", 
line 587, in process_schemas
sch = process_schema(os.path.abspath(filename))
  File "/home/rmilecki/.local/lib/python3.6/site-packages/dtschema/lib.py", 
line 561, in process_schema
schema = load_schema(filename)
  File "/home/rmilecki/.local/lib/python3.6/site-packages/dtschema/lib.py", 
line 126, in load_schema
return do_load(os.path.join(schema_basedir, schema))
  File "/home/rmilecki/.local/lib/python3.6/site-packages/dtschema/lib.py", 
line 112, in do_load
return yaml.load(tmp)
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/main.py", line 343, in load
return constructor.get_single_data()
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 113, 
in get_single_data
return self.construct_document(node)
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 123, 
in construct_document
for _dummy in generator:
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 723, 
in construct_yaml_map
value = self.construct_mapping(node)
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 440, 
in construct_mapping
return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 257, 
in construct_mapping
if self.check_mapping_key(node, key_node, mapping, key, value):
  File "/usr/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 295, 
in check_mapping_key
raise DuplicateKeyError(*args)
ruamel.yaml.constructor.DuplicateKeyError: while constructing a mapping
  in "", line 4, column 1
found duplicate key "if" with value "{}" (original value: "{}")
  in "", line 113, column 1

It's caused by two commits:
ba3e86789eaf ("dt-bindings: display: bridge: lvds-codec: Document LVDS data mapping 
select")
d7df3948eb49 ("dt-bindings: display: bridge: lvds-codec: Document pixel data 
sampling edge select")

Both commits add "if" and "then" at YAML "root" level.

Can you take a look at that, please?


Re: [PATCH 22/22] drm: rockchip: Add VOP2 driver

2021-12-22 Thread Nicolas Frattaroli
On Dienstag, 21. Dezember 2021 14:44:39 CET Nicolas Frattaroli wrote:
> On Montag, 20. Dezember 2021 12:06:30 CET Sascha Hauer wrote:
> > From: Andy Yan 
> >
> > The VOP2 unit is found on Rockchip SoCs beginning with rk3566/rk3568.
> > It replaces the VOP unit found in the older Rockchip SoCs.
> >
> > This driver has been derived from the downstream Rockchip Kernel and
> > heavily modified:
> >
> > - All nonstandard DRM properties have been removed
> > - dropped struct vop2_plane_state and pass around less data between
> >   functions
> > - Dropped all DRM_FORMAT_* not known on upstream
> > - rework register access to get rid of excessively used macros
> > - Drop all waiting for framesyncs
> >
> > The driver is tested with HDMI and MIPI-DSI display on a RK3568-EVB
> > board. Overlay support is tested with the modetest utility. AFBC support
> > on the cluster windows is tested with weston-simple-dmabuf-egl on
> > weston using the (yet to be upstreamed) panfrost driver support.
> >
> > Signed-off-by: Sascha Hauer 
> > ---
>
> Hi Sascha,
>
> quick partial review of the code in-line.
>
> For reference, I debugged locking issues with the kernel lock
> debug config options and assert_spin_locked in the reg write
> functions, as well as some manual deduction.
>

As a small follow-up, I've completely mapped out the calls to
vop2_writel, vop2_readl, vop2_vp_write and vop2_win_write and
coloured in whether they were called with the lock held or not.

The conclusion is startling: Most of the code absolutely does
not care about the reg_lock.

Here's the graph as an SVG: 
https://overviewer.org/~pillow/up/6800427ef3/vop2_callgraph_modified.svg

vop2_isr here needs to be paid special attention, as it also
acquires a different spinlock, and we want to avoid deadlocks.

Perhaps we should precisely define which lock must be held for
what registers, such that the vop2_isr can write its interrupt
related registers without acquiring the "big" reg_lock.

I'm also not entirely sure whether I should assume vop2_readl
needs to be called with the lock held. This needs some
investigating both in terms of whether the hardware presents
a writel as an atomic write of a long, and whether the code
assumes the state between readl calls is ever a consistent view.

Regards,
Nicolas Frattaroli





Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts

2021-12-22 Thread Tvrtko Ursulin



Ping?

Main two points being:

1) Commit message seems in contradiction with the change in 
guc_flush_destroyed_contexts. And the lock drop to immediately 
re-acquire it looks questionable to start with.


2) And in deregister_destroyed_contexts and in 1) I was therefore asking 
if you can unlink all at once and process with reduced hammering on the 
lock.


Regards,

Tvrtko

On 17/12/2021 11:14, Tvrtko Ursulin wrote:


On 17/12/2021 11:06, Tvrtko Ursulin wrote:

On 14/12/2021 17:04, Matthew Brost wrote:

From: John Harrison 

While attempting to debug a CT deadlock issue in various CI failures
(most easily reproduced with gem_ctx_create/basic-files), I was seeing
CPU deadlock errors being reported. This were because the context
destroy loop was blocking waiting on H2G space from inside an IRQ
spinlock. There no was deadlock as such, it's just that the H2G queue
was full of context destroy commands and GuC was taking a long time to
process them. However, the kernel was seeing the large amount of time
spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
then happen (heartbeat failures, CT deadlock errors, outstanding H2G
WARNs, etc.).

Re-working the loop to only acquire the spinlock around the list
management (which is all it is meant to protect) rather than the
entire destroy operation seems to fix all the above issues.

v2:
  (John Harrison)
   - Fix typo in comment message

Signed-off-by: John Harrison 
Signed-off-by: Matthew Brost 
Reviewed-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ---
  1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 36c2965db49b..96fcf869e3ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct 
intel_context *ce)

  unsigned long flags;
  bool disabled;
-    lockdep_assert_held(>submission_state.lock);
  GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
  GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id));
  GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id));
@@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct 
intel_context *ce)

  }
  spin_unlock_irqrestore(>guc_state.lock, flags);
  if (unlikely(disabled)) {
-    __release_guc_id(guc, ce);
+    release_guc_id(guc, ce);
  __guc_context_destroy(ce);
  return;
  }
@@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct 
intel_context *ce)

  static void guc_flush_destroyed_contexts(struct intel_guc *guc)
  {
-    struct intel_context *ce, *cn;
+    struct intel_context *ce;
  unsigned long flags;
  GEM_BUG_ON(!submission_disabled(guc) &&
 guc_submission_initialized(guc));
-    spin_lock_irqsave(>submission_state.lock, flags);
-    list_for_each_entry_safe(ce, cn,
- >submission_state.destroyed_contexts,
- destroyed_link) {
-    list_del_init(>destroyed_link);
-    __release_guc_id(guc, ce);
+    while (!list_empty(>submission_state.destroyed_contexts)) {


Are lockless false negatives a concern here - I mean this thread not 
seeing something just got added to the list?



+    spin_lock_irqsave(>submission_state.lock, flags);
+    ce = 
list_first_entry_or_null(>submission_state.destroyed_contexts,

+  struct intel_context,
+  destroyed_link);
+    if (ce)
+    list_del_init(>destroyed_link);
+    spin_unlock_irqrestore(>submission_state.lock, flags);
+
+    if (!ce)
+    break;
+
+    release_guc_id(guc, ce);


This looks suboptimal and in conflict with this part of the commit 
message:


"""
  Re-working the loop to only acquire the spinlock around the list
  management (which is all it is meant to protect) rather than the
  entire destroy operation seems to fix all the above issues.
"""

Because you end up doing:

... loop ...
   spin_lock_irqsave(>submission_state.lock, flags);
   list_del_init(>destroyed_link);
   spin_unlock_irqrestore(>submission_state.lock, flags);

   release_guc_id, which calls:
 spin_lock_irqsave(>submission_state.lock, flags);
 __release_guc_id(guc, ce);
 spin_unlock_irqrestore(>submission_state.lock, flags);

So a) the lock seems to be protecting more than just list management, 
or release_guc_if is wrong, and b) the loop ends up with highly 
questionable hammering on the lock.


Is there any point to this part of the patch? Or the only business end 
of the patch is below:



  __guc_context_destroy(ce);
  }
-    spin_unlock_irqrestore(>submission_state.lock, flags);
  }
  static void deregister_destroyed_contexts(struct intel_guc *guc)
  {
-    struct intel_context *ce, *cn;
+    struct intel_context *ce;
  unsigned long flags;
-    

Re: [Intel-gfx] [PATCH] drm/i915/guc: Log engine resets

2021-12-22 Thread Tvrtko Ursulin



On 21/12/2021 22:14, John Harrison wrote:

On 12/21/2021 05:37, Tvrtko Ursulin wrote:

On 20/12/2021 18:34, John Harrison wrote:

On 12/20/2021 07:00, Tvrtko Ursulin wrote:

On 17/12/2021 16:22, Matthew Brost wrote:

On Fri, Dec 17, 2021 at 12:15:53PM +, Tvrtko Ursulin wrote:


On 14/12/2021 15:07, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Log engine resets done by the GuC firmware in the similar way it 
is done

by the execlists backend.

This way we have notion of where the hangs are before the GuC gains
support for proper error capture.


Ping - any interest to log this info?

All there currently is a non-descriptive "[drm] GPU HANG: ecode
12:0:".



Yea, this could be helpful. One suggestion below.

Also, will GuC be reporting the reason for the engine reset at any 
point?




We are working on the error state capture, presumably the registers 
will

give a clue what caused the hang.

As for the GuC providing a reason, that isn't defined in the interface
but that is decent idea to provide a hint in G2H what the issue 
was. Let
me run that by the i915 GuC developers / GuC firmware team and see 
what

they think.

The GuC does not do any hang analysis. So as far as GuC is concerned, 
the reason is pretty much always going to be pre-emption timeout. 
There are a few ways the pre-emption itself could be triggered but 
basically, if GuC resets an active context then it is because it did 
not pre-empt quickly enough when requested.




Regards,

Tvrtko


Signed-off-by: Tvrtko Ursulin 
Cc: Matthew Brost 
Cc: John Harrison 
---
   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 
+++-

   1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 9739da6f..51512123dc1a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
   #include "gt/intel_context.h"
   #include "gt/intel_engine_pm.h"
   #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_engine_user.h"
   #include "gt/intel_gpu_commands.h"
   #include "gt/intel_gt.h"
   #include "gt/intel_gt_clock_utils.h"
@@ -3934,9 +3935,18 @@ static void capture_error_state(struct 
intel_guc *guc,

   {
   struct intel_gt *gt = guc_to_gt(guc);
   struct drm_i915_private *i915 = gt->i915;
-    struct intel_engine_cs *engine = 
__context_to_physical_engine(ce);

+    struct intel_engine_cs *engine = ce->engine;
   intel_wakeref_t wakeref;
+    if (intel_engine_is_virtual(engine)) {
+    drm_notice(>drm, "%s class, engines 0x%x; GuC 
engine reset\n",

+ intel_engine_class_repr(engine->class),
+   engine->mask);
+    engine = guc_virtual_get_sibling(engine, 0);
+    } else {
+    drm_notice(>drm, "%s GuC engine reset\n", 
engine->name);


Probably include the guc_id of the context too then?


Is the guc id stable and useful on its own - who would be the user?
The GuC id is the only thing that matters when trying to correlate 
KMD activity with a GuC log. So while it might not be of any use or 
interest to an end user, it is extremely important and useful to a 
kernel developer attempting to debug an issue. And that includes bug 
reports from end users that are hard to repro given that the standard 
error capture will include the GuC log.


On the topic of GuC log - is there a tool in IGT (or will be) which 
will parse the bit saved in the error capture or how is that supposed 
to be used?

Nope.

However, Alan is currently working on supporting the GuC error capture 
mechanism. Prior to sending the reset notification to the KMD, the GuC 
will save a whole bunch of register state to a memory buffer and send a 
notification to the KMD that this is available. When we then get the 
actual reset notification, we need to match the two together and include 
a parsed, human readable version of the GuC's capture state buffer in 
the sysfs error log output.


The GuC log should not be involved in this process. And note that any 
register dumps in the GuC log are limited in scope and only enabled at 
higher verbosity levels. Whereas, the official state capture is based on 
a register list provided by the KMD and is available irrespective of 
debug CONFIG settings, verbosity levels, etc.


Hm why should GuC log not be involved now? I thought earlier you said:

"""
And that includes bug reports from end users that are hard to repro 
given that the standard error capture will include the GuC log.

"""

Hence I thought there would be a tool in IGT which would parse the part 
saved inside the error capture.


Also, note that GuC really resets contexts rather than engines. What 
it reports back to i915 on a reset is simply the GuC id of the 
context. It is up to i915 to work back from that to determine engine 
instances/classes if required. And in the case of a virtual context, 
it is impossible to extract the actual 

[PATCH] drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.

2021-12-22 Thread Maarten Lankhorst
Convert free_work into delayed_work, similar to ttm to allow converting the
blocking lock in __i915_gem_free_objects to a trylock.

Unlike ttm, the object should already be idle, as it's kept alive
by a reference through struct i915_vma->active, which is dropped
after all vma's are idle.

Because of this, we can use a no wait by default, or when the lock
is contested, we use ttm's 10 ms.

The trylock should only fail when the object is sharing it's resv with
other objects, and typically objects are not kept locked for a long
time, so we can safely retry on failure.

Fixes: be7612fd6665 ("drm/i915: Require object lock when freeing pages during 
destruction")
Testcase: igt/gem_exec_alignment/pi*
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 14 ++
 drivers/gpu/drm/i915/i915_drv.h|  4 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 39cd563544a5..d87b508b59b1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -331,7 +331,13 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
continue;
}
 
-   i915_gem_object_lock(obj, NULL);
+   if (!i915_gem_object_trylock(obj, NULL)) {
+   /* busy, toss it back to the pile */
+   if (llist_add(>freed, >mm.free_list))
+   queue_delayed_work(i915->wq, 
>mm.free_work, msecs_to_jiffies(10));
+   continue;
+   }
+
__i915_gem_object_pages_fini(obj);
i915_gem_object_unlock(obj);
__i915_gem_free_object(obj);
@@ -353,7 +359,7 @@ void i915_gem_flush_free_objects(struct drm_i915_private 
*i915)
 static void __i915_gem_free_work(struct work_struct *work)
 {
struct drm_i915_private *i915 =
-   container_of(work, struct drm_i915_private, mm.free_work);
+   container_of(work, struct drm_i915_private, mm.free_work.work);
 
i915_gem_flush_free_objects(i915);
 }
@@ -385,7 +391,7 @@ static void i915_gem_free_object(struct drm_gem_object 
*gem_obj)
 */
 
if (llist_add(>freed, >mm.free_list))
-   queue_work(i915->wq, >mm.free_work);
+   queue_delayed_work(i915->wq, >mm.free_work, 0);
 }
 
 void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
@@ -710,7 +716,7 @@ bool i915_gem_object_placement_possible(struct 
drm_i915_gem_object *obj,
 
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
-   INIT_WORK(>mm.free_work, __i915_gem_free_work);
+   INIT_DELAYED_WORK(>mm.free_work, __i915_gem_free_work);
 }
 
 void i915_objects_module_exit(void)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c8fddb7e61c9..beeb42a14aae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -465,7 +465,7 @@ struct i915_gem_mm {
 * List of objects which are pending destruction.
 */
struct llist_head free_list;
-   struct work_struct free_work;
+   struct delayed_work free_work;
/**
 * Count of objects pending destructions. Used to skip needlessly
 * waiting on an RCU barrier if no objects are waiting to be freed.
@@ -1625,7 +1625,7 @@ static inline void i915_gem_drain_freed_objects(struct 
drm_i915_private *i915)
 * armed the work again.
 */
while (atomic_read(>mm.free_count)) {
-   flush_work(>mm.free_work);
+   flush_delayed_work(>mm.free_work);
flush_delayed_work(>bdev.wq);
rcu_barrier();
}
-- 
2.34.1



[PATCH v2 1/1] drm/i915/dsi: Drop double check ACPI companion device for NULL

2021-12-22 Thread Andy Shevchenko
acpi_dev_get_resources() does perform the NULL pointer check against
ACPI companion device which is given as function parameter. Thus,
there is no need to duplicate this check in the caller.

Signed-off-by: Andy Shevchenko 
---
v2: used LIST_HEAD() (Ville), initialized lookup directly on stack (Ville)
 drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 28 +++-
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c 
b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
index 0da91849efde..da0bd056f3d3 100644
--- a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
+++ b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
@@ -426,24 +426,16 @@ static void i2c_acpi_find_adapter(struct intel_dsi 
*intel_dsi,
  const u16 slave_addr)
 {
struct drm_device *drm_dev = intel_dsi->base.base.dev;
-   struct device *dev = drm_dev->dev;
-   struct acpi_device *acpi_dev;
-   struct list_head resource_list;
-   struct i2c_adapter_lookup lookup;
-
-   acpi_dev = ACPI_COMPANION(dev);
-   if (acpi_dev) {
-   memset(, 0, sizeof(lookup));
-   lookup.slave_addr = slave_addr;
-   lookup.intel_dsi = intel_dsi;
-   lookup.dev_handle = acpi_device_handle(acpi_dev);
-
-   INIT_LIST_HEAD(_list);
-   acpi_dev_get_resources(acpi_dev, _list,
-  i2c_adapter_lookup,
-  );
-   acpi_dev_free_resource_list(_list);
-   }
+   struct acpi_device *adev = ACPI_COMPANION(drm_dev->dev);
+   struct i2c_adapter_lookup lookup = {
+   .slave_addr = slave_addr,
+   .intel_dsi = intel_dsi,
+   .dev_handle = acpi_device_handle(adev),
+   };
+   LIST_HEAD(resource_list);
+
+   acpi_dev_get_resources(adev, _list, i2c_adapter_lookup, 
);
+   acpi_dev_free_resource_list(_list);
 }
 #else
 static inline void i2c_acpi_find_adapter(struct intel_dsi *intel_dsi,
-- 
2.34.1



Re: [PATCH] drm/amd/display: Fix the uninitialized variable in enable_stream_features()

2021-12-22 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Dec 17, 2021 at 11:22 PM Yizhuo Zhai  wrote:
>
> In function enable_stream_features(), the variable "old_downspread.raw"
> could be uninitialized if core_link_read_dpcd() fails, however, it is
> used in the later if statement, and further, core_link_write_dpcd()
> may write random value, which is potentially unsafe.
>
> Fixes: 6016cd9dba0f ("drm/amd/display: add helper for enabling mst stream 
> features")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Yizhuo Zhai 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> index c8457babfdea..fd5a0e7eb029 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
> @@ -1844,6 +1844,8 @@ static void enable_stream_features(struct pipe_ctx 
> *pipe_ctx)
> union down_spread_ctrl old_downspread;
> union down_spread_ctrl new_downspread;
>
> +   memset(_downspread, 0, sizeof(old_downspread));
> +
> core_link_read_dpcd(link, DP_DOWNSPREAD_CTRL,
> _downspread.raw, sizeof(old_downspread));
>
> --
> 2.25.1
>


  1   2   >