Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-08 Thread Matthew Brost
On Fri, Jul 07, 2023 at 02:52:41PM +0200, Boris Brezillon wrote:
> On Fri, 7 Jul 2023 14:41:23 +0200
> Danilo Krummrich  wrote:
> 
> > >> + va__ && (va__->va.addr < (end__)) && \
> > >> + !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> > >> + va__ = list_next_entry(va__, rb.entry))  
> > > 
> > > If you define:
> > > 
> > > static inline struct drm_gpuva *
> > > drm_gpuva_next(struct drm_gpuva *va)
> > > {
> > >   if (va && !list_is_last(>rb.entry, >mgr->rb.list))
> > >   return list_next_entry(va, rb.entry);
> > > 
> > >   return NULL;
> > > } >
> > > the for loop becomes a bit more readable:  
> > 
> > Yes, it would. However, I don't want it to be confused with 
> > drm_gpuva_find_next(). Maybe I should rename the latter to something 
> > like drm_gpuva_find_next_neighbor() then.
> 
> If you want to keep drm_gpuva_find_next(), feel free to rename/prefix
> the drm_gpuva_next() function. I was just posting it as a reference.
> 
> > 
> > > 
> > >   for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
> > > (start__)); \
> > >va__ && (va__->va.addr < (end__)); \
> > >va__ = drm_gpuva_next(va__))
> > >   
> > >> +
> > >> +/**
> > >> + * drm_gpuva_for_each_va_range_safe - iternator to safely walk over a 
> > >> range of
> > >> + * _gpuvas
> > >> + * @va__: _gpuva to assign to in each iteration step
> > >> + * @next__: another _gpuva to use as temporary storage
> > >> + * @mgr__: _gpuva_manager to walk over
> > >> + * @start__: starting offset, the first gpuva will overlap this
> > >> + * @end__: ending offset, the last gpuva will start before this (but may
> > >> + * overlap)
> > >> + *
> > >> + * This iterator walks over all _gpuvas in the _gpuva_manager 
> > >> that lie
> > >> + * between @start__ and @end__. It is implemented similarly to
> > >> + * list_for_each_safe(), but is using the _gpuva_manager's internal 
> > >> interval
> > >> + * tree to accelerate the search for the starting _gpuva, and hence 
> > >> is safe
> > >> + * against removal of elements. It assumes that @end__ is within (or is 
> > >> the
> > >> + * upper limit of) the _gpuva_manager. This iterator does not skip 
> > >> over the
> > >> + * _gpuva_manager's @kernel_alloc_node.
> > >> + */
> > >> +#define drm_gpuva_for_each_va_range_safe(va__, next__, mgr__, start__, 
> > >> end__) \
> > >> +for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)), \
> > >> + next__ = va ? list_next_entry(va__, rb.entry) : NULL; \
> > >> + va__ && (va__->va.addr < (end__)) && \
> > >> + !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> > >> + va__ = next__, next__ = list_next_entry(va__, rb.entry))  
> > > 
> > > And this is the safe version using the drm_gpuva_next() helper:
> > > 
> > >   for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
> > > (start__)), \
> > >next__ = drm_gpuva_next(va__); \
> > >va__ && (va__->va.addr < (end__)); \
> > >va__ = next__, next__ = drm_gpuva_next(va__))
> > > 
> > > Those changes fixed an invalid pointer access I had in the sm_unmap()
> > > path.
> > >   
> > 
> > Sorry you did run into this bug.
> 
> No worries, that's what testing/debugging/reviewing is for. And I'm glad
> someone decided to work on this gpuva stuff so I don't have to code it
> myself, so that's the least I can do.

With Boris's changes this version works in Xe.

With that:

Acked-by: Matthew Brost 
Tested-by: Matthew Brost 


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-07 Thread Boris Brezillon
On Fri, 7 Jul 2023 14:41:23 +0200
Danilo Krummrich  wrote:

> >> +   va__ && (va__->va.addr < (end__)) && \
> >> +   !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> >> +   va__ = list_next_entry(va__, rb.entry))  
> > 
> > If you define:
> > 
> > static inline struct drm_gpuva *
> > drm_gpuva_next(struct drm_gpuva *va)
> > {
> > if (va && !list_is_last(>rb.entry, >mgr->rb.list))
> > return list_next_entry(va, rb.entry);
> > 
> > return NULL;
> > } >
> > the for loop becomes a bit more readable:  
> 
> Yes, it would. However, I don't want it to be confused with 
> drm_gpuva_find_next(). Maybe I should rename the latter to something 
> like drm_gpuva_find_next_neighbor() then.

If you want to keep drm_gpuva_find_next(), feel free to rename/prefix
the drm_gpuva_next() function. I was just posting it as a reference.

> 
> > 
> > for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
> > (start__)); \
> >  va__ && (va__->va.addr < (end__)); \
> >  va__ = drm_gpuva_next(va__))
> >   
> >> +
> >> +/**
> >> + * drm_gpuva_for_each_va_range_safe - iternator to safely walk over a 
> >> range of
> >> + * _gpuvas
> >> + * @va__: _gpuva to assign to in each iteration step
> >> + * @next__: another _gpuva to use as temporary storage
> >> + * @mgr__: _gpuva_manager to walk over
> >> + * @start__: starting offset, the first gpuva will overlap this
> >> + * @end__: ending offset, the last gpuva will start before this (but may
> >> + * overlap)
> >> + *
> >> + * This iterator walks over all _gpuvas in the _gpuva_manager 
> >> that lie
> >> + * between @start__ and @end__. It is implemented similarly to
> >> + * list_for_each_safe(), but is using the _gpuva_manager's internal 
> >> interval
> >> + * tree to accelerate the search for the starting _gpuva, and hence 
> >> is safe
> >> + * against removal of elements. It assumes that @end__ is within (or is 
> >> the
> >> + * upper limit of) the _gpuva_manager. This iterator does not skip 
> >> over the
> >> + * _gpuva_manager's @kernel_alloc_node.
> >> + */
> >> +#define drm_gpuva_for_each_va_range_safe(va__, next__, mgr__, start__, 
> >> end__) \
> >> +  for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)), \
> >> +   next__ = va ? list_next_entry(va__, rb.entry) : NULL; \
> >> +   va__ && (va__->va.addr < (end__)) && \
> >> +   !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> >> +   va__ = next__, next__ = list_next_entry(va__, rb.entry))  
> > 
> > And this is the safe version using the drm_gpuva_next() helper:
> > 
> > for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
> > (start__)), \
> >  next__ = drm_gpuva_next(va__); \
> >  va__ && (va__->va.addr < (end__)); \
> >  va__ = next__, next__ = drm_gpuva_next(va__))
> > 
> > Those changes fixed an invalid pointer access I had in the sm_unmap()
> > path.
> >   
> 
> Sorry you did run into this bug.

No worries, that's what testing/debugging/reviewing is for. And I'm glad
someone decided to work on this gpuva stuff so I don't have to code it
myself, so that's the least I can do.


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-07 Thread Danilo Krummrich

On 7/7/23 13:00, Boris Brezillon wrote:

On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich  wrote:


+/**
+ * drm_gpuva_for_each_va_range - iternator to walk over a range of _gpuvas
+ * @va__: _gpuva structure to assign to in each iteration step
+ * @mgr__: _gpuva_manager to walk over
+ * @start__: starting offset, the first gpuva will overlap this
+ * @end__: ending offset, the last gpuva will start before this (but may
+ * overlap)
+ *
+ * This iterator walks over all _gpuvas in the _gpuva_manager that lie
+ * between @start__ and @end__. It is implemented similarly to list_for_each(),
+ * but is using the _gpuva_manager's internal interval tree to accelerate
+ * the search for the starting _gpuva, and hence isn't safe against removal
+ * of elements. It assumes that @end__ is within (or is the upper limit of) the
+ * _gpuva_manager. This iterator does not skip over the 
_gpuva_manager's
+ * @kernel_alloc_node.
+ */
+#define drm_gpuva_for_each_va_range(va__, mgr__, start__, end__) \
+   for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)); \


drm_gpuva_find_first() takes the range size as its last argument, not
the range end:

for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
(start__)); \



Good catch! Originally this was

drm_gpuva_it_iter_first(&(mgr)->rb.tree, (start__), (end__) - 1)

but then I changed it since I did not want to expose the interval tree 
functions directly.





+va__ && (va__->va.addr < (end__)) && \
+!list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
+va__ = list_next_entry(va__, rb.entry))


If you define:

static inline struct drm_gpuva *
drm_gpuva_next(struct drm_gpuva *va)
{
if (va && !list_is_last(>rb.entry, >mgr->rb.list))
return list_next_entry(va, rb.entry);

return NULL;
} >
the for loop becomes a bit more readable:


Yes, it would. However, I don't want it to be confused with 
drm_gpuva_find_next(). Maybe I should rename the latter to something 
like drm_gpuva_find_next_neighbor() then.




for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
(start__)); \
 va__ && (va__->va.addr < (end__)); \
 va__ = drm_gpuva_next(va__))


+
+/**
+ * drm_gpuva_for_each_va_range_safe - iternator to safely walk over a range of
+ * _gpuvas
+ * @va__: _gpuva to assign to in each iteration step
+ * @next__: another _gpuva to use as temporary storage
+ * @mgr__: _gpuva_manager to walk over
+ * @start__: starting offset, the first gpuva will overlap this
+ * @end__: ending offset, the last gpuva will start before this (but may
+ * overlap)
+ *
+ * This iterator walks over all _gpuvas in the _gpuva_manager that lie
+ * between @start__ and @end__. It is implemented similarly to
+ * list_for_each_safe(), but is using the _gpuva_manager's internal 
interval
+ * tree to accelerate the search for the starting _gpuva, and hence is safe
+ * against removal of elements. It assumes that @end__ is within (or is the
+ * upper limit of) the _gpuva_manager. This iterator does not skip over the
+ * _gpuva_manager's @kernel_alloc_node.
+ */
+#define drm_gpuva_for_each_va_range_safe(va__, next__, mgr__, start__, end__) \
+   for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)), \
+next__ = va ? list_next_entry(va__, rb.entry) : NULL; \
+va__ && (va__->va.addr < (end__)) && \
+!list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
+va__ = next__, next__ = list_next_entry(va__, rb.entry))


And this is the safe version using the drm_gpuva_next() helper:

for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
(start__)), \
 next__ = drm_gpuva_next(va__); \
 va__ && (va__->va.addr < (end__)); \
 va__ = next__, next__ = drm_gpuva_next(va__))

Those changes fixed an invalid pointer access I had in the sm_unmap()
path.



Sorry you did run into this bug.

- Danilo



Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-07 Thread Danilo Krummrich

On 7/7/23 09:57, Boris Brezillon wrote:

On Thu, 6 Jul 2023 20:26:42 +0200
Boris Brezillon  wrote:


On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich  wrote:


+#ifdef CONFIG_LOCKDEP
+typedef struct lockdep_map *lockdep_map_p;
+#define drm_gpuva_manager_ext_assert_held(mgr) \
+   lockdep_assert(lock_is_held((mgr)->ext_lock) != LOCK_STATE_NOT_HELD)
+/**
+ * drm_gpuva_manager_set_ext_lock - set the external lock according to
+ * @DRM_GPUVA_MANAGER_LOCK_EXTERN
+ * @mgr: the _gpuva_manager to set the lock for
+ * @lock: the lock to set
+ *
+ * If @DRM_GPUVA_MANAGER_LOCK_EXTERN is set, drivers need to call this function
+ * to provide the lock used to lock linking and unlinking of _gpuvas to the
+ * _gem_objects GPUVA list.
+ */
+#define drm_gpuva_manager_set_ext_lock(mgr, lock)  \
+   (mgr)->ext_lock = &(lock)->dep_map


Okay, so, IIUC, this is the lock protecting the GEM's active mappings
list, meaning the lock is likely to be attached to the GEM object. Are
we expected to call drm_gpuva_manager_set_ext_lock() every time we call
drm_gpuva_[un]link(), or are we supposed to have some lock at the
device level serializing all drm_gpuva_[un]link() calls across VMs? The
later doesn't sound like a good option to me, and the former feels a bit
weird. I'm wondering if we shouldn't just drop this assert_held() check
when DRM_GPUVA_MANAGER_LOCK_EXTERN is set. Alternatively, we could say
that any driver wanting to use a custom lock (which is basically all
drivers modifying the VA space asynchronously in the ::run_job() path)
has to provide its own variant of drm_gpuva_[un]link() (maybe with its
own VA list too), which doesn't sound like a good idea either.


Or we could just attach the dep_map to drm_gem_object::gpuva::lock, and
let drivers overload the default lock in their GEM creation function if
they want to use a custom lock (see the following diff).


Uh, I like that. Will pick it up, thanks!



---

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index e47747f22126..6427c88c22ba 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -675,8 +675,7 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
   const char *name,
   u64 start_offset, u64 range,
   u64 reserve_offset, u64 reserve_range,
-  const struct drm_gpuva_fn_ops *ops,
-  enum drm_gpuva_manager_flags flags)
+  const struct drm_gpuva_fn_ops *ops)
  {
mgr->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(>rb.list);
@@ -686,7 +685,6 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
mgr->mm_range = range;
  
  	mgr->name = name ? name : "unknown";

-   mgr->flags = flags;
mgr->ops = ops;
  
  	memset(>kernel_alloc_node, 0, sizeof(struct drm_gpuva));

@@ -822,16 +820,12 @@ EXPORT_SYMBOL(drm_gpuva_remove);
  void
  drm_gpuva_link(struct drm_gpuva *va)
  {
-   struct drm_gpuva_manager *mgr = va->mgr;
struct drm_gem_object *obj = va->gem.obj;
  
  	if (unlikely(!obj))

return;
  
-	if (drm_gpuva_manager_external_lock(mgr))

-   drm_gpuva_manager_ext_assert_held(mgr);
-   else
-   dma_resv_assert_held(obj->resv);
+   drm_gem_gpuva_assert_lock_held(obj);
  
  	list_add_tail(>gem.entry, >gpuva.list);

  }
@@ -850,16 +844,12 @@ EXPORT_SYMBOL(drm_gpuva_link);
  void
  drm_gpuva_unlink(struct drm_gpuva *va)
  {
-   struct drm_gpuva_manager *mgr = va->mgr;
struct drm_gem_object *obj = va->gem.obj;
  
  	if (unlikely(!obj))

return;
  
-	if (drm_gpuva_manager_external_lock(mgr))

-   drm_gpuva_manager_ext_assert_held(mgr);
-   else
-   dma_resv_assert_held(obj->resv);
+   drm_gem_gpuva_assert_lock_held(obj);
  
  	list_del_init(>gem.entry);

  }
@@ -1680,10 +1670,7 @@ drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager 
*mgr,
struct drm_gpuva *va;
int ret;
  
-	if (drm_gpuva_manager_external_lock(mgr))

-   drm_gpuva_manager_ext_assert_held(mgr);
-   else
-   dma_resv_assert_held(obj->resv);
+   drm_gem_gpuva_assert_lock_held(obj);
  
  	ops = kzalloc(sizeof(*ops), GFP_KERNEL);

if (!ops)
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 5ec8148a30ee..572d7a538324 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -387,10 +387,14 @@ struct drm_gem_object {
 * Provides the list of GPU VAs attached to this GEM object.
 *
 * Drivers should lock list accesses with the GEMs _resv lock
-* (_gem_object.resv).
+* (_gem_object.resv) or a custom lock if one is provided.
 */
struct {
struct list_head list;
+
+#ifdef CONFIG_LOCKDEP
+   struct lockdep_map *lock_dep_map;
+#endif
} gpuva;
  
  	/**

@@ -540,6 +544,26 @@ unsigned long 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-07 Thread Boris Brezillon
On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich  wrote:

> +/**
> + * drm_gpuva_for_each_va_range - iternator to walk over a range of 
> _gpuvas
> + * @va__: _gpuva structure to assign to in each iteration step
> + * @mgr__: _gpuva_manager to walk over
> + * @start__: starting offset, the first gpuva will overlap this
> + * @end__: ending offset, the last gpuva will start before this (but may
> + * overlap)
> + *
> + * This iterator walks over all _gpuvas in the _gpuva_manager that 
> lie
> + * between @start__ and @end__. It is implemented similarly to 
> list_for_each(),
> + * but is using the _gpuva_manager's internal interval tree to accelerate
> + * the search for the starting _gpuva, and hence isn't safe against 
> removal
> + * of elements. It assumes that @end__ is within (or is the upper limit of) 
> the
> + * _gpuva_manager. This iterator does not skip over the 
> _gpuva_manager's
> + * @kernel_alloc_node.
> + */
> +#define drm_gpuva_for_each_va_range(va__, mgr__, start__, end__) \
> + for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)); \

drm_gpuva_find_first() takes the range size as its last argument, not
the range end:

for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
(start__)); \


> +  va__ && (va__->va.addr < (end__)) && \
> +  !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> +  va__ = list_next_entry(va__, rb.entry))

If you define:

static inline struct drm_gpuva *
drm_gpuva_next(struct drm_gpuva *va)
{
if (va && !list_is_last(>rb.entry, >mgr->rb.list))
return list_next_entry(va, rb.entry);

return NULL;
}

the for loop becomes a bit more readable:

for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
(start__)); \
 va__ && (va__->va.addr < (end__)); \
 va__ = drm_gpuva_next(va__))

> +
> +/**
> + * drm_gpuva_for_each_va_range_safe - iternator to safely walk over a range 
> of
> + * _gpuvas
> + * @va__: _gpuva to assign to in each iteration step
> + * @next__: another _gpuva to use as temporary storage
> + * @mgr__: _gpuva_manager to walk over
> + * @start__: starting offset, the first gpuva will overlap this
> + * @end__: ending offset, the last gpuva will start before this (but may
> + * overlap)
> + *
> + * This iterator walks over all _gpuvas in the _gpuva_manager that 
> lie
> + * between @start__ and @end__. It is implemented similarly to
> + * list_for_each_safe(), but is using the _gpuva_manager's internal 
> interval
> + * tree to accelerate the search for the starting _gpuva, and hence is 
> safe
> + * against removal of elements. It assumes that @end__ is within (or is the
> + * upper limit of) the _gpuva_manager. This iterator does not skip over 
> the
> + * _gpuva_manager's @kernel_alloc_node.
> + */
> +#define drm_gpuva_for_each_va_range_safe(va__, next__, mgr__, start__, 
> end__) \
> + for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__)), \
> +  next__ = va ? list_next_entry(va__, rb.entry) : NULL; \
> +  va__ && (va__->va.addr < (end__)) && \
> +  !list_entry_is_head(va__, &(mgr__)->rb.list, rb.entry); \
> +  va__ = next__, next__ = list_next_entry(va__, rb.entry))

And this is the safe version using the drm_gpuva_next() helper:

for (va__ = drm_gpuva_find_first((mgr__), (start__), (end__) - 
(start__)), \
 next__ = drm_gpuva_next(va__); \
 va__ && (va__->va.addr < (end__)); \
 va__ = next__, next__ = drm_gpuva_next(va__))

Those changes fixed an invalid pointer access I had in the sm_unmap()
path.


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-07 Thread Boris Brezillon
On Thu, 6 Jul 2023 20:26:42 +0200
Boris Brezillon  wrote:

> On Fri, 30 Jun 2023 00:25:18 +0200
> Danilo Krummrich  wrote:
> 
> > +#ifdef CONFIG_LOCKDEP
> > +typedef struct lockdep_map *lockdep_map_p;
> > +#define drm_gpuva_manager_ext_assert_held(mgr) \
> > +   lockdep_assert(lock_is_held((mgr)->ext_lock) != LOCK_STATE_NOT_HELD)
> > +/**
> > + * drm_gpuva_manager_set_ext_lock - set the external lock according to
> > + * @DRM_GPUVA_MANAGER_LOCK_EXTERN
> > + * @mgr: the _gpuva_manager to set the lock for
> > + * @lock: the lock to set
> > + *
> > + * If @DRM_GPUVA_MANAGER_LOCK_EXTERN is set, drivers need to call this 
> > function
> > + * to provide the lock used to lock linking and unlinking of _gpuvas 
> > to the
> > + * _gem_objects GPUVA list.
> > + */
> > +#define drm_gpuva_manager_set_ext_lock(mgr, lock)  \
> > +   (mgr)->ext_lock = &(lock)->dep_map  
> 
> Okay, so, IIUC, this is the lock protecting the GEM's active mappings
> list, meaning the lock is likely to be attached to the GEM object. Are
> we expected to call drm_gpuva_manager_set_ext_lock() every time we call
> drm_gpuva_[un]link(), or are we supposed to have some lock at the
> device level serializing all drm_gpuva_[un]link() calls across VMs? The
> later doesn't sound like a good option to me, and the former feels a bit
> weird. I'm wondering if we shouldn't just drop this assert_held() check
> when DRM_GPUVA_MANAGER_LOCK_EXTERN is set. Alternatively, we could say
> that any driver wanting to use a custom lock (which is basically all
> drivers modifying the VA space asynchronously in the ::run_job() path)
> has to provide its own variant of drm_gpuva_[un]link() (maybe with its
> own VA list too), which doesn't sound like a good idea either.

Or we could just attach the dep_map to drm_gem_object::gpuva::lock, and
let drivers overload the default lock in their GEM creation function if
they want to use a custom lock (see the following diff).

---

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index e47747f22126..6427c88c22ba 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -675,8 +675,7 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
   const char *name,
   u64 start_offset, u64 range,
   u64 reserve_offset, u64 reserve_range,
-  const struct drm_gpuva_fn_ops *ops,
-  enum drm_gpuva_manager_flags flags)
+  const struct drm_gpuva_fn_ops *ops)
 {
mgr->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(>rb.list);
@@ -686,7 +685,6 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
mgr->mm_range = range;
 
mgr->name = name ? name : "unknown";
-   mgr->flags = flags;
mgr->ops = ops;
 
memset(>kernel_alloc_node, 0, sizeof(struct drm_gpuva));
@@ -822,16 +820,12 @@ EXPORT_SYMBOL(drm_gpuva_remove);
 void
 drm_gpuva_link(struct drm_gpuva *va)
 {
-   struct drm_gpuva_manager *mgr = va->mgr;
struct drm_gem_object *obj = va->gem.obj;
 
if (unlikely(!obj))
return;
 
-   if (drm_gpuva_manager_external_lock(mgr))
-   drm_gpuva_manager_ext_assert_held(mgr);
-   else
-   dma_resv_assert_held(obj->resv);
+   drm_gem_gpuva_assert_lock_held(obj);
 
list_add_tail(>gem.entry, >gpuva.list);
 }
@@ -850,16 +844,12 @@ EXPORT_SYMBOL(drm_gpuva_link);
 void
 drm_gpuva_unlink(struct drm_gpuva *va)
 {
-   struct drm_gpuva_manager *mgr = va->mgr;
struct drm_gem_object *obj = va->gem.obj;
 
if (unlikely(!obj))
return;
 
-   if (drm_gpuva_manager_external_lock(mgr))
-   drm_gpuva_manager_ext_assert_held(mgr);
-   else
-   dma_resv_assert_held(obj->resv);
+   drm_gem_gpuva_assert_lock_held(obj);
 
list_del_init(>gem.entry);
 }
@@ -1680,10 +1670,7 @@ drm_gpuva_gem_unmap_ops_create(struct drm_gpuva_manager 
*mgr,
struct drm_gpuva *va;
int ret;
 
-   if (drm_gpuva_manager_external_lock(mgr))
-   drm_gpuva_manager_ext_assert_held(mgr);
-   else
-   dma_resv_assert_held(obj->resv);
+   drm_gem_gpuva_assert_lock_held(obj);
 
ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (!ops)
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 5ec8148a30ee..572d7a538324 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -387,10 +387,14 @@ struct drm_gem_object {
 * Provides the list of GPU VAs attached to this GEM object.
 *
 * Drivers should lock list accesses with the GEMs _resv lock
-* (_gem_object.resv).
+* (_gem_object.resv) or a custom lock if one is provided.
 */
struct {
struct list_head list;
+
+#ifdef CONFIG_LOCKDEP
+   struct lockdep_map *lock_dep_map;
+#endif
} gpuva;
 
/**
@@ -540,6 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Boris Brezillon
On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich  wrote:

> +#ifdef CONFIG_LOCKDEP
> +typedef struct lockdep_map *lockdep_map_p;
> +#define drm_gpuva_manager_ext_assert_held(mgr)   \
> + lockdep_assert(lock_is_held((mgr)->ext_lock) != LOCK_STATE_NOT_HELD)
> +/**
> + * drm_gpuva_manager_set_ext_lock - set the external lock according to
> + * @DRM_GPUVA_MANAGER_LOCK_EXTERN
> + * @mgr: the _gpuva_manager to set the lock for
> + * @lock: the lock to set
> + *
> + * If @DRM_GPUVA_MANAGER_LOCK_EXTERN is set, drivers need to call this 
> function
> + * to provide the lock used to lock linking and unlinking of _gpuvas to 
> the
> + * _gem_objects GPUVA list.
> + */
> +#define drm_gpuva_manager_set_ext_lock(mgr, lock)\
> + (mgr)->ext_lock = &(lock)->dep_map

Okay, so, IIUC, this is the lock protecting the GEM's active mappings
list, meaning the lock is likely to be attached to the GEM object. Are
we expected to call drm_gpuva_manager_set_ext_lock() every time we call
drm_gpuva_[un]link(), or are we supposed to have some lock at the
device level serializing all drm_gpuva_[un]link() calls across VMs? The
later doesn't sound like a good option to me, and the former feels a bit
weird. I'm wondering if we shouldn't just drop this assert_held() check
when DRM_GPUVA_MANAGER_LOCK_EXTERN is set. Alternatively, we could say
that any driver wanting to use a custom lock (which is basically all
drivers modifying the VA space asynchronously in the ::run_job() path)
has to provide its own variant of drm_gpuva_[un]link() (maybe with its
own VA list too), which doesn't sound like a good idea either.

> +#else
> +typedef struct { /* nothing */ } lockdep_map_p;
> +#define drm_gpuva_manager_ext_assert_held(mgr)   do { 
> (void)(mgr); } while (0)
> +#define drm_gpuva_manager_set_ext_lock(mgr, lock)do { } while (0)
> +#endif


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Intel



On 7/6/23 17:48, Danilo Krummrich wrote:

Hi Thomas,

On 7/6/23 10:49, Thomas Hellström (Intel) wrote:

Hi, Danilo

Some review comments below:

On 6/30/23 00:25, Danilo Krummrich wrote:

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple 
and

arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
    making use of the maple_tree.


It looks like we're not using the maple tree anymore, but rather an 
instantiation of an interval tree.


(Perhaps as a follow-up it makes sense to provide a pre-instantiated 
common u64 version of the interval tree in addition to the unsigned 
long one since it appears to be used in multiple places in graphics 
drivers).



2) Generically connect GPU VA mappings to their backing buffers, in
    particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
    operations on the GPU VA space. In particular splitting and merging
    of GPU VA mappings, e.g. for intersecting mapping requests or 
partial

    unmap requests.

Tested-by: Donald Robson
Reviewed-by: Boris Brezillon
Suggested-by: Dave Airlie
Signed-off-by: Danilo Krummrich
---
  Documentation/gpu/drm-mm.rst    |   36 +
  drivers/gpu/drm/Makefile    |    1 +
  drivers/gpu/drm/drm_gem.c   |    3 +
  drivers/gpu/drm/drm_gpuva_mgr.c | 1743 
+++

  include/drm/drm_drv.h   |    6 +
  include/drm/drm_gem.h   |   52 +
  include/drm/drm_gpuva_mgr.h |  756 ++
  7 files changed, 2597 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
  create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst 
b/Documentation/gpu/drm-mm.rst

index a52e6f4117d6..3d5dc9dc1bfe 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,42 @@ DRM MM Range Allocator Function References
  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
 :export:
+DRM GPU VA Manager
+==
+
+Overview
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+Examples
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Examples
+
+DRM GPU VA Manager Function References
+--
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
  DRM Buddy Allocator
  ===
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 414855e2a463..6d6c9dec66e8 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
  drm_vblank.o \
  drm_vblank_work.o \
  drm_vma_manager.o \
+    drm_gpuva_mgr.o \
  drm_writeback.o
  drm-$(CONFIG_DRM_LEGACY) += \
  drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..cd878ebddbd0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct 
drm_device *dev,

  if (!obj->resv)
  obj->resv = >_resv;
+    if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
+    drm_gem_gpuva_init(obj);
+
  drm_vma_node_reset(>vma_node);
  INIT_LIST_HEAD(>lru_node);
  }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c 
b/drivers/gpu/drm/drm_gpuva_mgr.c

new file mode 100644
index ..4414990c05cc
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1743 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 
OR MIT" or does something restrict it to GPL-only?

+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to 
whom the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Boris Brezillon
On Thu, 6 Jul 2023 17:06:08 +0200
Danilo Krummrich  wrote:

> Hi Boris,
> 
> On 6/30/23 10:02, Boris Brezillon wrote:
> > Hi Danilo,
> > 
> > On Fri, 30 Jun 2023 00:25:18 +0200
> > Danilo Krummrich  wrote:
> >   
> >> + *int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
> >> + *{
> >> + *struct driver_context *ctx = __ctx;
> >> + *
> >> + *drm_gpuva_remap(ctx->prev_va, ctx->next_va, >remap);
> >> + *
> >> + *drm_gpuva_unlink(op->remap.unmap->va);
> >> + *kfree(op->remap.unmap->va);
> >> + *
> >> + *if (op->remap.prev) {
> >> + *drm_gpuva_link(ctx->prev_va);  
> > 
> > I ended up switching to dma_resv-based locking for the GEMs and I
> > wonder what the locking is supposed to look like in the async-mapping
> > case, where we insert/remove the VA nodes in the drm_sched::run_job()
> > path.  
> 
> If you decide to pick the interface where you just call 
> drm_gpuva_sm_[un]map() and receive a callback for each operation it 
> takes to fulfill the request, you probably do this because you want to 
> do everything one shot, updating the VA space, link/unlink GPUVAs 
> to/from its corresponding backing GEMs, do the actual GPU mappings.
> 
> This has a few advantages over generating a list of operations when the 
> job is submitted. You've pointed out one of them, when you noticed that 
> with a list of operations one can't sneak in a synchronous job between 
> already queued up asynchronous jobs.
> 
> However, for the asynchronous path it has the limitation that the 
> dma-resv lock can't be used to link/unlink GPUVAs to/from its 
> corresponding backing GEMs, since this would happen in the fence 
> signalling critical path and we're not allowed to hold the dma-resv lock 
> there. Hence, as we discussed I added the option for drivers to provide 
> an external lock for that, just to be able to keep some lockdep checks.

Uh, okay, I guess that means I need to go back to a custom lock for VM
operations then.

> 
> > 
> > What I have right now is something like:
> > 
> > dma_resv_lock(vm->resv);
> > 
> > // split done in drm_gpuva_sm_map(), each iteration
> > // of the loop is a call to the driver ->[re,un]map()
> > // hook
> > for_each_sub_op() {
> > 
> > // Private BOs have their resv field pointing to the
> > // VM resv and we take the VM resv lock before calling
> > // drm_gpuva_sm_map()
> > if (vm->resv != gem->resv)
> > dma_resv_lock(gem->resv);
> > 
> > drm_gpuva_[un]link(va);
> > gem_[un]pin(gem);
> > 
> > if (vm->resv != gem->resv)
> > dma_resv_unlock(gem->resv);
> > }
> > 
> > dma_resv_unlock(vm->resv);
> >   
> 
> I'm not sure I get this code right, reading "for_each_sub_op()" and 
> "drm_gpuva_sm_map()" looks a bit like things are mixed up?
> 
> Or do you mean to represent the sum of all callbacks with 
> "for_each_sub_op()"?

That ^.

> In this case I assume this code runs in 
> drm_sched::run_job() and hence isn't allowed to take the dma-resv lock.

Yeah, I didn't realize that taking the dma-resv lock in the
dma-signaling path was forbidden. I think it's fine for the drm_gpuva
destroy operation (which calls drm_gem_shmem_unpin(), which in turns
acquires the resv lock) because I can move that to a worker and get it
out of the dma-signaling path. The problem remains for remap operations
though. I need to call drm_gem_shmem_pin() so we retain the pages even
after the unmapped gpuva object that's in the middle of a mapping is
released. I guess one option would be to use an atomic_t for
drm_shmem_gem_object::pages_use_count, and
have something like:

int drm_gem_shmem_pin(struct drm_gem_shmem_object *shmem)
{
int ret;

if (atomic_inc_not_zero(>pages_use_count))
return 0;

dma_resv_lock(shmem->base.resv, NULL);
ret = drm_gem_shmem_pin_locked(shmem);
dma_resv_unlock(shmem->base.resv);

return ret;
}

Given the object already had its pages pinned when we remap, we're sure
the fast path will be taken, and no dma-resv lock aquired.

> 
> > In practice, I don't expect things to deadlock, because the VM resv is
> > not supposed to be taken outside the VM context and the locking order
> > is always the same (VM lock first, and then each shared BO
> > taken/released independently), but I'm not super thrilled by this
> > nested lock, and I'm wondering if we shouldn't have a pass collecting
> > locks in a drm_exec context first, and then have
> > the operations executed. IOW, something like that:
> > 
> > drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES)
> > drm_exec_until_all_locked(exec) {
> > // Dummy GEM is the dummy GEM object I use to make the VM
> > // participate in the locking without having to teach
> > // drm_exec 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Danilo Krummrich

Hi Donald,

On 7/6/23 17:45, Donald Robson wrote:

On Fri, 2023-06-30 at 00:25 +0200, Danilo Krummrich wrote:


+#ifdef CONFIG_LOCKDEP
+typedef struct lockdep_map *lockdep_map_p;
+#define drm_gpuva_manager_ext_assert_held(mgr) \
+   lockdep_assert(lock_is_held((mgr)->ext_lock) != LOCK_STATE_NOT_HELD)
+/**
+ * drm_gpuva_manager_set_ext_lock - set the external lock according to
+ * @DRM_GPUVA_MANAGER_LOCK_EXTERN
+ * @mgr: the _gpuva_manager to set the lock for
+ * @lock: the lock to set
+ *
+ * If @DRM_GPUVA_MANAGER_LOCK_EXTERN is set, drivers need to call this function
+ * to provide the lock used to lock linking and unlinking of _gpuvas to the
+ * _gem_objects GPUVA list.
+ */
+#define drm_gpuva_manager_set_ext_lock(mgr, lock)  \
+   (mgr)->ext_lock = &(lock)->dep_map
+#else
+typedef struct { /* nothing */ } lockdep_map_p;


lockdep_map_p conflicts with an identical typedef in maple_tree.h when 
CONFIG_LOCKDEP is
not set (it's being pulled in by mm.h in drm_vma_manager.h). I'll just comment 
the line
out for now.


Good catch! I got this trick from maple_tree.h and intended to move it 
to the lockdep header in a separate patch to avoid such collisions. 
Unfortunately, I forgot about it. Gonna fix it up.


- Danilo




+#define drm_gpuva_manager_ext_assert_held(mgr) do { (void)(mgr); } 
while (0)
+#define drm_gpuva_manager_set_ext_lock(mgr, lock)  do { } while (0)
+#endif
+




Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Danilo Krummrich

Hi Thomas,

On 7/6/23 10:49, Thomas Hellström (Intel) wrote:

Hi, Danilo

Some review comments below:

On 6/30/23 00:25, Danilo Krummrich wrote:

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
    making use of the maple_tree.


It looks like we're not using the maple tree anymore, but rather an 
instantiation of an interval tree.


(Perhaps as a follow-up it makes sense to provide a pre-instantiated 
common u64 version of the interval tree in addition to the unsigned long 
one since it appears to be used in multiple places in graphics drivers).



2) Generically connect GPU VA mappings to their backing buffers, in
    particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
    operations on the GPU VA space. In particular splitting and merging
    of GPU VA mappings, e.g. for intersecting mapping requests or partial
    unmap requests.

Tested-by: Donald Robson
Reviewed-by: Boris Brezillon
Suggested-by: Dave Airlie
Signed-off-by: Danilo Krummrich
---
  Documentation/gpu/drm-mm.rst    |   36 +
  drivers/gpu/drm/Makefile    |    1 +
  drivers/gpu/drm/drm_gem.c   |    3 +
  drivers/gpu/drm/drm_gpuva_mgr.c | 1743 +++
  include/drm/drm_drv.h   |    6 +
  include/drm/drm_gem.h   |   52 +
  include/drm/drm_gpuva_mgr.h |  756 ++
  7 files changed, 2597 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
  create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a52e6f4117d6..3d5dc9dc1bfe 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,42 @@ DRM MM Range Allocator Function References
  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
 :export:
+DRM GPU VA Manager
+==
+
+Overview
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+Examples
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Examples
+
+DRM GPU VA Manager Function References
+--
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
  DRM Buddy Allocator
  ===
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 414855e2a463..6d6c9dec66e8 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
  drm_vblank.o \
  drm_vblank_work.o \
  drm_vma_manager.o \
+    drm_gpuva_mgr.o \
  drm_writeback.o
  drm-$(CONFIG_DRM_LEGACY) += \
  drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..cd878ebddbd0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device 
*dev,

  if (!obj->resv)
  obj->resv = >_resv;
+    if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
+    drm_gem_gpuva_init(obj);
+
  drm_vma_node_reset(>vma_node);
  INIT_LIST_HEAD(>lru_node);
  }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c 
b/drivers/gpu/drm/drm_gpuva_mgr.c

new file mode 100644
index ..4414990c05cc
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1743 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 OR 
MIT" or does something restrict it to GPL-only?

+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,

+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
SHALL
+ * THE COPYRIGHT 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Donald Robson
On Fri, 2023-06-30 at 00:25 +0200, Danilo Krummrich wrote:
> 
> +#ifdef CONFIG_LOCKDEP
> +typedef struct lockdep_map *lockdep_map_p;
> +#define drm_gpuva_manager_ext_assert_held(mgr)   \
> + lockdep_assert(lock_is_held((mgr)->ext_lock) != LOCK_STATE_NOT_HELD)
> +/**
> + * drm_gpuva_manager_set_ext_lock - set the external lock according to
> + * @DRM_GPUVA_MANAGER_LOCK_EXTERN
> + * @mgr: the _gpuva_manager to set the lock for
> + * @lock: the lock to set
> + *
> + * If @DRM_GPUVA_MANAGER_LOCK_EXTERN is set, drivers need to call this 
> function
> + * to provide the lock used to lock linking and unlinking of _gpuvas to 
> the
> + * _gem_objects GPUVA list.
> + */
> +#define drm_gpuva_manager_set_ext_lock(mgr, lock)\
> + (mgr)->ext_lock = &(lock)->dep_map
> +#else
> +typedef struct { /* nothing */ } lockdep_map_p;

lockdep_map_p conflicts with an identical typedef in maple_tree.h when 
CONFIG_LOCKDEP is
not set (it's being pulled in by mm.h in drm_vma_manager.h). I'll just comment 
the line
out for now.

> +#define drm_gpuva_manager_ext_assert_held(mgr)   do { 
> (void)(mgr); } while (0)
> +#define drm_gpuva_manager_set_ext_lock(mgr, lock)do { } while (0)
> +#endif
> +


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Danilo Krummrich

Hi Boris,

On 6/30/23 10:02, Boris Brezillon wrote:

Hi Danilo,

On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich  wrote:


+ * int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
+ * {
+ * struct driver_context *ctx = __ctx;
+ *
+ * drm_gpuva_remap(ctx->prev_va, ctx->next_va, >remap);
+ *
+ * drm_gpuva_unlink(op->remap.unmap->va);
+ * kfree(op->remap.unmap->va);
+ *
+ * if (op->remap.prev) {
+ * drm_gpuva_link(ctx->prev_va);


I ended up switching to dma_resv-based locking for the GEMs and I
wonder what the locking is supposed to look like in the async-mapping
case, where we insert/remove the VA nodes in the drm_sched::run_job()
path.


If you decide to pick the interface where you just call 
drm_gpuva_sm_[un]map() and receive a callback for each operation it 
takes to fulfill the request, you probably do this because you want to 
do everything one shot, updating the VA space, link/unlink GPUVAs 
to/from its corresponding backing GEMs, do the actual GPU mappings.


This has a few advantages over generating a list of operations when the 
job is submitted. You've pointed out one of them, when you noticed that 
with a list of operations one can't sneak in a synchronous job between 
already queued up asynchronous jobs.


However, for the asynchronous path it has the limitation that the 
dma-resv lock can't be used to link/unlink GPUVAs to/from its 
corresponding backing GEMs, since this would happen in the fence 
signalling critical path and we're not allowed to hold the dma-resv lock 
there. Hence, as we discussed I added the option for drivers to provide 
an external lock for that, just to be able to keep some lockdep checks.




What I have right now is something like:

dma_resv_lock(vm->resv);

// split done in drm_gpuva_sm_map(), each iteration
// of the loop is a call to the driver ->[re,un]map()
// hook
for_each_sub_op() {

// Private BOs have their resv field pointing to the
// VM resv and we take the VM resv lock before calling
// drm_gpuva_sm_map()
if (vm->resv != gem->resv)
dma_resv_lock(gem->resv);

drm_gpuva_[un]link(va);
gem_[un]pin(gem);

if (vm->resv != gem->resv)
dma_resv_unlock(gem->resv);
}

dma_resv_unlock(vm->resv);



I'm not sure I get this code right, reading "for_each_sub_op()" and 
"drm_gpuva_sm_map()" looks a bit like things are mixed up?


Or do you mean to represent the sum of all callbacks with 
"for_each_sub_op()"? In this case I assume this code runs in 
drm_sched::run_job() and hence isn't allowed to take the dma-resv lock.



In practice, I don't expect things to deadlock, because the VM resv is
not supposed to be taken outside the VM context and the locking order
is always the same (VM lock first, and then each shared BO
taken/released independently), but I'm not super thrilled by this
nested lock, and I'm wondering if we shouldn't have a pass collecting
locks in a drm_exec context first, and then have
the operations executed. IOW, something like that:

drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES)
drm_exec_until_all_locked(exec) {
// Dummy GEM is the dummy GEM object I use to make the VM
// participate in the locking without having to teach
// drm_exec how to deal with raw dma_resv objects.
ret = drm_exec_lock_obj(exec, vm->dummy_gem);
drm_exec_retry_on_contention(exec);
if (ret)
return ret;

// Could take the form of drm_gpuva_sm_[un]map_acquire_locks()
// helpers
for_each_sub_op() {
ret = drm_exec_lock_obj(exec, gem);
if (ret)
return ret;
}
}

// each iteration of the loop is a call to the driver
// ->[re,un]map() hook
for_each_sub_op() {
...
gem_[un]pin_locked(gem);
drm_gpuva_[un]link(va);
...
}

drm_exec_fini(exec);


I have a follow-up patch (still WIP) in the queue to generalize dma-resv 
handling, fence handling and GEM validation within the GPUVA manager as 
optional helper functions: 
https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/a5fc29f3b1edbf3f96fb5a21b858ffe00a3f2584


This was suggested by Matt Brost.

- Danilo



Don't know if I got this right, or if I'm just confused again by how
the drm_gpuva API is supposed to be used.

Regards,

Boris


+ * ctx->prev_va = NULL;
+ * }
+ *
+ * if (op->remap.next) {
+ * drm_gpuva_link(ctx->next_va);
+ * ctx->next_va = 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Intel

Hi, Danilo

Some review comments below:

On 6/30/23 00:25, Danilo Krummrich wrote:

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
making use of the maple_tree.


It looks like we're not using the maple tree anymore, but rather an 
instantiation of an interval tree.


(Perhaps as a follow-up it makes sense to provide a pre-instantiated 
common u64 version of the interval tree in addition to the unsigned long 
one since it appears to be used in multiple places in graphics drivers).



2) Generically connect GPU VA mappings to their backing buffers, in
particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
operations on the GPU VA space. In particular splitting and merging
of GPU VA mappings, e.g. for intersecting mapping requests or partial
unmap requests.

Tested-by: Donald Robson
Reviewed-by: Boris Brezillon
Suggested-by: Dave Airlie
Signed-off-by: Danilo Krummrich
---
  Documentation/gpu/drm-mm.rst|   36 +
  drivers/gpu/drm/Makefile|1 +
  drivers/gpu/drm/drm_gem.c   |3 +
  drivers/gpu/drm/drm_gpuva_mgr.c | 1743 +++
  include/drm/drm_drv.h   |6 +
  include/drm/drm_gem.h   |   52 +
  include/drm/drm_gpuva_mgr.h |  756 ++
  7 files changed, 2597 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
  create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a52e6f4117d6..3d5dc9dc1bfe 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,42 @@ DRM MM Range Allocator Function References
  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
 :export:
  
+DRM GPU VA Manager

+==
+
+Overview
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+Examples
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Examples
+
+DRM GPU VA Manager Function References
+--
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
  DRM Buddy Allocator
  ===
  
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile

index 414855e2a463..6d6c9dec66e8 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
drm_vblank.o \
drm_vblank_work.o \
drm_vma_manager.o \
+   drm_gpuva_mgr.o \
drm_writeback.o
  drm-$(CONFIG_DRM_LEGACY) += \
drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..cd878ebddbd0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
if (!obj->resv)
obj->resv = >_resv;
  
+	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))

+   drm_gem_gpuva_init(obj);
+
drm_vma_node_reset(>vma_node);
INIT_LIST_HEAD(>lru_node);
  }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
new file mode 100644
index ..4414990c05cc
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1743 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 OR 
MIT" or does something restrict it to GPL-only?

+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE 

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-06-30 Thread Boris Brezillon
On Fri, 30 Jun 2023 10:02:52 +0200
Boris Brezillon  wrote:

> In practice, I don't expect things to deadlock, because the VM resv is
> not supposed to be taken outside the VM context and the locking order
> is always the same (VM lock first, and then each shared BO
> taken/released independently), but I'm not super thrilled by this
> nested lock, and I'm wondering if we shouldn't have a pass collecting
> locks in a drm_exec context first, and then have
> the operations executed. IOW, something like that:
> 
>   drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES)
>   drm_exec_until_all_locked(exec) {
>   // Dummy GEM is the dummy GEM object I use to make the VM
>   // participate in the locking without having to teach
>   // drm_exec how to deal with raw dma_resv objects.
>   ret = drm_exec_lock_obj(exec, vm->dummy_gem);
>   drm_exec_retry_on_contention(exec);
>   if (ret)
>   return ret;
> 
>   // Could take the form of drm_gpuva_sm_[un]map_acquire_locks()
>   // helpers

Nevermind, I implemented a driver specific acquire_op_locks(), and it's
fairly simple with the gpuva iter (we just have to iterate over all VAs
covered by the operation range and call drm_exec_lock_obj() on the GEM
attached to these VAs), so it's probably not worth providing a generic
helper for that.


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-06-30 Thread Boris Brezillon
On Fri, 30 Jun 2023 10:02:52 +0200
Boris Brezillon  wrote:

> Hi Danilo,
> 
> On Fri, 30 Jun 2023 00:25:18 +0200
> Danilo Krummrich  wrote:
> 
> > + * int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
> > + * {
> > + * struct driver_context *ctx = __ctx;
> > + *
> > + * drm_gpuva_remap(ctx->prev_va, ctx->next_va, >remap);
> > + *
> > + * drm_gpuva_unlink(op->remap.unmap->va);
> > + * kfree(op->remap.unmap->va);
> > + *
> > + * if (op->remap.prev) {
> > + * drm_gpuva_link(ctx->prev_va);
> 
> I ended up switching to dma_resv-based locking for the GEMs and I
> wonder what the locking is supposed to look like in the async-mapping
> case, where we insert/remove the VA nodes in the drm_sched::run_job()
> path.
> 
> What I have right now is something like:
> 
>   dma_resv_lock(vm->resv);
> 
>   // split done in drm_gpuva_sm_map(), each iteration
>   // of the loop is a call to the driver ->[re,un]map()
>   // hook
>   for_each_sub_op() {
>   
>   // Private BOs have their resv field pointing to the
>   // VM resv and we take the VM resv lock before calling
>   // drm_gpuva_sm_map()
>   if (vm->resv != gem->resv)
>   dma_resv_lock(gem->resv);
> 
>   drm_gpuva_[un]link(va);
>   gem_[un]pin(gem);
> 
>   if (vm->resv != gem->resv)
>   dma_resv_unlock(gem->resv);
>   }
> 
>   dma_resv_unlock(vm->resv);
> 
> In practice, I don't expect things to deadlock, because the VM resv is
> not supposed to be taken outside the VM context and the locking order
> is always the same (VM lock first, and then each shared BO
> taken/released independently), but I'm not super thrilled by this
> nested lock, and I'm wondering if we shouldn't have a pass collecting
> locks in a drm_exec context first, and then have
> the operations executed. IOW, something like that:
> 
>   drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES)
>   drm_exec_until_all_locked(exec) {
>   // Dummy GEM is the dummy GEM object I use to make the VM
>   // participate in the locking without having to teach
>   // drm_exec how to deal with raw dma_resv objects.
>   ret = drm_exec_lock_obj(exec, vm->dummy_gem);
>   drm_exec_retry_on_contention(exec);
>   if (ret)
>   return ret;
> 
>   // Could take the form of drm_gpuva_sm_[un]map_acquire_locks()
>   // helpers
>   for_each_sub_op() {
>   ret = drm_exec_lock_obj(exec, gem);
>   if (ret)
>   return ret;
>   }
>   }
> 
>   // each iteration of the loop is a call to the driver
>   // ->[re,un]map() hook
>   for_each_sub_op() {
>   ...
>   gem_[un]pin_locked(gem);

Just wanted to clarify that the pages have been pinned at VM_BIND job
creation time, so this gem_pin_locked() call is effectively just a
pin_count++, not the whole page allocation, which we don't want to
happen in a dma-signaling path.

>   drm_gpuva_[un]link(va);
>   ...
>   }
> 
>   drm_exec_fini(exec);


Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-06-30 Thread Boris Brezillon
Hi Danilo,

On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich  wrote:

> + *   int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
> + *   {
> + *   struct driver_context *ctx = __ctx;
> + *
> + *   drm_gpuva_remap(ctx->prev_va, ctx->next_va, >remap);
> + *
> + *   drm_gpuva_unlink(op->remap.unmap->va);
> + *   kfree(op->remap.unmap->va);
> + *
> + *   if (op->remap.prev) {
> + *   drm_gpuva_link(ctx->prev_va);

I ended up switching to dma_resv-based locking for the GEMs and I
wonder what the locking is supposed to look like in the async-mapping
case, where we insert/remove the VA nodes in the drm_sched::run_job()
path.

What I have right now is something like:

dma_resv_lock(vm->resv);

// split done in drm_gpuva_sm_map(), each iteration
// of the loop is a call to the driver ->[re,un]map()
// hook
for_each_sub_op() {

// Private BOs have their resv field pointing to the
// VM resv and we take the VM resv lock before calling
// drm_gpuva_sm_map()
if (vm->resv != gem->resv)
dma_resv_lock(gem->resv);

drm_gpuva_[un]link(va);
gem_[un]pin(gem);

if (vm->resv != gem->resv)
dma_resv_unlock(gem->resv);
}

dma_resv_unlock(vm->resv);

In practice, I don't expect things to deadlock, because the VM resv is
not supposed to be taken outside the VM context and the locking order
is always the same (VM lock first, and then each shared BO
taken/released independently), but I'm not super thrilled by this
nested lock, and I'm wondering if we shouldn't have a pass collecting
locks in a drm_exec context first, and then have
the operations executed. IOW, something like that:

drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES)
drm_exec_until_all_locked(exec) {
// Dummy GEM is the dummy GEM object I use to make the VM
// participate in the locking without having to teach
// drm_exec how to deal with raw dma_resv objects.
ret = drm_exec_lock_obj(exec, vm->dummy_gem);
drm_exec_retry_on_contention(exec);
if (ret)
return ret;

// Could take the form of drm_gpuva_sm_[un]map_acquire_locks()
// helpers
for_each_sub_op() {
ret = drm_exec_lock_obj(exec, gem);
if (ret)
return ret;
}
}

// each iteration of the loop is a call to the driver
// ->[re,un]map() hook
for_each_sub_op() {
...
gem_[un]pin_locked(gem);
drm_gpuva_[un]link(va);
...
}

drm_exec_fini(exec);

Don't know if I got this right, or if I'm just confused again by how
the drm_gpuva API is supposed to be used.

Regards,

Boris

> + *   ctx->prev_va = NULL;
> + *   }
> + *
> + *   if (op->remap.next) {
> + *   drm_gpuva_link(ctx->next_va);
> + *   ctx->next_va = NULL;
> + *   }
> + *
> + *   return 0;
> + *   }